Recent advances in understanding the role of phasic dopamine activity

The latest animal neurophysiology has revealed that the dopamine reward prediction error signal drives neuronal learning in addition to behavioral learning and reflects subjective reward representations beyond explicit contingency. The signal complies with formal economic concepts and functions in real-world consumer choice and social interaction. An early response component is influenced by physical impact, reward environment, and novelty but does not fully code prediction error. Some dopamine neurons are activated by aversive stimuli, which may reflect physical stimulus impact or true aversiveness, but they do not seem to code general negative value or aversive prediction error. The reward prediction error signal is complemented by distinct, heterogeneous, smaller and slower changes reflecting sensory and motor contributors to behavioral activation, such as substantial movement (as opposed to precise motor control), reward expectation, spatial choice, vigor, and motivation. The different dopamine signals seem to defy a simple unifying concept and should be distinguished to better understand phasic dopamine functions.


Introduction
The question "What is dopamine doing?" keeps stubbornly popping up after the discovery of the brain's dopamine system and its relationships to Parkinson's disease, psychosis, and drug addiction. Although the efficacy of dopamine receptorstimulating drugs in alleviating Parkinsonian movement disorders pointed initially to a mere tonic, modulatory role, it became increasingly clear that dopamine is a neurotransmitter not unlike other transmitters and has its own synapses and phasic activity related to stimuli and actions. The ensuing research efforts revealed an amazing array of heterogeneous functions at various time courses and levels of specificity that range from general behavioral activation to precise reward signaling for biological learning, machine learning, and economic choice 1 . The complexity defies the notion of "one neuronal system equals one function" but likely reflects the workings of an evolutionarily ancient system that governs the individual's requirements for survival.
This overview describes further conceptual, biological, and economic characterizations of the dopamine reward signal in animals from the past few years, its involvement in social processes, and its distinction from aversive, novelty, sensory, and motor processing. I will follow the notion that the function of an information-processing system can be defined by the relationship of its internal signals to behavior. This knowledge would provide a firm basis for investigating molecular, cellular, and circuit mechanisms. However, detailed descriptions of the recently elucidated fine network properties of dopamine neurons would exceed the topic and limits of this brief review, nor will I be able to discuss molecular signaling, human brain signals, and effects of lesions and systemic dopaminergic drugs that indicate tonic permissive rather than phasic driving influences.

Further characterization of the reward prediction error signal
Rather than coding rewards and reward-predicting stimuli as they appear in the environment, phasic, sub-second responses in the majority of midbrain dopamine neurons code a reward prediction error. Their activity is increased for one hundred or two hundred milliseconds when a reward or rewardpredicting stimulus is better than predicted, their activity is unchanged when these events have the same reward value as their prediction, and their activity is briefly depressed when these events have lower reward value than predicted 1 .

Rewarding effect of dopamine neuron stimulation
Electrical or optical stimulation of dopamine neurons serves as a teaching signal for lever pressing, nose poking, place preference, unblocking, and prevention of extinction 2-6 ; conversely, optogenetic dopamine inhibition induces place avoidance and behavioral inhibition 7-9 . These behavioral effects likely reflect the elicitation of positive and negative reward prediction error signals, respectively. Recent research shows that these behavioral learning functions extend to neuronal learning: monkey dopamine neurons acquire stronger responses to an intrinsically neutral visual stimulus that is followed by optogenetic dopamine stimulation added to juice reward, as compared with a stimulus associated with only that reward ( Figure 1A, B) 10 . Concomitantly, the animal develops choice preference over (B) Behavioral learning: gradual increase of choice probability between the two stimuli. Ticks indicate choices in channelrhodopsin-injected animals (blue) and non-injected controls (red). Adapted from Stauffer et al. 10 Figure 6.B, CC BY 4.0. (C) Graded neuronal learning in rats induced by dopamine excitation at reward time. P = probability of excitation per stimulus appearance 11 . (D) Behavioral learning: acquisition of locomotion following the stimulus associated with optogenetic excitation 11 . 20 to 25 repetitions for the stimulation-associated fractal over an alternative, non-stimulated fractal, even without natural reward. In rats, optogenetic dopamine excitation at the time of reward induces dopamine responses to the stimulus along with driving approach and locomotion ( Figure 1C, D) 11 . In a further step, dopamine stimulation serves as reward for operantly controlling cortical firing patterns 12 . These effects together support the hypothesis that bidirectional dopamine reward prediction error responses influence neuronal and behavioral learning.
Dopamine neurons access reward predictions without explicit association Standard reward learning paradigms rely on the contingent association with a stimulus, whereas higher learning theories postulate a role for representations beyond explicit reward contingency. Dopamine neurons follow this latter notion 13 : during sensory preconditioning, two stimuli (A and B) are first presented sequentially. Then reward occurs only with the later stimulus presented alone (B). Then the earlier stimulus (A) is tested for reward prediction. Indeed, dopamine neurons are activated by the test stimulus (A) although it had never been explicitly paired with the reward. Thus, the neurons access a reward representation via the test stimulus (A) that had earlier been associated with the then-unrewarded stimulus (B), defying the simple requirement for direct stimulus-reward contingency.

Prediction error responses reveal what's on dopamine's mind
The reward prediction error response depends on both the reward and the prediction: reward received minus reward predicted. If we know the reward and measure the dopamine response, we can infer the prediction the neuron is accessing.
The idea started with a stimulus sequence that always ends with a reward after a short but random number of steps. A monkey registering only repeated reward omissions would expect progressively less reward, but with experience it would know the reward would come more likely the longer the wait is (increasing hazard rate). Thus, with longer waiting, reward prediction increases and the error when the reward occurs decreases. Indeed, the dopamine response to the reward decreased during waiting, indicating that the neurons accessed the temporally increasing reward prediction derived from the overall task experience (rather than a decreasing prediction derived from the repeating reward omissions) 14 . A recent experiment confirmed this result in mice but tested also slightly uncertain rewards (probability of P = 0.9). Here, the animal never knew for sure whether the reward would ultimately come and might increasingly expect none as time advances (like humans giving up waiting for an unreliable bus). But when the reward does occur, the prediction error and the dopamine response are higher the longer the wait was 15 . Thus, the dopamine response reflects access to reward predictions that are inferred from the temporal structure of reward probabilities rather than deriving entirely from the occurrence or omission of last rewards. Interestingly, rewardpredicting responses in amygdala reflect also temporal reward probability 16 , indicating that reward neurons in general may access more sophisticated reward representations than hitherto assumed.
Reward predictions accessed by dopamine neurons derive from probability distributions of reward amounts. A larger reward compared with the expected value (predicted mean) of a predicted distribution activates dopamine neurons in monkeys, and a smaller reward induces a depression [17][18][19][20] . Dopamine responses change their gain depending on the variance of the distribution 21 , suggesting access to at least the first two statistical moments of distributions. By contrast, with a predicted distribution of only two fixed reward amounts, something unexpected happens in mice: there is no response when either of the two predicted rewards occurs but a graded response in rare probe trials that tends to increase with the absolute difference to each of the two predicted rewards; the response is positive for amounts slightly above the lower reward, negative for amounts slightly below the upper reward, and zero for amounts right between the two rewards 22 . For an intuitive example, imagine a restaurant with two randomly alternating chefs with widely different ability: when the food is almost but not quite spectacular, we realize the good chef was cooking but may have overlooked something, thus generating a negative prediction error (relative to the predicted superb meal from that chef), even though the food was better than from the other chef and above the mean from both chefs. Thus, dopamine neurons access rich reward probability distributions via their statistical moments but can access individual elements when distributions are very restricted. As seen during waiting 14,15 and reward reversal 23 , the reward predictions accessed by dopamine neurons derive not only from recent rewards but also from the overall reward structure of the environment.
Perceptual choices help to further reveal what's on dopamine's mind. Dopamine responses to a set of choice options reflect the animal's future choice. When a monkey chooses the more frequently rewarded option, the stimulus response is stronger compared with choosing the less often rewarded option, despite identical option presentation. As reward probability constitutes value, the neurons code "chosen value" (that is, the value of the option the animal chooses) rather than the mean value of all options 24,25 . The chosen value response occurs to the stimulus and partly precedes and thus predicts the choice. In these straightforward tests, the animal chooses, with some stochasticity, between values that are firmly associated with the options. By contrast, in perceptual random-dot motion choice tasks, the value depends on the animal's discrimination of motion direction, and the reward probabilities are not firmly associated with constant, unequivocally marked options. Higher motion coherence allows better discrimination and thus increases the probability of getting a reward. Thus, with higher coherence, reward value increases monotonically when choosing the correct motion direction but decreases monotonically when choosing the opposite, incorrect direction. Dopamine neurons in monkeys and mice show exactly this graded chosen value response during random-dot motion and contrast detection tasks 26,27 . The value responses before each choice derive from the combination of the animal's stimulus assessment and the subjective probability of making a correct discrimination ("subjective" in the sense of perception rather than individual economic probability weighing). As the targets are not distinctly marked for value, the responses cannot simply reflect the experienced reward probability for a given target.
Taken together, dopamine neurons have access to representations of future rewards that not only are associated with explicit stimuli but also derive from environmental factors like context, task structure, and time. These internal representations may be more globally called belief states and, when they reflect prior probabilities, Bayesian belief states 22,26 . These representations or beliefs are parts of reward predictions that affect dopamine neurons, which report their deviation from the actual obtained primary and conditioned rewards as "reward prediction error".

Neuroeconomics
Rewards don't exist; they are made up by our minds. The third steak during a dinner is not attractive although it is pretty similar to the first two appetizing steaks. Plenty of other examples confirm that reward value is subjective and depends on non-physical factors like satiety, delay, and risk. While we can forever test individual cases of subjective value, economic theory provides concepts for understanding subjective value and preferences and predicting behavioral choices under various conditions, including risk. An example is the utility signal of dopamine neurons that transcends the ad-hoc coding of subjective value 19 . This neuronal result aligns biological reward to economic choice and constitutes a prerequisite for understanding how individuals maximize utility for momentary and evolutionary benefit.
But what would a dopamine signal for such a theoretical decision variable do in a real-world scenario? One of the most intuitive and reliable phenomena in economics is the pricedemand relationship. As the price goes up, consumption goes down; people buy less stuff when it gets more expensive. But if the good becomes more valuable, demand increases, which shifts price-demand curves to the right. Price can be modeled as number of lever presses in rats, and value can be enhanced by dopamine stimulation, although further known factors affecting consumption may be too extensive for an initial, well-controlled study, such as availability of alternatives, time, and effort. How then would a dopamine economic value (utility) signal affect consumer choice? Indeed, inducing a positive dopamine reward prediction error signal by optogenetic excitation at the reward shifts the curves upward and rightward, indicating that the stimulation enhances value, thereby increasing demand at same price and maintaining same consumption despite higher price (Figure 2) 28 . Stimulation at the rewardpredicting cue has the opposite effect (by lowering reward value due to a negative prediction error elicited by the reward following the enhanced value prediction). This wellconceptualized situation, even with the restrictions imposed on an initial study, demonstrates that the dopamine utility signal has a very practical application; it affects daily consumer choice by influencing the value of a good. This beautiful result, outside the beaten path, suggests many follow-up experiments.
Social settings: valuing own and other's reward Rewards are fine for me but may not be so great when somebody else receives them instead of me. Monkeys see it the same way; they value rewards more when they occur more frequently for themselves but not so much when they occur for another monkey, as shown by licking and binary choice. Dopamine neurons follow this social reward valuation; higher probability of own water reward elicits stronger responses, confirming standard reward value coding, whereas higher reward probability for the other monkey reduces own dopamine responses 29 . It seems that this disadvantageous reward inequity has negative reward value for dopamine neurons. Thus, dopamine neurons register everybody's rewards but value them only relative to their host. Their primary concern with own reward resembles that of most reward neurons in the striatum 30 , some of which sense disadvantageous reward inequity 31 .

The dopamine prediction error signal: purely reward?
A response that is only a component Environmental rewards and reward-predicting stimuli contain a non-value component that impacts on sensory receptors, but their identification and evaluation take a few tens or hundreds of milliseconds. Dopamine neurons, in analogy to other neuronal systems, show an early unselective activation, which reflects sensory detection of the stimulus 32 and constitutes a default signal for any potential reward in the environment; it is quickly replaced, before any behavioral action, by the subsequent prediction error component that codes reward value 19,[33][34][35] ; recent studies confirm this notion 36 . Thus, the initial, non-reward activation constitutes an integral part of the dopamine reward response. Its identification requires temporal resolution in the ten-millisecond range and is often difficult, in particular with unrewarded, value-less stimuli not allowing independent variation of sensory and reward parameters.
Several factors affect the initial, sensory dopamine activation. First, it increases with physical impact and salience, irrespective of reward or aversive value 34 . Second, it is elicited and enhanced by neutral or punishment-predicting stimuli that resemble rewards or occur in rewarding contexts 37-39 . Finally, it occurs with novel stimuli in humans, monkeys, and mice 25,40-42 . The novelty component decays during conditioning (due to repetition), whereas the reward-predicting component increases 25,42 . The unpredicted occurrence of an unrewarded picture and positive sensory prediction errors enhance the initial-component response but, in contrast to bidirectional reward prediction error coding, picture omission does not seem to elicit a dopamine depression in monkeys and rats 33,38,43 ( Figure 3A-D). Thus, the initial dopamine response component seems to code Figure 3. Surprise salience coding with non-rewarding stimuli contrasts with reward prediction error coding. (A) Bidirectional prediction error coding for juice reward. The animal received juice reward in 75% of trials but not in 25% of trials. Hence, a reward that did occur generated a 25% positive prediction error, and an omitted reward generated a 75% negative prediction error. (B) With similar 75 to 25% presentation of non-rewarding arbitrary (fractal) picture, unidirectional response enhancement with surprising picture occurrence (+25% picture prediction error), without negative error coding with picture omission (−75% picture error). (C) Reward response increase with unpredicted reward delivery (compatible with positive reward prediction error coding). Closed circles indicate significant differences (P <0.05; t test). (D) Smaller response enhancement with unpredicted picture occurrence, reflecting surprise salience. A-D are reused from Kobayashi and Schultz 38 Figure 4 (A, B, E, F), CC BY 3.0. (E) Preference for blackcurrant over orange juice in binary, simultaneous choice (same liquid amounts), indicating higher value of blackcurrant than orange juice 18 . (F) Dopamine prediction error response for juice identity reflects reward value. The concentric stimulus predicts equiprobable delivery of either blackcurrant or orange juice; the neuronal response reflects the prediction error between the value of the received juice and the stimulus-predicted mean value of the two juices (green: positive; blue: negative, with initial-component activation) 18 . surprise salience rather than a full, bidirectional prediction error. In contrast to the initial sensory component, delivery of different juices with different sensory attributes elicits a bidirectional reward prediction error response that reflects the value of the juices ( Figure 3E, F).

Aversive responses
For 40 years, many studies, including our own, reported activations by aversive stimuli in some dopamine neurons (for references, see 35). However, aversive events contain several components, as do rewards, and their dissociation concluded that dopamine activations by aversive stimuli reflect physical impact (first component) rather than aversiveness 34 ; aversiveness is coded not at all 34 or as depression of activity reflecting negative reward value (second component) 44,45 . Dopamine reward neurons are also activated by negative punishment prediction error, which has positive value (double negative) 39,45,46 , by rebound from aversive depression 34,45 , and by prediction of relief from punishment 45-47 , which is rewarding 48,49 . Thus, some of the recently reported activations by aversive air puff, sound, and foot shock 44,45 might reflect rewarding relief from the threat these stimuli might pose to the animals, even if these neurons do not code standard reward.
In contrast to these reward responses, recent studies report activations in dopamine subgroups in lateral substantia nigra, striatum tail, and ventro-medial nucleus accumbens shell in response to air puff, intense sound, and foot shock but not with physically less intense aversive quinine nor much with reward 42,44,45 . These responses may reflect physical impact or aversion or both. The foot shock activation transfers to predictive stimuli during learning in ventro-medial nucleus accumbens shell 45 . This result would refute a possible relation to physical impact, which is unchanged, but it might also reflect temporal surprise salience; it might even indicate transfer of an early-component sensory impact response in analogy to the known transfer of the subsequent value component. Nonetheless, these neurons differ in molecular and physiological properties and have striatal projection territories different from those of the typical, straightforward reward-processing dopamine neurons 44,45 . Foot shock omission fails to elicit depressions in these dopamine neurons 45 ; this lack of bidirectional prediction error coding would make an involvement in reinforcement learning less direct. Furthermore, optogenetic excitation of dopamine axons in striatum tail elicits behavioral aversion 44 , indicating a truly aversive function (though without completely mimicking the brain's mechanics of natural excitation). The physically less intense quinine is ineffective despite its behavioral aversiveness 44 , which argues for a contribution of physical impact and against general negative value coding.
Thus, if physical impact remains an option for explaining activations by aversive stimuli, we might be dealing with the opposite tails of two continuous probability distributions: one for physical impact and one for value. Then dopamine neurons with activations by aversive stimuli might lie at the high end of the physical impact distribution, and their weak reward coding would be at the low end of the value distribution. On the other hand, despite all the caveats, optogenetics may have uncovered groups of dopamine neurons that are truly activated by specific punishers and thus differ qualitatively from reward-processing dopamine neurons 45 , after 40 years of trying to nail them. If so, they might be parts of an ancient system detecting fear (of air puff, intense sound, foot shock, and novelty) rather than disgust (quinine) 44 and contrast with the abundant reward-coding dopamine neurons that are depressed by aversive stimuli and code outcome value monotonically from negative to positive 39,44 . Dopamine neurons in fruit flies show similar response diversity-about 130 neurons code reward and 12 neurons code punishment 50 suggesting preservation across a huge evolutionary range. So, ten years from now, will we know whether the dopamine activations by aversive stimuli reflect physical impact or aversiveness or maybe both?

Behavioral activation
Even though the common assumption of one brain system equals one function may not hold for dopamine 1 , such multifunctionality seems perplexing and gives rise to the question "What is dopamine doing?"

Movement or not movement
The earliest behavioral studies of midbrain dopamine neurons and striatal dopamine concentrations in monkeys and rats report heterogeneous activations and depressions for a second or more with movements 51-55 . Dopamine changes are associated with task events such as large contralateral or ipsilateral arm reaching movements (16-44% and 15-17% of neurons, respectively), self-initiated arm movements (12%), reward delivery and mouth movements (9%), and full trial duration (5%). However, such changes are absent with more concise movements, such as well-controlled arm flexion-extension 56 , stereotyped reaching 41 , sluggish reaching elicited by offset of a stimulus 57 , and spontaneous and stimulus-driven eye movements 57 . The monitoring of large numbers of individual muscles in monkeys ( Figure 4) shows that these heterogeneous dopamine changes are unrelated to specific movements or motor control but reflect the behavioral activation underlying large movements, derived from the activity of many muscles 55,57,58 and of sensory receptors in muscle, joint, and skin associated with such movements, a global process that might also be called vigor or even motivation.

Movement activation
The advent of dopamine voltammetry, molecular identification, optogenetics, and optical recording allows us to further characterize these behavior-related changes, associate them with different neuronal populations and their projection territories, and distinguish them from reward prediction error responses. Recent studies describe dopamine changes when rodents move in open fields, small chambers, levers, nose poke ports, T-mazes, running wheels, and trackballs 6,59-68 , whereas specific motor processes engaging only few muscles are ineffective 69 . The dopamine changes are heterogeneous in terms of timing during test trials, behavioral variable being encoded, and midbrain location. Thus, early in each trial, activity in distinct dopamine neurons varies with different movement parameters like speed and acceleration, whereas at trial end more neurons code mouth movement or reward 68 . While some studies provide fine-grained statistical dissociation 68 , some of the effective behavioral variables, like reward expectation leading to faster movement and movement speed reflecting vigor and motivation, might be intercorrelated; indeed, a common variable underlying these behaviors might be arousal and general behavioral activation. The molecular, cellular, and input heterogeneity of dopamine neuron groups and the differential projection topography between midbrain and striatum [71][72][73] would allow specific dopamine influences on particular postsynaptic targets. Correspondingly, optogenetic dopamine excitation elicits locomotion and biases choice depending on the midbrain region being stimulated, whereas inhibition elicits opposite effects 61,64,65 , suggesting an active behavioral role of the observed dopamine changes (even without knowing the animal's "feeling" when receiving a dopamine shock without accompanying sensory or motor cortex activity). By contrast, some motivation-related changes in striatal dopamine concentration are not associated with dopamine impulse changes in the soma 67 and may derive from local presynaptic influences that have long been recognized 74,75 . (As with other neurotransmitter systems, dopamine function depends on transmitter release and postsynaptic receptors in addition to the temporally precise impulse responses.)

Comparison with reward prediction error coding
The amazing spectrum and heterogeneity of dopamine relationships to behavioral activation contrast with the rather stereotyped reward prediction error response that varies across neurons in only a single scalar parameter 36 . The prediction error response stands out more; it is more phasic and has a higher instantaneous impulse rate and a shorter duration than the changes related to behavioral activation. These differences are particularly evident with the high temporal resolution of neurophysiological impulse responses. Nevertheless, the detection of prediction error responses requires explicit events that allow to identify predictions and to subtract their value from that of the reward. Analyses using reinforcement models help to further identify dopamine prediction error responses in elaborate tasks 64,76 .
How might these seemingly separate modes of dopamine action relate to each other? Despite attempts to derive a common activational role 77 , it is currently unclear how the heterogeneous relationships to behavioral activation might emerge from prediction error coding. One may dissociate the behavioral activation from prediction error coding by their respective spatial and non-spatial specificity 78 or explain the dopamine voltammetry signal during movement and reward expectation by prediction error coding 79-81 , or behavioral activation and reward prediction error might be coded in different dopamine groups. In rodents, movement relationships are more frequent in substantia nigra dopamine neurons and their striatum-projecting regions, whereas reward prediction error coding is abundant in ventral tegmental area neurons and their nucleus accumbens projection 6,62,67,68 . These differences are gradual and do not constitute the strong medio-lateral midbrain or the ventro-dorsal striatum dichotomy seen in regional lesion experiments. Similar graded, rather than strict, differences are seen in monkeys, whose dopamine neurons in substantia nigra signal reward less f requently (<60%) than in ventral tegmental area (>70-80%) 41,82 ; in corresponding striatal projection territories, reward expectation affects 40 to 50% of caudate and anterior putamen neurons and more than 75% of nucleus accumbens neurons 83 .

Multiple dopamine functions
Thus, the notion of one neuronal system having exactly one function may not be valid for dopamine neurons, however hard we try. Maybe such an evolutionarily ancient system, which exists already in fruit flies, has multiple functions that are difficult to capture in a single term. A common denominator for the role of phasic dopamine activity might be to get the animal what it needs to survive, like detecting reward and coding the action for obtaining it (the two key components of motivation), although that sounds awfully superficial given the intricate complexity of the system.

The future
The investigation of dopamine function and the underlying networks are currently in full swing. The past several years have revealed many details that help us get a better understanding of dopamine function, and lots of mysticism has disappeared. We are not dealing with a system with clear-cut and wellparcellated functions, but we know that some of the dopamine functions are crucial for the animal's survival. What we don't know are at least two things.
How does the dopamine reward signal, as the strongest component of dopamine function, get us the best reward and thus help evolutionary fitness? An obvious approach is to study economic decision-making, which has well-developed concepts for maximizing utility. This approach assumes that decision makers identify, process, and deliberate about all available options and have clear preferences, which underlies the first Von Neumann-Morgenstern utility axiom ("completeness"). But there are many exceptions to rational decision-making, and many decisions are not based on identifiable options. We often just do what we do without actively considering the alternatives. What is the role of dopamine neurons in these processes?
As the investigation of dopamine function has revealed a number of important processes, then what are the other "neuromodulatory" systems hiding? Can we get a handle on norepinephrine after its attentional functions have been so well described 84 ? And what about serotonin-would it have several, diverse functions 85,86 but ultimately a coherent denominator? And what about acetylcholine? We have tons of work to do.
Of course, all of these processes may go wrong in brain disorders, which affect more than 20% of the population and present a major human challenge. For that reason, we should invest substantial portions of our wealth into all fields of neuroscience.