Dopamine-independent effect of rewards on choices through hidden-state inference

Dopamine-independent effect of rewards on choices through hidden-state inference

February 2024 | Marta Blanco-Pozo, Thomas Akam & Mark E. Walton
Dopamine is involved in adaptive behavior through reward prediction error (RPE) signals that update value estimates. However, it is unclear how these RPE-based accounts of reward-guided decision-making should be integrated with evidence that animals in structured environments use inference processes to facilitate behavioral flexibility. Using a two-step task for mice, the study shows that dopamine reports RPEs using value information inferred from task structure knowledge, alongside information about reward rate and movement. Despite rewards strongly influencing choices and dopamine activity, neither activating nor inhibiting dopamine neurons at trial outcome affected future choice. These data were recapitulated by a neural network model where cortex learned to track hidden task states by predicting observations, while basal ganglia learned values and actions via RPEs. This shows that the influence of rewards on choices can stem from dopamine-independent information they convey about the world's state, not the dopaminergic RPEs they produce. Adaptive behavior requires learning which actions lead to desired outcomes and updating this knowledge when the world changes. Reinforcement learning (RL) has provided an influential account of how this works in the brain, with RPEs updating estimates of the values of states and/or actions, in turn driving choices. In support of this framework, dopamine activity resembles RPEs in many behaviors, and causal manipulations can reinforce or suppress behaviors consistent with dopamine acting functionally as an RPE. However, value learning is not the only way we adapt to changes in the environment. For example, we behave differently on weekdays and weekends, but this is clearly not because we relearn the value of going to work versus spending time with family each Saturday morning. Rather, although the world looks the same when we wake up, we understand that the state of the world is in fact different, and this calls for different behavior. Formally, the decision problem we face is partially observable—our current sensory observations only partially constrain the true state of the world. In such environments, it is typically possible to estimate the current state better using the history of observations than using just the current sensory input. It is increasingly clear that this ability to infer hidden (that is, not directly observable) states of the world plays an important role even in simple laboratory reward-guided decision-making. For example, in probabilistic reversal learning tasks where reward probabilities of two options are anticorrelated, both behavior and brain activity indicate that subjects understand this statistical relationship. This is not predicted by standard RL models in which RPEs update the value of preceding actions, but is predicted by models which assume subjects understand there is a hidden state that controls both reward probabilities. Intriguingly, brain recordings have shown that not only prefrontal cortex (PFC) but also the dopamine system can reflect knowledge of such hidden states. Integrating these two accounts of behavioral flexibility raises several pressing questions. If state inference, not RL, mediates flexible reward-guided behavior, why does dopamine look and act like anDopamine is involved in adaptive behavior through reward prediction error (RPE) signals that update value estimates. However, it is unclear how these RPE-based accounts of reward-guided decision-making should be integrated with evidence that animals in structured environments use inference processes to facilitate behavioral flexibility. Using a two-step task for mice, the study shows that dopamine reports RPEs using value information inferred from task structure knowledge, alongside information about reward rate and movement. Despite rewards strongly influencing choices and dopamine activity, neither activating nor inhibiting dopamine neurons at trial outcome affected future choice. These data were recapitulated by a neural network model where cortex learned to track hidden task states by predicting observations, while basal ganglia learned values and actions via RPEs. This shows that the influence of rewards on choices can stem from dopamine-independent information they convey about the world's state, not the dopaminergic RPEs they produce. Adaptive behavior requires learning which actions lead to desired outcomes and updating this knowledge when the world changes. Reinforcement learning (RL) has provided an influential account of how this works in the brain, with RPEs updating estimates of the values of states and/or actions, in turn driving choices. In support of this framework, dopamine activity resembles RPEs in many behaviors, and causal manipulations can reinforce or suppress behaviors consistent with dopamine acting functionally as an RPE. However, value learning is not the only way we adapt to changes in the environment. For example, we behave differently on weekdays and weekends, but this is clearly not because we relearn the value of going to work versus spending time with family each Saturday morning. Rather, although the world looks the same when we wake up, we understand that the state of the world is in fact different, and this calls for different behavior. Formally, the decision problem we face is partially observable—our current sensory observations only partially constrain the true state of the world. In such environments, it is typically possible to estimate the current state better using the history of observations than using just the current sensory input. It is increasingly clear that this ability to infer hidden (that is, not directly observable) states of the world plays an important role even in simple laboratory reward-guided decision-making. For example, in probabilistic reversal learning tasks where reward probabilities of two options are anticorrelated, both behavior and brain activity indicate that subjects understand this statistical relationship. This is not predicted by standard RL models in which RPEs update the value of preceding actions, but is predicted by models which assume subjects understand there is a hidden state that controls both reward probabilities. Intriguingly, brain recordings have shown that not only prefrontal cortex (PFC) but also the dopamine system can reflect knowledge of such hidden states. Integrating these two accounts of behavioral flexibility raises several pressing questions. If state inference, not RL, mediates flexible reward-guided behavior, why does dopamine look and act like an
Reach us at info@futurestudyspace.com