The Reasons that Agents Act: Intention and Instrumental Goals

The Reasons that Agents Act: Intention and Instrumental Goals

May 6–10, 2024 | Francis Rhys Ward, Matt MacDermott, Francesco Belardinelli, Francesca Toni, Tom Everitt
The paper presents a formal definition of intention for AI agents, grounded in structural causal influence models (SCIMs), and shows how it relates to past concepts such as actual causality and instrumental goals. Intention is defined as the reasons an agent chooses its actions, with a focus on instrumental goals, which are key in safe AI. The authors introduce three operationalisations of intention: (1) intention to cause an outcome, (2) intention to cause multiple outcomes, and (3) intention in a random setting. These operationalisations are shown to capture the intuitive notion of intent and satisfy desiderata for a definition of algorithmic intent. The paper also shows how the definition relates to past concepts, including actual causality and instrumental control incentives (ICIs), and demonstrates how it can be used to infer the intentions of reinforcement learning agents and language models from their behaviour. The authors argue that their definition of intention is behaviourally testable and allows for the precise characterisation of artificial systems' intentions using intuitively understandable language. The paper also discusses the challenges of assessing the intentions of real-world systems, highlighting the limitations of current approaches and the need for further research.The paper presents a formal definition of intention for AI agents, grounded in structural causal influence models (SCIMs), and shows how it relates to past concepts such as actual causality and instrumental goals. Intention is defined as the reasons an agent chooses its actions, with a focus on instrumental goals, which are key in safe AI. The authors introduce three operationalisations of intention: (1) intention to cause an outcome, (2) intention to cause multiple outcomes, and (3) intention in a random setting. These operationalisations are shown to capture the intuitive notion of intent and satisfy desiderata for a definition of algorithmic intent. The paper also shows how the definition relates to past concepts, including actual causality and instrumental control incentives (ICIs), and demonstrates how it can be used to infer the intentions of reinforcement learning agents and language models from their behaviour. The authors argue that their definition of intention is behaviourally testable and allows for the precise characterisation of artificial systems' intentions using intuitively understandable language. The paper also discusses the challenges of assessing the intentions of real-world systems, highlighting the limitations of current approaches and the need for further research.
Reach us at info@study.space