2024-04-23 | Seliem El-Sayed, Canfer Akbulut, Amanda McCroskey, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt and Sasha Brown
This paper presents a mechanism-based approach to mitigating harms from persuasive generative AI. It defines persuasive generative AI as systems that can shape, reinforce, or change users' beliefs, behaviors, or preferences through rational persuasion or manipulation. Rational persuasion involves providing relevant facts, sound reasoning, or trustworthy evidence, while manipulation exploits cognitive biases, heuristics, or misrepresentation of information. The paper maps harms from AI persuasion, including economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harms. It also introduces a map of mechanisms contributing to harmful persuasion, such as trust and rapport, anthropomorphism, personalisation, deception, manipulative strategies, and alteration of choice environments. The paper emphasizes the need to focus on process harms, which arise from the manipulation elements of persuasion, rather than outcome harms. It proposes mitigation strategies, including prompt engineering for manipulation classification and red teaming. The paper highlights the importance of understanding the mechanisms underlying AI persuasion to develop effective mitigation strategies. The key contribution is providing a map of mechanisms of persuasive AI, coupled with mitigation strategies targeting these mechanisms. The paper also discusses the role of contextual conditions in AI persuasion and the need for further research on the mechanisms and model features of generative AI persuasion.This paper presents a mechanism-based approach to mitigating harms from persuasive generative AI. It defines persuasive generative AI as systems that can shape, reinforce, or change users' beliefs, behaviors, or preferences through rational persuasion or manipulation. Rational persuasion involves providing relevant facts, sound reasoning, or trustworthy evidence, while manipulation exploits cognitive biases, heuristics, or misrepresentation of information. The paper maps harms from AI persuasion, including economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harms. It also introduces a map of mechanisms contributing to harmful persuasion, such as trust and rapport, anthropomorphism, personalisation, deception, manipulative strategies, and alteration of choice environments. The paper emphasizes the need to focus on process harms, which arise from the manipulation elements of persuasion, rather than outcome harms. It proposes mitigation strategies, including prompt engineering for manipulation classification and red teaming. The paper highlights the importance of understanding the mechanisms underlying AI persuasion to develop effective mitigation strategies. The key contribution is providing a map of mechanisms of persuasive AI, coupled with mitigation strategies targeting these mechanisms. The paper also discusses the role of contextual conditions in AI persuasion and the need for further research on the mechanisms and model features of generative AI persuasion.