A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

2024-4-24 | Seliem El-Sayed, Canfer Akbulut, Amanda McCroskery, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt and Sasha Brown
This paper addresses the growing concerns about the persuasive capabilities of generative AI systems and the potential harms they can cause. The authors define and distinguish between rationally persuasive and manipulative generative AI, highlighting the need for a systematic study of AI persuasion. They propose a framework to understand and mitigate these harms, focusing on process harms rather than outcome harms. The paper outlines various mechanisms that contribute to harmful persuasion, including trust and rapport building, anthropomorphism, personalization, deception, and manipulative strategies. It also provides a map of the risks associated with these mechanisms and suggests sociotechnical mitigations such as evaluation and monitoring, prompt engineering, classifiers, reinforcement learning, scalable oversight, and interpretability. The authors emphasize the importance of ongoing research, active participation from civil society, and continuous monitoring to address the evolving nature of persuasive AI.This paper addresses the growing concerns about the persuasive capabilities of generative AI systems and the potential harms they can cause. The authors define and distinguish between rationally persuasive and manipulative generative AI, highlighting the need for a systematic study of AI persuasion. They propose a framework to understand and mitigate these harms, focusing on process harms rather than outcome harms. The paper outlines various mechanisms that contribute to harmful persuasion, including trust and rapport building, anthropomorphism, personalization, deception, and manipulative strategies. It also provides a map of the risks associated with these mechanisms and suggests sociotechnical mitigations such as evaluation and monitoring, prompt engineering, classifiers, reinforcement learning, scalable oversight, and interpretability. The authors emphasize the importance of ongoing research, active participation from civil society, and continuous monitoring to address the evolving nature of persuasive AI.
Reach us at info@study.space
[slides and audio] A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI