[slides] Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

The paper investigates the security risks of manipulated knowledge in LLM-based multi-agent systems, focusing on the spread of counterfactual and toxic knowledge. The authors construct a detailed threat model and a simulation environment to mirror real-world multi-agent deployments. They propose a two-stage attack method involving *Persuasiveness Injection* and *Manipulated Knowledge Injection* to explore the potential for unconscious spread of manipulated knowledge without explicit prompt manipulation. The method leverages the vulnerabilities of LLMs in handling world knowledge, which can be exploited by attackers to spread fabricated information. Extensive experiments demonstrate that the attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during communication. The results also show that these manipulations can persist through popular retrieval-augmented generation frameworks, where benign agents store and retrieve manipulated chat histories for future interactions. The findings highlight significant security risks in LLM-based multi-agent systems, emphasizing the need for robust defenses such as introducing "guardian" agents and advanced fact-checking tools. The paper concludes with a discussion of the implications and potential solutions to mitigate the spread of manipulated knowledge.The paper investigates the security risks of manipulated knowledge in LLM-based multi-agent systems, focusing on the spread of counterfactual and toxic knowledge. The authors construct a detailed threat model and a simulation environment to mirror real-world multi-agent deployments. They propose a two-stage attack method involving *Persuasiveness Injection* and *Manipulated Knowledge Injection* to explore the potential for unconscious spread of manipulated knowledge without explicit prompt manipulation. The method leverages the vulnerabilities of LLMs in handling world knowledge, which can be exploited by attackers to spread fabricated information. Extensive experiments demonstrate that the attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during communication. The results also show that these manipulations can persist through popular retrieval-augmented generation frameworks, where benign agents store and retrieve manipulated chat histories for future interactions. The findings highlight significant security risks in LLM-based multi-agent systems, emphasizing the need for robust defenses such as introducing "guardian" agents and advanced fact-checking tools. The paper concludes with a discussion of the implications and potential solutions to mitigate the spread of manipulated knowledge.

Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

23 Jul 2024 | Tianjie Ju1, Yiting Wang1, Xinbei Ma1, Pengzhou Cheng1, Haodong Zhao1, Yulong Wang2, Lifeng Liu2, Jian Xie2, Zhuosheng Zhang1*, Gongshen Liu1