Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

23 Jul 2024 | Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu
This paper investigates the security risks of manipulated knowledge spread in LLM-based multi-agent systems. The authors propose a two-stage attack method involving Persuasiveness Injection and Manipulated Knowledge Injection to explore how manipulated knowledge can be spread without explicit prompt manipulation. The first stage biases agents to generate persuasive evidence, while the second stage alters their perception of specific knowledge. The attack leverages the inherent vulnerabilities of LLMs in handling world knowledge, allowing attackers to spread fabricated information without degrading the agents' foundational capabilities. The simulation environment mirrors a realistic deployment of multi-agent systems on a trusted platform, with agents assigned specific roles and attributes to ensure diverse and authentic interactions. The environment prohibits direct prompt manipulation, making it impossible to explicitly spread manipulated knowledge. The authors demonstrate that their attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during agent communication. The results show that manipulated knowledge can persist through retrieval-augmented generation frameworks, where benign agents store and retrieve manipulated chat histories for future interactions. This persistence indicates that even after the interaction has ended, the benign agents may continue to be influenced by manipulated knowledge. The findings reveal significant security risks in LLM-based multi-agent systems, emphasizing the need for robust defenses against manipulated knowledge spread, such as introducing “guardian” agents and advanced fact-checking tools. The experiments were conducted on three representative open-source LLMs (Vicuna, LLaMA 3, and Gemma) to investigate the feasibility of manipulated knowledge spread in LLM-based agent communities. The results demonstrate the effectiveness of the attack method, with significant implications for the integrity and reliability of knowledge shared among agents. The study also extends to the spread of toxic knowledge, which is specifically crafted to provoke or exacerbate conflict. The results show that toxic knowledge can spread with a considerable accuracy, highlighting the potential for significant disruption in multi-agent communities. The authors conclude that their two-stage attack method is effective in spreading manipulated knowledge, with minimal impact on the foundational capabilities of the agents. This highlights the potential risks posed by such attack methods in real-world multi-agent deployments. The study provides a detailed threat model and simulation environment to systematically model the threat scenario, emphasizing the need for robust defenses against manipulated knowledge spread.This paper investigates the security risks of manipulated knowledge spread in LLM-based multi-agent systems. The authors propose a two-stage attack method involving Persuasiveness Injection and Manipulated Knowledge Injection to explore how manipulated knowledge can be spread without explicit prompt manipulation. The first stage biases agents to generate persuasive evidence, while the second stage alters their perception of specific knowledge. The attack leverages the inherent vulnerabilities of LLMs in handling world knowledge, allowing attackers to spread fabricated information without degrading the agents' foundational capabilities. The simulation environment mirrors a realistic deployment of multi-agent systems on a trusted platform, with agents assigned specific roles and attributes to ensure diverse and authentic interactions. The environment prohibits direct prompt manipulation, making it impossible to explicitly spread manipulated knowledge. The authors demonstrate that their attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during agent communication. The results show that manipulated knowledge can persist through retrieval-augmented generation frameworks, where benign agents store and retrieve manipulated chat histories for future interactions. This persistence indicates that even after the interaction has ended, the benign agents may continue to be influenced by manipulated knowledge. The findings reveal significant security risks in LLM-based multi-agent systems, emphasizing the need for robust defenses against manipulated knowledge spread, such as introducing “guardian” agents and advanced fact-checking tools. The experiments were conducted on three representative open-source LLMs (Vicuna, LLaMA 3, and Gemma) to investigate the feasibility of manipulated knowledge spread in LLM-based agent communities. The results demonstrate the effectiveness of the attack method, with significant implications for the integrity and reliability of knowledge shared among agents. The study also extends to the spread of toxic knowledge, which is specifically crafted to provoke or exacerbate conflict. The results show that toxic knowledge can spread with a considerable accuracy, highlighting the potential for significant disruption in multi-agent communities. The authors conclude that their two-stage attack method is effective in spreading manipulated knowledge, with minimal impact on the foundational capabilities of the agents. This highlights the potential risks posed by such attack methods in real-world multi-agent deployments. The study provides a detailed threat model and simulation environment to systematically model the threat scenario, emphasizing the need for robust defenses against manipulated knowledge spread.
Reach us at info@study.space