10 Feb 2024 | Jonathan Evertz, Merlin Chlosta, Lea Schönherr, Thorsten Eisenhofer
The paper "Whispers in the Machine: Confidentiality in LLM-integrated Systems" by Jonathan Evertz, Merlin Chlosta, Lea Schönherr, and Thorsten Eisenhofer addresses the growing concern of confidentiality in large language model (LLM)-integrated systems. These systems, which integrate LLMs with external tools and services, can expose confidential data if not properly secured. The authors propose a systematic approach to evaluate confidentiality in such systems by formalizing a "secret key" game that captures the model's ability to conceal private information. They assess eight previously published attacks and four defense strategies, finding that current defenses lack generalization across different attack strategies. To address this, they propose a robustness fine-tuning method inspired by adversarial training, which effectively reduces the success rate of attackers and improves the system's resilience against unknown attacks. The paper also discusses the trade-offs between utility and robustness in LLMs and provides a detailed evaluation of the proposed methods, demonstrating their effectiveness in enhancing the confidentiality of LLMs.The paper "Whispers in the Machine: Confidentiality in LLM-integrated Systems" by Jonathan Evertz, Merlin Chlosta, Lea Schönherr, and Thorsten Eisenhofer addresses the growing concern of confidentiality in large language model (LLM)-integrated systems. These systems, which integrate LLMs with external tools and services, can expose confidential data if not properly secured. The authors propose a systematic approach to evaluate confidentiality in such systems by formalizing a "secret key" game that captures the model's ability to conceal private information. They assess eight previously published attacks and four defense strategies, finding that current defenses lack generalization across different attack strategies. To address this, they propose a robustness fine-tuning method inspired by adversarial training, which effectively reduces the success rate of attackers and improves the system's resilience against unknown attacks. The paper also discusses the trade-offs between utility and robustness in LLMs and provides a detailed evaluation of the proposed methods, demonstrating their effectiveness in enhancing the confidentiality of LLMs.