2 May 2024 | Dario Pasquini, Martin Strohmeier, Carmela Troncoso
The paper introduces a new family of prompt injection attacks called Neural Exec, which differ from traditional handcrafted triggers by being generated through a differentiable search problem using learning-based methods. The authors demonstrate that motivated adversaries can create more effective and flexible execution triggers that bypass existing blacklist-based detection methods. These triggers can persist through multi-stage preprocessing pipelines, such as those used in Retrieval-Augmented Generation (RAG) applications. The optimization-based approach allows for the imposition of arbitrary biases, leading to triggers with unseen properties and functionalities. The paper also introduces the concept of robustness to pre-processing for indirect prompt injection attacks, showing how adversaries can design triggers to be resilient against common RAG-based pipelines. The evaluation results indicate that the generated Neural Execs are significantly more effective than existing handcrafted triggers, achieving an improvement of 200% to 500% in effectiveness. The triggers are also shown to be robust against RAG pipelines, making them effective against real-world applications that rely on RAG.The paper introduces a new family of prompt injection attacks called Neural Exec, which differ from traditional handcrafted triggers by being generated through a differentiable search problem using learning-based methods. The authors demonstrate that motivated adversaries can create more effective and flexible execution triggers that bypass existing blacklist-based detection methods. These triggers can persist through multi-stage preprocessing pipelines, such as those used in Retrieval-Augmented Generation (RAG) applications. The optimization-based approach allows for the imposition of arbitrary biases, leading to triggers with unseen properties and functionalities. The paper also introduces the concept of robustness to pre-processing for indirect prompt injection attacks, showing how adversaries can design triggers to be resilient against common RAG-based pipelines. The evaluation results indicate that the generated Neural Execs are significantly more effective than existing handcrafted triggers, achieving an improvement of 200% to 500% in effectiveness. The triggers are also shown to be robust against RAG pipelines, making them effective against real-world applications that rely on RAG.