[slides and audio] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

The paper introduces a new algorithm called Probe Sampling, which aims to accelerate the Greedy Coordinate Gradient (GCG) method for optimizing adversarial prompts to break aligned Large Language Models (LLMs). GCG is effective but time-consuming due to the need to compute losses for many candidate prompts using the full target model. Probe Sampling reduces this computational burden by using a smaller draft model to filter out unpromising candidates early in the optimization process. The core of the algorithm is a mechanism that dynamically determines how similar the predictions of the draft model are to those of the target model for prompt candidates. This allows for significant speedups, achieving up to 5.6 times faster computation using Llama2-7b-chat while maintaining or improving attack success rates (ASR) on the AdvBench dataset. Additionally, Probe Sampling is shown to accelerate other prompt optimization techniques and adversarial methods, such as AutoPrompt, APE, and AutoDAN, with speedups of 1.8×, 2.4×, and 2.4×, respectively. The paper also discusses the implementation details, experimental results, and limitations of the proposed method.The paper introduces a new algorithm called Probe Sampling, which aims to accelerate the Greedy Coordinate Gradient (GCG) method for optimizing adversarial prompts to break aligned Large Language Models (LLMs). GCG is effective but time-consuming due to the need to compute losses for many candidate prompts using the full target model. Probe Sampling reduces this computational burden by using a smaller draft model to filter out unpromising candidates early in the optimization process. The core of the algorithm is a mechanism that dynamically determines how similar the predictions of the draft model are to those of the target model for prompt candidates. This allows for significant speedups, achieving up to 5.6 times faster computation using Llama2-7b-chat while maintaining or improving attack success rates (ASR) on the AdvBench dataset. Additionally, Probe Sampling is shown to accelerate other prompt optimization techniques and adversarial methods, such as AutoPrompt, APE, and AutoDAN, with speedups of 1.8×, 2.4×, and 2.4×, respectively. The paper also discusses the implementation details, experimental results, and limitations of the proposed method.

Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

27 May 2024 | Yiran Zhao1*, Wenyue Zheng1, Tianle Cai2, Xuan Long Do1, Kenji Kawaguchi1, Anirudh Goyal3, Michael Shieh1*

27 May 2024 | Yiran Zhao1, Wenyue Zheng1, Tianle Cai2, Xuan Long Do1, Kenji Kawaguchi1, Anirudh Goyal3, Michael Shieh1