DrAttack is a novel jailbreaking method that uses prompt decomposition and reconstruction to effectively bypass the security mechanisms of large language models (LLMs). The method involves breaking down a malicious prompt into sub-prompts, then implicitly reconstructing them using in-context learning with benign examples, and finally searching for synonyms of the sub-prompts to enhance the attack's effectiveness. This approach allows DrAttack to achieve a significantly higher success rate compared to existing methods, with a 78.0% success rate on GPT-4 using only 15 queries, surpassing previous state-of-the-art results by 33.1%. The framework is effective across multiple open-source and closed-source LLMs, demonstrating its robustness and efficiency. DrAttack's success is attributed to its ability to obscure malicious intent through decomposition and reconstruction, making it harder for LLMs to detect and reject the attack. The method also includes a synonym search component that further enhances the attack's effectiveness by modifying sub-prompts to generate more effective jailbreaking prompts. The paper presents extensive empirical results showing that DrAttack outperforms existing jailbreaking techniques in terms of both effectiveness and efficiency. The framework is also tested against various defensive strategies, including content moderation, perplexity filtering, and random attack rejection, demonstrating its resilience against these defenses. The study highlights the vulnerabilities of LLMs to jailbreaking attacks and underscores the need for more robust defense mechanisms. The research contributes to the understanding of LLM security and provides a new approach to jailbreaking attacks through prompt decomposition and reconstruction.DrAttack is a novel jailbreaking method that uses prompt decomposition and reconstruction to effectively bypass the security mechanisms of large language models (LLMs). The method involves breaking down a malicious prompt into sub-prompts, then implicitly reconstructing them using in-context learning with benign examples, and finally searching for synonyms of the sub-prompts to enhance the attack's effectiveness. This approach allows DrAttack to achieve a significantly higher success rate compared to existing methods, with a 78.0% success rate on GPT-4 using only 15 queries, surpassing previous state-of-the-art results by 33.1%. The framework is effective across multiple open-source and closed-source LLMs, demonstrating its robustness and efficiency. DrAttack's success is attributed to its ability to obscure malicious intent through decomposition and reconstruction, making it harder for LLMs to detect and reject the attack. The method also includes a synonym search component that further enhances the attack's effectiveness by modifying sub-prompts to generate more effective jailbreaking prompts. The paper presents extensive empirical results showing that DrAttack outperforms existing jailbreaking techniques in terms of both effectiveness and efficiency. The framework is also tested against various defensive strategies, including content moderation, perplexity filtering, and random attack rejection, demonstrating its resilience against these defenses. The study highlights the vulnerabilities of LLMs to jailbreaking attacks and underscores the need for more robust defense mechanisms. The research contributes to the understanding of LLM security and provides a new approach to jailbreaking attacks through prompt decomposition and reconstruction.