10 Jun 2024 | Weize Kong, Spurthi Amba Hombhaia, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky
PRewrite is an automated prompt engineering method that uses reinforcement learning (RL) to optimize prompts for downstream tasks. The method involves training a large language model (LLM) to rewrite an initial, under-optimized prompt into a more effective one. The rewriter LLM is trained using RL to maximize performance on a given task. The process involves generating a rewritten prompt, which is then used by a task LLM to produce the final output. A reward is computed based on the final output compared to the ground truth, and the rewriter LLM is fine-tuned using RL.
PRewrite addresses limitations of previous methods by producing interpretable prompts, allowing unconstrained exploration, and leveraging larger models like PaLM 2. It proposes two rewriting strategies: one that uses inference to generate a single prompt and another that searches through multiple prompts generated by the rewriter LLM. The search strategy is shown to be more effective in improving prompt performance.
Experiments on diverse benchmark datasets, including classification, question answering, and arithmetic reasoning tasks, demonstrate the effectiveness of PRewrite. It outperforms several baseline methods, including AutoPrompt, RLPrompt, TEMPERA, APE, OPRO, and Promptbreeder. PRewrite achieves state-of-the-art performance on tasks such as AG News, SST-2, GSM8K, and NQ. The method is particularly effective when there is more room for improvement, and it consistently outperforms the inference strategy.
PRewrite also produces interpretable prompts, unlike methods such as RLPrompt that often generate uninterpretable text. This is due to the use of a capable LLM and a KL-divergence penalty in the RL training process. The method is shown to be effective even when using API-only access to larger models like PaLM 2. However, the method has limitations, including the use of limited initial and meta prompts and the need for further research into exploring multiple prompts for better diversity in prompt rewriting.PRewrite is an automated prompt engineering method that uses reinforcement learning (RL) to optimize prompts for downstream tasks. The method involves training a large language model (LLM) to rewrite an initial, under-optimized prompt into a more effective one. The rewriter LLM is trained using RL to maximize performance on a given task. The process involves generating a rewritten prompt, which is then used by a task LLM to produce the final output. A reward is computed based on the final output compared to the ground truth, and the rewriter LLM is fine-tuned using RL.
PRewrite addresses limitations of previous methods by producing interpretable prompts, allowing unconstrained exploration, and leveraging larger models like PaLM 2. It proposes two rewriting strategies: one that uses inference to generate a single prompt and another that searches through multiple prompts generated by the rewriter LLM. The search strategy is shown to be more effective in improving prompt performance.
Experiments on diverse benchmark datasets, including classification, question answering, and arithmetic reasoning tasks, demonstrate the effectiveness of PRewrite. It outperforms several baseline methods, including AutoPrompt, RLPrompt, TEMPERA, APE, OPRO, and Promptbreeder. PRewrite achieves state-of-the-art performance on tasks such as AG News, SST-2, GSM8K, and NQ. The method is particularly effective when there is more room for improvement, and it consistently outperforms the inference strategy.
PRewrite also produces interpretable prompts, unlike methods such as RLPrompt that often generate uninterpretable text. This is due to the use of a capable LLM and a KL-divergence penalty in the RL training process. The method is shown to be effective even when using API-only access to larger models like PaLM 2. However, the method has limitations, including the use of limited initial and meta prompts and the need for further research into exploring multiple prompts for better diversity in prompt rewriting.