20 Jun 2024 | Can Jin, Hongwu Peng, Shiyu Zhao, Zhtenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas
APEER is a novel automatic prompt engineering algorithm designed to enhance large language model (LLM) reranking in information retrieval (IR). The algorithm iteratively refines prompts through feedback and preference optimization, significantly improving performance compared to existing manual prompts. APEER reduces human effort in prompt design and enhances the effectiveness of prompt optimization in reranking tasks. It is evaluated on four LLMs and ten datasets, demonstrating substantial performance improvements, particularly in transferability across diverse tasks and models. The algorithm's effectiveness is supported by extensive experiments showing that APEER outperforms manual prompts and other methods in various benchmarks, including TREC and BEIR. APEER's ability to generate prompts that are effective across different models and datasets highlights its practical utility in real-world applications. The study also explores the impact of training dataset size and the effectiveness of preference optimization in enhancing prompt quality. Overall, APEER provides a robust solution for improving LLM reranking performance with minimal human intervention.APEER is a novel automatic prompt engineering algorithm designed to enhance large language model (LLM) reranking in information retrieval (IR). The algorithm iteratively refines prompts through feedback and preference optimization, significantly improving performance compared to existing manual prompts. APEER reduces human effort in prompt design and enhances the effectiveness of prompt optimization in reranking tasks. It is evaluated on four LLMs and ten datasets, demonstrating substantial performance improvements, particularly in transferability across diverse tasks and models. The algorithm's effectiveness is supported by extensive experiments showing that APEER outperforms manual prompts and other methods in various benchmarks, including TREC and BEIR. APEER's ability to generate prompts that are effective across different models and datasets highlights its practical utility in real-world applications. The study also explores the impact of training dataset size and the effectiveness of preference optimization in enhancing prompt quality. Overall, APEER provides a robust solution for improving LLM reranking performance with minimal human intervention.