APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

20 Jun 2024 | Can Jin, Hongwu Peng, Shiyu Zhao, Zhenhui Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthavar Rajasekaran, Dimitris N. Metaxas
The paper introduces APEER (Automatic Prompt Engineering Enhances Large Language Model Reranking), a novel automatic prompt engineering algorithm designed to enhance the performance of large language models (LLMs) in information retrieval (IR) tasks, particularly in relevance ranking. APEER aims to reduce the human effort required for prompt engineering and improve the optimization of prompts through feedback and preference optimization. The algorithm iteratively generates refined prompts by gathering feedback on the current prompt and learning preferences from positive and negative prompt demonstrations. Extensive experiments using four LLMs (GPT4, LLaMA3, and Qwen2) and ten datasets (TREC and BEIR) demonstrate that APEER significantly outperforms existing manual prompts and baseline methods in terms of reranking performance. Additionally, APEER prompts exhibit better transferability across diverse datasets and architectures, highlighting their practical utility in real-world applications. The paper also discusses the limitations of the work and emphasizes the importance of ethical considerations in the use of LLMs.The paper introduces APEER (Automatic Prompt Engineering Enhances Large Language Model Reranking), a novel automatic prompt engineering algorithm designed to enhance the performance of large language models (LLMs) in information retrieval (IR) tasks, particularly in relevance ranking. APEER aims to reduce the human effort required for prompt engineering and improve the optimization of prompts through feedback and preference optimization. The algorithm iteratively generates refined prompts by gathering feedback on the current prompt and learning preferences from positive and negative prompt demonstrations. Extensive experiments using four LLMs (GPT4, LLaMA3, and Qwen2) and ten datasets (TREC and BEIR) demonstrate that APEER significantly outperforms existing manual prompts and baseline methods in terms of reranking performance. Additionally, APEER prompts exhibit better transferability across diverse datasets and architectures, highlighting their practical utility in real-world applications. The paper also discusses the limitations of the work and emphasizes the importance of ethical considerations in the use of LLMs.
Reach us at info@study.space
[slides and audio] APEER%3A Automatic Prompt Engineering Enhances Large Language Model Reranking