Misconfidence-based Demonstration Selection for LLM In-Context Learning

Misconfidence-based Demonstration Selection for LLM In-Context Learning

12 Jan 2024 | Shangqing Xu and Chao Zhang
The article introduces a novel method called In-Context Reflection (ICR) for selecting effective demonstrations in large language model (LLM) in-context learning. The goal of ICR is to improve the performance of LLMs by strategically selecting demonstrations that reduce the discrepancy between the LLM's output distribution and the actual input-output mappings of the task. Unlike existing methods that rely on external supervision or frequent interactions with LLMs, ICR uses a metric called misconfidence to identify the most informative demonstrations. Misconfidence measures how confidently the LLM misclassifies a given input, and ICR selects demonstrations with high misconfidence to refine the prompt and improve the LLM's performance. ICR works by starting with a random set of initial demonstrations and iteratively refining them. In each iteration, it analyzes a pool of candidate examples and identifies those that are most likely to challenge the LLM's current understanding, based on the misconfidence metric. These most confusing examples are then selected to replace the less informative demonstrations in the current set. This process continues until the prompt is optimized. The method was evaluated across five diverse datasets encompassing 13 subtasks, and it achieved an average performance boost of 4% compared to existing methods. The results show that ICR is effective in improving the LLM's performance across various tasks and demonstrates strong cross-task generalization capabilities. Additionally, ICR was tested on different tasks from the same task family, and it performed comparably to uniform sampling, indicating its robustness. The key contributions of the paper include proposing the use of misconfidence as a metric to quantify the discrepancy between the LLM's output distribution and the task's input-output mappings, introducing ICR as a method for selecting demonstrations that provide "lacking knowledge" to help LLMs adapt to specific tasks, and demonstrating through experiments on 13 tasks from 5 task sets that prompts constructed using ICR are both effective and robust. The study also highlights the importance of carefully selecting demonstrations in in-context learning and shows that ICR is a computationally efficient and effective approach to this challenge.The article introduces a novel method called In-Context Reflection (ICR) for selecting effective demonstrations in large language model (LLM) in-context learning. The goal of ICR is to improve the performance of LLMs by strategically selecting demonstrations that reduce the discrepancy between the LLM's output distribution and the actual input-output mappings of the task. Unlike existing methods that rely on external supervision or frequent interactions with LLMs, ICR uses a metric called misconfidence to identify the most informative demonstrations. Misconfidence measures how confidently the LLM misclassifies a given input, and ICR selects demonstrations with high misconfidence to refine the prompt and improve the LLM's performance. ICR works by starting with a random set of initial demonstrations and iteratively refining them. In each iteration, it analyzes a pool of candidate examples and identifies those that are most likely to challenge the LLM's current understanding, based on the misconfidence metric. These most confusing examples are then selected to replace the less informative demonstrations in the current set. This process continues until the prompt is optimized. The method was evaluated across five diverse datasets encompassing 13 subtasks, and it achieved an average performance boost of 4% compared to existing methods. The results show that ICR is effective in improving the LLM's performance across various tasks and demonstrates strong cross-task generalization capabilities. Additionally, ICR was tested on different tasks from the same task family, and it performed comparably to uniform sampling, indicating its robustness. The key contributions of the paper include proposing the use of misconfidence as a metric to quantify the discrepancy between the LLM's output distribution and the task's input-output mappings, introducing ICR as a method for selecting demonstrations that provide "lacking knowledge" to help LLMs adapt to specific tasks, and demonstrating through experiments on 13 tasks from 5 task sets that prompts constructed using ICR are both effective and robust. The study also highlights the importance of carefully selecting demonstrations in in-context learning and shows that ICR is a computationally efficient and effective approach to this challenge.
Reach us at info@study.space