Many-Shot In-Context Learning

Many-Shot In-Context Learning

2024-5-24 | Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust and Hugo Larochelle
Many-Shot In-Context Learning (ICL) involves using a large number of input-output examples (shots) in the context to train large language models (LLMs) for tasks without updating model weights. This approach, enabled by expanded context windows, allows for significant performance improvements across various tasks, including problem-solving, question-answering, summarization, planning, and sentiment analysis. Unlike few-shot ICL, many-shot ICL can override pre-training biases, learn high-dimensional functions with numerical inputs, and perform comparably to fine-tuning. However, it is limited by the availability of human-generated rationales. To address this, two approaches are explored: Reinforced ICL, which uses model-generated rationales, and Unsupervised ICL, which uses only problem inputs. Both methods are effective in the many-shot regime, particularly for complex reasoning tasks. The paper evaluates many-shot ICL on a wide range of tasks, including machine translation, summarization, planning, and reward modeling. Results show that many-shot ICL outperforms few-shot ICL, especially on tasks requiring complex reasoning. For example, in machine translation, many-shot ICL achieves state-of-the-art results on low-resource languages like Bemba and Kurdish. In summarization, many-shot ICL improves performance on XLSum but not on XSum. In planning, many-shot ICL significantly improves success rates on the Logistics domain. In reward modeling, many-shot ICL outperforms human-generated solutions in verifying code correctness. The paper also investigates the effectiveness of many-shot ICL in overcoming pre-training biases and learning non-NLP tasks. Results show that many-shot ICL can overcome pre-training biases and perform comparably to fine-tuning on high-dimensional prediction tasks. However, the order of examples in the prompt can significantly affect performance, indicating that many-shot ICL is sensitive to example ordering. Additionally, the paper finds that next-token prediction loss is not a reliable indicator of ICL performance on problem-solving and reasoning tasks. The study concludes that many-shot ICL is a promising approach for improving LLM capabilities, enabling them to adapt to unseen tasks and domains. However, further research is needed to fully understand the limitations and potential of many-shot ICL.Many-Shot In-Context Learning (ICL) involves using a large number of input-output examples (shots) in the context to train large language models (LLMs) for tasks without updating model weights. This approach, enabled by expanded context windows, allows for significant performance improvements across various tasks, including problem-solving, question-answering, summarization, planning, and sentiment analysis. Unlike few-shot ICL, many-shot ICL can override pre-training biases, learn high-dimensional functions with numerical inputs, and perform comparably to fine-tuning. However, it is limited by the availability of human-generated rationales. To address this, two approaches are explored: Reinforced ICL, which uses model-generated rationales, and Unsupervised ICL, which uses only problem inputs. Both methods are effective in the many-shot regime, particularly for complex reasoning tasks. The paper evaluates many-shot ICL on a wide range of tasks, including machine translation, summarization, planning, and reward modeling. Results show that many-shot ICL outperforms few-shot ICL, especially on tasks requiring complex reasoning. For example, in machine translation, many-shot ICL achieves state-of-the-art results on low-resource languages like Bemba and Kurdish. In summarization, many-shot ICL improves performance on XLSum but not on XSum. In planning, many-shot ICL significantly improves success rates on the Logistics domain. In reward modeling, many-shot ICL outperforms human-generated solutions in verifying code correctness. The paper also investigates the effectiveness of many-shot ICL in overcoming pre-training biases and learning non-NLP tasks. Results show that many-shot ICL can overcome pre-training biases and perform comparably to fine-tuning on high-dimensional prediction tasks. However, the order of examples in the prompt can significantly affect performance, indicating that many-shot ICL is sensitive to example ordering. Additionally, the paper finds that next-token prediction loss is not a reliable indicator of ICL performance on problem-solving and reasoning tasks. The study concludes that many-shot ICL is a promising approach for improving LLM capabilities, enabling them to adapt to unseen tasks and domains. However, further research is needed to fully understand the limitations and potential of many-shot ICL.
Reach us at info@study.space