[slides and audio] Many-Shot In-Context Learning

The paper explores the performance of large language models (LLMs) in many-shot in-context learning (ICL), where the model is provided with hundreds or thousands of input-output examples. The authors observe significant performance gains across various tasks, including generative and discriminative tasks, when transitioning from few-shot to many-shot ICL. To address the limitation of requiring high-quality human-generated outputs, they introduce two approaches: Reinforced ICL, which uses model-generated chain-of-thought rationales, and Unsupervised ICL, which prompts the model with domain-specific inputs without rationales. Both methods show effectiveness in the many-shot regime, particularly on complex reasoning tasks. The study also reveals that many-shot ICL can overcome pretraining biases, perform comparably to fine-tuning, and learn high-dimensional functions with numerical inputs. Additionally, the paper discusses the limitations of using next-token prediction loss as an indicator of ICL performance and highlights the importance of context length in scaling ICL. The key contributions include systematic evaluation of ICL performance at different scales, the introduction of Reinforced and Unsupervised ICL, and the analysis of ICL dynamics from few-shot to many-shot regimes.The paper explores the performance of large language models (LLMs) in many-shot in-context learning (ICL), where the model is provided with hundreds or thousands of input-output examples. The authors observe significant performance gains across various tasks, including generative and discriminative tasks, when transitioning from few-shot to many-shot ICL. To address the limitation of requiring high-quality human-generated outputs, they introduce two approaches: Reinforced ICL, which uses model-generated chain-of-thought rationales, and Unsupervised ICL, which prompts the model with domain-specific inputs without rationales. Both methods show effectiveness in the many-shot regime, particularly on complex reasoning tasks. The study also reveals that many-shot ICL can overcome pretraining biases, perform comparably to fine-tuning, and learn high-dimensional functions with numerical inputs. Additionally, the paper discusses the limitations of using next-token prediction loss as an indicator of ICL performance and highlights the importance of context length in scaling ICL. The key contributions include systematic evaluation of ICL performance at different scales, the introduction of Reinforced and Unsupervised ICL, and the analysis of ICL dynamics from few-shot to many-shot regimes.

Many-Shot In-Context Learning

2024-5-24 | Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust and Hugo Larochelle