Dual Operating Modes of In-Context Learning

Dual Operating Modes of In-Context Learning

2024 | Ziqian Lin, Kangwook Lee
This paper explores the dual operating modes of in-context learning (ICL) in large language models (LLMs), which are task learning and task retrieval. The authors propose a new probabilistic model for pretraining data, assuming a latent clustered structure with multiple task groups and task-dependent input distributions. This model helps to quantitatively understand how ICL operates in both modes. The analysis reveals that the ICL risk initially increases and then decreases with more in-context examples, a phenomenon known as the "early ascent." This is explained by the model's ability to initially retrieve an incorrect skill, which is later corrected as more in-context samples are provided. The paper also predicts the bounded efficacy of ICL with biased labels, where the performance improves initially but eventually degrades as the model learns from more in-context examples. Extensive experiments with Transformers and LLMs validate these findings. The work provides a deeper understanding of ICL and sets the stage for further research in this area.This paper explores the dual operating modes of in-context learning (ICL) in large language models (LLMs), which are task learning and task retrieval. The authors propose a new probabilistic model for pretraining data, assuming a latent clustered structure with multiple task groups and task-dependent input distributions. This model helps to quantitatively understand how ICL operates in both modes. The analysis reveals that the ICL risk initially increases and then decreases with more in-context examples, a phenomenon known as the "early ascent." This is explained by the model's ability to initially retrieve an incorrect skill, which is later corrected as more in-context samples are provided. The paper also predicts the bounded efficacy of ICL with biased labels, where the performance improves initially but eventually degrades as the model learns from more in-context examples. Extensive experiments with Transformers and LLMs validate these findings. The work provides a deeper understanding of ICL and sets the stage for further research in this area.
Reach us at info@study.space