In-Context Language Learning: Architectures and Algorithms

In-Context Language Learning: Architectures and Algorithms

30 Jan 2024 | Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas
In-context language learning (ICLL) is a critical capability of large-scale neural language models (LMs), enabling them to infer novel functions from input data. This paper investigates ICLL through a new family of model problems, focusing on regular languages generated by random finite automata. The study evaluates various neural sequence models, including Transformers, RNNs, and convolutional models, on ICLL tasks, aiming to answer three key questions: (1) Which model classes are capable of ICLL? (2) What algorithmic solutions do successful models use? (3) How can architectural changes improve ICLL in less performant models? Transformers significantly outperform other models in ICLL tasks, particularly in generating regular languages. Their success is attributed to specialized "n-gram heads" that compute input-conditional next-token distributions. Hard-wiring these heads into models improves performance on both synthetic ICLL and natural language modeling, reducing perplexity by up to 1.14 points on the SlimPajama dataset. The study introduces REGBENCH, a benchmark dataset for ICLL, consisting of problem instances generated from probabilistic finite automata. It evaluates models on tasks such as in-context language learning and associative recall, revealing that Transformers are more data-efficient and perform better than other models. The results show that Transformers' ability to compute in-context n-gram statistics is crucial for ICLL, and that adding n-gram heads to other models can improve their performance. The paper highlights the importance of ICLL as a model problem for understanding ICL in natural text. It suggests that ICLL can be extended to more complex languages, offering insights into real-world language modeling. The findings contribute to a growing body of evidence that some instances of ICL are best understood as LMs simulating smaller models using known parameter estimation and inference algorithms. The study provides a framework for evaluating and improving neural sequence models through the lens of ICLL.In-context language learning (ICLL) is a critical capability of large-scale neural language models (LMs), enabling them to infer novel functions from input data. This paper investigates ICLL through a new family of model problems, focusing on regular languages generated by random finite automata. The study evaluates various neural sequence models, including Transformers, RNNs, and convolutional models, on ICLL tasks, aiming to answer three key questions: (1) Which model classes are capable of ICLL? (2) What algorithmic solutions do successful models use? (3) How can architectural changes improve ICLL in less performant models? Transformers significantly outperform other models in ICLL tasks, particularly in generating regular languages. Their success is attributed to specialized "n-gram heads" that compute input-conditional next-token distributions. Hard-wiring these heads into models improves performance on both synthetic ICLL and natural language modeling, reducing perplexity by up to 1.14 points on the SlimPajama dataset. The study introduces REGBENCH, a benchmark dataset for ICLL, consisting of problem instances generated from probabilistic finite automata. It evaluates models on tasks such as in-context language learning and associative recall, revealing that Transformers are more data-efficient and perform better than other models. The results show that Transformers' ability to compute in-context n-gram statistics is crucial for ICLL, and that adding n-gram heads to other models can improve their performance. The paper highlights the importance of ICLL as a model problem for understanding ICL in natural text. It suggests that ICLL can be extended to more complex languages, offering insights into real-world language modeling. The findings contribute to a growing body of evidence that some instances of ICL are best understood as LMs simulating smaller models using known parameter estimation and inference algorithms. The study provides a framework for evaluating and improving neural sequence models through the lens of ICLL.
Reach us at info@study.space