10 Apr 2024 | Aaditya K. Singh, Ted Moskovitz, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe
This paper investigates the emergence and dynamics of induction heads (IHs) in transformer models, focusing on in-context learning (ICL). IHs, which perform a match-and-copy operation, are critical for ICL and often emerge during a phase change in the loss function. The study uses a controlled setting with synthetic data to develop an optogenetics-inspired causal framework for modifying activations throughout training. This framework allows for the identification of three underlying subcircuits that interact to drive IH formation, leading to the phase change. The research reveals that IHs operate additively and exhibit emergent redundancy, with multiple IHs contributing to minimizing the loss. The study also identifies a many-to-many wiring pattern between previous token heads and IHs, indicating redundant operations. By clamping subsets of activations, the authors demonstrate that the phase change in IH formation is driven by the interaction of these subcircuits. Additionally, the data-dependent properties of IH formation, such as the timing of the phase change, are explained by the individual dynamics of these subcircuits. The findings provide insights into the mechanisms underlying ICL and offer a more detailed understanding of the training dynamics of transformer models.This paper investigates the emergence and dynamics of induction heads (IHs) in transformer models, focusing on in-context learning (ICL). IHs, which perform a match-and-copy operation, are critical for ICL and often emerge during a phase change in the loss function. The study uses a controlled setting with synthetic data to develop an optogenetics-inspired causal framework for modifying activations throughout training. This framework allows for the identification of three underlying subcircuits that interact to drive IH formation, leading to the phase change. The research reveals that IHs operate additively and exhibit emergent redundancy, with multiple IHs contributing to minimizing the loss. The study also identifies a many-to-many wiring pattern between previous token heads and IHs, indicating redundant operations. By clamping subsets of activations, the authors demonstrate that the phase change in IH formation is driven by the interaction of these subcircuits. Additionally, the data-dependent properties of IH formation, such as the timing of the phase change, are explained by the individual dynamics of these subcircuits. The findings provide insights into the mechanisms underlying ICL and offer a more detailed understanding of the training dynamics of transformer models.