25 Jul 2024 | Jie Ren*1,2, Qipeng Guo†2, Hang Yan2,4, Dongrui Liu1, Quanshi Zhang1, Xipeng Qiu3, Dahua Lin2,4
This paper investigates the semantic induction heads in large language models (LLMs) to better understand their in-context learning (ICL) capabilities. The authors analyze the operations of attention heads, focusing on two types of relationships between tokens: syntactic dependencies and semantic relationships within knowledge graphs. They find that certain attention heads exhibit a pattern where they recall tail tokens and increase the output logits of those tail tokens when attending to head tokens. These semantic induction heads are closely correlated with the emergence of ICL in LLMs. The study categorizes ICL into three levels: loss reduction, format compliance, and pattern discovery, and observes their gradual emergence during training. The results show that semantic induction heads play a crucial role in facilitating ICL, particularly in pattern discovery. The paper also discusses the limitations of the study and suggests future directions for research.This paper investigates the semantic induction heads in large language models (LLMs) to better understand their in-context learning (ICL) capabilities. The authors analyze the operations of attention heads, focusing on two types of relationships between tokens: syntactic dependencies and semantic relationships within knowledge graphs. They find that certain attention heads exhibit a pattern where they recall tail tokens and increase the output logits of those tail tokens when attending to head tokens. These semantic induction heads are closely correlated with the emergence of ICL in LLMs. The study categorizes ICL into three levels: loss reduction, format compliance, and pattern discovery, and observes their gradual emergence during training. The results show that semantic induction heads play a crucial role in facilitating ICL, particularly in pattern discovery. The paper also discusses the limitations of the study and suggests future directions for research.