25 Jul 2024 | Jie Ren, Qipeng Guo, Hang Yan, Dongrui Liu, Quanshi Zhang, Xipeng Qiu, Dahua Lin
This paper investigates the role of semantic induction heads in understanding in-context learning (ICL) in large language models (LLMs). The authors analyze the operations of attention heads in LLMs to better understand how they process and learn from contextual information. They find that certain attention heads encode semantic relationships between tokens, such as syntactic dependencies and semantic relationships in knowledge graphs. These semantic induction heads are crucial for the emergence of ICL ability in LLMs. The study reveals that the formation of semantic induction heads is closely correlated with the development of ICL. The authors categorize ICL into three levels: loss reduction, format compliance, and pattern discovery. They observe that these levels emerge sequentially during the training of LLMs. The results show that semantic induction heads play a key role in facilitating ICL by encoding and leveraging semantic relationships between tokens. The study provides new insights into the mechanisms of in-context learning in LLMs and enhances our understanding of the operations of attention heads in transformers. The findings highlight the importance of understanding the behavior of attention heads in LLMs for improving their learning capabilities.This paper investigates the role of semantic induction heads in understanding in-context learning (ICL) in large language models (LLMs). The authors analyze the operations of attention heads in LLMs to better understand how they process and learn from contextual information. They find that certain attention heads encode semantic relationships between tokens, such as syntactic dependencies and semantic relationships in knowledge graphs. These semantic induction heads are crucial for the emergence of ICL ability in LLMs. The study reveals that the formation of semantic induction heads is closely correlated with the development of ICL. The authors categorize ICL into three levels: loss reduction, format compliance, and pattern discovery. They observe that these levels emerge sequentially during the training of LLMs. The results show that semantic induction heads play a key role in facilitating ICL by encoding and leveraging semantic relationships between tokens. The study provides new insights into the mechanisms of in-context learning in LLMs and enhances our understanding of the operations of attention heads in transformers. The findings highlight the importance of understanding the behavior of attention heads in LLMs for improving their learning capabilities.