Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

24 May 2024 | Emily Cheng, Diego Doimo, Corentin Kervadec, Iuri Macocco, Jade Yu, Alessandro Laio, Marco Baroni
The paper "Emergence of a High-Dimensional Abstraction Phase in Language Transformers" by Emily Cheng explores the geometric properties of language models (LMs) and their impact on linguistic processing. The study focuses on five pre-trained transformer-based LMs and three input datasets, revealing a distinct phase characterized by high intrinsic dimensionality (ID). This phase is marked by: 1. **High Intrinsic Dimensionality**: The representations during this phase correspond to the first full linguistic abstraction of the input. 2. **Transferability**: These representations are the first to effectively transfer to downstream tasks. 3. **Cross-Model Prediction**: The representations from different LMs predict each other across different models. 4. **Performance Correlation**: An earlier onset of this phase strongly predicts better language modeling performance. The authors use geometric tools, particularly the Generalized Ratios Intrinsic Dimension Estimator (GRIDE), to estimate the intrinsic dimension of representations at each layer. They find that the high-dimensional phase emerges in the intermediate layers of the models, indicating that these layers encode complex and abstract linguistic information. This phase is associated with a transition in layer function, from processing superficial to abstract linguistic information, and is crucial for the model's ability to predict the next token. The study also highlights that the high-dimensional phase is a geometric signature of learned structure, and that the representations at this phase are more likely to be transferred to downstream tasks. Additionally, the paper discusses the implications of this phase for the overall performance of the models, suggesting that better models have higher peaks in ID and an earlier onset of this phase. Overall, the findings suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures, providing insights into the geometric and functional aspects of language modeling.The paper "Emergence of a High-Dimensional Abstraction Phase in Language Transformers" by Emily Cheng explores the geometric properties of language models (LMs) and their impact on linguistic processing. The study focuses on five pre-trained transformer-based LMs and three input datasets, revealing a distinct phase characterized by high intrinsic dimensionality (ID). This phase is marked by: 1. **High Intrinsic Dimensionality**: The representations during this phase correspond to the first full linguistic abstraction of the input. 2. **Transferability**: These representations are the first to effectively transfer to downstream tasks. 3. **Cross-Model Prediction**: The representations from different LMs predict each other across different models. 4. **Performance Correlation**: An earlier onset of this phase strongly predicts better language modeling performance. The authors use geometric tools, particularly the Generalized Ratios Intrinsic Dimension Estimator (GRIDE), to estimate the intrinsic dimension of representations at each layer. They find that the high-dimensional phase emerges in the intermediate layers of the models, indicating that these layers encode complex and abstract linguistic information. This phase is associated with a transition in layer function, from processing superficial to abstract linguistic information, and is crucial for the model's ability to predict the next token. The study also highlights that the high-dimensional phase is a geometric signature of learned structure, and that the representations at this phase are more likely to be transferred to downstream tasks. Additionally, the paper discusses the implications of this phase for the overall performance of the models, suggesting that better models have higher peaks in ID and an earlier onset of this phase. Overall, the findings suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures, providing insights into the geometric and functional aspects of language modeling.
Reach us at info@study.space
[slides and audio] Emergence of a High-Dimensional Abstraction Phase in Language Transformers