24 May 2024 | Emily Cheng, Diego Doimo, Corentin Kervadec, Iuri Macocco, Jade Yu, Alessandro Laio, Marco Baroni
This paper investigates the emergence of a high-dimensional abstraction phase in language transformers, revealing how representations evolve across layers. The study analyzes five pre-trained transformer-based language models (LMs) and three input datasets, identifying a distinct phase characterized by high intrinsic dimensionality (ID). During this phase, representations correspond to the first full linguistic abstraction of the input, are the first to transfer to downstream tasks, and predict each other across different LMs. The onset of this phase strongly predicts better language modeling performance, suggesting that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.
The research explores how the intrinsic dimension of representations changes across layers, revealing a profile that generalizes across models and inputs. The ID peak is significantly reduced in presence of random text and nonexistent in untrained models. The layer at which the ID peak appears correlates with LM quality. The highest-dimensional representations of different networks predict each other, but neither the initial representation of the input nor representations in later layers. The ID peak marks an approximate borderline between representations that perform poorly and fairly in syntactic and semantic probing tasks, as well as in transfer to downstream NLP tasks.
The study also finds that the ID peak is associated with a transition to abstract linguistic processing. The ID peak layers contain less surface-form information and excel at semantic and syntactic tasks. The ID peak marks a transition to syntactic and semantic processing, with representations at this peak being more abstract and capable of predicting downstream tasks. The results suggest that higher ID peaks and earlier onset of the ID peak are correlated with better language modeling performance.
The paper also explores the relationship between ID and the ability of LMs to transfer to downstream tasks, finding that higher ID peaks and earlier onset of the ID peak are associated with better performance. The study highlights the importance of the high-dimensional processing phase for good model performance, as it enables the model to process abstract information and transfer this information to downstream tasks. The findings suggest that the high-dimensional processing phase is a key factor in the performance of language models.This paper investigates the emergence of a high-dimensional abstraction phase in language transformers, revealing how representations evolve across layers. The study analyzes five pre-trained transformer-based language models (LMs) and three input datasets, identifying a distinct phase characterized by high intrinsic dimensionality (ID). During this phase, representations correspond to the first full linguistic abstraction of the input, are the first to transfer to downstream tasks, and predict each other across different LMs. The onset of this phase strongly predicts better language modeling performance, suggesting that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.
The research explores how the intrinsic dimension of representations changes across layers, revealing a profile that generalizes across models and inputs. The ID peak is significantly reduced in presence of random text and nonexistent in untrained models. The layer at which the ID peak appears correlates with LM quality. The highest-dimensional representations of different networks predict each other, but neither the initial representation of the input nor representations in later layers. The ID peak marks an approximate borderline between representations that perform poorly and fairly in syntactic and semantic probing tasks, as well as in transfer to downstream NLP tasks.
The study also finds that the ID peak is associated with a transition to abstract linguistic processing. The ID peak layers contain less surface-form information and excel at semantic and syntactic tasks. The ID peak marks a transition to syntactic and semantic processing, with representations at this peak being more abstract and capable of predicting downstream tasks. The results suggest that higher ID peaks and earlier onset of the ID peak are correlated with better language modeling performance.
The paper also explores the relationship between ID and the ability of LMs to transfer to downstream tasks, finding that higher ID peaks and earlier onset of the ID peak are associated with better performance. The study highlights the importance of the high-dimensional processing phase for good model performance, as it enables the model to process abstract information and transfer this information to downstream tasks. The findings suggest that the high-dimensional processing phase is a key factor in the performance of language models.