Linguistic Collapse: Neural Collapse in (Large) Language Models

Linguistic Collapse: Neural Collapse in (Large) Language Models

28 May 2024 | Robert Wu, Vardan Papyan
Neural collapse (NC) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular, and aligned with classifiers. These properties are associated with generalization and robustness and typically manifest under specific conditions: few classes, balanced classes, noise-free labels, and training past zero loss. This paper investigates NC in language modeling, where these conditions are not met. Language models, including large language models (LLMs), are trained on imbalanced vocabularies and often undertrained. The study empirically examines how scaling the architectures and training of causal language models (CLMs) affects their progression towards NC. Results show that NC properties are linked to generalization, and some relationships between NC and generalization persist independent of scale. The findings suggest that NC is a general phenomenon that extends to language modeling, offering insights into LLMs and neural networks. The study highlights the importance of NC in understanding and improving LLMs, and encourages further research on this phenomenon.Neural collapse (NC) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular, and aligned with classifiers. These properties are associated with generalization and robustness and typically manifest under specific conditions: few classes, balanced classes, noise-free labels, and training past zero loss. This paper investigates NC in language modeling, where these conditions are not met. Language models, including large language models (LLMs), are trained on imbalanced vocabularies and often undertrained. The study empirically examines how scaling the architectures and training of causal language models (CLMs) affects their progression towards NC. Results show that NC properties are linked to generalization, and some relationships between NC and generalization persist independent of scale. The findings suggest that NC is a general phenomenon that extends to language modeling, offering insights into LLMs and neural networks. The study highlights the importance of NC in understanding and improving LLMs, and encourages further research on this phenomenon.
Reach us at info@study.space