On the Origins of Linear Representations in Large Language Models

On the Origins of Linear Representations in Large Language Models

6 Mar 2024 | Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam, and Victor Veitch
This paper investigates the origins of linear representations in large language models (LLMs). The authors propose a latent variable model to formalize the concept dynamics in next token prediction. They show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent promote linear representations of concepts. Experiments confirm that linear representations emerge when learning from data matching the latent variable model, and the results are validated using the LLaMA-2 model. The paper introduces a latent conditional model where context sentences and next tokens share a latent space. Binary random variables represent underlying concepts, and the model shows that these concepts are linearly represented in the learned space. The results are supported by two main findings: (1) log-odds matching leads to linear structure, and (2) gradient descent's implicit bias promotes linear structure even when log-odds matching fails. The authors also show that the latent conditional model can produce orthogonal representations, as seen in LLaMA-2. This is attributed to the structure of the model and the implicit bias of gradient descent. Experiments on simulated data and LLaMA-2 confirm that linear and orthogonal representations emerge, validating the model's predictions. The paper concludes that the latent conditional model provides a theoretical framework for understanding linear representations in LLMs, and that gradient descent's implicit bias plays a key role in promoting linear structures. The findings suggest that the model's simple structure can yield generalizable insights into LLM behavior.This paper investigates the origins of linear representations in large language models (LLMs). The authors propose a latent variable model to formalize the concept dynamics in next token prediction. They show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent promote linear representations of concepts. Experiments confirm that linear representations emerge when learning from data matching the latent variable model, and the results are validated using the LLaMA-2 model. The paper introduces a latent conditional model where context sentences and next tokens share a latent space. Binary random variables represent underlying concepts, and the model shows that these concepts are linearly represented in the learned space. The results are supported by two main findings: (1) log-odds matching leads to linear structure, and (2) gradient descent's implicit bias promotes linear structure even when log-odds matching fails. The authors also show that the latent conditional model can produce orthogonal representations, as seen in LLaMA-2. This is attributed to the structure of the model and the implicit bias of gradient descent. Experiments on simulated data and LLaMA-2 confirm that linear and orthogonal representations emerge, validating the model's predictions. The paper concludes that the latent conditional model provides a theoretical framework for understanding linear representations in LLMs, and that gradient descent's implicit bias plays a key role in promoting linear structures. The findings suggest that the model's simple structure can yield generalizable insights into LLM behavior.
Reach us at info@study.space
[slides] On the Origins of Linear Representations in Large Language Models | StudySpace