3 Jun 2024 | Kiho Park¹, Yo Joong Choe², Yibo Jiang³, and Victor Veitch¹²
This paper investigates how categorical and hierarchical concepts are represented in the embedding spaces of large language models (LLMs). The authors propose that categorical concepts are represented as simplices, while hierarchically related concepts are orthogonal in the representation space. They validate these findings using the Gemma large language model and data from WordNet. The study extends the linear representation hypothesis, showing that semantic hierarchy is encoded as orthogonality between representations. Categorical concepts are represented as polytopes constructed from direct sums of simplices, reflecting their hierarchical structure. The results demonstrate that the geometric structure of the representation space aligns with the semantic hierarchy of WordNet. The paper also explores the implications of these findings for understanding and interpreting LLMs, suggesting that hierarchical semantics should be respected in interpretability methods. The work contributes to the broader understanding of how semantic meaning is encoded in the representation spaces of LLMs.This paper investigates how categorical and hierarchical concepts are represented in the embedding spaces of large language models (LLMs). The authors propose that categorical concepts are represented as simplices, while hierarchically related concepts are orthogonal in the representation space. They validate these findings using the Gemma large language model and data from WordNet. The study extends the linear representation hypothesis, showing that semantic hierarchy is encoded as orthogonality between representations. Categorical concepts are represented as polytopes constructed from direct sums of simplices, reflecting their hierarchical structure. The results demonstrate that the geometric structure of the representation space aligns with the semantic hierarchy of WordNet. The paper also explores the implications of these findings for understanding and interpreting LLMs, suggesting that hierarchical semantics should be respected in interpretability methods. The work contributes to the broader understanding of how semantic meaning is encoded in the representation spaces of LLMs.