Understanding Do Large Language Models Have Compositional Ability%3F An Investigation into Limitations and Scalability

This paper investigates the compositional ability of large language models (LLMs) in solving complex tasks that combine multiple simple tasks. Compositional ability is crucial for Artificial General Intelligence (AGI) as it enables models to integrate and apply learned skills to novel, composite challenges. The study focuses on in-context learning (ICL), where models are presented with examples of simple tasks and then asked to solve composite tasks not encountered during pretraining. The authors develop a suite of composite tasks, including linguistic and logical challenges, and evaluate their performance across different LLM families. They observe that: 1. For simpler composite tasks that apply distinct mapping mechanisms to different input segments, models exhibit decent compositional ability, and this ability improves with larger model sizes. 2. For more complex composite tasks involving sequential reasoning, models typically underperform, and scaling up generally does not provide improvements. The paper provides theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. The findings suggest that the nature of the task and the scale of the model significantly influence the models' compositional performance. The authors conclude that their work sheds new light on the capabilities of LLMs in solving composite tasks and highlights the importance of understanding the underlying mechanisms. The contributions of the paper are twofold: 1. Empirically, the authors introduce a variety of composite tasks to explore how the nature of these tasks influences the compositional performance of LLMs through ICL. 2. Theoretically, they provide analysis on a simple yet insightful model: a one-layer single-head linear self-attention network, demonstrating that larger models with a more complex internal structure exhibit enhanced performance on individual tasks, thereby improving their overall compositional capabilities on separable composite tasks. The paper also discusses related work, including the scaling law of LLMs and previous studies on compositional generalization. It concludes with a discussion on the impact of the findings and their potential implications for future research and algorithm design.This paper investigates the compositional ability of large language models (LLMs) in solving complex tasks that combine multiple simple tasks. Compositional ability is crucial for Artificial General Intelligence (AGI) as it enables models to integrate and apply learned skills to novel, composite challenges. The study focuses on in-context learning (ICL), where models are presented with examples of simple tasks and then asked to solve composite tasks not encountered during pretraining. The authors develop a suite of composite tasks, including linguistic and logical challenges, and evaluate their performance across different LLM families. They observe that: 1. For simpler composite tasks that apply distinct mapping mechanisms to different input segments, models exhibit decent compositional ability, and this ability improves with larger model sizes. 2. For more complex composite tasks involving sequential reasoning, models typically underperform, and scaling up generally does not provide improvements. The paper provides theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. The findings suggest that the nature of the task and the scale of the model significantly influence the models' compositional performance. The authors conclude that their work sheds new light on the capabilities of LLMs in solving composite tasks and highlights the importance of understanding the underlying mechanisms. The contributions of the paper are twofold: 1. Empirically, the authors introduce a variety of composite tasks to explore how the nature of these tasks influences the compositional performance of LLMs through ICL. 2. Theoretically, they provide analysis on a simple yet insightful model: a one-layer single-head linear self-attention network, demonstrating that larger models with a more complex internal structure exhibit enhanced performance on individual tasks, thereby improving their overall compositional capabilities on separable composite tasks. The paper also discusses related work, including the scaling law of LLMs and previous studies on compositional generalization. It concludes with a discussion on the impact of the findings and their potential implications for future research and algorithm design.

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

11 Aug 2024 | Zhuoyan Xu, Zhenmei Shi, Yingyu Liang