Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

2024 | Zhuoyan Xu, Zhenmei Shi, Yingyu Liang
This paper investigates the compositional ability of large language models (LLMs) in solving composite tasks, which involve combining two or more simple tasks. The study explores how LLMs perform on various composite tasks, particularly those not encountered during pretraining. The research finds that LLMs exhibit divergent behaviors: for simpler composite tasks that apply distinct mapping mechanisms to different input segments, models demonstrate decent compositional ability, which improves with larger model sizes. However, for more complex composite tasks requiring sequential reasoning, models typically underperform, and scaling up the model size generally provides no improvements. Theoretical analysis suggests that models exhibit compositional capability when the task handles different input parts separately. The study introduces a test suite of composite tasks, including linguistic and logical challenges, and evaluates them across different LLM families. The results show that models fail to compose knowledge from simple tasks, even though they have the representation power to solve composite tasks. The study also provides theoretical insights into the conditions needed for success in separable composite tasks. The findings highlight the importance of task characteristics and model scale in determining LLMs' compositional abilities. The research contributes to the understanding of how LLMs handle composite tasks and provides a theoretical framework for analyzing their performance. The dataset and code are available at https://github.com/OliverXUZY/LLM_Compose.This paper investigates the compositional ability of large language models (LLMs) in solving composite tasks, which involve combining two or more simple tasks. The study explores how LLMs perform on various composite tasks, particularly those not encountered during pretraining. The research finds that LLMs exhibit divergent behaviors: for simpler composite tasks that apply distinct mapping mechanisms to different input segments, models demonstrate decent compositional ability, which improves with larger model sizes. However, for more complex composite tasks requiring sequential reasoning, models typically underperform, and scaling up the model size generally provides no improvements. Theoretical analysis suggests that models exhibit compositional capability when the task handles different input parts separately. The study introduces a test suite of composite tasks, including linguistic and logical challenges, and evaluates them across different LLM families. The results show that models fail to compose knowledge from simple tasks, even though they have the representation power to solve composite tasks. The study also provides theoretical insights into the conditions needed for success in separable composite tasks. The findings highlight the importance of task characteristics and model scale in determining LLMs' compositional abilities. The research contributes to the understanding of how LLMs handle composite tasks and provides a theoretical framework for analyzing their performance. The dataset and code are available at https://github.com/OliverXUZY/LLM_Compose.
Reach us at info@study.space
[slides] Do Large Language Models Have Compositional Ability%3F An Investigation into Limitations and Scalability | StudySpace