[slides and audio] ComposerX%3A Multi-Agent Symbolic Music Composition with LLMs

This paper introduces ComposerX, a multi-agent symbolic music composition framework that leverages the reasoning capabilities of large language models (LLMs) to generate high-quality polyphonic music. Unlike traditional methods that rely on extensive training data and computational resources, ComposerX is training-free, cost-effective, and unified. It utilizes the internal musical capabilities of GPT-4-turbo to generate music with comparable or superior quality to dedicated symbolic music generation systems. The system employs a multi-agent approach, where agents collaborate to compose music, ensuring coherence, adherence to user instructions, and high-quality output. The framework includes a group leader, melody agent, harmony agent, instrument agent, reviewer agent, and arrangement agent, each responsible for specific tasks in the composition process. The agents communicate through a structured pattern, allowing for iterative refinement and feedback. The system is evaluated through both automatic and human listening tests, demonstrating its effectiveness in generating music that is perceived as human-like. Results show that ComposerX outperforms single-agent baselines in terms of music quality and length, and achieves a 32.2% perceived human-like quality. The system is also compared with other text-to-music generation models, showing strong performance. The paper highlights the advantages of ComposerX, including its controllability, training-free nature, and cost-effectiveness, while also identifying limitations in musical expression, translation of instructions into notation, and instrument range compliance. The study contributes to the field of music generation by introducing a novel multi-agent approach that enhances the capabilities of LLMs in music composition.This paper introduces ComposerX, a multi-agent symbolic music composition framework that leverages the reasoning capabilities of large language models (LLMs) to generate high-quality polyphonic music. Unlike traditional methods that rely on extensive training data and computational resources, ComposerX is training-free, cost-effective, and unified. It utilizes the internal musical capabilities of GPT-4-turbo to generate music with comparable or superior quality to dedicated symbolic music generation systems. The system employs a multi-agent approach, where agents collaborate to compose music, ensuring coherence, adherence to user instructions, and high-quality output. The framework includes a group leader, melody agent, harmony agent, instrument agent, reviewer agent, and arrangement agent, each responsible for specific tasks in the composition process. The agents communicate through a structured pattern, allowing for iterative refinement and feedback. The system is evaluated through both automatic and human listening tests, demonstrating its effectiveness in generating music that is perceived as human-like. Results show that ComposerX outperforms single-agent baselines in terms of music quality and length, and achieves a 32.2% perceived human-like quality. The system is also compared with other text-to-music generation models, showing strong performance. The paper highlights the advantages of ComposerX, including its controllability, training-free nature, and cost-effectiveness, while also identifying limitations in musical expression, translation of instructions into notation, and instrument range compliance. The study contributes to the field of music generation by introducing a novel multi-agent approach that enhances the capabilities of LLMs in music composition.

COMPOSERX: MULTI-AGENT SYMBOLIC MUSIC COMPOSITION WITH LLMs

30 Apr 2024 | Qixin Deng, Kqikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo