18 Jun 2024 | Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou
AVATAR is a novel framework that optimizes large language model (LLM) agents to effectively use external tools and improve performance on complex tasks. The framework includes an actor LLM and a comparator LLM. The comparator generates holistic prompts by comparing positive and negative examples, enabling the actor to learn effective tool usage and task strategies. AVATAR is tested on four complex multimodal retrieval datasets, achieving significant improvements in performance metrics such as Hit@1 and Recall@20. The framework demonstrates strong generalization ability, with an average relative improvement of 14% on Hit@1. AVATAR's comparator module uses contrastive learning to generate robust prompts, enhancing the actor's ability to solve multi-step problems. The framework also includes a memory bank to store past instructions and improve generalization. AVATAR outperforms existing agent methods in terms of task performance and generalization ability. The framework is evaluated on tasks involving textual and relational retrieval, image retrieval, and other complex tasks, demonstrating its effectiveness in real-world scenarios. AVATAR's key contributions include the introduction of a comparator module that automatically generates holistic prompts, the demonstration of AVATAR's superior performance on complex tasks, and the comprehensive analysis of the actor's evolution during optimization. The framework is designed to automatically generate instructions for effective tool usage and task performance, making it a promising approach for complex problem-solving.AVATAR is a novel framework that optimizes large language model (LLM) agents to effectively use external tools and improve performance on complex tasks. The framework includes an actor LLM and a comparator LLM. The comparator generates holistic prompts by comparing positive and negative examples, enabling the actor to learn effective tool usage and task strategies. AVATAR is tested on four complex multimodal retrieval datasets, achieving significant improvements in performance metrics such as Hit@1 and Recall@20. The framework demonstrates strong generalization ability, with an average relative improvement of 14% on Hit@1. AVATAR's comparator module uses contrastive learning to generate robust prompts, enhancing the actor's ability to solve multi-step problems. The framework also includes a memory bank to store past instructions and improve generalization. AVATAR outperforms existing agent methods in terms of task performance and generalization ability. The framework is evaluated on tasks involving textual and relational retrieval, image retrieval, and other complex tasks, demonstrating its effectiveness in real-world scenarios. AVATAR's key contributions include the introduction of a comparator module that automatically generates holistic prompts, the demonstration of AVATAR's superior performance on complex tasks, and the comprehensive analysis of the actor's evolution during optimization. The framework is designed to automatically generate instructions for effective tool usage and task performance, making it a promising approach for complex problem-solving.