Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

26 May 2024 | Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li
This paper proposes a novel framework for efficient LLM grounding in multi-agent collaboration, called Reinforced Advantage (ReAd). The framework introduces Reinforced Advantage feedback (ReAd) to enable efficient self-refinement of plans. The key idea is to use critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. This approach allows the LLM to discern whether an action contributes to accomplishing the final task. The paper provides theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate and significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. The paper also presents two optional LLM-generated plan refinement schemes: Sequential Individual Plan Refinement with the local advantage (ReAd-S) and Joint Plan Refinement with the joint advantage (ReAd-J). The results show that ReAd significantly decreases the interaction and query rounds, and also surpasses baselines in success rate, highlighting its effectiveness for grounding LLMs in embodied multi-agent collaboration tasks. The paper also discusses related works, including task planning with LLMs and grounding LLM with RL. The experiments show that ReAd outperforms baselines in all metrics and achieves more efficient LLM grounding. The paper also discusses ablation studies and shows that plan refinement has a remarkable impact on grounding LLM. The advantage score plays two roles in ReAd: (i) prompting as optimizing for generating actions with the highest score, and (ii) feedback as refinement for re-plan if the score is less than a threshold. The policy refinement makes our method a multi-step process since the action can be refined for multi-rounds. The paper concludes that ReAd is a novel LLM feedback for closed-loop planning in multi-agent collaboration, and the advantage feedback can handle sudden disturbances and is crucial for refinement.This paper proposes a novel framework for efficient LLM grounding in multi-agent collaboration, called Reinforced Advantage (ReAd). The framework introduces Reinforced Advantage feedback (ReAd) to enable efficient self-refinement of plans. The key idea is to use critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. This approach allows the LLM to discern whether an action contributes to accomplishing the final task. The paper provides theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate and significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. The paper also presents two optional LLM-generated plan refinement schemes: Sequential Individual Plan Refinement with the local advantage (ReAd-S) and Joint Plan Refinement with the joint advantage (ReAd-J). The results show that ReAd significantly decreases the interaction and query rounds, and also surpasses baselines in success rate, highlighting its effectiveness for grounding LLMs in embodied multi-agent collaboration tasks. The paper also discusses related works, including task planning with LLMs and grounding LLM with RL. The experiments show that ReAd outperforms baselines in all metrics and achieves more efficient LLM grounding. The paper also discusses ablation studies and shows that plan refinement has a remarkable impact on grounding LLM. The advantage score plays two roles in ReAd: (i) prompting as optimizing for generating actions with the highest score, and (ii) feedback as refinement for re-plan if the score is less than a threshold. The policy refinement makes our method a multi-step process since the action can be refined for multi-rounds. The paper concludes that ReAd is a novel LLM feedback for closed-loop planning in multi-agent collaboration, and the advantage feedback can handle sudden disturbances and is crucial for refinement.
Reach us at info@study.space
[slides] Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration | StudySpace