This paper introduces a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC, for complex task-solving using Large Language Models (LLMs). The goal is to enhance the performance of LLM-based agents by integrating human intuition and wisdom, enabling more effective collaboration between humans and agents. ReHAC is designed to dynamically determine the optimal stages for human intervention during task-solving, balancing the effectiveness and efficiency of human-agent collaboration.
The method formulates the human-agent collaboration problem as a Markov Decision Process (MDP), where a policy model is trained to identify the most advantageous moments for human input. A dataset of tasks collaboratively completed by humans and LLM-based agents is used for offline reinforcement learning. The policy model is trained using two popular LLM-based agent frameworks, ReAct and "Try-again," on three multi-step reasoning datasets: HotpotQA, StrategyQA, and InterCode. The results show that ReHAC effectively allocates human intervention in human-agent collaboration scenarios, achieving a balance between effectiveness and efficiency.
Experiments demonstrate that ReHAC outperforms other baselines, including agent-only, human-only, random, prompt-based, and imitation learning methods. The method achieves higher rewards with fewer human interventions, indicating its ability to dynamically introduce human intervention in real human-agent collaboration scenarios. ReHAC also performs well in simulations using GPT-4 to simulate human agents, showing its generalizability across different datasets and scenarios.
The paper discusses the challenges and limitations of LLM-based agents, including the need for human intervention in complex tasks and the importance of safety and alignment in human-agent collaboration. It proposes three extended research directions to enhance the effectiveness, safety, and intelligence of human-agent collaboration: multi-level human-agent collaboration, development stages of LLM-based agents, and safety and super alignment.
The study highlights the importance of integrating human intuition and expertise with the computational power of LLM-based agents, particularly in complex decision-making tasks. ReHAC provides a practical pathway for the application of LLM-based agents in real-world scenarios, demonstrating its potential to improve task performance through effective human-agent collaboration.This paper introduces a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC, for complex task-solving using Large Language Models (LLMs). The goal is to enhance the performance of LLM-based agents by integrating human intuition and wisdom, enabling more effective collaboration between humans and agents. ReHAC is designed to dynamically determine the optimal stages for human intervention during task-solving, balancing the effectiveness and efficiency of human-agent collaboration.
The method formulates the human-agent collaboration problem as a Markov Decision Process (MDP), where a policy model is trained to identify the most advantageous moments for human input. A dataset of tasks collaboratively completed by humans and LLM-based agents is used for offline reinforcement learning. The policy model is trained using two popular LLM-based agent frameworks, ReAct and "Try-again," on three multi-step reasoning datasets: HotpotQA, StrategyQA, and InterCode. The results show that ReHAC effectively allocates human intervention in human-agent collaboration scenarios, achieving a balance between effectiveness and efficiency.
Experiments demonstrate that ReHAC outperforms other baselines, including agent-only, human-only, random, prompt-based, and imitation learning methods. The method achieves higher rewards with fewer human interventions, indicating its ability to dynamically introduce human intervention in real human-agent collaboration scenarios. ReHAC also performs well in simulations using GPT-4 to simulate human agents, showing its generalizability across different datasets and scenarios.
The paper discusses the challenges and limitations of LLM-based agents, including the need for human intervention in complex tasks and the importance of safety and alignment in human-agent collaboration. It proposes three extended research directions to enhance the effectiveness, safety, and intelligence of human-agent collaboration: multi-level human-agent collaboration, development stages of LLM-based agents, and safety and super alignment.
The study highlights the importance of integrating human intuition and expertise with the computational power of LLM-based agents, particularly in complex decision-making tasks. ReHAC provides a practical pathway for the application of LLM-based agents in real-world scenarios, demonstrating its potential to improve task performance through effective human-agent collaboration.