20 Feb 2024 | Marta Skreta*, Zihan Zhou*, Jia Lin Yuan*, Kourosh Darvish, Alán Aspuru-Guzik, and Animesh Garg
**REPLAN: Robotic Replanning with Perception and Language Models**
**Authors:** Marta Skreta, Zihan Zhou, Jia Lin Yuan, Kouros Darvish, Alán Aspuru-Guzik, and Animesh Garg
**Institution:** University of Toronto, Vector Institute for Artificial Intelligence, Georgia Institute of Technology, NVIDIA
**Abstract:**
This paper introduces REPLAN, a novel framework for robotic replanning that integrates large language models (LLMs) and vision-language models (VLMs) to enable online replanning for long-horizon tasks. REPLAN generates high-level plans and low-level reward functions, and uses perception to diagnose and address issues during task execution. The framework is evaluated on a Reasoning and Control (RC) benchmark with eight long-horizon tasks, demonstrating that REPLAN can successfully adapt to unforeseen obstacles and achieve open-ended goals, where baseline models often fail. The authors also present a detailed analysis of the framework's components and their contributions to task completion rates.
**Key Contributions:**
1. **REPLAN Framework:** An end-to-end framework for multi-level replanning with verification, incorporating high-level planning and low-level reward generation.
2. **Vision-Language Signals:** Utilizes vision-language signals for grounding and feedback correction, enhancing the framework's adaptability.
3. **Task Adaptation:** Demonstrates the ability to solve ambiguous tasks and adapt plans online as challenges arise without human intervention.
4. **Benchmark Evaluation:** Presents a Reasoning and Control (RC) benchmark with eight tasks, showing that REPLAN achieves significantly higher success rates compared to baselines.
**Related Work:**
The paper reviews existing approaches in long-horizon robot planning, including rule-based methods, learning-based techniques, and the integration of LLMs and VLMs. It highlights the limitations of current methods, such as the need for extensive domain knowledge, intricate reward engineering, and the inability to handle complex, open-ended tasks.
**Experimental Evaluation:**
The authors evaluate REPLAN using a Franka Emika Panda robot and MuJoCo MPC for physics simulation and real-time predictive control. They compare REPLAN with other state-of-the-art methods, including Language to Rewards, and demonstrate the importance of each module in the pipeline through ablation studies. The results show that REPLAN achieves a 4× improvement in task completion rates compared to the baseline method.
**Real-Robot Environment:**
The paper also presents real-world experiments, adapting one of the tasks to involve a lemon and an apple, and discusses limitations, such as the reliance on VLMs for spatial state interpretation and communication failures between LLMs and VLMs.
**Conclusion:**
REPLAN is a robust solution for multi-stage planning, leveraging LLMs for plan generation and VLMs for insightful feedback. The multi-level planning approach, coupled with step-wise verification and replanning, demonstrates promising results in addressing complex, open-ended tasks**REPLAN: Robotic Replanning with Perception and Language Models**
**Authors:** Marta Skreta, Zihan Zhou, Jia Lin Yuan, Kouros Darvish, Alán Aspuru-Guzik, and Animesh Garg
**Institution:** University of Toronto, Vector Institute for Artificial Intelligence, Georgia Institute of Technology, NVIDIA
**Abstract:**
This paper introduces REPLAN, a novel framework for robotic replanning that integrates large language models (LLMs) and vision-language models (VLMs) to enable online replanning for long-horizon tasks. REPLAN generates high-level plans and low-level reward functions, and uses perception to diagnose and address issues during task execution. The framework is evaluated on a Reasoning and Control (RC) benchmark with eight long-horizon tasks, demonstrating that REPLAN can successfully adapt to unforeseen obstacles and achieve open-ended goals, where baseline models often fail. The authors also present a detailed analysis of the framework's components and their contributions to task completion rates.
**Key Contributions:**
1. **REPLAN Framework:** An end-to-end framework for multi-level replanning with verification, incorporating high-level planning and low-level reward generation.
2. **Vision-Language Signals:** Utilizes vision-language signals for grounding and feedback correction, enhancing the framework's adaptability.
3. **Task Adaptation:** Demonstrates the ability to solve ambiguous tasks and adapt plans online as challenges arise without human intervention.
4. **Benchmark Evaluation:** Presents a Reasoning and Control (RC) benchmark with eight tasks, showing that REPLAN achieves significantly higher success rates compared to baselines.
**Related Work:**
The paper reviews existing approaches in long-horizon robot planning, including rule-based methods, learning-based techniques, and the integration of LLMs and VLMs. It highlights the limitations of current methods, such as the need for extensive domain knowledge, intricate reward engineering, and the inability to handle complex, open-ended tasks.
**Experimental Evaluation:**
The authors evaluate REPLAN using a Franka Emika Panda robot and MuJoCo MPC for physics simulation and real-time predictive control. They compare REPLAN with other state-of-the-art methods, including Language to Rewards, and demonstrate the importance of each module in the pipeline through ablation studies. The results show that REPLAN achieves a 4× improvement in task completion rates compared to the baseline method.
**Real-Robot Environment:**
The paper also presents real-world experiments, adapting one of the tasks to involve a lemon and an apple, and discusses limitations, such as the reliance on VLMs for spatial state interpretation and communication failures between LLMs and VLMs.
**Conclusion:**
REPLAN is a robust solution for multi-stage planning, leveraging LLMs for plan generation and VLMs for insightful feedback. The multi-level planning approach, coupled with step-wise verification and replanning, demonstrates promising results in addressing complex, open-ended tasks