DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

5 Jun 2024 | Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai
DriVLMe is a video-language model-based autonomous driving agent that enhances communication between humans and autonomous vehicles by integrating embodied and social experiences. The agent is trained using both simulated environments and real human dialogue to improve navigation and dialogue response capabilities. DriVLMe demonstrates strong performance in open-loop benchmarks and closed-loop human studies, but faces challenges such as long inference times, imbalanced training data, limited visual understanding, and difficulties in handling unexpected situations. The agent is evaluated on the SDN benchmark, where it outperforms baselines in dialogue response and navigation tasks. In closed-loop experiments, DriVLMe successfully follows human instructions and replans routes, but struggles with multi-turn interactions and dynamic environmental changes. The study highlights the potential of foundation models in autonomous driving but also identifies key limitations, including the need for improved world modeling, visual understanding, and dialogue generation. Future work aims to address these challenges through enhanced data augmentation, better world modeling, and improved dialogue capabilities. DriVLMe shows promise in enabling effective human-agent communication in autonomous driving, but further research is needed to overcome existing technical and theoretical limitations.DriVLMe is a video-language model-based autonomous driving agent that enhances communication between humans and autonomous vehicles by integrating embodied and social experiences. The agent is trained using both simulated environments and real human dialogue to improve navigation and dialogue response capabilities. DriVLMe demonstrates strong performance in open-loop benchmarks and closed-loop human studies, but faces challenges such as long inference times, imbalanced training data, limited visual understanding, and difficulties in handling unexpected situations. The agent is evaluated on the SDN benchmark, where it outperforms baselines in dialogue response and navigation tasks. In closed-loop experiments, DriVLMe successfully follows human instructions and replans routes, but struggles with multi-turn interactions and dynamic environmental changes. The study highlights the potential of foundation models in autonomous driving but also identifies key limitations, including the need for improved world modeling, visual understanding, and dialogue generation. Future work aims to address these challenges through enhanced data augmentation, better world modeling, and improved dialogue capabilities. DriVLMe shows promise in enabling effective human-agent communication in autonomous driving, but further research is needed to overcome existing technical and theoretical limitations.
Reach us at info@study.space
Understanding DriVLMe%3A Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences