VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

7 Jul 2024 | Daeun Song, Jing Liang, Amirreza Payandeh, Xuesu Xiao, and Dinesh Manocha
VLM-Social-Nav is a novel approach that uses Vision-Language Models (VLMs) to enable robots to navigate in human-centered environments while adhering to social norms. The method integrates a VLM with a motion planner and a perception model to generate socially compliant navigation behavior. The VLM-based scoring module computes a cost term that ensures socially appropriate and effective robot actions. This approach reduces reliance on large training datasets and enhances adaptability in decision-making. The system was tested in four real-world scenarios with a Turtlebot robot, achieving at least 27.38% improvement in average success rate and 19.05% improvement in average collision rate. The user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior. The system uses a VLM to interpret contextual information from robot observations to guide navigation. It leverages a VLM to analyze and reason about current social interactions and generate an immediate preferred robot action. The VLM-based scoring module computes a social cost term used by the motion planner to output appropriate robot actions. The system also uses a real-time perception model to detect social entities and query the VLM only when necessary, ensuring efficient real-time navigation. The results demonstrate that VLM-Social-Nav outperforms other methods in social compliance and navigation efficiency. The approach is effective in various social navigation scenarios, including frontal approach, frontal approach with gesture, intersection, and narrow doorway. The system is designed to handle complex environments and future work includes extending it to outdoor navigation and multiple individuals scenarios. The method shows promise in enabling socially aware robot navigation by leveraging the contextual understanding of VLMs.VLM-Social-Nav is a novel approach that uses Vision-Language Models (VLMs) to enable robots to navigate in human-centered environments while adhering to social norms. The method integrates a VLM with a motion planner and a perception model to generate socially compliant navigation behavior. The VLM-based scoring module computes a cost term that ensures socially appropriate and effective robot actions. This approach reduces reliance on large training datasets and enhances adaptability in decision-making. The system was tested in four real-world scenarios with a Turtlebot robot, achieving at least 27.38% improvement in average success rate and 19.05% improvement in average collision rate. The user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior. The system uses a VLM to interpret contextual information from robot observations to guide navigation. It leverages a VLM to analyze and reason about current social interactions and generate an immediate preferred robot action. The VLM-based scoring module computes a social cost term used by the motion planner to output appropriate robot actions. The system also uses a real-time perception model to detect social entities and query the VLM only when necessary, ensuring efficient real-time navigation. The results demonstrate that VLM-Social-Nav outperforms other methods in social compliance and navigation efficiency. The approach is effective in various social navigation scenarios, including frontal approach, frontal approach with gesture, intersection, and narrow doorway. The system is designed to handle complex environments and future work includes extending it to outdoor navigation and multiple individuals scenarios. The method shows promise in enabling socially aware robot navigation by leveraging the contextual understanding of VLMs.
Reach us at info@study.space