[slides] VLM-Social-Nav%3A Socially Aware Robot Navigation Through Scoring Using Vision-Language Models

**VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models** **Abstract:** VLM-Social-Nav is a novel approach that uses Vision-Language Models (VLMs) to enable robots to navigate in human-centered environments while adhering to social norms. The system aims to make real-time decisions on robot actions that are socially compliant with human expectations. It leverages a perception model to detect social entities and prompts a VLM to generate guidance for socially compliant behavior. A VLM-based scoring module computes a social cost term to ensure appropriate and effective robot actions. The approach reduces reliance on large training datasets and enhances adaptability in decision-making, resulting in improved social compliance in human-shared environments. Evaluations in four real-world scenarios with a Turtlebot robot show at least 27.38% improvement in average success rate and 19.05% improvement in average collision rate compared to other methods. **Keywords:** Motion and Path Planning, Task and Motion Planning, Integrated Planning and Control **Introduction:** The paper addresses the challenge of social navigation, focusing on robots' ability to navigate while adhering to social etiquette and contextual appropriateness. It integrates VLMs with motion planners and a perception model to improve efficiency. The VLM-based scoring module translates robot observations and textual instructions into a social cost term, which is used by the motion planner to output appropriate actions. The approach is evaluated in four indoor scenarios, showing superior social compliance compared to existing methods. **Main Results:** - VLM-Social-Nav integrates VLMs with motion planners and a perception model. - A VLM-based scoring module translates observations and instructions into a social cost term. - Evaluations in four indoor scenarios show significant improvements in success rate and collision rate. - User studies confirm VLM-Social-Nav's superior social compliance. **Related Work:** - Safety requirements and contextual appropriateness in social navigation. - Large Foundation Models (LFMs) for navigation. - Previous work on social navigation using VLMs and LLMs. **Approach:** - Problem definition: Social navigation as a Markov Decision Process (MDP). - VLM-based scoring module: Inference of socially compatible navigation behavior. - Real-time perception model: Detection of social entities to reduce VLM queries. - Algorithm overview: Integration of perception, VLM, and motion planning. **Experiments:** - Implementation details: Hardware and software setup. - Qualitative results: Comparison with other methods in four scenarios. - Quantitative results: Success rate, collision rate, and user study scores. - Discussion: Real-time navigation, socially aware navigation, and future directions. **Conclusion:** VLM-Social-Nav is a novel approach that uses VLMs to enable socially compliant robot navigation in human-centered environments. It reduces reliance on large datasets and enhances adaptability, resulting in improved social compliance in real-world scenarios**VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models** **Abstract:** VLM-Social-Nav is a novel approach that uses Vision-Language Models (VLMs) to enable robots to navigate in human-centered environments while adhering to social norms. The system aims to make real-time decisions on robot actions that are socially compliant with human expectations. It leverages a perception model to detect social entities and prompts a VLM to generate guidance for socially compliant behavior. A VLM-based scoring module computes a social cost term to ensure appropriate and effective robot actions. The approach reduces reliance on large training datasets and enhances adaptability in decision-making, resulting in improved social compliance in human-shared environments. Evaluations in four real-world scenarios with a Turtlebot robot show at least 27.38% improvement in average success rate and 19.05% improvement in average collision rate compared to other methods. **Keywords:** Motion and Path Planning, Task and Motion Planning, Integrated Planning and Control **Introduction:** The paper addresses the challenge of social navigation, focusing on robots' ability to navigate while adhering to social etiquette and contextual appropriateness. It integrates VLMs with motion planners and a perception model to improve efficiency. The VLM-based scoring module translates robot observations and textual instructions into a social cost term, which is used by the motion planner to output appropriate actions. The approach is evaluated in four indoor scenarios, showing superior social compliance compared to existing methods. **Main Results:** - VLM-Social-Nav integrates VLMs with motion planners and a perception model. - A VLM-based scoring module translates observations and instructions into a social cost term. - Evaluations in four indoor scenarios show significant improvements in success rate and collision rate. - User studies confirm VLM-Social-Nav's superior social compliance. **Related Work:** - Safety requirements and contextual appropriateness in social navigation. - Large Foundation Models (LFMs) for navigation. - Previous work on social navigation using VLMs and LLMs. **Approach:** - Problem definition: Social navigation as a Markov Decision Process (MDP). - VLM-based scoring module: Inference of socially compatible navigation behavior. - Real-time perception model: Detection of social entities to reduce VLM queries. - Algorithm overview: Integration of perception, VLM, and motion planning. **Experiments:** - Implementation details: Hardware and software setup. - Qualitative results: Comparison with other methods in four scenarios. - Quantitative results: Success rate, collision rate, and user study scores. - Discussion: Real-time navigation, socially aware navigation, and future directions. **Conclusion:** VLM-Social-Nav is a novel approach that uses VLMs to enable socially compliant robot navigation in human-centered environments. It reduces reliance on large datasets and enhances adaptability, resulting in improved social compliance in real-world scenarios

VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

7 Jul 2024 | Daeun Song1, Jing Liang1, Amirreza Payandeh2, Xuesu Xiao2, and Dinesh Manocha1