**VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models**
**Abstract:**
VLM-Social-Nav is a novel approach that uses Vision-Language Models (VLMs) to enable robots to navigate in human-centered environments while adhering to social norms. The system aims to make real-time decisions on robot actions that are socially compliant with human expectations. It leverages a perception model to detect social entities and prompts a VLM to generate guidance for socially compliant behavior. A VLM-based scoring module computes a social cost term to ensure appropriate and effective robot actions. The approach reduces reliance on large training datasets and enhances adaptability in decision-making, resulting in improved social compliance in human-shared environments. Evaluations in four real-world scenarios with a Turtlebot robot show at least 27.38% improvement in average success rate and 19.05% improvement in average collision rate compared to other methods.
**Keywords:**
Motion and Path Planning, Task and Motion Planning, Integrated Planning and Control
**Introduction:**
The paper addresses the challenge of social navigation, focusing on robots' ability to navigate while adhering to social etiquette and contextual appropriateness. It integrates VLMs with motion planners and a perception model to improve efficiency. The VLM-based scoring module translates robot observations and textual instructions into a social cost term, which is used by the motion planner to output appropriate actions. The approach is evaluated in four indoor scenarios, showing superior social compliance compared to existing methods.
**Main Results:**
- VLM-Social-Nav integrates VLMs with motion planners and a perception model.
- A VLM-based scoring module translates observations and instructions into a social cost term.
- Evaluations in four indoor scenarios show significant improvements in success rate and collision rate.
- User studies confirm VLM-Social-Nav's superior social compliance.
**Related Work:**
- Safety requirements and contextual appropriateness in social navigation.
- Large Foundation Models (LFMs) for navigation.
- Previous work on social navigation using VLMs and LLMs.
**Approach:**
- Problem definition: Social navigation as a Markov Decision Process (MDP).
- VLM-based scoring module: Inference of socially compatible navigation behavior.
- Real-time perception model: Detection of social entities to reduce VLM queries.
- Algorithm overview: Integration of perception, VLM, and motion planning.
**Experiments:**
- Implementation details: Hardware and software setup.
- Qualitative results: Comparison with other methods in four scenarios.
- Quantitative results: Success rate, collision rate, and user study scores.
- Discussion: Real-time navigation, socially aware navigation, and future directions.
**Conclusion:**
VLM-Social-Nav is a novel approach that uses VLMs to enable socially compliant robot navigation in human-centered environments. It reduces reliance on large datasets and enhances adaptability, resulting in improved social compliance in real-world scenarios**VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models**
**Abstract:**
VLM-Social-Nav is a novel approach that uses Vision-Language Models (VLMs) to enable robots to navigate in human-centered environments while adhering to social norms. The system aims to make real-time decisions on robot actions that are socially compliant with human expectations. It leverages a perception model to detect social entities and prompts a VLM to generate guidance for socially compliant behavior. A VLM-based scoring module computes a social cost term to ensure appropriate and effective robot actions. The approach reduces reliance on large training datasets and enhances adaptability in decision-making, resulting in improved social compliance in human-shared environments. Evaluations in four real-world scenarios with a Turtlebot robot show at least 27.38% improvement in average success rate and 19.05% improvement in average collision rate compared to other methods.
**Keywords:**
Motion and Path Planning, Task and Motion Planning, Integrated Planning and Control
**Introduction:**
The paper addresses the challenge of social navigation, focusing on robots' ability to navigate while adhering to social etiquette and contextual appropriateness. It integrates VLMs with motion planners and a perception model to improve efficiency. The VLM-based scoring module translates robot observations and textual instructions into a social cost term, which is used by the motion planner to output appropriate actions. The approach is evaluated in four indoor scenarios, showing superior social compliance compared to existing methods.
**Main Results:**
- VLM-Social-Nav integrates VLMs with motion planners and a perception model.
- A VLM-based scoring module translates observations and instructions into a social cost term.
- Evaluations in four indoor scenarios show significant improvements in success rate and collision rate.
- User studies confirm VLM-Social-Nav's superior social compliance.
**Related Work:**
- Safety requirements and contextual appropriateness in social navigation.
- Large Foundation Models (LFMs) for navigation.
- Previous work on social navigation using VLMs and LLMs.
**Approach:**
- Problem definition: Social navigation as a Markov Decision Process (MDP).
- VLM-based scoring module: Inference of socially compatible navigation behavior.
- Real-time perception model: Detection of social entities to reduce VLM queries.
- Algorithm overview: Integration of perception, VLM, and motion planning.
**Experiments:**
- Implementation details: Hardware and software setup.
- Qualitative results: Comparison with other methods in four scenarios.
- Quantitative results: Success rate, collision rate, and user study scores.
- Discussion: Real-time navigation, socially aware navigation, and future directions.
**Conclusion:**
VLM-Social-Nav is a novel approach that uses VLMs to enable socially compliant robot navigation in human-centered environments. It reduces reliance on large datasets and enhances adaptability, resulting in improved social compliance in real-world scenarios