VISIONGPT: LLM-ASSISTED REAL-TIME ANOMALY DETECTION FOR SAFE VISUAL NAVIGATION

VISIONGPT: LLM-ASSISTED REAL-TIME ANOMALY DETECTION FOR SAFE VISUAL NAVIGATION

19 Mar 2024 | Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, Zihao Gong, John Suchanek, Abolfazl Razi
VisionGPT is a real-time anomaly detection system that integrates large language models (LLMs) with open-world object detection to enhance safe visual navigation for visually impaired individuals. The system uses the Yolo-World model for real-time object detection and specialized prompts to identify anomalies in camera-captured frames, generating concise audio descriptions to assist navigation. It enables dynamic scene transitions and allows users to customize detection classes based on their needs. The system also provides real-time feedback with low latency, ensuring safety and ease of navigation. The framework leverages LLMs for dynamic scenario switching and improves performance by combining the speed of open-world detection with the intelligence of LLMs. The system is open-sourced and available for use. Key contributions include zero-shot anomaly detection, real-time feedback, and dynamic scene transition. The system also evaluates the performance of different prompt components and explores future improvements in visual accessibility. The framework is tested on custom video data and demonstrates high accuracy and efficiency in detecting anomalies, with low latency and effective use of resources. The system provides real-time scene descriptions and hazard alerts, ensuring the safety of visually impaired users. The research highlights the potential of integrating computer vision and LLMs to improve accessibility and safety in daily life.VisionGPT is a real-time anomaly detection system that integrates large language models (LLMs) with open-world object detection to enhance safe visual navigation for visually impaired individuals. The system uses the Yolo-World model for real-time object detection and specialized prompts to identify anomalies in camera-captured frames, generating concise audio descriptions to assist navigation. It enables dynamic scene transitions and allows users to customize detection classes based on their needs. The system also provides real-time feedback with low latency, ensuring safety and ease of navigation. The framework leverages LLMs for dynamic scenario switching and improves performance by combining the speed of open-world detection with the intelligence of LLMs. The system is open-sourced and available for use. Key contributions include zero-shot anomaly detection, real-time feedback, and dynamic scene transition. The system also evaluates the performance of different prompt components and explores future improvements in visual accessibility. The framework is tested on custom video data and demonstrates high accuracy and efficiency in detecting anomalies, with low latency and effective use of resources. The system provides real-time scene descriptions and hazard alerts, ensuring the safety of visually impaired users. The research highlights the potential of integrating computer vision and LLMs to improve accessibility and safety in daily life.
Reach us at info@study.space