VISIONGPT: LLM-ASSISTED REAL-TIME ANOMALY DETECTION FOR SAFE VISUAL NAVIGATION

VISIONGPT: LLM-ASSISTED REAL-TIME ANOMALY DETECTION FOR SAFE VISUAL NAVIGATION

19 Mar 2024 | Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, Zihao Gong, John Suchanek, Abolfazl Razi
This paper explores the integration of Large Language Models (LLMs) with real-time object detection models to enhance safe visual navigation for visually impaired individuals. The proposed framework, named VisionGPT, leverages the Yolo-World model for open-vocabulary object detection and specialized prompts to identify anomalies in camera-captured frames. It generates concise, audio-delivered descriptions of abnormalities and provides safety notifications, enabling users to navigate complex environments with greater ease and safety. The system supports dynamic scene transitions and allows users to interact with the LLM module to switch detection classes based on their needs. The paper also discusses the performance contributions of different prompt components and explores future improvements in visual accessibility. The system is designed to be efficient and low-latency, suitable for mobile devices, and has been tested on various platforms, demonstrating its potential for real-world applications.This paper explores the integration of Large Language Models (LLMs) with real-time object detection models to enhance safe visual navigation for visually impaired individuals. The proposed framework, named VisionGPT, leverages the Yolo-World model for open-vocabulary object detection and specialized prompts to identify anomalies in camera-captured frames. It generates concise, audio-delivered descriptions of abnormalities and provides safety notifications, enabling users to navigate complex environments with greater ease and safety. The system supports dynamic scene transitions and allows users to interact with the LLM module to switch detection classes based on their needs. The paper also discusses the performance contributions of different prompt components and explores future improvements in visual accessibility. The system is designed to be efficient and low-latency, suitable for mobile devices, and has been tested on various platforms, demonstrating its potential for real-world applications.
Reach us at info@study.space