Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion

Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion

13 May 2024 | Muhammad Zohaib, Muhammad Asim, Mohammed ELAffendi
This paper presents a deep learning approach for enhancing emergency vehicle detection using multimodal fusion of acoustic and visual information. The study aims to improve the accuracy and response time of emergency vehicle detection systems by integrating advanced deep learning techniques for both acoustic and visual data. The proposed framework includes an attention-based temporal spectrum network (ATSN) for ambulance siren sound detection and a Multi-Level Spatial Fusion YOLO (MLSF-YOLO) architecture for visual detection. These components are combined using stacking ensemble learning to create a robust framework for emergency vehicle detection. The ATSN model is designed to extract semantic features for siren sound classification, while the MLSF-YOLO model enhances visual detection by incorporating multi-level spatial fusion techniques. The study demonstrates that the proposed approach significantly improves detection accuracy and efficiency, achieving a misdetection rate of only 3.81% and an accuracy of 96.19% for visual data containing emergency vehicles. The results show that the multimodal approach outperforms existing methods in terms of accuracy and robustness, particularly in adverse conditions. The study also evaluates the performance of the proposed models on various datasets, including the VEVD + KITTI dataset for visual detection and an acoustic dataset for emergency vehicle sound classification. The MLSF-YOLO model achieves a mean average precision (mAP) of 71.1% at an input size of 608 × 608, demonstrating its efficiency and accuracy in detecting emergency vehicles. The ATSN model achieves high accuracy rates across different input lengths and noise levels, with performance exceeding 93% for most input durations. The study highlights the effectiveness of the proposed multimodal approach in improving emergency vehicle detection systems, with applications in real-time traffic management and road safety. The results demonstrate the potential of the proposed framework in enhancing the accuracy and reliability of emergency vehicle detection in real-world scenarios.This paper presents a deep learning approach for enhancing emergency vehicle detection using multimodal fusion of acoustic and visual information. The study aims to improve the accuracy and response time of emergency vehicle detection systems by integrating advanced deep learning techniques for both acoustic and visual data. The proposed framework includes an attention-based temporal spectrum network (ATSN) for ambulance siren sound detection and a Multi-Level Spatial Fusion YOLO (MLSF-YOLO) architecture for visual detection. These components are combined using stacking ensemble learning to create a robust framework for emergency vehicle detection. The ATSN model is designed to extract semantic features for siren sound classification, while the MLSF-YOLO model enhances visual detection by incorporating multi-level spatial fusion techniques. The study demonstrates that the proposed approach significantly improves detection accuracy and efficiency, achieving a misdetection rate of only 3.81% and an accuracy of 96.19% for visual data containing emergency vehicles. The results show that the multimodal approach outperforms existing methods in terms of accuracy and robustness, particularly in adverse conditions. The study also evaluates the performance of the proposed models on various datasets, including the VEVD + KITTI dataset for visual detection and an acoustic dataset for emergency vehicle sound classification. The MLSF-YOLO model achieves a mean average precision (mAP) of 71.1% at an input size of 608 × 608, demonstrating its efficiency and accuracy in detecting emergency vehicles. The ATSN model achieves high accuracy rates across different input lengths and noise levels, with performance exceeding 93% for most input durations. The study highlights the effectiveness of the proposed multimodal approach in improving emergency vehicle detection systems, with applications in real-time traffic management and road safety. The results demonstrate the potential of the proposed framework in enhancing the accuracy and reliability of emergency vehicle detection in real-world scenarios.
Reach us at info@study.space
[slides] Enhancing Emergency Vehicle Detection%3A A Deep Learning Approach with Multimodal Fusion | StudySpace