Low-latency automotive vision with event cameras

Low-latency automotive vision with event cameras

30 May 2024 | Daniel Gehrig & Davide Scaramuzza
This paper introduces a hybrid event- and frame-based object detector that combines the advantages of both modalities to achieve efficient and accurate object detection in automotive settings. Event cameras, which capture changes in intensity asynchronously, offer high temporal resolution and sparsity, reducing bandwidth and latency requirements. However, event-based methods face challenges in accuracy due to the inability to capture slowly varying signals and inefficiencies in processing events into frame-like representations. The proposed method uses a standard CNN for images and an efficient asynchronous graph neural network (GNN) for events, leveraging the high temporal resolution and sparsity of events and the rich but low temporal resolution information in standard images to generate efficient, high-rate object detections with reduced perceptual and computational latency. The hybrid approach uses a 20 fps RGB camera and an event camera to achieve the same latency as a 5,000 fps camera with the bandwidth of a 45 fps camera, without compromising accuracy. This setup reduces perceptual latency and computational latency, enabling efficient and robust perception in edge-case scenarios. The method processes events as a streaming data structure instead of a dense one, making it four orders of magnitude more efficient. It also innovates on the architecture building blocks to scale the depth of the network while remaining more efficient than competing asynchronous methods. The method can effectively detect objects in the blind time between frames and maintain high detection performance throughout this period. The hybrid approach combines the strengths of event and frame-based sensors, leveraging the rich context information in images and sparse, high-rate event information for efficient, high-rate object detections. It provides additional certifiable snapshots of reality that show objects before they become visible in the next image or captures object movements that encode the intent or trajectory of traffic participants. The method outperforms state-of-the-art event- and frame-based object detectors in terms of efficiency and accuracy, achieving higher mAP and lower computational complexity. It also demonstrates significant improvements in performance for nonlinearly moving or deformable objects compared to image-based approaches. The method is evaluated on various datasets, including the DSEC-Detection dataset, and shows superior performance in terms of mAP and computational efficiency. It outperforms existing methods in terms of accuracy and bandwidth, achieving a 0.2-ms perceptual latency, on par with that of a 5,000 fps RGB camera, but with only 4% more data bandwidth than a 45 fps automotive sensor. The results demonstrate that the combination of a 20 fps RGB camera with an event camera features a 0.2-ms perceptual latency, on par with that of a 5,000 fps RGB camera, but with only 4% more data bandwidth than a 45 fps automotive sensor. The method also shows significant improvements in performance for nonlinearly moving or deformable objects compared to image-based approaches.This paper introduces a hybrid event- and frame-based object detector that combines the advantages of both modalities to achieve efficient and accurate object detection in automotive settings. Event cameras, which capture changes in intensity asynchronously, offer high temporal resolution and sparsity, reducing bandwidth and latency requirements. However, event-based methods face challenges in accuracy due to the inability to capture slowly varying signals and inefficiencies in processing events into frame-like representations. The proposed method uses a standard CNN for images and an efficient asynchronous graph neural network (GNN) for events, leveraging the high temporal resolution and sparsity of events and the rich but low temporal resolution information in standard images to generate efficient, high-rate object detections with reduced perceptual and computational latency. The hybrid approach uses a 20 fps RGB camera and an event camera to achieve the same latency as a 5,000 fps camera with the bandwidth of a 45 fps camera, without compromising accuracy. This setup reduces perceptual latency and computational latency, enabling efficient and robust perception in edge-case scenarios. The method processes events as a streaming data structure instead of a dense one, making it four orders of magnitude more efficient. It also innovates on the architecture building blocks to scale the depth of the network while remaining more efficient than competing asynchronous methods. The method can effectively detect objects in the blind time between frames and maintain high detection performance throughout this period. The hybrid approach combines the strengths of event and frame-based sensors, leveraging the rich context information in images and sparse, high-rate event information for efficient, high-rate object detections. It provides additional certifiable snapshots of reality that show objects before they become visible in the next image or captures object movements that encode the intent or trajectory of traffic participants. The method outperforms state-of-the-art event- and frame-based object detectors in terms of efficiency and accuracy, achieving higher mAP and lower computational complexity. It also demonstrates significant improvements in performance for nonlinearly moving or deformable objects compared to image-based approaches. The method is evaluated on various datasets, including the DSEC-Detection dataset, and shows superior performance in terms of mAP and computational efficiency. It outperforms existing methods in terms of accuracy and bandwidth, achieving a 0.2-ms perceptual latency, on par with that of a 5,000 fps RGB camera, but with only 4% more data bandwidth than a 45 fps automotive sensor. The results demonstrate that the combination of a 20 fps RGB camera with an event camera features a 0.2-ms perceptual latency, on par with that of a 5,000 fps RGB camera, but with only 4% more data bandwidth than a 45 fps automotive sensor. The method also shows significant improvements in performance for nonlinearly moving or deformable objects compared to image-based approaches.
Reach us at info@study.space