29 May 2024 | Daniel Gehrig123 & Davide Scaramuzza123
The paper addresses the bandwidth-latency trade-off in advanced driver assistance systems (ADAS) by proposing a hybrid event- and frame-based object detector. Traditional ADAS systems rely on image-based RGB cameras, which suffer from high bandwidth demands and perceptual latency. Event cameras, which measure changes in intensity asynchronously, offer high temporal resolution and sparsity, reducing these issues. However, event-camera-based algorithms often sacrifice accuracy or efficiency.
To overcome these limitations, the authors propose a hybrid method that combines a standard CNN for images with an efficient asynchronous graph neural network (GNN) for events. This approach leverages the rich context information in images and the sparse, high-rate event information to generate efficient, high-rate object detections with reduced perceptual latency. The method is designed to work with a 20 fps RGB camera and an event camera, achieving the same latency as a 5,000 fps camera while maintaining comparable accuracy and using only 4% more bandwidth.
The proposed method, called Deep Asynchronous GNN (DAGr), processes events in a recursive fashion, minimizing redundant computation and enhancing computational efficiency. It shares features from a dense CNN running on low-rate images with the GNN, which processes each new event efficiently. The GNN constructs spatio-temporal graphs from events and processes them together with image features through a sequence of convolution and pooling layers. Specialized layers, such as graph residual layers and directed voxel grid max pooling, further improve efficiency.
The authors evaluate the method on the DSEC Detection dataset, showing that it outperforms state-of-the-art event- and frame-based object detectors in terms of both efficiency and accuracy. The method demonstrates the potential of event cameras in edge-case scenarios, providing certifiable snapshots of reality and improving object detection for nonlinearly moving or deformable objects. The combination of a 20 fps RGB camera and an event camera achieves a 2.0 ms perceptual latency, on par with a 5,000 fps camera, while requiring only 4% more data bandwidth.The paper addresses the bandwidth-latency trade-off in advanced driver assistance systems (ADAS) by proposing a hybrid event- and frame-based object detector. Traditional ADAS systems rely on image-based RGB cameras, which suffer from high bandwidth demands and perceptual latency. Event cameras, which measure changes in intensity asynchronously, offer high temporal resolution and sparsity, reducing these issues. However, event-camera-based algorithms often sacrifice accuracy or efficiency.
To overcome these limitations, the authors propose a hybrid method that combines a standard CNN for images with an efficient asynchronous graph neural network (GNN) for events. This approach leverages the rich context information in images and the sparse, high-rate event information to generate efficient, high-rate object detections with reduced perceptual latency. The method is designed to work with a 20 fps RGB camera and an event camera, achieving the same latency as a 5,000 fps camera while maintaining comparable accuracy and using only 4% more bandwidth.
The proposed method, called Deep Asynchronous GNN (DAGr), processes events in a recursive fashion, minimizing redundant computation and enhancing computational efficiency. It shares features from a dense CNN running on low-rate images with the GNN, which processes each new event efficiently. The GNN constructs spatio-temporal graphs from events and processes them together with image features through a sequence of convolution and pooling layers. Specialized layers, such as graph residual layers and directed voxel grid max pooling, further improve efficiency.
The authors evaluate the method on the DSEC Detection dataset, showing that it outperforms state-of-the-art event- and frame-based object detectors in terms of both efficiency and accuracy. The method demonstrates the potential of event cameras in edge-case scenarios, providing certifiable snapshots of reality and improving object detection for nonlinearly moving or deformable objects. The combination of a 20 fps RGB camera and an event camera achieves a 2.0 ms perceptual latency, on par with a 5,000 fps camera, while requiring only 4% more data bandwidth.