Understanding Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

This paper introduces a collaborative semantic occupancy prediction framework, CoHFF, which leverages hybrid feature fusion to enhance 3D semantic occupancy prediction in connected automated vehicles (CAVs). The framework integrates semantic and occupancy task features, as well as compressed orthogonal attention features shared between vehicles, to improve local 3D semantic occupancy predictions. To evaluate the framework, the authors extend the existing collaborative perception dataset OPV2V with 3D semantic occupancy labels, enabling more robust evaluation. The experimental results show that collaborative semantic occupancy predictions outperform single-vehicle results by over 30%, and models based on semantic occupancy outperform state-of-the-art collaborative 3D detection techniques in subsequent perception applications, demonstrating enhanced accuracy and semantic awareness in road environments. The CoHFF framework consists of four key modules: (1) Occupancy Prediction Task Net for occupancy feature extraction, (2) Semantic Segmentation Task Net for creating semantic plane-based embeddings, (3) V2X Feature Fusion for merging CAV features via deformable self-attention, and (4) Task Feature Fusion for uniting all task features to enhance semantic occupancy prediction. The framework achieves camera-based collaborative semantic occupancy prediction by sharing plane-based semantic features via V2X communication. The authors propose a hybrid feature fusion approach that not only facilitates efficient collaboration among CAVs but also significantly enhances performance over models pre-trained solely for occupancy prediction or semantic voxel segmentation. The framework is evaluated on the extended OPV2V dataset, which includes 12 different 3D semantic occupancy labels. The results show that collaboration enhances semantic occupancy prediction, with some previously undetectable categories such as traffic signs and bridges being detectable after collaboration. The framework also demonstrates robustness under low communication budgets, with performance remaining stable even when communication volume is reduced by 97 times. The authors also present visual results showing the effectiveness of CoHFF in various scenarios, including vehicle geometry completion and occluded vehicle detection. The framework is shown to predict more complete vehicle objects than those in the ego vehicle's FoV and successfully detect vehicles outside of the FoV using minimal pixel information. The results demonstrate that CoHFF significantly enhances perception performance by over 30% through integrating features from different tasks and various CAVs. The framework is also shown to be robust in the presence of GPS noise and to perform well under low communication budgets.This paper introduces a collaborative semantic occupancy prediction framework, CoHFF, which leverages hybrid feature fusion to enhance 3D semantic occupancy prediction in connected automated vehicles (CAVs). The framework integrates semantic and occupancy task features, as well as compressed orthogonal attention features shared between vehicles, to improve local 3D semantic occupancy predictions. To evaluate the framework, the authors extend the existing collaborative perception dataset OPV2V with 3D semantic occupancy labels, enabling more robust evaluation. The experimental results show that collaborative semantic occupancy predictions outperform single-vehicle results by over 30%, and models based on semantic occupancy outperform state-of-the-art collaborative 3D detection techniques in subsequent perception applications, demonstrating enhanced accuracy and semantic awareness in road environments. The CoHFF framework consists of four key modules: (1) Occupancy Prediction Task Net for occupancy feature extraction, (2) Semantic Segmentation Task Net for creating semantic plane-based embeddings, (3) V2X Feature Fusion for merging CAV features via deformable self-attention, and (4) Task Feature Fusion for uniting all task features to enhance semantic occupancy prediction. The framework achieves camera-based collaborative semantic occupancy prediction by sharing plane-based semantic features via V2X communication. The authors propose a hybrid feature fusion approach that not only facilitates efficient collaboration among CAVs but also significantly enhances performance over models pre-trained solely for occupancy prediction or semantic voxel segmentation. The framework is evaluated on the extended OPV2V dataset, which includes 12 different 3D semantic occupancy labels. The results show that collaboration enhances semantic occupancy prediction, with some previously undetectable categories such as traffic signs and bridges being detectable after collaboration. The framework also demonstrates robustness under low communication budgets, with performance remaining stable even when communication volume is reduced by 97 times. The authors also present visual results showing the effectiveness of CoHFF in various scenarios, including vehicle geometry completion and occluded vehicle detection. The framework is shown to predict more complete vehicle objects than those in the ego vehicle's FoV and successfully detect vehicles outside of the FoV using minimal pixel information. The results demonstrate that CoHFF significantly enhances perception performance by over 30% through integrating features from different tasks and various CAVs. The framework is also shown to be robust in the presence of GPS noise and to perform well under low communication budgets.

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

25 Apr 2024 | Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, Alois Knoll