8 May 2024 | Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu
LaserMix++ is a data-efficient framework for 3D scene understanding in autonomous driving, combining LiDAR and camera data to enhance feature learning through spatial and textural synergies. The framework extends semi-supervised learning by integrating multi-modal data, including multi-modal LaserMix operations, camera-to-LiDAR feature distillation, and language-driven knowledge guidance. These components work together to generate robust auxiliary signals, improving the model's ability to generalize across different scenarios. LaserMix++ is validated on popular driving perception datasets like nuScenes, SemanticKITTI, and ScribbleKITTI, demonstrating significant performance improvements over fully supervised methods with fewer annotations. The framework is agnostic to different LiDAR representations and effectively integrates multi-modal inputs, making it a universally applicable solution. The approach leverages spatial priors in LiDAR data and enhances data efficiency through techniques like entropy minimization and consistency regularization. The framework also incorporates camera data to complement LiDAR information, improving robustness and accuracy. Results show that LaserMix++ achieves superior performance in both low- and high-data regimes, highlighting the potential of semi-supervised learning in reducing reliance on extensive labeled data. The framework is implemented using PyTorch on eight NVIDIA A100 GPUs and is publicly available at https://github.com/ldkong1205/LaserMix.LaserMix++ is a data-efficient framework for 3D scene understanding in autonomous driving, combining LiDAR and camera data to enhance feature learning through spatial and textural synergies. The framework extends semi-supervised learning by integrating multi-modal data, including multi-modal LaserMix operations, camera-to-LiDAR feature distillation, and language-driven knowledge guidance. These components work together to generate robust auxiliary signals, improving the model's ability to generalize across different scenarios. LaserMix++ is validated on popular driving perception datasets like nuScenes, SemanticKITTI, and ScribbleKITTI, demonstrating significant performance improvements over fully supervised methods with fewer annotations. The framework is agnostic to different LiDAR representations and effectively integrates multi-modal inputs, making it a universally applicable solution. The approach leverages spatial priors in LiDAR data and enhances data efficiency through techniques like entropy minimization and consistency regularization. The framework also incorporates camera data to complement LiDAR information, improving robustness and accuracy. Results show that LaserMix++ achieves superior performance in both low- and high-data regimes, highlighting the potential of semi-supervised learning in reducing reliance on extensive labeled data. The framework is implemented using PyTorch on eight NVIDIA A100 GPUs and is publicly available at https://github.com/ldkong1205/LaserMix.