14 Mar 2024 | Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia
GroupContrast is a novel self-supervised representation learning method for 3D scene understanding that combines segment grouping and semantic-aware contrastive learning. The method addresses the issue of "semantic conflict" in previous approaches, where semantically similar points may have dissimilar representations, leading to false negatives. Segment grouping partitions points into semantically meaningful regions, enhancing semantic coherence and providing guidance for contrastive learning. Semantic-aware contrastive learning uses the grouped segments to improve the contrastive learning process, reducing semantic conflicts. The method is evaluated on multiple 3D scene understanding tasks, achieving promising transfer learning performance. Experiments show that GroupContrast outperforms existing self-supervised 3D representation learning approaches in tasks such as 3D semantic segmentation, instance segmentation, and object detection. The method also demonstrates high data efficiency, achieving state-of-the-art results on data-efficient semantic segmentation benchmarks. The framework uses a dual-network structure, including a teacher network and a student network, to ensure stable and consistent contrastive learning. The teacher network is updated using an exponential moving average of the student network's parameters. The method also incorporates informative-aware distillation and confidence-weighted contrastive learning to improve performance. Overall, GroupContrast achieves state-of-the-art results in various 3D scene perception tasks.GroupContrast is a novel self-supervised representation learning method for 3D scene understanding that combines segment grouping and semantic-aware contrastive learning. The method addresses the issue of "semantic conflict" in previous approaches, where semantically similar points may have dissimilar representations, leading to false negatives. Segment grouping partitions points into semantically meaningful regions, enhancing semantic coherence and providing guidance for contrastive learning. Semantic-aware contrastive learning uses the grouped segments to improve the contrastive learning process, reducing semantic conflicts. The method is evaluated on multiple 3D scene understanding tasks, achieving promising transfer learning performance. Experiments show that GroupContrast outperforms existing self-supervised 3D representation learning approaches in tasks such as 3D semantic segmentation, instance segmentation, and object detection. The method also demonstrates high data efficiency, achieving state-of-the-art results on data-efficient semantic segmentation benchmarks. The framework uses a dual-network structure, including a teacher network and a student network, to ensure stable and consistent contrastive learning. The teacher network is updated using an exponential moving average of the student network's parameters. The method also incorporates informative-aware distillation and confidence-weighted contrastive learning to improve performance. Overall, GroupContrast achieves state-of-the-art results in various 3D scene perception tasks.