25 Jul 2024 | Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé
The paper introduces SAL (Segment Anything in Lidar), a method for zero-shot instance segmentation and classification of objects in Lidar point clouds. SAL consists of two main components: a pseudo-label engine and a zero-shot model. The pseudo-label engine uses 2D vision foundation models (such as SAM and CLIP) to generate pseudo-labels for Lidar data, which are then used to train the zero-shot model. This approach allows SAL to perform class-agnostic segmentation and zero-shot classification without manual supervision. The key contributions of SAL include:
1. **Pseudo-Label Engine**: Utilizes 2D vision foundation models to generate pseudo-labels for Lidar data, which are then used to train the zero-shot model.
2. **Zero-Shot Model**: Trains on the generated pseudo-labels to perform class-agnostic segmentation and zero-shot classification using text prompts.
3. **Performance**: Achieves 91% accuracy in class-agnostic segmentation and 54% accuracy in zero-shot LPS on the SemanticKITTI dataset, outperforming several baselines.
4. **Generalizability**: Can be easily extended to new datasets and supports arbitrary class prompts.
The paper also discusses the challenges and limitations of existing methods in Lidar Panoptic Segmentation (LPS) and highlights the effectiveness of SAL in handling a wide range of object classes and datasets. The authors release all models and the code to facilitate further research and application.The paper introduces SAL (Segment Anything in Lidar), a method for zero-shot instance segmentation and classification of objects in Lidar point clouds. SAL consists of two main components: a pseudo-label engine and a zero-shot model. The pseudo-label engine uses 2D vision foundation models (such as SAM and CLIP) to generate pseudo-labels for Lidar data, which are then used to train the zero-shot model. This approach allows SAL to perform class-agnostic segmentation and zero-shot classification without manual supervision. The key contributions of SAL include:
1. **Pseudo-Label Engine**: Utilizes 2D vision foundation models to generate pseudo-labels for Lidar data, which are then used to train the zero-shot model.
2. **Zero-Shot Model**: Trains on the generated pseudo-labels to perform class-agnostic segmentation and zero-shot classification using text prompts.
3. **Performance**: Achieves 91% accuracy in class-agnostic segmentation and 54% accuracy in zero-shot LPS on the SemanticKITTI dataset, outperforming several baselines.
4. **Generalizability**: Can be easily extended to new datasets and supports arbitrary class prompts.
The paper also discusses the challenges and limitations of existing methods in Lidar Panoptic Segmentation (LPS) and highlights the effectiveness of SAL in handling a wide range of object classes and datasets. The authors release all models and the code to facilitate further research and application.