Better Call SAL: Towards Learning to Segment Anything in Lidar

Better Call SAL: Towards Learning to Segment Anything in Lidar

25 Jul 2024 | Aljoša Ošep, Tim Meinhardt, Francesco Ferrari, Neehar Peri, Deva Ramanan, Laura Leal-Taixé
The paper introduces SAL (Segment Anything in Lidar), a novel method for zero-shot lidar panoptic segmentation. SAL combines a text-promptable zero-shot model for segmenting and classifying any object in lidar data with a pseudo-labeling engine that enables model training without manual supervision. The method leverages 2D vision foundation models to generate 3D supervision "for free," distilling these models into a lidar-specific model. SAL achieves 91% in class-agnostic segmentation and 54% in zero-shot lidar panoptic segmentation, outperforming several baselines. The model supports arbitrary class prompts and can be extended to new datasets. SAL's pseudo-label engine uses SAM for segmentation masks and CLIP for text-based classification, transferring these to lidar using a calibrated sensory setup. The model is trained on pseudo-labels generated from 2D vision models, allowing it to perform zero-shot classification without image features. SAL demonstrates strong performance on standard benchmarks, achieving 42% and 54% of the performance of fully supervised models on SemanticKITTI and nuScenes, respectively. The model is also effective on full lidar point clouds without relying on image features. SAL's approach shows significant potential for improving with increasing amounts of self-labeled data and opens the door for training lidar segmentation models without manual supervision.The paper introduces SAL (Segment Anything in Lidar), a novel method for zero-shot lidar panoptic segmentation. SAL combines a text-promptable zero-shot model for segmenting and classifying any object in lidar data with a pseudo-labeling engine that enables model training without manual supervision. The method leverages 2D vision foundation models to generate 3D supervision "for free," distilling these models into a lidar-specific model. SAL achieves 91% in class-agnostic segmentation and 54% in zero-shot lidar panoptic segmentation, outperforming several baselines. The model supports arbitrary class prompts and can be extended to new datasets. SAL's pseudo-label engine uses SAM for segmentation masks and CLIP for text-based classification, transferring these to lidar using a calibrated sensory setup. The model is trained on pseudo-labels generated from 2D vision models, allowing it to perform zero-shot classification without image features. SAL demonstrates strong performance on standard benchmarks, achieving 42% and 54% of the performance of fully supervised models on SemanticKITTI and nuScenes, respectively. The model is also effective on full lidar point clouds without relying on image features. SAL's approach shows significant potential for improving with increasing amounts of self-labeled data and opens the door for training lidar segmentation models without manual supervision.
Reach us at info@study.space