25 Jun 2024 | Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su
Point-SAM is a 3D promptable segmentation model designed for point clouds, extending the Segment Anything Model (SAM) to the 3D domain. The model uses a transformer-based architecture to predict segmentation masks from point clouds and prompts. It leverages part-level and object-level annotations and introduces a data engine to generate pseudo labels from SAM, distilling 2D knowledge into the 3D model. Point-SAM outperforms state-of-the-art models on several indoor and outdoor benchmarks and demonstrates applications such as 3D annotation. The model is trained on a mixture of heterogeneous datasets, including PartNet and ScanNet, with both part- and object-level annotations. A data engine generates pseudo labels using SAM, enhancing the model's zero-shot transferability. Point-SAM is capable of handling data from various sources and producing results at multiple levels of granularity. The model's contributions include developing a 3D foundation model for promptable segmentation on point clouds, proposing a data engine to generate pseudo labels with diverse masks, and scaling up the model and dataset for 3D segmentation. The model shows strong zero-shot transferability to unseen point-cloud distributions and new tasks. Point-SAM is evaluated on various datasets, including PartNet-Mobility, ScanObjectNN, S3DIS, and KITTI-360, demonstrating superior performance in zero-shot point-prompted segmentation and object proposal generation. The model also excels in few-shot part segmentation, outperforming other methods in tasks requiring part-level segmentation. The model's performance is further validated through ablation studies, showing that scaling up training data and using pseudo labels significantly improves zero-shot transferability. The model's architecture is designed to handle the irregularity and scalability of point clouds, making it effective for 3D segmentation tasks. The model's success highlights the importance of developing native 3D foundation models to address the challenges of 3D segmentation, including the lack of unified data formats, lightweight models, and labeled data with diverse masks.Point-SAM is a 3D promptable segmentation model designed for point clouds, extending the Segment Anything Model (SAM) to the 3D domain. The model uses a transformer-based architecture to predict segmentation masks from point clouds and prompts. It leverages part-level and object-level annotations and introduces a data engine to generate pseudo labels from SAM, distilling 2D knowledge into the 3D model. Point-SAM outperforms state-of-the-art models on several indoor and outdoor benchmarks and demonstrates applications such as 3D annotation. The model is trained on a mixture of heterogeneous datasets, including PartNet and ScanNet, with both part- and object-level annotations. A data engine generates pseudo labels using SAM, enhancing the model's zero-shot transferability. Point-SAM is capable of handling data from various sources and producing results at multiple levels of granularity. The model's contributions include developing a 3D foundation model for promptable segmentation on point clouds, proposing a data engine to generate pseudo labels with diverse masks, and scaling up the model and dataset for 3D segmentation. The model shows strong zero-shot transferability to unseen point-cloud distributions and new tasks. Point-SAM is evaluated on various datasets, including PartNet-Mobility, ScanObjectNN, S3DIS, and KITTI-360, demonstrating superior performance in zero-shot point-prompted segmentation and object proposal generation. The model also excels in few-shot part segmentation, outperforming other methods in tasks requiring part-level segmentation. The model's performance is further validated through ablation studies, showing that scaling up training data and using pseudo labels significantly improves zero-shot transferability. The model's architecture is designed to handle the irregularity and scalability of point clouds, making it effective for 3D segmentation tasks. The model's success highlights the importance of developing native 3D foundation models to address the challenges of 3D segmentation, including the lack of unified data formats, lightweight models, and labeled data with diverse masks.