13 Jun 2024 | Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song
Real2Code is a novel approach for reconstructing articulated objects using code generation. Given visual observations of an object, Real2Code first reconstructs its part geometry using an image segmentation model and a shape completion model. It then represents the object parts with oriented bounding boxes (OBBs), which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, Real2Code scales elegantly with the number of articulated parts and generalizes from synthetic training data to real-world objects in unstructured environments. Experimental results show that Real2Code significantly outperforms previous state-of-the-art methods in reconstruction accuracy and is the first approach to extrapolate beyond objects' structural complexity in the training set, reconstructing objects with up to 10 articulated parts. When combined with a stereo reconstruction model, Real2Code generalizes to real-world objects from a few multi-view RGB images without needing depth or camera information. Real2Code is evaluated on the PartNet-Mobility dataset, demonstrating superior performance in both 3D reconstruction and joint prediction accuracy. It is the only method to reliably reconstruct objects with more than three articulated parts, while prior methods fail on such objects. Real2Code reconstructs complex articulated objects with up to 10 parts and generalizes to real-world objects from a few pose-free RGB images. The method uses a pre-trained SAM model for part segmentation and a learned shape completion model to extract watertight meshes. It then uses a fine-tuned LLM to predict joint parameters as code, which can be directly executed in simulation. Real2Code outperforms baseline methods in joint prediction accuracy and achieves high-quality reconstructions. The method is effective in both synthetic and real-world settings, demonstrating its ability to handle complex articulated objects with many parts. Real2Code's approach of using code generation for joint prediction allows for scalable and accurate reconstruction of articulated objects. The method is validated through extensive experiments on various datasets and shows significant improvements over existing methods. Real2Code's ability to generalize to real-world objects from few RGB images and its effectiveness in reconstructing complex articulated objects with many parts make it a promising approach for future research in articulated object reconstruction.Real2Code is a novel approach for reconstructing articulated objects using code generation. Given visual observations of an object, Real2Code first reconstructs its part geometry using an image segmentation model and a shape completion model. It then represents the object parts with oriented bounding boxes (OBBs), which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, Real2Code scales elegantly with the number of articulated parts and generalizes from synthetic training data to real-world objects in unstructured environments. Experimental results show that Real2Code significantly outperforms previous state-of-the-art methods in reconstruction accuracy and is the first approach to extrapolate beyond objects' structural complexity in the training set, reconstructing objects with up to 10 articulated parts. When combined with a stereo reconstruction model, Real2Code generalizes to real-world objects from a few multi-view RGB images without needing depth or camera information. Real2Code is evaluated on the PartNet-Mobility dataset, demonstrating superior performance in both 3D reconstruction and joint prediction accuracy. It is the only method to reliably reconstruct objects with more than three articulated parts, while prior methods fail on such objects. Real2Code reconstructs complex articulated objects with up to 10 parts and generalizes to real-world objects from a few pose-free RGB images. The method uses a pre-trained SAM model for part segmentation and a learned shape completion model to extract watertight meshes. It then uses a fine-tuned LLM to predict joint parameters as code, which can be directly executed in simulation. Real2Code outperforms baseline methods in joint prediction accuracy and achieves high-quality reconstructions. The method is effective in both synthetic and real-world settings, demonstrating its ability to handle complex articulated objects with many parts. Real2Code's approach of using code generation for joint prediction allows for scalable and accurate reconstruction of articulated objects. The method is validated through extensive experiments on various datasets and shows significant improvements over existing methods. Real2Code's ability to generalize to real-world objects from few RGB images and its effectiveness in reconstructing complex articulated objects with many parts make it a promising approach for future research in articulated object reconstruction.