Understanding Real2Code%3A Reconstruct Articulated Objects via Code Generation

Real2Code is a novel approach to reconstructing articulated objects using code generation. The method first reconstructs the part geometry of an object using image segmentation and shape completion models. It then represents the object parts with oriented bounding boxes (OBBs), which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, Real2Code scales well with the number of articulated parts and generalizes from synthetic training data to real-world objects in unstructured environments. Experimental results show that Real2Code significantly outperforms previous state-of-the-art methods in reconstruction accuracy and can reconstruct objects with up to 10 articulated parts. When combined with a stereo reconstruction model, Real2Code can also generalize to real-world objects from a few multi-view RGB images without depth or camera information. The contributions of Real2Code include a novel approach to articulated object reconstruction, a kinematics-aware part segmentation and shape completion method, and empirical validation of its effectiveness in both articulation estimation and part reconstruction.Real2Code is a novel approach to reconstructing articulated objects using code generation. The method first reconstructs the part geometry of an object using image segmentation and shape completion models. It then represents the object parts with oriented bounding boxes (OBBs), which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, Real2Code scales well with the number of articulated parts and generalizes from synthetic training data to real-world objects in unstructured environments. Experimental results show that Real2Code significantly outperforms previous state-of-the-art methods in reconstruction accuracy and can reconstruct objects with up to 10 articulated parts. When combined with a stereo reconstruction model, Real2Code can also generalize to real-world objects from a few multi-view RGB images without depth or camera information. The contributions of Real2Code include a novel approach to articulated object reconstruction, a kinematics-aware part segmentation and shape completion method, and empirical validation of its effectiveness in both articulation estimation and part reconstruction.

Real2Code: Reconstruct Articulated Objects via Code Generation

13 Jun 2024 | Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song