SHAPELLM is the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, aiming to achieve universal 3D object understanding using 3D point clouds and languages. The model is built on an improved 3D encoder, ReCon++, which extends the original ReCon by incorporating multi-view image distillation to enhance geometric understanding. SHAPELLM is trained on constructed instruction-following data and evaluated on the 3D MM-Vet benchmark, achieving state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks. The paper also introduces ReCon++, a novel 3D point cloud encoder that leverages multi-view distillation and advanced 3D representation learning, forming the basis for SHAPELLM. The 3D MM-Vet benchmark is established to evaluate four levels of capacity in embodied interaction scenarios, from fundamental recognition to control statement generation.SHAPELLM is the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, aiming to achieve universal 3D object understanding using 3D point clouds and languages. The model is built on an improved 3D encoder, ReCon++, which extends the original ReCon by incorporating multi-view image distillation to enhance geometric understanding. SHAPELLM is trained on constructed instruction-following data and evaluated on the 3D MM-Vet benchmark, achieving state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks. The paper also introduces ReCon++, a novel 3D point cloud encoder that leverages multi-view distillation and advanced 3D representation learning, forming the basis for SHAPELLM. The 3D MM-Vet benchmark is established to evaluate four levels of capacity in embodied interaction scenarios, from fundamental recognition to control statement generation.