The paper introduces 3D Diffuser Actor, a novel neural policy that combines 3D scene representations and diffusion models to predict robot actions. The 3D Diffuser Actor is designed to handle multimodal action distributions and improve generalization across different camera viewpoints. It uses a 3D denoising transformer to fuse information from 3D visual scenes, language instructions, and proprioception to predict noise in noised 3D robot pose trajectories. The model is evaluated on the RLbench and CALVIN benchmarks, achieving state-of-the-art performance with significant improvements over existing methods. The 3D Diffuser Actor also demonstrates the ability to learn from real-world demonstrations, showing strong performance in multi-task manipulation tasks. The paper highlights the importance of 3D scene representations and diffusion objectives in robot policy learning, providing a comprehensive comparison with related works and ablation studies.The paper introduces 3D Diffuser Actor, a novel neural policy that combines 3D scene representations and diffusion models to predict robot actions. The 3D Diffuser Actor is designed to handle multimodal action distributions and improve generalization across different camera viewpoints. It uses a 3D denoising transformer to fuse information from 3D visual scenes, language instructions, and proprioception to predict noise in noised 3D robot pose trajectories. The model is evaluated on the RLbench and CALVIN benchmarks, achieving state-of-the-art performance with significant improvements over existing methods. The 3D Diffuser Actor also demonstrates the ability to learn from real-world demonstrations, showing strong performance in multi-task manipulation tasks. The paper highlights the importance of 3D scene representations and diffusion objectives in robot policy learning, providing a comprehensive comparison with related works and ablation studies.