[slides] Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

**ChatSim: Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents** This paper introduces ChatSim, a system that enables editable photo-realistic 3D driving scene simulations using natural language commands and external digital assets. To address the limitations of existing editable scene simulation approaches, ChatSim leverages a large language model (LLM) agent collaboration framework to enable high command flexibility. It employs a novel multi-camera neural radiance field method (McNeRF) for generating photo-realistic outcomes and a multi-camera lighting estimation method (McLight) to achieve scene-consistent asset rendering. **Key Contributions:** 1. **LLM-Agents Collaboration:** ChatSim uses a collaborative LLM-agent framework to handle complex and abstract user commands, ensuring intuitive and dynamic editing of driving scenes. 2. **McNeRF for Photo-Realistic Rendering:** McNeRF incorporates multi-camera inputs to generate high-fidelity rendering, addressing camera pose misalignment and brightness inconsistency. 3. **McLight for Realistic Asset Integration:** McLight estimates lighting conditions for external digital assets, enabling seamless integration with the scene. **Experiments:** - **Waymo Open Dataset:** ChatSim demonstrates the ability to handle complex language commands and generate photo-realistic scene videos. - **Performance Evaluation:** ChatSim achieves state-of-the-art performance in photo-realism and wide-angle rendering, outperforming existing methods in terms of accuracy and realism. **Conclusion:** ChatSim is the first system to enable editable photo-realistic 3D driving scene simulations via natural language commands and external digital assets. It addresses the limitations of existing approaches by leveraging LLM-agent collaboration, McNeRF, and McLight, achieving high-quality and flexible scene simulations.**ChatSim: Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents** This paper introduces ChatSim, a system that enables editable photo-realistic 3D driving scene simulations using natural language commands and external digital assets. To address the limitations of existing editable scene simulation approaches, ChatSim leverages a large language model (LLM) agent collaboration framework to enable high command flexibility. It employs a novel multi-camera neural radiance field method (McNeRF) for generating photo-realistic outcomes and a multi-camera lighting estimation method (McLight) to achieve scene-consistent asset rendering. **Key Contributions:** 1. **LLM-Agents Collaboration:** ChatSim uses a collaborative LLM-agent framework to handle complex and abstract user commands, ensuring intuitive and dynamic editing of driving scenes. 2. **McNeRF for Photo-Realistic Rendering:** McNeRF incorporates multi-camera inputs to generate high-fidelity rendering, addressing camera pose misalignment and brightness inconsistency. 3. **McLight for Realistic Asset Integration:** McLight estimates lighting conditions for external digital assets, enabling seamless integration with the scene. **Experiments:** - **Waymo Open Dataset:** ChatSim demonstrates the ability to handle complex language commands and generate photo-realistic scene videos. - **Performance Evaluation:** ChatSim achieves state-of-the-art performance in photo-realism and wide-angle rendering, outperforming existing methods in terms of accuracy and realism. **Conclusion:** ChatSim is the first system to enable editable photo-realistic 3D driving scene simulations via natural language commands and external digital assets. It addresses the limitations of existing approaches by leveraging LLM-agent collaboration, McNeRF, and McLight, achieving high-quality and flexible scene simulations.

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

26 Jun 2024 | Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang