26 Jun 2024 | Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang
**ChatSim: Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents**
This paper introduces ChatSim, a system that enables editable photo-realistic 3D driving scene simulations using natural language commands and external digital assets. To address the limitations of existing editable scene simulation approaches, ChatSim leverages a large language model (LLM) agent collaboration framework to enable high command flexibility. It employs a novel multi-camera neural radiance field method (McNeRF) for generating photo-realistic outcomes and a multi-camera lighting estimation method (McLight) to achieve scene-consistent asset rendering.
**Key Contributions:**
1. **LLM-Agents Collaboration:** ChatSim uses a collaborative LLM-agent framework to handle complex and abstract user commands, ensuring intuitive and dynamic editing of driving scenes.
2. **McNeRF for Photo-Realistic Rendering:** McNeRF incorporates multi-camera inputs to generate high-fidelity rendering, addressing camera pose misalignment and brightness inconsistency.
3. **McLight for Realistic Asset Integration:** McLight estimates lighting conditions for external digital assets, enabling seamless integration with the scene.
**Experiments:**
- **Waymo Open Dataset:** ChatSim demonstrates the ability to handle complex language commands and generate photo-realistic scene videos.
- **Performance Evaluation:** ChatSim achieves state-of-the-art performance in photo-realism and wide-angle rendering, outperforming existing methods in terms of accuracy and realism.
**Conclusion:**
ChatSim is the first system to enable editable photo-realistic 3D driving scene simulations via natural language commands and external digital assets. It addresses the limitations of existing approaches by leveraging LLM-agent collaboration, McNeRF, and McLight, achieving high-quality and flexible scene simulations.**ChatSim: Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents**
This paper introduces ChatSim, a system that enables editable photo-realistic 3D driving scene simulations using natural language commands and external digital assets. To address the limitations of existing editable scene simulation approaches, ChatSim leverages a large language model (LLM) agent collaboration framework to enable high command flexibility. It employs a novel multi-camera neural radiance field method (McNeRF) for generating photo-realistic outcomes and a multi-camera lighting estimation method (McLight) to achieve scene-consistent asset rendering.
**Key Contributions:**
1. **LLM-Agents Collaboration:** ChatSim uses a collaborative LLM-agent framework to handle complex and abstract user commands, ensuring intuitive and dynamic editing of driving scenes.
2. **McNeRF for Photo-Realistic Rendering:** McNeRF incorporates multi-camera inputs to generate high-fidelity rendering, addressing camera pose misalignment and brightness inconsistency.
3. **McLight for Realistic Asset Integration:** McLight estimates lighting conditions for external digital assets, enabling seamless integration with the scene.
**Experiments:**
- **Waymo Open Dataset:** ChatSim demonstrates the ability to handle complex language commands and generate photo-realistic scene videos.
- **Performance Evaluation:** ChatSim achieves state-of-the-art performance in photo-realism and wide-angle rendering, outperforming existing methods in terms of accuracy and realism.
**Conclusion:**
ChatSim is the first system to enable editable photo-realistic 3D driving scene simulations via natural language commands and external digital assets. It addresses the limitations of existing approaches by leveraging LLM-agent collaboration, McNeRF, and McLight, achieving high-quality and flexible scene simulations.