26 Jun 2024 | Yuxi Wei*, Zi Wang*, Yifan Lu*, Chenxin Xu*, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang
ChatSim is a novel system for editable photo-realistic 3D driving scene simulations using natural language commands and external digital assets. The system leverages a large language model (LLM) agent collaboration framework to enable flexible and efficient editing. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field (McNeRF) method that addresses challenges such as camera pose misalignment and brightness inconsistency. Additionally, McLight, a novel multi-camera lighting estimation method, is used to achieve scene-consistent rendering of external digital assets. The system is tested on the Waymo Open Dataset, demonstrating its ability to handle complex language commands and generate photo-realistic scene videos. Experiments show that ChatSim achieves state-of-the-art performance in photorealism and outperforms existing methods in lighting estimation. The system also supports multi-round commands and complex scene editing, making it a powerful tool for autonomous driving simulation. ChatSim's collaborative LLM agents enable precise and efficient scene editing, with each agent responsible for specific tasks such as background rendering, foreground rendering, and vehicle motion. The system's ability to integrate external digital assets and generate realistic lighting conditions makes it a valuable resource for data expansion and customization in autonomous driving.ChatSim is a novel system for editable photo-realistic 3D driving scene simulations using natural language commands and external digital assets. The system leverages a large language model (LLM) agent collaboration framework to enable flexible and efficient editing. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field (McNeRF) method that addresses challenges such as camera pose misalignment and brightness inconsistency. Additionally, McLight, a novel multi-camera lighting estimation method, is used to achieve scene-consistent rendering of external digital assets. The system is tested on the Waymo Open Dataset, demonstrating its ability to handle complex language commands and generate photo-realistic scene videos. Experiments show that ChatSim achieves state-of-the-art performance in photorealism and outperforms existing methods in lighting estimation. The system also supports multi-round commands and complex scene editing, making it a powerful tool for autonomous driving simulation. ChatSim's collaborative LLM agents enable precise and efficient scene editing, with each agent responsible for specific tasks such as background rendering, foreground rendering, and vehicle motion. The system's ability to integrate external digital assets and generate realistic lighting conditions makes it a valuable resource for data expansion and customization in autonomous driving.