The paper introduces a novel framework called Versatile Behavior Diffusion (VBD) for generating realistic and controllable traffic scenarios with multiple interacting agents. VBD leverages diffusion models, which are typically used for generative tasks, to model multi-agent interactions in traffic simulations. The key contributions of the paper include:
1. **Conceptual Connection**: The paper draws a conceptual connection between imitation learning (IL) and diffusion-based generative modeling, bridging the gap between scenario generation and diffusion.
2. **Model Architecture**: VBD consists of three main components: a scene context encoder, a denoiser, and a multi-modal trajectory predictor. These components work together to generate scene-consistent multi-agent interactions and enable scenario editing through multi-step guidance and refinement.
3. **State-of-the-Art Performance**: VBD achieves state-of-the-art performance on the Waymo Sim Agents benchmark, demonstrating its effectiveness in generating realistic and controllable traffic scenarios.
4. **Versatility**: VBD can be adapted to various applications, including conditioning on priors, integrating with model-based optimization, sampling multi-modal scene-consistent scenarios, and generating safety-critical scenarios using a game-theoretic solver.
The paper also discusses the training and evaluation of VBD, including the use of multi-task learning to train the encoder, denoiser, and predictor components. Experimental results show that VBD can generate diverse, realistic, and interactive traffic scenarios, and it effectively handles various user-defined objectives and constraints. The paper concludes with a discussion on the limitations and future directions, emphasizing the potential for enhancing runtime efficiency and applying VBD to AV planning tests.The paper introduces a novel framework called Versatile Behavior Diffusion (VBD) for generating realistic and controllable traffic scenarios with multiple interacting agents. VBD leverages diffusion models, which are typically used for generative tasks, to model multi-agent interactions in traffic simulations. The key contributions of the paper include:
1. **Conceptual Connection**: The paper draws a conceptual connection between imitation learning (IL) and diffusion-based generative modeling, bridging the gap between scenario generation and diffusion.
2. **Model Architecture**: VBD consists of three main components: a scene context encoder, a denoiser, and a multi-modal trajectory predictor. These components work together to generate scene-consistent multi-agent interactions and enable scenario editing through multi-step guidance and refinement.
3. **State-of-the-Art Performance**: VBD achieves state-of-the-art performance on the Waymo Sim Agents benchmark, demonstrating its effectiveness in generating realistic and controllable traffic scenarios.
4. **Versatility**: VBD can be adapted to various applications, including conditioning on priors, integrating with model-based optimization, sampling multi-modal scene-consistent scenarios, and generating safety-critical scenarios using a game-theoretic solver.
The paper also discusses the training and evaluation of VBD, including the use of multi-task learning to train the encoder, denoiser, and predictor components. Experimental results show that VBD can generate diverse, realistic, and interactive traffic scenarios, and it effectively handles various user-defined objectives and constraints. The paper concludes with a discussion on the limitations and future directions, emphasizing the potential for enhancing runtime efficiency and applying VBD to AV planning tests.