SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction

SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction

24 May 2024 | Wei Wu*, Xiaoxin Feng*, Ziyan Gao*, Yuheng Kan
**Title:** SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction **Authors:** Wei Wu **Abstract:** This paper introduces SMART, a novel paradigm for autonomous driving motion generation that addresses the limitations of dataset size and domain gaps. SMART models vectorized map and agent trajectory data into discrete sequence tokens, which are processed through a decoder-only transformer architecture to train for next token prediction. This GPT-style method allows the model to learn motion distributions in real driving scenarios. SMART achieves state-of-the-art performance on the Sim Agents challenge, ranking 1st on the Waymo Open Motion Dataset (WOMD) leaderboards. It demonstrates zero-shot generalization capabilities, achieving a competitive score of 0.71 on the Sim Agents challenge using only the NuPlan dataset for training and WOMD for validation. Over 1 billion motion tokens from multiple datasets validate the model's scalability. SMART's contributions include a novel framework for motion generation, zero-shot generalizability across datasets, and state-of-the-art performance on the Sim Agents challenge. The model's single-frame inference time is within 15ms, meeting real-time requirements for interactive simulation in autonomous driving. **Introduction:** The paper discusses the limitations of existing motion generation methods, such as the inability to handle future interactions between agent motions and the lack of generalizability across different datasets. SMART overcomes these issues by tokenizing agent trajectories and map data, using a decoder-only transformer for next token prediction. This approach enhances spatial and temporal understanding, improving generative tasks in autonomous driving. **Method:** SMART employs a tokenizer for map data and a next token prediction task for agent trajectories. The model architecture includes an encoder for road map encoding and a motion decoder that predicts motion token distributions. The training tasks focus on understanding temporal and spatial relationships in traffic scenes. **Experiments:** Experiments validate SMART's generalizability and scalability. SMART achieves superior performance on the Sim Agents challenge, demonstrating zero-shot generalization on different datasets and unseen scenarios. Scalability is confirmed through power-law scaling laws, showing a predictable decrease in test loss as model size increases. **Conclusion:** SMART is a scalable and zero-shot generalizable model for autonomous driving motion generation. The release of all code encourages further exploration and development in the field, contributing to more reliable autonomous driving systems. Future work will focus on improving the model's performance and verifying its applicability to planning and prediction tasks.**Title:** SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction **Authors:** Wei Wu **Abstract:** This paper introduces SMART, a novel paradigm for autonomous driving motion generation that addresses the limitations of dataset size and domain gaps. SMART models vectorized map and agent trajectory data into discrete sequence tokens, which are processed through a decoder-only transformer architecture to train for next token prediction. This GPT-style method allows the model to learn motion distributions in real driving scenarios. SMART achieves state-of-the-art performance on the Sim Agents challenge, ranking 1st on the Waymo Open Motion Dataset (WOMD) leaderboards. It demonstrates zero-shot generalization capabilities, achieving a competitive score of 0.71 on the Sim Agents challenge using only the NuPlan dataset for training and WOMD for validation. Over 1 billion motion tokens from multiple datasets validate the model's scalability. SMART's contributions include a novel framework for motion generation, zero-shot generalizability across datasets, and state-of-the-art performance on the Sim Agents challenge. The model's single-frame inference time is within 15ms, meeting real-time requirements for interactive simulation in autonomous driving. **Introduction:** The paper discusses the limitations of existing motion generation methods, such as the inability to handle future interactions between agent motions and the lack of generalizability across different datasets. SMART overcomes these issues by tokenizing agent trajectories and map data, using a decoder-only transformer for next token prediction. This approach enhances spatial and temporal understanding, improving generative tasks in autonomous driving. **Method:** SMART employs a tokenizer for map data and a next token prediction task for agent trajectories. The model architecture includes an encoder for road map encoding and a motion decoder that predicts motion token distributions. The training tasks focus on understanding temporal and spatial relationships in traffic scenes. **Experiments:** Experiments validate SMART's generalizability and scalability. SMART achieves superior performance on the Sim Agents challenge, demonstrating zero-shot generalization on different datasets and unseen scenarios. Scalability is confirmed through power-law scaling laws, showing a predictable decrease in test loss as model size increases. **Conclusion:** SMART is a scalable and zero-shot generalizable model for autonomous driving motion generation. The release of all code encourages further exploration and development in the field, contributing to more reliable autonomous driving systems. Future work will focus on improving the model's performance and verifying its applicability to planning and prediction tasks.
Reach us at info@study.space
[slides] SMART%3A Scalable Multi-agent Real-time Simulation via Next-token Prediction | StudySpace