August 25-29, 2024 | Yuan Yuan, Jingtao Ding, Jie Feng, Depeng Jin, Yong Li
UniST is a universal model for urban spatio-temporal prediction, designed to handle diverse scenarios with a single model. Inspired by large language models, UniST leverages diverse spatio-temporal data, effective pre-training, and knowledge-guided prompts to enhance generalization. The model is trained on extensive data from various domains and cities, and uses spatio-temporal patching to unify data into a sequential format. It employs a Transformer-based encoder-decoder architecture, with pre-training using masked token modeling and prompt learning to capture complex spatio-temporal relationships. UniST's design enables it to adapt to different scenarios, even with limited or no training data. Extensive experiments on over 20 spatio-temporal scenarios demonstrate UniST's effectiveness in achieving state-of-the-art performance, particularly in few-shot and zero-shot prediction. The model's ability to generalize across diverse scenarios highlights its potential for urban spatio-temporal prediction. UniST's success is attributed to its ability to capture shared spatio-temporal patterns through prompt learning, which allows it to adapt to new scenarios without extensive training. The model's performance is evaluated on various datasets, showing superior results compared to existing methods. The study also explores the scalability of UniST, demonstrating that larger models generally perform better, although spatio-temporal prediction shows diminishing returns with increasing model size. The results indicate that UniST is a promising solution for urban spatio-temporal prediction, capable of handling diverse scenarios with high accuracy and generalization.UniST is a universal model for urban spatio-temporal prediction, designed to handle diverse scenarios with a single model. Inspired by large language models, UniST leverages diverse spatio-temporal data, effective pre-training, and knowledge-guided prompts to enhance generalization. The model is trained on extensive data from various domains and cities, and uses spatio-temporal patching to unify data into a sequential format. It employs a Transformer-based encoder-decoder architecture, with pre-training using masked token modeling and prompt learning to capture complex spatio-temporal relationships. UniST's design enables it to adapt to different scenarios, even with limited or no training data. Extensive experiments on over 20 spatio-temporal scenarios demonstrate UniST's effectiveness in achieving state-of-the-art performance, particularly in few-shot and zero-shot prediction. The model's ability to generalize across diverse scenarios highlights its potential for urban spatio-temporal prediction. UniST's success is attributed to its ability to capture shared spatio-temporal patterns through prompt learning, which allows it to adapt to new scenarios without extensive training. The model's performance is evaluated on various datasets, showing superior results compared to existing methods. The study also explores the scalability of UniST, demonstrating that larger models generally perform better, although spatio-temporal prediction shows diminishing returns with increasing model size. The results indicate that UniST is a promising solution for urban spatio-temporal prediction, capable of handling diverse scenarios with high accuracy and generalization.