2024 | Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin and Chao Huang
**UrbanGPT: Spatio-Temporal Large Language Models**
**Authors:** Zhonghang Li, Lianghao Xia, Jiahui Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang
**Institution:** The University of Hong Kong, South China University of Technology, Baidu Inc.
**Project Page:** https://urban-gpt.github.io/
**GitHub:** https://github.com/HKUDS/UrbanGPT
**Abstract:**
Spatio-temporal prediction aims to forecast and gain insights into the dynamic nature of urban environments across time and space. Despite the development of neural network techniques for accurate predictions, data scarcity remains a significant challenge in practical urban sensing scenarios. To address this, the authors propose UrbanGPT, a spatio-temporal large language model (LLM) that integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables LLMs to understand complex inter-dependencies across time and space, enhancing their predictive capabilities under data scarcity. Extensive experiments on various datasets demonstrate UrbanGPT's superior performance compared to state-of-the-art baselines, highlighting its potential in zero-shot spatio-temporal learning scenarios.
**Contributions:**
- UrbanGPT is the first attempt to develop a spatio-temporal LLM capable of predicting diverse urban phenomena across different datasets, especially under limited data conditions.
- The model integrates a spatio-temporal dependency encoder and instruction-tuning, aligning spatio-temporal context with LLMs.
- Experiments on benchmark datasets show UrbanGPT's robust generalization capacity in zero-shot spatio-temporal learning.
**Key Components:**
1. **Spatio-Temporal Dependency Encoder:** Captures intricate temporal dynamics using a multi-level temporal convolutional network.
2. **Spatio-Temporal Instruction-Tuning:** Aligns textual and spatio-temporal information through a lightweight alignment module, enhancing the model's ability to process complex spatio-temporal patterns.
**Evaluation:**
- **Zero-Shot Learning Scenarios:** UrbanGPT outperforms baselines in predicting spatio-temporal patterns in unseen regions and cities.
- **Supervised Learning Scenarios:** The model demonstrates enhanced long-term forecasting abilities and spatial semantic understanding.
- **Ablation Study:** Key components, such as the spatio-temporal encoder and regression layer, significantly impact the model's performance.
**Conclusion:**
UrbanGPT is a promising spatio-temporal LLM that generalizes well across diverse urban scenarios. Future work will focus on enhancing data diversity and improving interpretability.**UrbanGPT: Spatio-Temporal Large Language Models**
**Authors:** Zhonghang Li, Lianghao Xia, Jiahui Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang
**Institution:** The University of Hong Kong, South China University of Technology, Baidu Inc.
**Project Page:** https://urban-gpt.github.io/
**GitHub:** https://github.com/HKUDS/UrbanGPT
**Abstract:**
Spatio-temporal prediction aims to forecast and gain insights into the dynamic nature of urban environments across time and space. Despite the development of neural network techniques for accurate predictions, data scarcity remains a significant challenge in practical urban sensing scenarios. To address this, the authors propose UrbanGPT, a spatio-temporal large language model (LLM) that integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables LLMs to understand complex inter-dependencies across time and space, enhancing their predictive capabilities under data scarcity. Extensive experiments on various datasets demonstrate UrbanGPT's superior performance compared to state-of-the-art baselines, highlighting its potential in zero-shot spatio-temporal learning scenarios.
**Contributions:**
- UrbanGPT is the first attempt to develop a spatio-temporal LLM capable of predicting diverse urban phenomena across different datasets, especially under limited data conditions.
- The model integrates a spatio-temporal dependency encoder and instruction-tuning, aligning spatio-temporal context with LLMs.
- Experiments on benchmark datasets show UrbanGPT's robust generalization capacity in zero-shot spatio-temporal learning.
**Key Components:**
1. **Spatio-Temporal Dependency Encoder:** Captures intricate temporal dynamics using a multi-level temporal convolutional network.
2. **Spatio-Temporal Instruction-Tuning:** Aligns textual and spatio-temporal information through a lightweight alignment module, enhancing the model's ability to process complex spatio-temporal patterns.
**Evaluation:**
- **Zero-Shot Learning Scenarios:** UrbanGPT outperforms baselines in predicting spatio-temporal patterns in unseen regions and cities.
- **Supervised Learning Scenarios:** The model demonstrates enhanced long-term forecasting abilities and spatial semantic understanding.
- **Ablation Study:** Key components, such as the spatio-temporal encoder and regression layer, significantly impact the model's performance.
**Conclusion:**
UrbanGPT is a promising spatio-temporal LLM that generalizes well across diverse urban scenarios. Future work will focus on enhancing data diversity and improving interpretability.