The paper introduces a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning in urban settings. GPD addresses the challenge of data scarcity and heterogeneity by pre-training a generative diffusion model on a collection of neural network parameters optimized from source cities. This approach allows the model to adaptively generate tailored neural networks for target cities, improving performance in tasks such as traffic speed and crowd flow prediction. The framework is model-agnostic, ensuring compatibility with various spatio-temporal prediction models. Extensive experiments on multiple real-world datasets demonstrate that GPD outperforms state-of-the-art baselines, achieving an average improvement of 7.87% in prediction accuracy. The key contributions of GPD include its ability to effectively transfer knowledge across cities and its flexibility in handling diverse data distributions. The implementation of GPD is available at <https://github.com/tsinghua-fib-lab/GPD>.The paper introduces a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning in urban settings. GPD addresses the challenge of data scarcity and heterogeneity by pre-training a generative diffusion model on a collection of neural network parameters optimized from source cities. This approach allows the model to adaptively generate tailored neural networks for target cities, improving performance in tasks such as traffic speed and crowd flow prediction. The framework is model-agnostic, ensuring compatibility with various spatio-temporal prediction models. Extensive experiments on multiple real-world datasets demonstrate that GPD outperforms state-of-the-art baselines, achieving an average improvement of 7.87% in prediction accuracy. The key contributions of GPD include its ability to effectively transfer knowledge across cities and its flexibility in handling diverse data distributions. The implementation of GPD is available at <https://github.com/tsinghua-fib-lab/GPD>.