DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

7 May 2024 | Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu
This paper introduces DPOT (Auto-Regressive Denoising Operator Transformer), a novel pre-training strategy for neural operators in the context of Partial Differential Equations (PDEs). DPOT addresses the challenges of training on diverse and complex PDE datasets, which are characterized by varying dimensions, temporal steps, resolutions, and geometric configurations. The proposed method involves injecting Gaussian noise into training data and predicting the next timestep using noisy inputs, enhancing robustness and generalization. The model architecture, based on Fourier attention, efficiently learns kernel integral transforms in the frequency domain, making it scalable and flexible for handling diverse PDE data. Extensive experiments on multiple PDE benchmarks and downstream tasks demonstrate that DPOT achieves state-of-the-art performance, with significant improvements in error reduction and generalizability. The largest model, with 1 billion parameters, outperforms smaller models and other state-of-the-art methods, highlighting the effectiveness of the proposed pre-training strategy and model architecture. The code for DPOT is available at <https://github.com/thu-ml/DPOT>.This paper introduces DPOT (Auto-Regressive Denoising Operator Transformer), a novel pre-training strategy for neural operators in the context of Partial Differential Equations (PDEs). DPOT addresses the challenges of training on diverse and complex PDE datasets, which are characterized by varying dimensions, temporal steps, resolutions, and geometric configurations. The proposed method involves injecting Gaussian noise into training data and predicting the next timestep using noisy inputs, enhancing robustness and generalization. The model architecture, based on Fourier attention, efficiently learns kernel integral transforms in the frequency domain, making it scalable and flexible for handling diverse PDE data. Extensive experiments on multiple PDE benchmarks and downstream tasks demonstrate that DPOT achieves state-of-the-art performance, with significant improvements in error reduction and generalizability. The largest model, with 1 billion parameters, outperforms smaller models and other state-of-the-art methods, highlighting the effectiveness of the proposed pre-training strategy and model architecture. The code for DPOT is available at <https://github.com/thu-ml/DPOT>.
Reach us at info@study.space