[slides] Informer%3A Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

The paper "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting" addresses the challenge of long sequence time-series forecasting (LSTF) by proposing an efficient transformer-based model named Informer. The main issues with vanilla transformers in LSTF include quadratic time complexity, high memory usage, and the limitations of the encoder-decoder architecture. To overcome these challenges, Informer introduces three key innovations: 1. **ProbSparse Self-Attention Mechanism**: This mechanism reduces the time complexity and memory usage to $\mathcal{O}(L \log L)$, achieving efficient dependency alignment on long sequences. 2. **Self-Attention Distilling**: This operation highlights the dominating attention scores by halving the input size of cascading layers, reducing the total space complexity to $\mathcal{O}((2 - \epsilon) L \log L)$ and improving the handling of long input sequences. 3. **Generative Style Decoder**: This decoder predicts long time-series sequences in a single forward operation, avoiding the step-by-step inference of vanilla transformers and significantly improving inference speed. Extensive experiments on four large-scale datasets demonstrate that Informer outperforms existing methods, providing a new solution to the LSTF problem. The paper also includes a detailed analysis of the model's performance, parameter sensitivity, and ablation studies, confirming the effectiveness of each component.The paper "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting" addresses the challenge of long sequence time-series forecasting (LSTF) by proposing an efficient transformer-based model named Informer. The main issues with vanilla transformers in LSTF include quadratic time complexity, high memory usage, and the limitations of the encoder-decoder architecture. To overcome these challenges, Informer introduces three key innovations: 1. **ProbSparse Self-Attention Mechanism**: This mechanism reduces the time complexity and memory usage to $\mathcal{O}(L \log L)$, achieving efficient dependency alignment on long sequences. 2. **Self-Attention Distilling**: This operation highlights the dominating attention scores by halving the input size of cascading layers, reducing the total space complexity to $\mathcal{O}((2 - \epsilon) L \log L)$ and improving the handling of long input sequences. 3. **Generative Style Decoder**: This decoder predicts long time-series sequences in a single forward operation, avoiding the step-by-step inference of vanilla transformers and significantly improving inference speed. Extensive experiments on four large-scale datasets demonstrate that Informer outperforms existing methods, providing a new solution to the LSTF problem. The paper also includes a detailed analysis of the model's performance, parameter sensitivity, and ablation studies, confirming the effectiveness of each component.

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

28 Mar 2021 | Haoyi Zhou, 1 Shanghang Zhang, 2 Jieqi Peng, 1 Shuai Zhang, 1 Jianxin Li, 1 Hui Xiong, 3 Wancai Zhang 4