SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

2024 | Romain Ilbert, Ambroise Odonnat, Vasiliy Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko
SAMformer is a lightweight transformer model designed to improve the performance of time series forecasting by addressing the limitations of traditional transformer architectures. The paper highlights that transformers, despite their high expressive power, struggle with generalization in multivariate time series forecasting due to issues like attention mechanism instability and sharpness of the loss landscape. SAMformer incorporates sharpness-aware minimization (SAM) to optimize the model, allowing it to converge to flatter local minima and improve generalization. The model also integrates channel-wise attention, which is more effective than temporal attention in this context. The proposed SAMformer outperforms existing state-of-the-art methods on various real-world multivariate time series forecasting datasets, achieving better performance with significantly fewer parameters than the largest foundation model, MOIRAI. The model's performance is validated across multiple datasets, including electricity, traffic, weather, and exchange rate data, with results showing that SAMformer consistently generalizes well and is more robust to different initializations and prediction horizons. The paper also discusses the limitations of previous approaches, such as the entropy collapse in attention matrices and the sharpness of the loss landscape, which hinder the performance of transformers. SAMformer addresses these issues by using SAM and channel-wise attention, leading to a more stable and effective model. The results demonstrate that SAMformer not only improves upon existing models but also shows strong performance against recent transformer-based methods, including iTransformer and PatchTST. The model's efficiency and effectiveness make it a promising solution for multivariate time series forecasting.SAMformer is a lightweight transformer model designed to improve the performance of time series forecasting by addressing the limitations of traditional transformer architectures. The paper highlights that transformers, despite their high expressive power, struggle with generalization in multivariate time series forecasting due to issues like attention mechanism instability and sharpness of the loss landscape. SAMformer incorporates sharpness-aware minimization (SAM) to optimize the model, allowing it to converge to flatter local minima and improve generalization. The model also integrates channel-wise attention, which is more effective than temporal attention in this context. The proposed SAMformer outperforms existing state-of-the-art methods on various real-world multivariate time series forecasting datasets, achieving better performance with significantly fewer parameters than the largest foundation model, MOIRAI. The model's performance is validated across multiple datasets, including electricity, traffic, weather, and exchange rate data, with results showing that SAMformer consistently generalizes well and is more robust to different initializations and prediction horizons. The paper also discusses the limitations of previous approaches, such as the entropy collapse in attention matrices and the sharpness of the loss landscape, which hinder the performance of transformers. SAMformer addresses these issues by using SAM and channel-wise attention, leading to a more stable and effective model. The results demonstrate that SAMformer not only improves upon existing models but also shows strong performance against recent transformer-based methods, including iTransformer and PatchTST. The model's efficiency and effectiveness make it a promising solution for multivariate time series forecasting.
Reach us at info@study.space
[slides and audio] SAMformer%3A Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention