xLSTM: Extended Long Short-Term Memory

xLSTM: Extended Long Short-Term Memory

7 May 2024 | Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter
xLSTM: Extended Long Short-Term Memory xLSTM is an enhanced version of the Long Short-Term Memory (LSTM) architecture, designed to address the limitations of traditional LSTMs while maintaining their effectiveness in language modeling. The key innovations of xLSTM include the introduction of exponential gating with normalization and stabilization techniques, and the modification of the LSTM memory structure to include two new variants: sLSTM and mLSTM. sLSTM features a scalar memory, scalar update, and new memory mixing, while mLSTM introduces a matrix memory and a covariance update rule, enabling full parallelization. The xLSTM architecture integrates these modified LSTM variants into residual block backbones, resulting in xLSTM blocks that are then stacked to form xLSTM architectures. These architectures are designed to perform favorably compared to state-of-the-art Transformers and State Space Models, both in performance and scalability. The exponential gating and modified memory structures enhance xLSTM's capabilities, allowing it to revise storage decisions, handle larger storage capacities, and overcome the lack of parallelizability inherent in traditional LSTMs. xLSTM has been evaluated on various tasks, including synthetic tasks, associative recall, and long-range sequence processing. The results demonstrate that xLSTM outperforms existing methods in language modeling, particularly in tasks requiring state tracking and memory capacity. The architecture also exhibits strong scaling behavior, maintaining low perplexity for longer contexts and performing well on downstream tasks. Despite its advantages, xLSTM has some limitations, including the prohibition of parallelizable operations in sLSTM, suboptimal CUDA kernels for mLSTM, and high computational complexity due to matrix memory. However, these limitations are considered minor, and xLSTM shows significant potential in various deep learning applications, including Reinforcement Learning, Time Series Prediction, and the modeling of physical systems.xLSTM: Extended Long Short-Term Memory xLSTM is an enhanced version of the Long Short-Term Memory (LSTM) architecture, designed to address the limitations of traditional LSTMs while maintaining their effectiveness in language modeling. The key innovations of xLSTM include the introduction of exponential gating with normalization and stabilization techniques, and the modification of the LSTM memory structure to include two new variants: sLSTM and mLSTM. sLSTM features a scalar memory, scalar update, and new memory mixing, while mLSTM introduces a matrix memory and a covariance update rule, enabling full parallelization. The xLSTM architecture integrates these modified LSTM variants into residual block backbones, resulting in xLSTM blocks that are then stacked to form xLSTM architectures. These architectures are designed to perform favorably compared to state-of-the-art Transformers and State Space Models, both in performance and scalability. The exponential gating and modified memory structures enhance xLSTM's capabilities, allowing it to revise storage decisions, handle larger storage capacities, and overcome the lack of parallelizability inherent in traditional LSTMs. xLSTM has been evaluated on various tasks, including synthetic tasks, associative recall, and long-range sequence processing. The results demonstrate that xLSTM outperforms existing methods in language modeling, particularly in tasks requiring state tracking and memory capacity. The architecture also exhibits strong scaling behavior, maintaining low perplexity for longer contexts and performing well on downstream tasks. Despite its advantages, xLSTM has some limitations, including the prohibition of parallelizable operations in sLSTM, suboptimal CUDA kernels for mLSTM, and high computational complexity due to matrix memory. However, these limitations are considered minor, and xLSTM shows significant potential in various deep learning applications, including Reinforcement Learning, Time Series Prediction, and the modeling of physical systems.
Reach us at info@study.space