DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

5 Mar 2024 | Wei He * 1 Kai Han * 1 Yehui Tang 1 Chengcheng Wang 1 Yujie Yang 1 Tianyu Guo 1 Yunhe Wang 1
This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in State Space Models (SSMs). SSMs, which offer lower computational complexity compared to the widely used Transformer architecture, have not fully matched the performance of Transformers. DenseSSM addresses this issue by selectively integrating shallow-layer hidden states into deeper layers, retaining fine-grained information crucial for the final output. This method maintains the training parallelizability and inference efficiency of SSMs while achieving significant improvements. The proposed method is applicable to various SSM types, such as RetNet and Mamba, and demonstrates up to 5% accuracy improvement on public benchmarks with similar model sizes. The paper also includes a detailed analysis of the hidden state degradation in conventional SSMs and introduces a dense connection mechanism to preserve richer information for deeper layers. The effectiveness of DenseSSM is validated through comprehensive experiments on different architectures, showing superior performance in both autoregressive and parallelizable convolution modes.This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in State Space Models (SSMs). SSMs, which offer lower computational complexity compared to the widely used Transformer architecture, have not fully matched the performance of Transformers. DenseSSM addresses this issue by selectively integrating shallow-layer hidden states into deeper layers, retaining fine-grained information crucial for the final output. This method maintains the training parallelizability and inference efficiency of SSMs while achieving significant improvements. The proposed method is applicable to various SSM types, such as RetNet and Mamba, and demonstrates up to 5% accuracy improvement on public benchmarks with similar model sizes. The paper also includes a detailed analysis of the hidden state degradation in conventional SSMs and introduces a dense connection mechanism to preserve richer information for deeper layers. The effectiveness of DenseSSM is validated through comprehensive experiments on different architectures, showing superior performance in both autoregressive and parallelizable convolution modes.
Reach us at info@study.space