This paper investigates the in-context learning (ICL) capabilities of state-space models (SSMs), particularly Mamba, compared to Transformer models. The study evaluates how well these models can perform ICL tasks without parameter optimization. Results show that SSMs, including Mamba, perform comparably to Transformers in standard regression tasks but outperform them in tasks like sparse parity learning. However, SSMs struggle with tasks involving non-standard retrieval functionality. To address these limitations, the authors introduce a hybrid model, MambaFormer, which combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. The findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models. The study also highlights that while Transformers excel in certain tasks, SSMs like Mamba show strengths in others, such as learning sparse parities. The hybrid model MambaFormer achieves best-of-both-worlds performance in various ICL tasks, including retrieval and parity learning. The research underscores the importance of exploring broader understanding of ICL beyond Transformers, as significant progress has been made in attention-free architectures. The study also demonstrates that hybrid models can perform as well as or better than Transformers in formal language ICL tasks, indicating their potential for language modeling and in-context learning.This paper investigates the in-context learning (ICL) capabilities of state-space models (SSMs), particularly Mamba, compared to Transformer models. The study evaluates how well these models can perform ICL tasks without parameter optimization. Results show that SSMs, including Mamba, perform comparably to Transformers in standard regression tasks but outperform them in tasks like sparse parity learning. However, SSMs struggle with tasks involving non-standard retrieval functionality. To address these limitations, the authors introduce a hybrid model, MambaFormer, which combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. The findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models. The study also highlights that while Transformers excel in certain tasks, SSMs like Mamba show strengths in others, such as learning sparse parities. The hybrid model MambaFormer achieves best-of-both-worlds performance in various ICL tasks, including retrieval and parity learning. The research underscores the importance of exploring broader understanding of ICL beyond Transformers, as significant progress has been made in attention-free architectures. The study also demonstrates that hybrid models can perform as well as or better than Transformers in formal language ICL tasks, indicating their potential for language modeling and in-context learning.