Needle in the Haystack for Memory Based Large Language Models

Needle in the Haystack for Memory Based Large Language Models

12 Jul 2024 | Elliot Nelson¹, Georgios Kollias², Payel Das³, Subhajit Chaudhury⁴, Soham Dan⁵
This paper introduces Larimar, a language model with an external associative memory that enables efficient long-context recall. Unlike traditional transformer-based models, which struggle with long-context tasks due to quadratic complexity and information mixing, Larimar uses a dynamically updatable memory system to store and retrieve information from arbitrarily long contexts. The model's external memory is structured similarly to the Kanerva Machine, but updates the memory using least-squares solutions rather than Gaussian posteriors. This allows the model to generalize to much longer contexts than those seen during training, even when only a small part of the context is task-relevant. The model's memory operations are performed on the CPU, enabling efficient scaling to longer contexts without increasing GPU memory usage. During training, the model is trained on relatively short contexts (384 tokens), but can handle much longer contexts during inference. The external memory is used to condition the decoder, allowing it to generate correct outputs based on the memory readout. The model's performance is evaluated on two tasks: the passkey test and the needle-in-the-haystack test. On the passkey test, Larimar achieves high accuracy even with very long contexts, outperforming other models. On the needle-in-the-haystack test, Larimar maintains strong recall at over 100K tokens, while other models struggle with shorter contexts. The paper also discusses the limitations of the approach, noting that the model loses information about cross-segment correlations and sequence order when contexts are divided into segments. However, the model is effective for tasks where relevant information is contained within individual segments. The authors suggest that future work could focus on more general methods for computing reading and writing keys based on the full context, allowing for dynamic allocation of memory to more task-relevant data. The paper concludes that Larimar provides a compact, efficient solution for long-context recall without task-specific training, making it a promising alternative to larger, more complex models.This paper introduces Larimar, a language model with an external associative memory that enables efficient long-context recall. Unlike traditional transformer-based models, which struggle with long-context tasks due to quadratic complexity and information mixing, Larimar uses a dynamically updatable memory system to store and retrieve information from arbitrarily long contexts. The model's external memory is structured similarly to the Kanerva Machine, but updates the memory using least-squares solutions rather than Gaussian posteriors. This allows the model to generalize to much longer contexts than those seen during training, even when only a small part of the context is task-relevant. The model's memory operations are performed on the CPU, enabling efficient scaling to longer contexts without increasing GPU memory usage. During training, the model is trained on relatively short contexts (384 tokens), but can handle much longer contexts during inference. The external memory is used to condition the decoder, allowing it to generate correct outputs based on the memory readout. The model's performance is evaluated on two tasks: the passkey test and the needle-in-the-haystack test. On the passkey test, Larimar achieves high accuracy even with very long contexts, outperforming other models. On the needle-in-the-haystack test, Larimar maintains strong recall at over 100K tokens, while other models struggle with shorter contexts. The paper also discusses the limitations of the approach, noting that the model loses information about cross-segment correlations and sequence order when contexts are divided into segments. However, the model is effective for tasks where relevant information is contained within individual segments. The authors suggest that future work could focus on more general methods for computing reading and writing keys based on the full context, allowing for dynamic allocation of memory to more task-relevant data. The paper concludes that Larimar provides a compact, efficient solution for long-context recall without task-specific training, making it a promising alternative to larger, more complex models.
Reach us at info@study.space
[slides] Needle in the Haystack for Memory Based Large Language Models | StudySpace