[slides and audio] Needle in the Haystack for Memory Based Large Language Models

The paper "Needle in the Haystack for Memory Based Large Language Models" by Elliot Nelson, Georgios Kollias, Payel Das, Subhajit Chaudhury, and Soham Dan from IBM Research explores the effectiveness of coupling a dynamically adaptable external memory with large language models (LLMs) to improve their performance on long-context recall tasks. The authors test the Larimar model, a recently proposed architecture that uses an external associative memory, on tasks such as the passkey and needle-in-the-haystack tests. They demonstrate that Larimar's external memory, which allows fast write and read operations of text samples, can handle contexts much longer than those seen during training. The latent readouts from the memory control the decoder to generate correct outputs, even when the memory is stored off the GPU. Compared to existing transformer-based LLMs that require larger parameter counts or modified attention mechanisms for long-context recall, Larimar, with only 1.3 billion parameters, maintains strong performance without task-specific training or training on longer contexts. The paper also introduces a method for writing long contexts to an external associative memory and shows that this method can perform long-context retrieval tasks at context lengths of 100K-1M tokens without task-specific training. The experiments conducted on the passkey and needle-in-the-haystack tests demonstrate Larimar's ability to maintain strong recall performance, even for contexts far exceeding the training distribution. The authors discuss the limitations of their approach, such as the loss of cross-segment correlations, and future directions for improving the method.The paper "Needle in the Haystack for Memory Based Large Language Models" by Elliot Nelson, Georgios Kollias, Payel Das, Subhajit Chaudhury, and Soham Dan from IBM Research explores the effectiveness of coupling a dynamically adaptable external memory with large language models (LLMs) to improve their performance on long-context recall tasks. The authors test the Larimar model, a recently proposed architecture that uses an external associative memory, on tasks such as the passkey and needle-in-the-haystack tests. They demonstrate that Larimar's external memory, which allows fast write and read operations of text samples, can handle contexts much longer than those seen during training. The latent readouts from the memory control the decoder to generate correct outputs, even when the memory is stored off the GPU. Compared to existing transformer-based LLMs that require larger parameter counts or modified attention mechanisms for long-context recall, Larimar, with only 1.3 billion parameters, maintains strong performance without task-specific training or training on longer contexts. The paper also introduces a method for writing long contexts to an external associative memory and shows that this method can perform long-context retrieval tasks at context lengths of 100K-1M tokens without task-specific training. The experiments conducted on the passkey and needle-in-the-haystack tests demonstrate Larimar's ability to maintain strong recall performance, even for contexts far exceeding the training distribution. The authors discuss the limitations of their approach, such as the loss of cross-segment correlations, and future directions for improving the method.

Needle in the Haystack for Memory Based Large Language Models

12 Jul 2024 | Elliot Nelson, Georgios Kollias, Payel Das, Subhajit Chaudhury, Soham Dan