**MEMORYLLM: Towards Self-Updatable Large Language Models**
**Abstract:**
Existing large language models (LLMs) often remain static after deployment, making it challenging to incorporate new knowledge. This paper introduces MEMORYLLM, a model designed to include a significant portion of self-updatable parameters, enabling effective and efficient integration of new knowledge. MEMORYLLM integrates a transformer and a fixed-size memory pool within the latent space of the transformer, allowing it to self-update with text knowledge and retain previously injected knowledge. Evaluations demonstrate MEMORYLLM's ability to effectively incorporate new knowledge, as evidenced by its performance on model editing benchmarks and long-context QA benchmarks. The model also exhibits long-term information retention, as validated through custom-designed evaluations and long-context benchmarks. Even after nearly a million memory updates, MEMORYLLM maintains operational integrity without performance degradation. The code and model are open-sourced at <https://github.com/wangyu-ustc/MemoryLLM>.
**Introduction:**
The paper addresses the challenge of updating LLMs with new knowledge. Previous solutions can be categorized into three classes: retrieval-based methods, model editing, and long-context methods. Retrieval-based methods rely on information retrieval from a knowledge base, while model editing involves targeted edits to the model. Long-context methods incorporate all knowledge into the model's context. MEMORYLLM embeds a substantial, fixed-size memory pool within its latent space, designed to manage new knowledge integration and minimize information forgetting. The memory pool is structured as hidden vectors within each transformer layer, containing memory tokens representing compressed knowledge. The self-update mechanism updates the memory pool by propagating new knowledge to every layer, ensuring that previously stored knowledge is gradually phased out. Evaluations focus on integration of new knowledge, knowledge retention ability, and robustness, demonstrating MEMORYLLM's effectiveness in various benchmarks.
**Problem Statement:**
The primary challenge is designing an LLM capable of efficiently integrating new knowledge while minimizing the degradation of previously learned knowledge. Key properties include efficiency, efficacy, knowledge retention, integrity, and non-redundancy.
**MEMORYLLM Structure:**
- **Memory Pool:** instantiated with an off-the-shelf LLM (Llama2) and augmented with a fixed-size memory pool.
- **Self-Update Process:** updates the memory pool using a self-update mechanism that integrates new knowledge while maintaining gradient flow.
- **Analysis of Forgetting:** the self-update process is inspired by exponential forgetting, ensuring that knowledge is exponentially forgotten over time.
**Training Strategy:**
- **New Knowledge Incorporation:** updates the memory pool with new knowledge and uses it for prediction.
- **Enhancing Continuous Context Understanding:** addresses the long-context problem by compressing knowledge from multiple contexts into the memory pool.
- **Mitigating Forgetting Problems:** uses a task involving multiple documents to encourage the model to recall related contexts.
**Experiments:**
- **Evaluation Protocols:** assesses new knowledge integration, knowledge retention,**MEMORYLLM: Towards Self-Updatable Large Language Models**
**Abstract:**
Existing large language models (LLMs) often remain static after deployment, making it challenging to incorporate new knowledge. This paper introduces MEMORYLLM, a model designed to include a significant portion of self-updatable parameters, enabling effective and efficient integration of new knowledge. MEMORYLLM integrates a transformer and a fixed-size memory pool within the latent space of the transformer, allowing it to self-update with text knowledge and retain previously injected knowledge. Evaluations demonstrate MEMORYLLM's ability to effectively incorporate new knowledge, as evidenced by its performance on model editing benchmarks and long-context QA benchmarks. The model also exhibits long-term information retention, as validated through custom-designed evaluations and long-context benchmarks. Even after nearly a million memory updates, MEMORYLLM maintains operational integrity without performance degradation. The code and model are open-sourced at <https://github.com/wangyu-ustc/MemoryLLM>.
**Introduction:**
The paper addresses the challenge of updating LLMs with new knowledge. Previous solutions can be categorized into three classes: retrieval-based methods, model editing, and long-context methods. Retrieval-based methods rely on information retrieval from a knowledge base, while model editing involves targeted edits to the model. Long-context methods incorporate all knowledge into the model's context. MEMORYLLM embeds a substantial, fixed-size memory pool within its latent space, designed to manage new knowledge integration and minimize information forgetting. The memory pool is structured as hidden vectors within each transformer layer, containing memory tokens representing compressed knowledge. The self-update mechanism updates the memory pool by propagating new knowledge to every layer, ensuring that previously stored knowledge is gradually phased out. Evaluations focus on integration of new knowledge, knowledge retention ability, and robustness, demonstrating MEMORYLLM's effectiveness in various benchmarks.
**Problem Statement:**
The primary challenge is designing an LLM capable of efficiently integrating new knowledge while minimizing the degradation of previously learned knowledge. Key properties include efficiency, efficacy, knowledge retention, integrity, and non-redundancy.
**MEMORYLLM Structure:**
- **Memory Pool:** instantiated with an off-the-shelf LLM (Llama2) and augmented with a fixed-size memory pool.
- **Self-Update Process:** updates the memory pool using a self-update mechanism that integrates new knowledge while maintaining gradient flow.
- **Analysis of Forgetting:** the self-update process is inspired by exponential forgetting, ensuring that knowledge is exponentially forgotten over time.
**Training Strategy:**
- **New Knowledge Incorporation:** updates the memory pool with new knowledge and uses it for prediction.
- **Enhancing Continuous Context Understanding:** addresses the long-context problem by compressing knowledge from multiple contexts into the memory pool.
- **Mitigating Forgetting Problems:** uses a task involving multiple documents to encourage the model to recall related contexts.
**Experiments:**
- **Evaluation Protocols:** assesses new knowledge integration, knowledge retention,