July 1, 2024 | Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, and Weinan E
Memory $^3$ introduces a novel approach to language modeling by incorporating explicit memory, which reduces the cost of training and inference for large language models (LLMs). Inspired by the memory hierarchy of the human brain, explicit memory is a cheaper format than model parameters and text retrieval-augmented generation (RAG). This allows LLMs to have a smaller parameter size, lower training and inference costs, and maintain higher decoding speed compared to RAG models. The model, named Memory $^3$, uses explicit memory as the third form of memory in LLMs, following implicit memory (model parameters) and working memory (context key-values). A memory circuitry theory is introduced to support the externalization of knowledge, along with novel techniques such as a memory sparsification mechanism and a two-stage pretraining scheme. The model achieves better performance than larger LLMs and RAG models, and maintains higher decoding speed than RAG. The paper discusses the theoretical foundation of Memory $^3$, its design, and the results of experiments on general benchmarks and professional tasks. It also compares Memory $^3$ with related works such as retrieval-augmented training and sparse computation, and discusses the implications of explicit memory for LLMs. The paper concludes that explicit memory enables LLMs to develop more human-like capabilities, including infinitely long context, memory consolidation, and factuality. The architecture of Memory $^3$ is described, along with its training scheme and the results of experiments. The paper also discusses the memory hierarchy of LLMs, the cost of knowledge storage, and the benefits of explicit memory for reducing the cost of pretraining LLMs. The paper concludes that explicit memory is a promising approach for improving the efficiency and effectiveness of LLMs.Memory $^3$ introduces a novel approach to language modeling by incorporating explicit memory, which reduces the cost of training and inference for large language models (LLMs). Inspired by the memory hierarchy of the human brain, explicit memory is a cheaper format than model parameters and text retrieval-augmented generation (RAG). This allows LLMs to have a smaller parameter size, lower training and inference costs, and maintain higher decoding speed compared to RAG models. The model, named Memory $^3$, uses explicit memory as the third form of memory in LLMs, following implicit memory (model parameters) and working memory (context key-values). A memory circuitry theory is introduced to support the externalization of knowledge, along with novel techniques such as a memory sparsification mechanism and a two-stage pretraining scheme. The model achieves better performance than larger LLMs and RAG models, and maintains higher decoding speed than RAG. The paper discusses the theoretical foundation of Memory $^3$, its design, and the results of experiments on general benchmarks and professional tasks. It also compares Memory $^3$ with related works such as retrieval-augmented training and sparse computation, and discusses the implications of explicit memory for LLMs. The paper concludes that explicit memory enables LLMs to develop more human-like capabilities, including infinitely long context, memory consolidation, and factuality. The architecture of Memory $^3$ is described, along with its training scheme and the results of experiments. The paper also discusses the memory hierarchy of LLMs, the cost of knowledge storage, and the benefits of explicit memory for reducing the cost of pretraining LLMs. The paper concludes that explicit memory is a promising approach for improving the efficiency and effectiveness of LLMs.