7 May 2024 | Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Mengibar
TransformerFAM is a novel Transformer architecture that enables the model to attend to its own latent representations through a feedback loop, fostering the emergence of working memory. This design allows the model to process indefinitely long sequences without additional weights, enabling seamless integration with pre-trained models. The architecture significantly improves performance on long-context tasks across various model sizes (1B, 8B, and 24B). TransformerFAM has a computational complexity of O(L) and a memory complexity of O(1), making it efficient for handling infinitely long input sequences. The model maintains past information for an indefinite horizon, making it a promising solution for Large Language Models (LLMs) to handle infinitely long input sequences. TransformerFAM leverages a feedback loop to enable the network to attend to its own latent representations, allowing it to process long sequences efficiently. The architecture is compared with other models like TransformerBSWA and shows superior performance on long-context tasks. The experiments demonstrate that TransformerFAM outperforms other models in terms of performance on long-context tasks, indicating its potential to enhance the capabilities of LLMs. The model's ability to process long sequences is attributed to its working memory mechanism, which is essential for maintaining long-term dependencies. The results show that TransformerFAM can effectively compress and retain important contextual information within extremely long contexts. The model's performance is evaluated on various tasks, including PassKey Retrieval and Long Context Tasks, demonstrating its effectiveness in handling long sequences. The architecture is also compared with other models in the literature, showing its potential to address the limitations of existing models in handling long sequences. The results indicate that TransformerFAM is a promising solution for LLMs to handle infinitely long input sequences.TransformerFAM is a novel Transformer architecture that enables the model to attend to its own latent representations through a feedback loop, fostering the emergence of working memory. This design allows the model to process indefinitely long sequences without additional weights, enabling seamless integration with pre-trained models. The architecture significantly improves performance on long-context tasks across various model sizes (1B, 8B, and 24B). TransformerFAM has a computational complexity of O(L) and a memory complexity of O(1), making it efficient for handling infinitely long input sequences. The model maintains past information for an indefinite horizon, making it a promising solution for Large Language Models (LLMs) to handle infinitely long input sequences. TransformerFAM leverages a feedback loop to enable the network to attend to its own latent representations, allowing it to process long sequences efficiently. The architecture is compared with other models like TransformerBSWA and shows superior performance on long-context tasks. The experiments demonstrate that TransformerFAM outperforms other models in terms of performance on long-context tasks, indicating its potential to enhance the capabilities of LLMs. The model's ability to process long sequences is attributed to its working memory mechanism, which is essential for maintaining long-term dependencies. The results show that TransformerFAM can effectively compress and retain important contextual information within extremely long contexts. The model's performance is evaluated on various tasks, including PassKey Retrieval and Long Context Tasks, demonstrating its effectiveness in handling long sequences. The architecture is also compared with other models in the literature, showing its potential to address the limitations of existing models in handling long sequences. The results indicate that TransformerFAM is a promising solution for LLMs to handle infinitely long input sequences.