This paper proposes an Interactive Continual Learning (ICL) framework that integrates fast thinking (System1) and slow thinking (System2) models to achieve effective continual learning. The framework is inspired by the Complementary Learning System theory and leverages the strengths of different models to enhance memory retention and reasoning capabilities. System1 is implemented using a Vision Transformer (ViT) model, while System2 is based on a multimodal large language model. The ICL framework enables collaborative learning between these models, with System1 handling memory tasks and System2 facilitating complex reasoning.
To improve memory retrieval and task-specific knowledge acquisition, the paper introduces the Class-Knowledge-Task Multi-Head Attention (CKT-MHA) module, which utilizes class features and knowledge from ViT to aid in task-related knowledge retrieval. Additionally, the CL-vMF mechanism is proposed, which employs the von Mises-Fisher (vMF) distribution to enhance memory representation and retrieval distinguishability through an Expectation-Maximization (EM) update strategy. The von Mises-Fisher Outlier Detection and Interaction (vMF-ODI) strategy is introduced to identify hard examples, enabling System1 and System2 to collaborate more effectively for complex reasoning.
The ICL framework is evaluated on various benchmarks, including the challenging ImageNet-R dataset. The results demonstrate that ICL significantly mitigates catastrophic forgetting and outperforms existing methods in terms of accuracy and performance. The framework's ability to adapt to new data while maintaining the performance of previously learned tasks is a key contribution. The proposed ICL framework provides a novel approach to continual learning by emphasizing the interaction between fast and slow thinking models, aligning with the principles of the Complementary Learning System theory.This paper proposes an Interactive Continual Learning (ICL) framework that integrates fast thinking (System1) and slow thinking (System2) models to achieve effective continual learning. The framework is inspired by the Complementary Learning System theory and leverages the strengths of different models to enhance memory retention and reasoning capabilities. System1 is implemented using a Vision Transformer (ViT) model, while System2 is based on a multimodal large language model. The ICL framework enables collaborative learning between these models, with System1 handling memory tasks and System2 facilitating complex reasoning.
To improve memory retrieval and task-specific knowledge acquisition, the paper introduces the Class-Knowledge-Task Multi-Head Attention (CKT-MHA) module, which utilizes class features and knowledge from ViT to aid in task-related knowledge retrieval. Additionally, the CL-vMF mechanism is proposed, which employs the von Mises-Fisher (vMF) distribution to enhance memory representation and retrieval distinguishability through an Expectation-Maximization (EM) update strategy. The von Mises-Fisher Outlier Detection and Interaction (vMF-ODI) strategy is introduced to identify hard examples, enabling System1 and System2 to collaborate more effectively for complex reasoning.
The ICL framework is evaluated on various benchmarks, including the challenging ImageNet-R dataset. The results demonstrate that ICL significantly mitigates catastrophic forgetting and outperforms existing methods in terms of accuracy and performance. The framework's ability to adapt to new data while maintaining the performance of previously learned tasks is a key contribution. The proposed ICL framework provides a novel approach to continual learning by emphasizing the interaction between fast and slow thinking models, aligning with the principles of the Complementary Learning System theory.