Understanding Learning without Forgetting

The paper "Learning without Forgetting" by Zhizhong Li and Derek Hoiem addresses the challenge of adapting a Convolutional Neural Network (CNN) to new tasks while preserving performance on existing tasks, without access to training data for the original tasks. The authors propose a method called Learning without Forgetting (LwF), which uses only new task data to train the network while maintaining the original capabilities. LwF is compared to common techniques such as feature extraction, fine-tuning, and multitask learning, showing superior performance on new tasks and comparable or better performance on old tasks compared to fine-tuning with original task data. LwF combines elements of knowledge distillation and fine-tuning, optimizing both for high accuracy on the new task and preserving responses on the old tasks. The method is effective in various scenarios, including single and multiple new tasks, and with different dataset sizes. Experiments on image classification tasks using AlexNet and VGGnet show that LwF outperforms other methods on new tasks and performs similarly to joint training on old tasks. The paper also explores alternative network modifications and design choices, concluding that LwF is a robust and efficient approach for adapting CNNs to new tasks while preserving performance on existing ones.The paper "Learning without Forgetting" by Zhizhong Li and Derek Hoiem addresses the challenge of adapting a Convolutional Neural Network (CNN) to new tasks while preserving performance on existing tasks, without access to training data for the original tasks. The authors propose a method called Learning without Forgetting (LwF), which uses only new task data to train the network while maintaining the original capabilities. LwF is compared to common techniques such as feature extraction, fine-tuning, and multitask learning, showing superior performance on new tasks and comparable or better performance on old tasks compared to fine-tuning with original task data. LwF combines elements of knowledge distillation and fine-tuning, optimizing both for high accuracy on the new task and preserving responses on the old tasks. The method is effective in various scenarios, including single and multiple new tasks, and with different dataset sizes. Experiments on image classification tasks using AlexNet and VGGnet show that LwF outperforms other methods on new tasks and performs similarly to joint training on old tasks. The paper also explores alternative network modifications and design choices, concluding that LwF is a robust and efficient approach for adapting CNNs to new tasks while preserving performance on existing ones.

Learning without Forgetting

14 Feb 2017 | Zhizhong Li, Derek Hoiem, Member, IEEE