22 Oct 2020 | Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, Simone Calderara
Dark Experience Replay (DER) is a simple baseline for General Continual Learning (GCL), which aims to address the challenge of learning from a stream of data where task boundaries are blurred and domain and class distributions shift gradually or suddenly. The method combines rehearsal with knowledge distillation and regularization, matching the network's logits sampled throughout the optimization trajectory to promote consistency with past experiences. Through extensive analysis on standard benchmarks and a novel GCL evaluation setting (MNIST-360), DER outperforms existing approaches and leverages limited resources. DER is designed to be compatible with GCL requirements, including no task boundaries, no test time oracle, and constant memory. It achieves flatter minima, better model calibration, and improved performance on both standard and novel GCL settings. DER++ is an enhanced version that further improves performance by incorporating additional terms to promote higher conditional likelihood with minimal memory overhead. The method is evaluated on various datasets, including CIFAR-10, Tiny ImageNet, and MNIST-360, demonstrating its effectiveness in preventing catastrophic forgetting and maintaining performance across tasks. DER and DER++ show strong performance in most benchmarks, especially in Domain-IL scenarios, and are effective in handling both sharp and smooth distribution shifts. The method is also evaluated for its calibration properties, showing that DER is less overconfident and easier to interpret. DER is recommended as a strong baseline for future studies in both Continual Learning and General Continual Learning.Dark Experience Replay (DER) is a simple baseline for General Continual Learning (GCL), which aims to address the challenge of learning from a stream of data where task boundaries are blurred and domain and class distributions shift gradually or suddenly. The method combines rehearsal with knowledge distillation and regularization, matching the network's logits sampled throughout the optimization trajectory to promote consistency with past experiences. Through extensive analysis on standard benchmarks and a novel GCL evaluation setting (MNIST-360), DER outperforms existing approaches and leverages limited resources. DER is designed to be compatible with GCL requirements, including no task boundaries, no test time oracle, and constant memory. It achieves flatter minima, better model calibration, and improved performance on both standard and novel GCL settings. DER++ is an enhanced version that further improves performance by incorporating additional terms to promote higher conditional likelihood with minimal memory overhead. The method is evaluated on various datasets, including CIFAR-10, Tiny ImageNet, and MNIST-360, demonstrating its effectiveness in preventing catastrophic forgetting and maintaining performance across tasks. DER and DER++ show strong performance in most benchmarks, especially in Domain-IL scenarios, and are effective in handling both sharp and smooth distribution shifts. The method is also evaluated for its calibration properties, showing that DER is less overconfident and easier to interpret. DER is recommended as a strong baseline for future studies in both Continual Learning and General Continual Learning.