14 Apr 2024 | Yang He1,2, Lingao Xiao1,2, Joey Tianyi Zhou1,2*, Ivor W. Tsang1,2,3
The paper introduces Multisize Dataset Condensation (MDC), a novel method that addresses the challenges of dataset condensation in on-device scenarios. Traditional dataset condensation methods often suffer from the "subset degradation problem," where subsets from a condensed dataset are less representative compared to directly condensing the full dataset. MDC compresses multiple condensation processes into a single process, reducing storage requirements and computational overhead. The key contribution is the "adaptive subset loss," which adaptively selects the Most Learnable Subset (MLS) to mitigate the subset degradation problem. The method is validated on various datasets (SVHN, CIFAR-10, CIFAR-100, ImageNet) and networks (ConvNet, ResNet, DenseNet), showing significant accuracy improvements, especially for small subsets. Experiments demonstrate that MDC outperforms existing methods in terms of accuracy and efficiency, making it a promising solution for on-device dataset condensation.The paper introduces Multisize Dataset Condensation (MDC), a novel method that addresses the challenges of dataset condensation in on-device scenarios. Traditional dataset condensation methods often suffer from the "subset degradation problem," where subsets from a condensed dataset are less representative compared to directly condensing the full dataset. MDC compresses multiple condensation processes into a single process, reducing storage requirements and computational overhead. The key contribution is the "adaptive subset loss," which adaptively selects the Most Learnable Subset (MLS) to mitigate the subset degradation problem. The method is validated on various datasets (SVHN, CIFAR-10, CIFAR-100, ImageNet) and networks (ConvNet, ResNet, DenseNet), showing significant accuracy improvements, especially for small subsets. Experiments demonstrate that MDC outperforms existing methods in terms of accuracy and efficiency, making it a promising solution for on-device dataset condensation.