[slides] EfficientTrain%2B%2B%3A Generalized Curriculum Learning for Efficient Visual Backbone Training

The paper "EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training" addresses the issue of costly training procedures for modern computer vision backbones, such as vision Transformers trained on large datasets like ImageNet-1K/22K. The authors propose a generalized curriculum learning approach to reduce training time without sacrificing accuracy. They reformulate the training curriculum as a soft-selection function that progressively introduces more difficult patterns within each example, rather than selecting easier-to-harder samples. The key contributions include: 1. **Generalized Curriculum Learning**: The authors introduce a soft-selection function that dynamically extracts 'easier-to-learn' patterns (lower-frequency components) from each example, while gradually introducing harder patterns as training progresses. 2. **Efficient Training Techniques**: They propose a cropping operation in the Fourier spectrum to extract only the lower-frequency components, reducing computational cost. They also suggest using weaker data augmentation techniques to leverage the original, more learnable patterns. 3. **EfficientTrain++**: An enhanced version of EfficientTrain that improves efficiency through: - A computational-constrained sequential searching algorithm to determine the curriculum schedule. - An efficient low-frequency down-sampling operation to reduce CPU-GPU I/O costs. 4. **Implementation Techniques**: Two techniques to facilitate large-scale parallel training and reduce data preprocessing loads. 5. **Experimental Results**: The method significantly reduces training time for various visual backbones (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, CAFormer) by 1.5-3.0× on ImageNet-1K/22K without compromising accuracy. It also demonstrates effectiveness in self-supervised learning (e.g., MAE). The paper provides a comprehensive evaluation of the proposed method, including comparisons with state-of-the-art efficient training methods and demonstrations of its applicability to different training scenarios and downstream tasks.The paper "EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training" addresses the issue of costly training procedures for modern computer vision backbones, such as vision Transformers trained on large datasets like ImageNet-1K/22K. The authors propose a generalized curriculum learning approach to reduce training time without sacrificing accuracy. They reformulate the training curriculum as a soft-selection function that progressively introduces more difficult patterns within each example, rather than selecting easier-to-harder samples. The key contributions include: 1. **Generalized Curriculum Learning**: The authors introduce a soft-selection function that dynamically extracts 'easier-to-learn' patterns (lower-frequency components) from each example, while gradually introducing harder patterns as training progresses. 2. **Efficient Training Techniques**: They propose a cropping operation in the Fourier spectrum to extract only the lower-frequency components, reducing computational cost. They also suggest using weaker data augmentation techniques to leverage the original, more learnable patterns. 3. **EfficientTrain++**: An enhanced version of EfficientTrain that improves efficiency through: - A computational-constrained sequential searching algorithm to determine the curriculum schedule. - An efficient low-frequency down-sampling operation to reduce CPU-GPU I/O costs. 4. **Implementation Techniques**: Two techniques to facilitate large-scale parallel training and reduce data preprocessing loads. 5. **Experimental Results**: The method significantly reduces training time for various visual backbones (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, CAFormer) by 1.5-3.0× on ImageNet-1K/22K without compromising accuracy. It also demonstrates effectiveness in self-supervised learning (e.g., MAE). The paper provides a comprehensive evaluation of the proposed method, including comparisons with state-of-the-art efficient training methods and demonstrations of its applicability to different training scenarios and downstream tasks.

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

14 May 2024 | Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Senior Member, IEEE, and Gao Huang†, Member, IEEE