14 May 2024 | Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Senior Member, IEEE, and Gao Huang†, Member, IEEE
The paper "EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training" addresses the issue of costly training procedures for modern computer vision backbones, such as vision Transformers trained on large datasets like ImageNet-1K/22K. The authors propose a generalized curriculum learning approach to reduce training time without sacrificing accuracy. They reformulate the training curriculum as a soft-selection function that progressively introduces more difficult patterns within each example, rather than selecting easier-to-harder samples. The key contributions include:
1. **Generalized Curriculum Learning**: The authors introduce a soft-selection function that dynamically extracts 'easier-to-learn' patterns (lower-frequency components) from each example, while gradually introducing harder patterns as training progresses.
2. **Efficient Training Techniques**: They propose a cropping operation in the Fourier spectrum to extract only the lower-frequency components, reducing computational cost. They also suggest using weaker data augmentation techniques to leverage the original, more learnable patterns.
3. **EfficientTrain++**: An enhanced version of EfficientTrain that improves efficiency through:
- A computational-constrained sequential searching algorithm to determine the curriculum schedule.
- An efficient low-frequency down-sampling operation to reduce CPU-GPU I/O costs.
4. **Implementation Techniques**: Two techniques to facilitate large-scale parallel training and reduce data preprocessing loads.
5. **Experimental Results**: The method significantly reduces training time for various visual backbones (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, CAFormer) by 1.5-3.0× on ImageNet-1K/22K without compromising accuracy. It also demonstrates effectiveness in self-supervised learning (e.g., MAE).
The paper provides a comprehensive evaluation of the proposed method, including comparisons with state-of-the-art efficient training methods and demonstrations of its applicability to different training scenarios and downstream tasks.The paper "EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training" addresses the issue of costly training procedures for modern computer vision backbones, such as vision Transformers trained on large datasets like ImageNet-1K/22K. The authors propose a generalized curriculum learning approach to reduce training time without sacrificing accuracy. They reformulate the training curriculum as a soft-selection function that progressively introduces more difficult patterns within each example, rather than selecting easier-to-harder samples. The key contributions include:
1. **Generalized Curriculum Learning**: The authors introduce a soft-selection function that dynamically extracts 'easier-to-learn' patterns (lower-frequency components) from each example, while gradually introducing harder patterns as training progresses.
2. **Efficient Training Techniques**: They propose a cropping operation in the Fourier spectrum to extract only the lower-frequency components, reducing computational cost. They also suggest using weaker data augmentation techniques to leverage the original, more learnable patterns.
3. **EfficientTrain++**: An enhanced version of EfficientTrain that improves efficiency through:
- A computational-constrained sequential searching algorithm to determine the curriculum schedule.
- An efficient low-frequency down-sampling operation to reduce CPU-GPU I/O costs.
4. **Implementation Techniques**: Two techniques to facilitate large-scale parallel training and reduce data preprocessing loads.
5. **Experimental Results**: The method significantly reduces training time for various visual backbones (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, CAFormer) by 1.5-3.0× on ImageNet-1K/22K without compromising accuracy. It also demonstrates effectiveness in self-supervised learning (e.g., MAE).
The paper provides a comprehensive evaluation of the proposed method, including comparisons with state-of-the-art efficient training methods and demonstrations of its applicability to different training scenarios and downstream tasks.