5 May 2020 | Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby
Big Transfer (BiT) is a method for general visual representation learning that achieves strong performance across a wide range of tasks and data regimes. The approach involves pre-training on large supervised datasets and fine-tuning on target tasks. BiT-L, trained on the JFT-300M dataset, achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the VTAB benchmark. BiT also performs well on small datasets, achieving 76.8% on ILSVRC-2012 with 10 examples per class and 97.0% on CIFAR-10 with 10 examples per class.
The method uses Group Normalization and Weight Standardization, which improve performance on small-batch training. A heuristic, BiT-HyperRule, is used to set hyperparameters for fine-tuning, which works well across diverse tasks. BiT is efficient, requiring only a single pre-training phase and cheap fine-tuning for downstream tasks. It does not require extensive hyperparameter tuning for new tasks.
BiT is evaluated on various benchmarks, including ILSVRC-2012, CIFAR-10, Oxford-IIIT Pet, Oxford Flowers-102, and VTAB. It outperforms previous state-of-the-art models on many of these tasks. BiT is also effective on tasks with few data points, achieving strong performance even with as few as 1 example per class.
The method is analyzed for its components, including model scaling, normalization, and training hyperparameters. It is shown that larger models and more data lead to better performance. BiT is also effective on object detection tasks, achieving high accuracy on the COCO-2017 dataset. The method is efficient and effective across a wide range of tasks and data regimes, making it a valuable tool for transfer learning.Big Transfer (BiT) is a method for general visual representation learning that achieves strong performance across a wide range of tasks and data regimes. The approach involves pre-training on large supervised datasets and fine-tuning on target tasks. BiT-L, trained on the JFT-300M dataset, achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the VTAB benchmark. BiT also performs well on small datasets, achieving 76.8% on ILSVRC-2012 with 10 examples per class and 97.0% on CIFAR-10 with 10 examples per class.
The method uses Group Normalization and Weight Standardization, which improve performance on small-batch training. A heuristic, BiT-HyperRule, is used to set hyperparameters for fine-tuning, which works well across diverse tasks. BiT is efficient, requiring only a single pre-training phase and cheap fine-tuning for downstream tasks. It does not require extensive hyperparameter tuning for new tasks.
BiT is evaluated on various benchmarks, including ILSVRC-2012, CIFAR-10, Oxford-IIIT Pet, Oxford Flowers-102, and VTAB. It outperforms previous state-of-the-art models on many of these tasks. BiT is also effective on tasks with few data points, achieving strong performance even with as few as 1 example per class.
The method is analyzed for its components, including model scaling, normalization, and training hyperparameters. It is shown that larger models and more data lead to better performance. BiT is also effective on object detection tasks, achieving high accuracy on the COCO-2017 dataset. The method is efficient and effective across a wide range of tasks and data regimes, making it a valuable tool for transfer learning.