Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

16 Feb 2019 | Longlong Jing and Yingli Tian*
This paper provides a comprehensive survey of deep learning-based self-supervised visual feature learning methods for images and videos. It discusses the motivation, general pipeline, and terminologies of self-supervised learning, summarizes common deep neural network architectures used for this purpose, reviews the schema and evaluation metrics of self-supervised learning methods, and discusses commonly used image and video datasets and existing self-supervised visual feature learning methods. The paper also summarizes quantitative performance comparisons of the reviewed methods on benchmark datasets for both image and video feature learning. The key contributions of this paper include being the first comprehensive survey of self-supervised visual feature learning with deep ConvNets, an in-depth review of recently developed self-supervised learning methods and datasets, quantitative performance analysis and comparison of existing methods, and a set of possible future directions for self-supervised learning. The paper compares supervised, semi-supervised, weakly supervised, and unsupervised learning methods. It reviews common deep network architectures for learning image and video features, including AlexNet, VGG, ResNet, GoogLeNet, and DenseNet for image features, and 2DConvNet-based, 3DConvNet-based, and LSTM-based methods for video features. It also discusses common pretext tasks used in self-supervised learning, such as image generation, context-based tasks, free semantic label-based tasks, and cross-modal-based tasks. The paper summarizes commonly used downstream tasks for evaluating the quality of learned image and video features, including semantic segmentation, object detection, image classification, human action recognition, and qualitative evaluation methods. It also summarizes commonly used image and video datasets for training and evaluating self-supervised visual feature learning methods. The paper concludes with a discussion of future directions for self-supervised visual feature learning.This paper provides a comprehensive survey of deep learning-based self-supervised visual feature learning methods for images and videos. It discusses the motivation, general pipeline, and terminologies of self-supervised learning, summarizes common deep neural network architectures used for this purpose, reviews the schema and evaluation metrics of self-supervised learning methods, and discusses commonly used image and video datasets and existing self-supervised visual feature learning methods. The paper also summarizes quantitative performance comparisons of the reviewed methods on benchmark datasets for both image and video feature learning. The key contributions of this paper include being the first comprehensive survey of self-supervised visual feature learning with deep ConvNets, an in-depth review of recently developed self-supervised learning methods and datasets, quantitative performance analysis and comparison of existing methods, and a set of possible future directions for self-supervised learning. The paper compares supervised, semi-supervised, weakly supervised, and unsupervised learning methods. It reviews common deep network architectures for learning image and video features, including AlexNet, VGG, ResNet, GoogLeNet, and DenseNet for image features, and 2DConvNet-based, 3DConvNet-based, and LSTM-based methods for video features. It also discusses common pretext tasks used in self-supervised learning, such as image generation, context-based tasks, free semantic label-based tasks, and cross-modal-based tasks. The paper summarizes commonly used downstream tasks for evaluating the quality of learned image and video features, including semantic segmentation, object detection, image classification, human action recognition, and qualitative evaluation methods. It also summarizes commonly used image and video datasets for training and evaluating self-supervised visual feature learning methods. The paper concludes with a discussion of future directions for self-supervised visual feature learning.
Reach us at info@study.space