A review of convolutional neural networks in computer vision

A review of convolutional neural networks in computer vision

Accepted: 4 February 2024 / Published online: 23 March 2024 | Xia Zhao, Limin Wang, Yufei Zhang, Xuming Han, Muhammet Deveci, Milan Parmar
This paper provides a comprehensive review of Convolutional Neural Networks (CNNs) in computer vision, focusing on their components, applications, and challenges. CNNs have revolutionized image classification, object detection, and video prediction through their ability to learn and extract features from raw data. The paper begins by introducing the basic components of CNNs, including convolution layers, pooling layers, activation functions, batch normalization, dropout, and fully connected layers. It then delves into the historical development of CNNs, highlighting key models such as AlexNet, VGG, GoogLeNet, ResNet, SENet, and MobileNet. Each model's architecture and performance are discussed, emphasizing their contributions to the field. The paper also addresses the challenges faced by deep CNNs, such as overfitting, gradient vanishing, and the curse of dimensionality, and proposes solutions like network pruning, knowledge distillation, and tensor decomposition. Finally, the paper outlines future research directions, emphasizing the need for more efficient and interpretable models, as well as the integration of attention mechanisms and domain adaptation techniques.This paper provides a comprehensive review of Convolutional Neural Networks (CNNs) in computer vision, focusing on their components, applications, and challenges. CNNs have revolutionized image classification, object detection, and video prediction through their ability to learn and extract features from raw data. The paper begins by introducing the basic components of CNNs, including convolution layers, pooling layers, activation functions, batch normalization, dropout, and fully connected layers. It then delves into the historical development of CNNs, highlighting key models such as AlexNet, VGG, GoogLeNet, ResNet, SENet, and MobileNet. Each model's architecture and performance are discussed, emphasizing their contributions to the field. The paper also addresses the challenges faced by deep CNNs, such as overfitting, gradient vanishing, and the curse of dimensionality, and proposes solutions like network pruning, knowledge distillation, and tensor decomposition. Finally, the paper outlines future research directions, emphasizing the need for more efficient and interpretable models, as well as the integration of attention mechanisms and domain adaptation techniques.
Reach us at info@study.space