[slides] Deep High-Resolution Representation Learning for Visual Recognition

This paper introduces a high-resolution network (HRNet) for visual recognition tasks, which maintains high-resolution representations throughout the entire process. Unlike traditional methods that first encode images into low-resolution representations and then recover high-resolution ones, HRNet connects high-to-low resolution convolution streams in parallel and repeatedly exchanges information across resolutions. This approach results in semantically richer and spatially more precise representations. The HRNet is evaluated on various tasks, including human pose estimation, semantic segmentation, and object detection, demonstrating its effectiveness as a strong backbone for computer vision problems. The network has two versions: HRNetV1, which outputs only the high-resolution representation, and HRNetV2, which combines representations from all resolutions. HRNetV2p further constructs a multi-level representation from HRNetV2's output. The HRNet is shown to outperform existing methods in terms of performance and efficiency. The paper also discusses the design of HRNet, including its architecture, training process, and evaluation on multiple datasets. The results indicate that HRNet achieves state-of-the-art performance in various visual recognition tasks, particularly in semantic segmentation and object detection. The HRNet's ability to maintain high-resolution representations throughout the process makes it a promising approach for position-sensitive vision problems.This paper introduces a high-resolution network (HRNet) for visual recognition tasks, which maintains high-resolution representations throughout the entire process. Unlike traditional methods that first encode images into low-resolution representations and then recover high-resolution ones, HRNet connects high-to-low resolution convolution streams in parallel and repeatedly exchanges information across resolutions. This approach results in semantically richer and spatially more precise representations. The HRNet is evaluated on various tasks, including human pose estimation, semantic segmentation, and object detection, demonstrating its effectiveness as a strong backbone for computer vision problems. The network has two versions: HRNetV1, which outputs only the high-resolution representation, and HRNetV2, which combines representations from all resolutions. HRNetV2p further constructs a multi-level representation from HRNetV2's output. The HRNet is shown to outperform existing methods in terms of performance and efficiency. The paper also discusses the design of HRNet, including its architecture, training process, and evaluation on multiple datasets. The results indicate that HRNet achieves state-of-the-art performance in various visual recognition tasks, particularly in semantic segmentation and object detection. The HRNet's ability to maintain high-resolution representations throughout the process makes it a promising approach for position-sensitive vision problems.

Deep High-Resolution Representation Learning for Visual Recognition

MARCH 2020 | Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao