MARCH 2020 | Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao
The paper introduces a novel network architecture called High-Resolution Network (HRNet) designed to maintain high-resolution representations throughout the entire process, addressing the limitations of existing methods that either encode low-resolution representations or recover high-resolution representations from low-resolution outputs. HRNet connects high-to-low resolution convolution streams in parallel and repeatedly exchanges information across resolutions, resulting in semantically richer and spatially more precise representations. The paper presents two versions of HRNet: HRNetV1, which outputs high-resolution representations, and HRNetV2, which combines representations from all high-to-low resolution streams. HRNetV2 is further extended to HRNetV2p, which constructs a multi-level representation for object detection and instance segmentation. The authors demonstrate the superior performance of HRNet in human pose estimation, semantic segmentation, and object detection on various datasets, showing that HRNet is a stronger backbone for computer vision problems. The paper also includes an ablation study to validate the effectiveness of the proposed architecture.The paper introduces a novel network architecture called High-Resolution Network (HRNet) designed to maintain high-resolution representations throughout the entire process, addressing the limitations of existing methods that either encode low-resolution representations or recover high-resolution representations from low-resolution outputs. HRNet connects high-to-low resolution convolution streams in parallel and repeatedly exchanges information across resolutions, resulting in semantically richer and spatially more precise representations. The paper presents two versions of HRNet: HRNetV1, which outputs high-resolution representations, and HRNetV2, which combines representations from all high-to-low resolution streams. HRNetV2 is further extended to HRNetV2p, which constructs a multi-level representation for object detection and instance segmentation. The authors demonstrate the superior performance of HRNet in human pose estimation, semantic segmentation, and object detection on various datasets, showing that HRNet is a stronger backbone for computer vision problems. The paper also includes an ablation study to validate the effectiveness of the proposed architecture.