[slides and audio] Hypercolumns for object segmentation and fine-grained localization

This paper introduces hypercolumns as a novel representation for object segmentation and fine-grained localization. Hypercolumns are defined as the vector of activations of all CNN units above a given pixel. By using hypercolumns as pixel descriptors, the authors demonstrate improvements in three tasks: simultaneous detection and segmentation, keypoint localization, and part labeling. For simultaneous detection and segmentation, they improve the state-of-the-art mean AP^r from 49.7 to 60.0. For keypoint localization, they achieve a 3.3 point gain over prior methods. For part labeling, they show a 6.6 point gain over a strong baseline. The authors propose a general framework for fine-grained localization tasks by framing them as pixel classification and using hypercolumns as pixel descriptors. They formulate their entire system as a neural network, allowing end-to-end training for particular tasks by changing the target labels. The hypercolumn representation is computed by taking the outputs of all units above a location at all layers of the CNN, stacked into one vector. They then interpolate into a grid of classifiers to handle location-specific features. The authors evaluate their method on the SDS task, showing significant improvements over previous methods. They also evaluate their method on part localization and keypoint prediction tasks, achieving state-of-the-art results. The hypercolumn representation is shown to be effective in capturing both semantic and spatial information, leading to better performance in fine-grained localization tasks. The authors conclude that the hypercolumn representation provides large gains in three different tasks and believe it may be useful for other fine-grained tasks such as attribute or action classification.This paper introduces hypercolumns as a novel representation for object segmentation and fine-grained localization. Hypercolumns are defined as the vector of activations of all CNN units above a given pixel. By using hypercolumns as pixel descriptors, the authors demonstrate improvements in three tasks: simultaneous detection and segmentation, keypoint localization, and part labeling. For simultaneous detection and segmentation, they improve the state-of-the-art mean AP^r from 49.7 to 60.0. For keypoint localization, they achieve a 3.3 point gain over prior methods. For part labeling, they show a 6.6 point gain over a strong baseline. The authors propose a general framework for fine-grained localization tasks by framing them as pixel classification and using hypercolumns as pixel descriptors. They formulate their entire system as a neural network, allowing end-to-end training for particular tasks by changing the target labels. The hypercolumn representation is computed by taking the outputs of all units above a location at all layers of the CNN, stacked into one vector. They then interpolate into a grid of classifiers to handle location-specific features. The authors evaluate their method on the SDS task, showing significant improvements over previous methods. They also evaluate their method on part localization and keypoint prediction tasks, achieving state-of-the-art results. The hypercolumn representation is shown to be effective in capturing both semantic and spatial information, leading to better performance in fine-grained localization tasks. The authors conclude that the hypercolumn representation provides large gains in three different tasks and believe it may be useful for other fine-grained tasks such as attribute or action classification.

Hypercolumns for Object Segmentation and Fine-grained Localization

25 Apr 2015 | Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik