1 Apr 2024 | Stephanie Fu*,†, Mark Hamilton*, Axel Feldmann, Zhoutong Zhang, Laura Brandt, William T. Freeman
FeatUp is a novel framework designed to enhance the spatial resolution of deep features from any model backbone, improving their performance in dense prediction tasks such as segmentation and depth estimation. The framework introduces two variants: one that uses a feedforward network to guide high-resolution features in a single forward pass, and another that fits an implicit model to a single image to reconstruct features at any resolution. Both variants use a multi-view consistency loss, similar to NeRFs, to ensure that the upsampled features retain their original semantics. FeatUp can be used as a drop-in replacement in existing applications without retraining, demonstrating significant improvements in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation. The framework is efficient and can produce high-quality features aligned with object edges, making it a valuable tool for enhancing the resolution and performance of deep features in computer vision tasks.FeatUp is a novel framework designed to enhance the spatial resolution of deep features from any model backbone, improving their performance in dense prediction tasks such as segmentation and depth estimation. The framework introduces two variants: one that uses a feedforward network to guide high-resolution features in a single forward pass, and another that fits an implicit model to a single image to reconstruct features at any resolution. Both variants use a multi-view consistency loss, similar to NeRFs, to ensure that the upsampled features retain their original semantics. FeatUp can be used as a drop-in replacement in existing applications without retraining, demonstrating significant improvements in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation. The framework is efficient and can produce high-quality features aligned with object edges, making it a valuable tool for enhancing the resolution and performance of deep features in computer vision tasks.