Understanding FeatUp%3A A Model-Agnostic Framework for Features at Any Resolution

FeatUp is a model-agnostic framework that enhances the spatial resolution of deep features without altering their original semantics. It addresses the challenge of low spatial resolution in deep features, which limits their use in dense prediction tasks like segmentation and depth estimation. FeatUp introduces two variants: one that uses a guided upsampling network to enhance features in a single forward pass, and another that learns an implicit model to reconstruct features at any resolution. Both approaches utilize a multi-view consistency loss, inspired by NeRF, to ensure high-resolution features are consistent across different views. The framework allows for high-resolution features to be used as drop-in replacements in downstream tasks, improving performance without retraining. FeatUp outperforms other feature upsampling and image super-resolution methods in tasks such as class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation. The framework is efficient, with a fast CUDA implementation of Joint Bilateral Upsampling, and can be applied to various vision models, including convolutional networks and vision transformers. FeatUp's implicit upsampler learns a deep implicit network to generate high-resolution features, enabling arbitrary resolution outputs with low storage costs. The method is effective across a range of tasks and models, demonstrating significant improvements in resolution and performance.FeatUp is a model-agnostic framework that enhances the spatial resolution of deep features without altering their original semantics. It addresses the challenge of low spatial resolution in deep features, which limits their use in dense prediction tasks like segmentation and depth estimation. FeatUp introduces two variants: one that uses a guided upsampling network to enhance features in a single forward pass, and another that learns an implicit model to reconstruct features at any resolution. Both approaches utilize a multi-view consistency loss, inspired by NeRF, to ensure high-resolution features are consistent across different views. The framework allows for high-resolution features to be used as drop-in replacements in downstream tasks, improving performance without retraining. FeatUp outperforms other feature upsampling and image super-resolution methods in tasks such as class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation. The framework is efficient, with a fast CUDA implementation of Joint Bilateral Upsampling, and can be applied to various vision models, including convolutional networks and vision transformers. FeatUp's implicit upsampler learns a deep implicit network to generate high-resolution features, enabling arbitrary resolution outputs with low storage costs. The method is effective across a range of tasks and models, demonstrating significant improvements in resolution and performance.

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

2024 | Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldmann, Zhoutong Zhang, William T. Freeman