This paper introduces the Open Long-Tailed Recognition (OLTR) task, which involves learning from naturally distributed long-tailed and open-ended data to optimize classification accuracy over a balanced test set that includes head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, unlike existing methods that focus on a single aspect. The key challenges are sharing visual knowledge between head and tail classes and reducing confusion between tail and open classes.
The authors propose an integrated OLTR algorithm that maps images to a feature space where visual concepts can relate based on a learned metric respecting closed-world classification while acknowledging open-world novelty. Their dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating familiarity to known classes. This approach outperforms state-of-the-art methods on three large-scale OLTR datasets derived from object-centric ImageNet, scene-centric Places, and face-centric MS1M data.
The method includes a dynamic meta-embedding that handles tail recognition robustness by combining a direct feature and an induced memory feature. It also incorporates modulated attention to maintain discrimination between head and tail classes. The dynamic meta-embedding dynamically calibrates the embedding norm with respect to the visual memory, scaling inversely by distance to the nearest centroid. The method also uses modulated attention to encourage head and tail classes to use different spatial features.
The authors conduct extensive experiments on their benchmarks, demonstrating that their approach consistently outperforms existing methods. They also show that their approach achieves across-the-board performance gains on many/medium/few-shot and open classes. The method is publicly available, enabling future research directly transferable to real-world applications. The work fills a void in practical benchmarks for imbalanced classification, few-shot learning, and open-set recognition.This paper introduces the Open Long-Tailed Recognition (OLTR) task, which involves learning from naturally distributed long-tailed and open-ended data to optimize classification accuracy over a balanced test set that includes head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, unlike existing methods that focus on a single aspect. The key challenges are sharing visual knowledge between head and tail classes and reducing confusion between tail and open classes.
The authors propose an integrated OLTR algorithm that maps images to a feature space where visual concepts can relate based on a learned metric respecting closed-world classification while acknowledging open-world novelty. Their dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating familiarity to known classes. This approach outperforms state-of-the-art methods on three large-scale OLTR datasets derived from object-centric ImageNet, scene-centric Places, and face-centric MS1M data.
The method includes a dynamic meta-embedding that handles tail recognition robustness by combining a direct feature and an induced memory feature. It also incorporates modulated attention to maintain discrimination between head and tail classes. The dynamic meta-embedding dynamically calibrates the embedding norm with respect to the visual memory, scaling inversely by distance to the nearest centroid. The method also uses modulated attention to encourage head and tail classes to use different spatial features.
The authors conduct extensive experiments on their benchmarks, demonstrating that their approach consistently outperforms existing methods. They also show that their approach achieves across-the-board performance gains on many/medium/few-shot and open classes. The method is publicly available, enabling future research directly transferable to real-world applications. The work fills a void in practical benchmarks for imbalanced classification, few-shot learning, and open-set recognition.