25 Jan 2019 | Boris N. Oreshkin, Pau Rodriguez, Alexandre Lacoste
TADAM: Task dependent adaptive metric for improved few-shot learning
This paper introduces TADAM, a task-dependent adaptive metric for improving few-shot learning. The authors show that metric scaling and task conditioning are important for improving few-shot learning performance. They propose a simple and effective way of conditioning a learner on the task sample set, resulting in learning a task-dependent metric space. They also propose and empirically test a practical end-to-end optimization procedure based on auxiliary task co-training to learn a task-dependent metric space. The resulting few-shot learning model based on the task-dependent scaled metric achieves state of the art on mini-Imagenet. They confirm these results on another few-shot dataset based on CIFAR100. Their code is publicly available at https://github.com/ElementAI/TADAM.
The authors analyze the effect of metric scaling on the performance of few-shot learning algorithms. They show that simple metric scaling can improve accuracy by up to 14% on the mini-Imagenet 5-way 5-shot classification task. They also propose a task conditioning mechanism that uses a task encoding network to extract a task representation based on the task’s sample set. This is used to influence the behavior of the feature extractor through FILM. They also show that co-training the feature extraction on a conventional supervised classification task reduces training complexity and provides better generalization.
The authors propose a few-shot learning architecture based on task-dependent scaled metric that achieves superior performance on two challenging few-shot image classification datasets. It shows up to 8.5% absolute accuracy improvement over the baseline (Snell et al. [28]), and 4.8% over the state-of-the-art [17] on the 5-shot, 5-way mini-Imagenet classification task, reaching 76.7% of accuracy, which is the best-reported accuracy on this dataset.
The authors also perform an ablation study to evaluate the importance of each component of their proposed architecture. They find that metric scaling, task conditioning, and auxiliary task co-training are all important for improving few-shot learning performance. They also find that the optimal performance is achieved in between two asymptotic regimes of the softmax. This poses the research question of explicitly designing loss functions and the α schedules optimal for few-shot learning.
The authors conclude that the scaling factor is a necessary standard component of any few-shot learning algorithm relying on a similarity metric and the cross-entropy loss function. They also identify that the optimal performance is achieved in between two asymptotic regimes of the softmax. This poses the research question of explicitly designing loss functions and the α schedules optimal for few-shot learning. They also propose task representation conditioning as a way to improve the performance of a feature extractor on the few-shot classification task. In this context, designing more powerful task representations, for example, based on higher order statistics of class embeddings, looks like a very promising venue for future work.TADAM: Task dependent adaptive metric for improved few-shot learning
This paper introduces TADAM, a task-dependent adaptive metric for improving few-shot learning. The authors show that metric scaling and task conditioning are important for improving few-shot learning performance. They propose a simple and effective way of conditioning a learner on the task sample set, resulting in learning a task-dependent metric space. They also propose and empirically test a practical end-to-end optimization procedure based on auxiliary task co-training to learn a task-dependent metric space. The resulting few-shot learning model based on the task-dependent scaled metric achieves state of the art on mini-Imagenet. They confirm these results on another few-shot dataset based on CIFAR100. Their code is publicly available at https://github.com/ElementAI/TADAM.
The authors analyze the effect of metric scaling on the performance of few-shot learning algorithms. They show that simple metric scaling can improve accuracy by up to 14% on the mini-Imagenet 5-way 5-shot classification task. They also propose a task conditioning mechanism that uses a task encoding network to extract a task representation based on the task’s sample set. This is used to influence the behavior of the feature extractor through FILM. They also show that co-training the feature extraction on a conventional supervised classification task reduces training complexity and provides better generalization.
The authors propose a few-shot learning architecture based on task-dependent scaled metric that achieves superior performance on two challenging few-shot image classification datasets. It shows up to 8.5% absolute accuracy improvement over the baseline (Snell et al. [28]), and 4.8% over the state-of-the-art [17] on the 5-shot, 5-way mini-Imagenet classification task, reaching 76.7% of accuracy, which is the best-reported accuracy on this dataset.
The authors also perform an ablation study to evaluate the importance of each component of their proposed architecture. They find that metric scaling, task conditioning, and auxiliary task co-training are all important for improving few-shot learning performance. They also find that the optimal performance is achieved in between two asymptotic regimes of the softmax. This poses the research question of explicitly designing loss functions and the α schedules optimal for few-shot learning.
The authors conclude that the scaling factor is a necessary standard component of any few-shot learning algorithm relying on a similarity metric and the cross-entropy loss function. They also identify that the optimal performance is achieved in between two asymptotic regimes of the softmax. This poses the research question of explicitly designing loss functions and the α schedules optimal for few-shot learning. They also propose task representation conditioning as a way to improve the performance of a feature extractor on the few-shot classification task. In this context, designing more powerful task representations, for example, based on higher order statistics of class embeddings, looks like a very promising venue for future work.