16 Jan 2018 | Chao-Yuan Wu*, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
This paper investigates the importance of sampling strategies in deep embedding learning, showing that sample selection plays an equally or more important role than loss functions. The authors propose distance weighted sampling, which selects more informative and stable examples than traditional methods. They also introduce a simple margin-based loss that outperforms other loss functions. The proposed methods achieve state-of-the-art performance on image retrieval, clustering, and face verification tasks across multiple datasets, including Stanford Online Products, CARS196, CUB200-2011, and LFW. The paper analyzes existing sampling strategies and shows why they work or don't, then proposes a new sampling strategy that corrects the bias induced by the geometry of embedding space while ensuring data points have a chance of being sampled. The new sampling strategy reduces gradient variance, stabilizing training and leading to better embeddings. The margin-based loss is shown to be more robust and effective than traditional contrastive loss, and when combined with distance weighted sampling, achieves superior performance. The paper also discusses the relationship between the margin-based loss and isotonic regression, and shows that the margin-based loss can be viewed as a ranking problem for distances. The authors evaluate their methods on multiple datasets and show that their approach outperforms previous state-of-the-art results. The paper concludes that sampling matters as much or more than loss functions in deep embedding learning, and that the proposed methods significantly outperform other loss functions.This paper investigates the importance of sampling strategies in deep embedding learning, showing that sample selection plays an equally or more important role than loss functions. The authors propose distance weighted sampling, which selects more informative and stable examples than traditional methods. They also introduce a simple margin-based loss that outperforms other loss functions. The proposed methods achieve state-of-the-art performance on image retrieval, clustering, and face verification tasks across multiple datasets, including Stanford Online Products, CARS196, CUB200-2011, and LFW. The paper analyzes existing sampling strategies and shows why they work or don't, then proposes a new sampling strategy that corrects the bias induced by the geometry of embedding space while ensuring data points have a chance of being sampled. The new sampling strategy reduces gradient variance, stabilizing training and leading to better embeddings. The margin-based loss is shown to be more robust and effective than traditional contrastive loss, and when combined with distance weighted sampling, achieves superior performance. The paper also discusses the relationship between the margin-based loss and isotonic regression, and shows that the margin-based loss can be viewed as a ranking problem for distances. The authors evaluate their methods on multiple datasets and show that their approach outperforms previous state-of-the-art results. The paper concludes that sampling matters as much or more than loss functions in deep embedding learning, and that the proposed methods significantly outperform other loss functions.