17 Apr 2014 | Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, Ying Wu
This paper proposes a deep ranking model to learn fine-grained image similarity directly from images, outperforming models based on hand-crafted features and deep classification models. The model uses a triplet-based hinge loss ranking function and a multiscale neural network architecture to capture both global visual properties and image semantics. A novel online triplet sampling algorithm is introduced to efficiently generate meaningful and discriminative triplets for training. The model is evaluated on a human-labeled dataset, showing superior performance in image retrieval tasks. The key contributions include a deep ranking model for fine-grained image similarity, a multiscale network structure, an efficient online triplet sampling algorithm, and a high-quality dataset with similarity ranking information. The model achieves better performance than existing methods by directly learning image similarity from images, rather than relying on hand-crafted features or pre-trained classification models. The experiments demonstrate that the deep ranking model outperforms state-of-the-art methods in terms of similarity precision and score-at-top-30 metrics. The model is effective in capturing both visual appearance and image semantics, making it suitable for applications such as exemplar-based object recognition and image deduplication.This paper proposes a deep ranking model to learn fine-grained image similarity directly from images, outperforming models based on hand-crafted features and deep classification models. The model uses a triplet-based hinge loss ranking function and a multiscale neural network architecture to capture both global visual properties and image semantics. A novel online triplet sampling algorithm is introduced to efficiently generate meaningful and discriminative triplets for training. The model is evaluated on a human-labeled dataset, showing superior performance in image retrieval tasks. The key contributions include a deep ranking model for fine-grained image similarity, a multiscale network structure, an efficient online triplet sampling algorithm, and a high-quality dataset with similarity ranking information. The model achieves better performance than existing methods by directly learning image similarity from images, rather than relying on hand-crafted features or pre-trained classification models. The experiments demonstrate that the deep ranking model outperforms state-of-the-art methods in terms of similarity precision and score-at-top-30 metrics. The model is effective in capturing both visual appearance and image semantics, making it suitable for applications such as exemplar-based object recognition and image deduplication.