Stacked Cross Attention for Image-Text Matching

Stacked Cross Attention for Image-Text Matching

23 Jul 2018 | Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He
This paper addresses the problem of image-text matching, aiming to infer the latent semantic alignment between objects or salient features in images and corresponding words in sentences. The authors propose Stacked Cross Attention (SCAN), a novel mechanism that discovers full latent alignments using both image regions and words in a sentence as context to infer image-text similarity. SCAN achieves state-of-the-art results on the MS-COCO and Flickr30K datasets, outperforming current methods by significant margins in text retrieval from image queries and image retrieval with text queries. The approach leverages bottom-up attention to detect and encode image regions, and employs a bi-directional GRU to map words and their context into a shared embedding space. The paper includes extensive experiments, ablation studies, and visualizations to demonstrate the effectiveness and interpretability of the proposed method.This paper addresses the problem of image-text matching, aiming to infer the latent semantic alignment between objects or salient features in images and corresponding words in sentences. The authors propose Stacked Cross Attention (SCAN), a novel mechanism that discovers full latent alignments using both image regions and words in a sentence as context to infer image-text similarity. SCAN achieves state-of-the-art results on the MS-COCO and Flickr30K datasets, outperforming current methods by significant margins in text retrieval from image queries and image retrieval with text queries. The approach leverages bottom-up attention to detect and encode image regions, and employs a bi-directional GRU to map words and their context into a shared embedding space. The paper includes extensive experiments, ablation studies, and visualizations to demonstrate the effectiveness and interpretability of the proposed method.
Reach us at info@study.space