Understanding Magic Tokens%3A Select Diverse Tokens for Multi-modal Object Re-Identification

The paper "Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification" addresses the challenges of single-modal object re-identification (ReID) in complex visual scenarios and proposes a novel learning framework named EDITOR to enhance multi-modal ReID. EDITOR aims to select diverse tokens from vision Transformers, improving feature robustness and discrimination. The framework consists of two key modules: Spatial-Frequency Token Selection (SFTS) and Hierarchical Masked Aggregation (HMA). SFTS adaptively selects object-centric tokens using spatial and frequency information, while HMA facilitates feature interactions within and across modalities. To reduce background interference, the paper introduces Background Consistency Constraint (BCC) and Object-Centric Feature Refinement (OCFR) losses. Extensive experiments on three multi-modal ReID benchmarks (RGBNT201, RGBNT100, and MSVR310) validate the effectiveness of EDITOR, demonstrating superior performance compared to state-of-the-art methods. The code is available at https://github.com/924973292/EDITOR.The paper "Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification" addresses the challenges of single-modal object re-identification (ReID) in complex visual scenarios and proposes a novel learning framework named EDITOR to enhance multi-modal ReID. EDITOR aims to select diverse tokens from vision Transformers, improving feature robustness and discrimination. The framework consists of two key modules: Spatial-Frequency Token Selection (SFTS) and Hierarchical Masked Aggregation (HMA). SFTS adaptively selects object-centric tokens using spatial and frequency information, while HMA facilitates feature interactions within and across modalities. To reduce background interference, the paper introduces Background Consistency Constraint (BCC) and Object-Centric Feature Refinement (OCFR) losses. Extensive experiments on three multi-modal ReID benchmarks (RGBNT201, RGBNT100, and MSVR310) validate the effectiveness of EDITOR, demonstrating superior performance compared to state-of-the-art methods. The code is available at https://github.com/924973292/EDITOR.

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

15 Mar 2024 | Pingping Zhang, Yuhao Wang, Yang Liu, Zhengzheng Tu, Huchuan Lu