IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

12 Apr 2024 | Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Member, IEEE, Massimo Tistarelli, Senior Member, IEEE, Zhe Jin, Member, IEEE
The paper introduces IFViT (Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer), a multi-stage interpretable fingerprint matching network designed to enhance the interpretability of fingerprint matching while maintaining high performance. The network consists of two primary modules: an interpretable dense registration module and an interpretable fixed-length representation extraction and matching module. The dense registration module uses a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and global context in fingerprint pairs, providing dense pixel-wise correspondences of feature points for alignment. The fixed-length representation extraction and matching module combines local and global representations to produce discriminative fixed-length representations and interpretable dense pixel-wise correspondences, enhancing the interpretability of the matching process. Extensive experiments on various fingerprint datasets demonstrate that IFViT not only achieves superior performance in dense registration and matching but also significantly improves interpretability in deep fixed-length representations-based fingerprint matching.The paper introduces IFViT (Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer), a multi-stage interpretable fingerprint matching network designed to enhance the interpretability of fingerprint matching while maintaining high performance. The network consists of two primary modules: an interpretable dense registration module and an interpretable fixed-length representation extraction and matching module. The dense registration module uses a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and global context in fingerprint pairs, providing dense pixel-wise correspondences of feature points for alignment. The fixed-length representation extraction and matching module combines local and global representations to produce discriminative fixed-length representations and interpretable dense pixel-wise correspondences, enhancing the interpretability of the matching process. Extensive experiments on various fingerprint datasets demonstrate that IFViT not only achieves superior performance in dense registration and matching but also significantly improves interpretability in deep fixed-length representations-based fingerprint matching.
Reach us at info@study.space