Understanding IFViT%3A Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Member, IEEE, Massimo Tistarelli, Senior Member, IEEE, Zhe Jin, Member, IEEE Abstract—Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching. Index Terms—Interpretable Fingerprint Recognition, Vision Transformers, Fingerprint Registration and Matching, Fixed-Length Fingerprint Representation ### I. INTRODUCTION Fingerprint is an immutable and unique biological trait widely used for human authentication in various scenarios including forensics, bank identification and physical access. As a crucial part of authentication, fingerprint matching aims to compare the input fingerprint patterns with those stored in a database to determine if they belong to the same finger. Minutiae, e.g. ridge endings and bifurcations, are commonly considered reliable features for accomplishing the matching process. However, extracting minutiae may be challenging when fingerprint quality is low due to conditions such as dry or wet. Conversely, deep learning-based approaches are capable of extracting discriminative fixed-length fingerprint representation and have been considered a promising alternative to address the limitations of traditional minutiae-based matching methods. Despite significant progress, the improvement of interpretability in deep learning-based fingerprint matching is still in its infancy. Machine Learning (ML) methods have gained tremendous success in major fields due to their powerful inferential capabilities. Among which explainable Artificial Intelligence (XAI) is currently one of the key focuses. XAI aims to enhance the comprehensibility and transparency on the outcomes of artificial intelligence systems to facilitate reliable real-world data-driven applications. Understanding the underlying reasons behind ML decision-making is crucial, particularly for black-box deep learningIFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Member, IEEE, Massimo Tistarelli, Senior Member, IEEE, Zhe Jin, Member, IEEE Abstract—Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching. Index Terms—Interpretable Fingerprint Recognition, Vision Transformers, Fingerprint Registration and Matching, Fixed-Length Fingerprint Representation ### I. INTRODUCTION Fingerprint is an immutable and unique biological trait widely used for human authentication in various scenarios including forensics, bank identification and physical access. As a crucial part of authentication, fingerprint matching aims to compare the input fingerprint patterns with those stored in a database to determine if they belong to the same finger. Minutiae, e.g. ridge endings and bifurcations, are commonly considered reliable features for accomplishing the matching process. However, extracting minutiae may be challenging when fingerprint quality is low due to conditions such as dry or wet. Conversely, deep learning-based approaches are capable of extracting discriminative fixed-length fingerprint representation and have been considered a promising alternative to address the limitations of traditional minutiae-based matching methods. Despite significant progress, the improvement of interpretability in deep learning-based fingerprint matching is still in its infancy. Machine Learning (ML) methods have gained tremendous success in major fields due to their powerful inferential capabilities. Among which explainable Artificial Intelligence (XAI) is currently one of the key focuses. XAI aims to enhance the comprehensibility and transparency on the outcomes of artificial intelligence systems to facilitate reliable real-world data-driven applications. Understanding the underlying reasons behind ML decision-making is crucial, particularly for black-box deep learning

IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

12 Apr 2024 | Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Member, IEEE, Massimo Tistarelli, Senior Member, IEEE, Zhe Jin, Member, IEEE