This paper presents a method to learn a general similarity function for comparing image patches directly from raw image data, without using manually designed features. The authors propose a deep convolutional neural network (CNN) model to encode this similarity function, which can account for various changes in image appearance, such as viewpoint, illumination, and occlusions. They explore multiple neural network architectures, including 2-channel, Siamese, Pseudo-siamese, and spatial pyramid pooling (SPP) networks, and evaluate their performance on several benchmark datasets. The results show that the proposed approach significantly outperforms state-of-the-art methods, including manually designed descriptors like SIFT and other learned descriptors. The paper also highlights the importance of multi-resolution information and the efficiency of the resulting descriptors.This paper presents a method to learn a general similarity function for comparing image patches directly from raw image data, without using manually designed features. The authors propose a deep convolutional neural network (CNN) model to encode this similarity function, which can account for various changes in image appearance, such as viewpoint, illumination, and occlusions. They explore multiple neural network architectures, including 2-channel, Siamese, Pseudo-siamese, and spatial pyramid pooling (SPP) networks, and evaluate their performance on several benchmark datasets. The results show that the proposed approach significantly outperforms state-of-the-art methods, including manually designed descriptors like SIFT and other learned descriptors. The paper also highlights the importance of multi-resolution information and the efficiency of the resulting descriptors.