Dendritic Learning-Incorporated Vision Transformer for Image Recognition

Dendritic Learning-Incorporated Vision Transformer for Image Recognition

February 2024 | Zhiming Zhang, Zhenyu Lei, Masaaki Omura, Hideyuki Hasegawa, Shangce Gao
This letter proposes a novel approach to image recognition by integrating a dendritic learning network with Vision Transformer (ViT). The proposed model, called Dendritic Learning-Incorporated Vision Transformer (DVT), is designed to enhance the accuracy and interpretability of image recognition tasks. Inspired by the structure of dendritic neurons in neuroscience, DVT combines the strengths of ViT, which excels in capturing global context and long-term dependencies, with a biologically inspired classification network. This hybrid architecture improves the efficiency and performance of image recognition, particularly in complex tasks. The study demonstrates that DVT outperforms existing state-of-the-art methods on three image recognition benchmarks. The DVT architecture includes a dendritic network with three layers (synapse, dendrite, and soma) that facilitate accurate feature classification. The model also incorporates a feature normalization operation to enhance learning stability and performance. Experimental results on datasets such as CIFAR10, SVHN, CIFAR100, and Tiny-ImageNet show that DVT achieves superior accuracy compared to other methods, especially as the classification difficulty increases. The methodology involves slicing input images into patches, applying linear projection and position encoding, and then processing these features through multiple Transformer blocks. The self-attention mechanism enables the network to focus on relevant information, while the dendritic network ensures accurate classification. The study also includes an ablation study that highlights the effectiveness of DVT's architecture in terms of learnable parameters and computational efficiency. The results indicate that DVT is a promising approach for image recognition, offering improved accuracy and biological interpretability. The study concludes that integrating dendritic networks with ViT can lead to more efficient and effective image recognition models, with potential applications in various computer vision tasks.This letter proposes a novel approach to image recognition by integrating a dendritic learning network with Vision Transformer (ViT). The proposed model, called Dendritic Learning-Incorporated Vision Transformer (DVT), is designed to enhance the accuracy and interpretability of image recognition tasks. Inspired by the structure of dendritic neurons in neuroscience, DVT combines the strengths of ViT, which excels in capturing global context and long-term dependencies, with a biologically inspired classification network. This hybrid architecture improves the efficiency and performance of image recognition, particularly in complex tasks. The study demonstrates that DVT outperforms existing state-of-the-art methods on three image recognition benchmarks. The DVT architecture includes a dendritic network with three layers (synapse, dendrite, and soma) that facilitate accurate feature classification. The model also incorporates a feature normalization operation to enhance learning stability and performance. Experimental results on datasets such as CIFAR10, SVHN, CIFAR100, and Tiny-ImageNet show that DVT achieves superior accuracy compared to other methods, especially as the classification difficulty increases. The methodology involves slicing input images into patches, applying linear projection and position encoding, and then processing these features through multiple Transformer blocks. The self-attention mechanism enables the network to focus on relevant information, while the dendritic network ensures accurate classification. The study also includes an ablation study that highlights the effectiveness of DVT's architecture in terms of learnable parameters and computational efficiency. The results indicate that DVT is a promising approach for image recognition, offering improved accuracy and biological interpretability. The study concludes that integrating dendritic networks with ViT can lead to more efficient and effective image recognition models, with potential applications in various computer vision tasks.
Reach us at info@study.space