Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks

Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks

21 Jun 2024 | Minjong Cheon
This preprint presents a study on the effectiveness of Kolmogorov-Arnold Networks (KAN) in vision tasks. The research demonstrates that KAN-Mixer, a variant of KAN, performs well on vision tasks such as image classification. The study evaluates KAN-Mixer on the MNIST, CIFAR10, and CIFAR100 datasets using a batch size of 32. Results show that KAN-Mixer outperforms the original MLP-Mixer on CIFAR10 and CIFAR100, but performs slightly worse than ResNet-18. The study highlights the potential of KANs for vision tasks and suggests that further modifications could enhance their performance. The KAN-Mixer architecture is designed to process images by first dividing them into patches, then applying KAN layers to each patch. These layers perform both token mixing and channel mixing, allowing the model to capture spatial and channel dependencies. The final output is obtained by aggregating the transformed patches and applying a KANLinear layer. The study also investigates the optimal parameters for the KAN layer, finding that n_channels=64 and n_hiddens=128 provide the best balance of performance and resource utilization. The results show that KAN-Mixer achieves competitive performance, particularly on the MNIST dataset, with a test accuracy of 98.16%. While it performs slightly worse than ResNet-18 on CIFAR10 and CIFAR100, it outperforms the original MLP-Mixer. The study contributes three main aspects: (1) demonstrating the efficiency of KAN-based algorithms for visual tasks, (2) providing extensive empirical assessments across various vision benchmarks, and (3) pioneering the use of natural KAN layers in visual tasks. The findings suggest that KANs have significant potential for use in vision-related tasks and that further tuning could improve their performance on more complex datasets.This preprint presents a study on the effectiveness of Kolmogorov-Arnold Networks (KAN) in vision tasks. The research demonstrates that KAN-Mixer, a variant of KAN, performs well on vision tasks such as image classification. The study evaluates KAN-Mixer on the MNIST, CIFAR10, and CIFAR100 datasets using a batch size of 32. Results show that KAN-Mixer outperforms the original MLP-Mixer on CIFAR10 and CIFAR100, but performs slightly worse than ResNet-18. The study highlights the potential of KANs for vision tasks and suggests that further modifications could enhance their performance. The KAN-Mixer architecture is designed to process images by first dividing them into patches, then applying KAN layers to each patch. These layers perform both token mixing and channel mixing, allowing the model to capture spatial and channel dependencies. The final output is obtained by aggregating the transformed patches and applying a KANLinear layer. The study also investigates the optimal parameters for the KAN layer, finding that n_channels=64 and n_hiddens=128 provide the best balance of performance and resource utilization. The results show that KAN-Mixer achieves competitive performance, particularly on the MNIST dataset, with a test accuracy of 98.16%. While it performs slightly worse than ResNet-18 on CIFAR10 and CIFAR100, it outperforms the original MLP-Mixer. The study contributes three main aspects: (1) demonstrating the efficiency of KAN-based algorithms for visual tasks, (2) providing extensive empirical assessments across various vision benchmarks, and (3) pioneering the use of natural KAN layers in visual tasks. The findings suggest that KANs have significant potential for use in vision-related tasks and that further tuning could improve their performance on more complex datasets.
Reach us at info@study.space