ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU

ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU

12 Aug 2024 | Qi Qiu, Tao Zhu, Helin Gong, Liming Chen, Huansheng Ning
This paper introduces ReLU-KAN, a novel architecture that enhances the performance of Kolmogorov-Arnold Networks (KANs) by replacing the original basis function (B-spline) with a new one that is more suitable for parallel computation. The proposed basis function, composed solely of matrix addition, dot product, and ReLU activation, enables efficient GPU parallelization. Unlike static B-splines, the new basis function incorporates two trainable hyperparameters, allowing it to dynamically adapt its shape and position to the specific fitting task. This adaptability gives ReLU-KAN a significant advantage in modeling complex functions. Experimental results on a four-layer network show a 20-fold speedup in backpropagation and a two to three orders of magnitude improvement in accuracy compared to the original KAN. Notably, ReLU-KAN preserves the original model's ability to avoid catastrophic forgetting. The paper also discusses the simplification of the basis function, the efficient matrix-based operations for GPU acceleration, and the convolutional implementation that integrates seamlessly with existing deep learning frameworks. Future work will explore the application of ReLU-KAN to more complex tasks and its potential combination with other neural network architectures.This paper introduces ReLU-KAN, a novel architecture that enhances the performance of Kolmogorov-Arnold Networks (KANs) by replacing the original basis function (B-spline) with a new one that is more suitable for parallel computation. The proposed basis function, composed solely of matrix addition, dot product, and ReLU activation, enables efficient GPU parallelization. Unlike static B-splines, the new basis function incorporates two trainable hyperparameters, allowing it to dynamically adapt its shape and position to the specific fitting task. This adaptability gives ReLU-KAN a significant advantage in modeling complex functions. Experimental results on a four-layer network show a 20-fold speedup in backpropagation and a two to three orders of magnitude improvement in accuracy compared to the original KAN. Notably, ReLU-KAN preserves the original model's ability to avoid catastrophic forgetting. The paper also discusses the simplification of the basis function, the efficient matrix-based operations for GPU acceleration, and the convolutional implementation that integrates seamlessly with existing deep learning frameworks. Future work will explore the application of ReLU-KAN to more complex tasks and its potential combination with other neural network architectures.
Reach us at info@study.space