ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU

ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU

12 Aug 2024 | Qi Qiu, Tao Zhu, Helin Gong, Liming Chen, Huansheng Ning
ReLU-KAN is a novel Kolmogorov-Arnold Network (KAN) that replaces the original B-spline basis function with a new one composed of matrix addition, dot product, and ReLU activation. This new basis function enables efficient GPU parallelization and allows the model to dynamically adapt its shape and position during training. The proposed ReLU-KAN architecture significantly improves training speed and fitting accuracy compared to traditional KANs, achieving up to 20 times faster training and accuracy improvements of one to three orders of magnitude. It also retains the original KAN's ability to avoid catastrophic forgetting. The basis function of ReLU-KAN is defined as $ R_{i}(x)=[\mathrm{ReLU}(e_{i}-x)\times\mathrm{ReLU}(x-s_{i})]^{2}\times16/(e_{i}-s_{i})^{4} $, where $ e_{i} $ and $ s_{i} $ are trainable parameters. This allows the basis function to adapt to different fitting tasks. The ReLU-KAN architecture is implemented using matrix operations and convolution, which facilitates efficient GPU acceleration. Experimental results show that ReLU-KAN outperforms KAN in training speed, convergence stability, and fitting accuracy, particularly for larger networks. It also demonstrates the ability to avoid catastrophic forgetting, making it a promising architecture for various applications. The paper presents the ReLU-KAN architecture, its implementation details, and experimental results. It compares the performance of ReLU-KAN with KAN in terms of training speed, fitting ability, and catastrophic forgetting. The results show that ReLU-KAN is significantly faster and more accurate than KAN, and it maintains the ability to avoid catastrophic forgetting. The paper also discusses future research directions, including the application of ReLU-KAN to more complex tasks and the potential of combining it with other neural network architectures.ReLU-KAN is a novel Kolmogorov-Arnold Network (KAN) that replaces the original B-spline basis function with a new one composed of matrix addition, dot product, and ReLU activation. This new basis function enables efficient GPU parallelization and allows the model to dynamically adapt its shape and position during training. The proposed ReLU-KAN architecture significantly improves training speed and fitting accuracy compared to traditional KANs, achieving up to 20 times faster training and accuracy improvements of one to three orders of magnitude. It also retains the original KAN's ability to avoid catastrophic forgetting. The basis function of ReLU-KAN is defined as $ R_{i}(x)=[\mathrm{ReLU}(e_{i}-x)\times\mathrm{ReLU}(x-s_{i})]^{2}\times16/(e_{i}-s_{i})^{4} $, where $ e_{i} $ and $ s_{i} $ are trainable parameters. This allows the basis function to adapt to different fitting tasks. The ReLU-KAN architecture is implemented using matrix operations and convolution, which facilitates efficient GPU acceleration. Experimental results show that ReLU-KAN outperforms KAN in training speed, convergence stability, and fitting accuracy, particularly for larger networks. It also demonstrates the ability to avoid catastrophic forgetting, making it a promising architecture for various applications. The paper presents the ReLU-KAN architecture, its implementation details, and experimental results. It compares the performance of ReLU-KAN with KAN in terms of training speed, fitting ability, and catastrophic forgetting. The results show that ReLU-KAN is significantly faster and more accurate than KAN, and it maintains the ability to avoid catastrophic forgetting. The paper also discusses future research directions, including the application of ReLU-KAN to more complex tasks and the potential of combining it with other neural network architectures.
Reach us at info@study.space
[slides] ReLU-KAN%3A New Kolmogorov-Arnold Networks that Only Need Matrix Addition%2C Dot Multiplication%2C and ReLU | StudySpace