Kolmogorov-Arnold Networks (KANs) - What are they and how do they work?

Created on 2025-04-13T15:10:09-05:00

Swaps the roles of activation functions and weights; all variables are sent to their activation function, but the activation *function* is learned for each variable connection, while the output is always the sum of activators.

Thus the functions to modify variables before/while summing are what is learned.

Learning functions is hard (or else we'd already have smart machines) but functions can be approximated with B-Splines. B-Splines can be learned with gradient descent (& other techniques.) Thus, splines are used to filter weights and the splines are learned.

Previous video on Kolmogorov-Arnold theorem