Kolmogorov-Arnold Networks (KANs) - What are they and how do they work?
Created on 2025-04-13T15:10:09-05:00
Swaps the roles of activation functions and weights; all variables are sent to their activation function, but the activation *function* is learned for each variable connection, while the output is always the sum of activators.
Thus the functions to modify variables before/while summing are what is learned.
Learning functions is hard (or else we'd already have smart machines) but functions can be approximated with B-Splines. B-Splines can be learned with gradient descent (& other techniques.) Thus, splines are used to filter weights and the splines are learned.