It seems plausible to create clustering algorithms based on gravity and the Coulomb force, with masses or charges corresponding to specific point weights. A “cluster” then becomes the resulting “solar system”. For example, if we represented all objects in the solar system with their masses and distances, the theoretical model would label them as one cluster (“Sol”).
Another idea I’ve been toying with is to use the concept of physical momentum with gradient descent (I don’t believe this is the same thing as the existing technique called “gradient descent with momentum”), such that an “energy counter” is kept that increments when the gradient points downward (proportional to its magnitude) and decrements when it points upward. This will cause the optimization to “roll” down slopes, completely clearing small minima, which tend to be pathological. The result is wherever the optimization/rolling comes to a halt. (Nevermind, this is in fact the same thing, or almost so)
Of course, I still think estimating the minima of the MSE curve from what is already known of it then moving there to check would be much faster and possibly more accurate.
Finally, another idea is to deform a surface to minimize the local MSE of its k-nearest neighbors at each of several regions. I’m not sure if this replicates the behavior of an SVM with a kernel, however, but it should probably operate much more quickly than the cubic learning of an SVM hyperplane due to the local nature of the constraints.