Here’s one that must already exist:
Linear regression is given by the closed matrix form:
θ = (XTX)-1 XTY.
We have a rule we can apply here: (AB)-1 = B-1 A-1
Which gives us: θ = X-1 (XT)-1 XT Y.
But the transpose and its inverse cancel, yielding the identity matrix when multiplied.
This leaves us with:
θ = X-1 Y.
Now, surely there must be some reason that it’s not taught this way. Is it because this requires X to be a square matrix? Can we use a pseudoinverse instead to solve this?
Update: Indeed we can. Now I understand why it’s presented that way; the pseudoinverse is given by (XTX)-1 XT! Therefore, we can also just say the optimal parameters are given by X+Y, where X+ is the pseudoinverse.
“Reinvention is talent crying out for background”, I suppose.