SVMs are a nice technique, but they’re slow (O(N3)) and often give worse classification results than techniques such as radial basis function networks or even bagged / boosted kNN, Bayes, and/or decision tree models. So why are they so popular?
Category Archives: Research
Simplifying the closed form of linear regression
Here’s one that must already exist:
Linear regression is given by the closed matrix form:
θ = (XTX)-1 XTY.
We have a rule we can apply here: (AB)-1 = B-1 A-1
Which gives us: θ = X-1 (XT)-1 XT Y.
But the transpose and its inverse cancel, yielding the identity matrix when multiplied.
This leaves us with:
θ = X-1 Y.
Now, surely there must be some reason that it’s not taught this way. Is it because this requires X to be a square matrix? Can we use a pseudoinverse instead to solve this?
Update: Indeed we can. Now I understand why it’s presented that way; the pseudoinverse is given by (XTX)-1 XT! Therefore, we can also just say the optimal parameters are given by X+Y, where X+ is the pseudoinverse.
“Reinvention is talent crying out for background”, I suppose.
Reading and another Einstein quote
Here’s another one that I think is good advice for scientists:
“Reading, after a certain age, diverts the mind too much from its creative pursuits. Any man who reads too much and uses his own brain too little falls into lazy habits of thinking.”
–Albert Einstein
The idea is one I’ve long held, though it flies in the face of conventional ideas: reading is useful to learn only what other people have thought. After a while, however, you need to move on to creating your own thoughts, using other people’s thoughts only as stepping stones – if at all. There’s no novelty to be found in the thoughts of others – they’re better at thinking in their own ways than you are.
Concentrating too much on the past will only prevent you from expending the effort in more useful ways.
Dissertation – Week 7
I’m not going to write any new content this week. Rather, I’m going to polish the existing 50 pages so I can send them to my advisor.
A grad student, but… not.
Most of the grad. students on facebook groups appear to be working far more hours a week than I am for much longer spans of time than I have been, and yet appear to be making far less progress (some people are in their fourth years, work 60+ hours a week, and haven’t even started their dissertations!)
I generally write about 3 hours per day, four days a week. That’s 12 hours. My class adds another 3, plus 9 for the assignments, so that brings me up to 24. Working on other projects generally takes far less time than the dissertation; perhaps another 5 or 6 hours a week. That makes 30.
And that’s it (unless you count commuting back and forth, which adds 3-12 more hours each week, which is one reason why I try to work remotely and minimize the number of days I need to be in the lab).
I still feel like my life as a graduate student is very much out-of-place. I easily managed to get an MS in one year simply by taking four courses a semester – my undergraduate workload was heavier (but still left me enough free time that I had to wedge three jobs into my week to avoid becoming bored). On the dissertation, it’s only been 6 weeks since I started and I’m already 1/3 of the way through (granted, the easiest third)… and it’s still only the beginning of my second year, when no one else in the department has even come up with a topic!
I’m also the youngest student there by at least five years. This isn’t speculation – there was a meeting where all of the graduate students mentioned their ages.
Some of it is the fellowship. Some of it is motivation: the drive to recover my personal autonomy so I can again pursue great things. Some of it is probably the fact that I follow what I think I’ve shown to be a much more powerful personal philosophy for acquiring knowledge than the one typical students follow (sacrifice breadth and you sacrifice your very creativity – don’t hyperspecialize). That doesn’t account for everything, however – the rest is probably the school.
Dissertation – Week 6
The “10 pages per week” goal I’ve been setting for myself has generally worked up until this point (save for one week, when I simply had too heavy an assignment for machine learning to work on my dissertation), but it’s going to fall apart very soon.
At page 50, I’ve reported all of the experiments we’ve done so far. I’ve even made up a lot of figures and sprinkled them throughout the paper. I am now idle. It’s bad when you think you can fill another page or two just by writing the acknowledgments early. I simply can’t cram anything more into that dissertation until we do more experiments. However, we can’t do more experiments until the next meeting with the CMU people.
If this continues every week, I am going to slip behind my schedule fast, since I’m working on an entirely different timeline than the rest of my group. While I can push myself as hard as I’d like, it’s unfair to my group to force them to move at a breakneck pace because I want to finish quickly; it’s not like they’re slacking off. Anyway, I don’t want such a disparity to occur, so I need to do something. Perhaps taking on more work myself is a good idea, since what I have thus far for my dissertation isn’t particularly challenging or time-consuming, not to mention that the rest of the group would probably be happy to offload more work on my shoulders. Once I finish machine learning, I’m going to have a lot of time as well, as it will essentially start the winter break that never ends 🙂
I’m wondering if I can do some experiments of my own and put them in the paper. It is supposed to be my own original work, after all, and I do have some good ideas which I’m fairly sure are original, having done a fairly extensive background search before and during writing.
Every conference paper I’m writing now (and there are a lot!) has some relevance as well; I can probably find a way to include all of them in my dissertation as experiments without losing the focus on tensors and medical imaging that I’ve established.
I need to polish some things up and then I’ll send a draft to my advisor. Yes, the first draft I’m sending for review is going to be 50 pages long. Hopefully this won’t end up with me rewriting what I expect to be 1/3 of my dissertation 🙂
What does "best paper" mean anyway?
I find it odd that research that contains blatantly incorrect material can win a best paper award, but I just stumbled upon such a paper in the course of my dissertation research. Just more evidence that we really can’t estimate the worth of ideas, I suppose.
Dissertation – Week 5
This week marks the completion and reporting of the first experiment. The background section is complete now, though I will likely make some further revisions here and there. From now, maintaining my pace will become difficult because I cannot rely on other people’s knowledge to fill my thesis; every gain I make will henceforth come from my own labor. To this extent, I may be forced to conform somewhat to the timetable my group chooses to set out. I can complete at least one experiment a week (once my class ends, anyway; the assignment this week is twice as long as usual), but I cannot simply forge ahead or I will end up walking over someone else’s work.
Multi-Index Notation
My vector idea for notation has indeed been formalized already (which is good, because this way I don’t need to make up my own notation): it’s known as “multi-index notation”. I deviate slightly from it in that I make the vector nature of the indices explicit by putting an arrow over them, but it otherwise appears to be the same.
I am perhaps the worst person to ask to review papers
Because of my unique “panidealist” philosophy, I am perhaps the worst person to ask to review papers. If we cannot assess the full impact of an idea (an axiom in the philosophy), something would have to be horribly wrong for me to reject a paper because doing so may very well impose my ignorance upon others.
I hate the way the scientific establishment works. No small group of people should act as gatekeepers. Just let the ideas go before the community as a whole and let them choose which to work with. If you are too worried about massive amounts of quackery, let them “digg” or “bury” papers and sort by number of “diggs” / citations. It will work.