01.09.10
Posted in Biology, Ideas, Research at 4:55 pm by Michael
Idea: a data classification metamodel based on the immune system: train a small bag of classifiers and clone the ones that perform well, but with a small chance of random mutations to the hyperparameters. Weight classifiers created in this manner exponentially based on iterations since last correct classification. Keep a “memory threshold” below which the weight will not fall in case that pattern is encountered again.
Permalink
12.27.09
Posted in Ideas, Research at 10:00 pm by Michael
Decisions using the kNN framework are arrived at through a majority vote of an observation’s k nearest neighbors (given some distance metric). When aggregating many kNN decisions and weighing them against one much more important kNN decision, one strategy I’ve found to work well is to copy congress:
The critical neighbor is “the President” and can’t “pass” the vote, but can “veto” it.
A decision is made to “pass” either on the vote of a majority of the neighbors in the absence of a veto, or given a 2/3 majority in its presence.
One example of this is aggregating decisions over a market index. Each individual asset in the index has an impact in its overall movement, but the index itself (the President) can also be analyzed directly.
Permalink
11.11.09
Posted in Ideas, Research at 1:19 am by Michael
Many spam mails that land in my inbox tend to be thematically similar, though the messages have slight variations (perhaps they’re being sent by the same spammer). Ordinary messages do not cluster so well. Clusters formed on these spam messages should thus be “tighter” than clusters to which ordinary messages belong. Cluster membership and validity may thus be used as a feature in subsequent spam classification.
Permalink
07.19.09
Posted in Ideas, Research at 10:12 am by Michael
LSI and higher-order techniques, such as PARAFAC, can decompose term-document matrices into individual concepts. One potential application of this may be to predict the topics of damaged files based on filename, metadata, and what data is present and uncorrupted.
(Note to self: potential future research area).
Permalink
06.25.09
Posted in Research, Sociology at 2:20 pm by Michael

–PhD Comics.
This is a 12-step guide for all of the researchers permanently stuck in primary integration out there. Here’s how to succeed without the obligation of forming an authentic personality:
1. Look at new papers to figure out what’s about to become hot.
2. Apply the standard techniques in this field to a new or understudied domain.
3. Find an eager young grad. student/fellow with ideas about a dissertation topic/research project.
4. Ignore said student’s ideas, unload your project onto him.
5. Eventually he will hit a roadblock that he can’t seem to get around. Tell him that what he is trying to do is impossible. (Otherwise, you’ll need to learn the subject enough to give him advice, which requires thinking).
6. A few weeks later, he’ll come back with a finished method. Tell him to write a paper on it. Stick your name on the paper. Tell him to keep going.
7. Once the method is complete, the student will start writing. Reviewing the drafts takes thought, so just ignore them.
8. If at any point the student gets close to completion, ask him some stock questions to keep him busy and tell him he needs to stay longer. (Warning signs: drafts exceed 100 pages, complete framework built around the new technique, work begins to be applied in actual practical applications, student gets restless, or job/marriage obligations arise…)
9. Repeat until grad. student suffers a nervous breakdown.
10. Copy and paste half of his draft into a grant application (do save them, even if you don’t read them – one needs material to get funded).
11. Recruit new student.
12. Repeat from step 1 until dead.
Permalink
03.04.09
Posted in Personal, Research at 11:14 pm by Michael
15 pages left to go in my dissertation. I may be able to finish the first draft within the week…
Permalink
01.28.09
Posted in Ideas, Literature, Philosophy, Programming, Research at 6:48 pm by Michael
There’s a problem with Matlab. Even though it’s great for high-level programming, it just has too many functions, of which even the most seasoned developer is doomed to know only a small fraction. For example, the function pdist will return a matrix of pairwise distances, yet I’ve seen the same being done manually countless times.
Rather than cutting down the functionality provided by the language or forcing the user to “specialize” in studying certain Matlab functionality, I thought of an alternative approach:
The user knows what he wants to do. He wants to “compute the pairwise distance between observations in matrix A”. At this moment, his best bet is issuing the command “lookfor pairwise” and sifting through the results.
But wouldn’t it be nice if he could type “I want to compute the pairwise distance between observations in matrix A”, and, based on the documentation and some tagging of the functions, Matlab would automatically fill in “pdist(A)”?
This can apply to any language, of course. Java seems another good candidate, given the number of standard classes in J2SE.
Permalink
01.09.09
Posted in Ideas, Research at 9:34 pm by Michael
When performing working memory processing tasks, the anterior cingulate cortex lights up. It does not during memory recall (the medial temporal lobes do, which sort of makes sense as the medial temporal lobe is the location of the hippocampus). The ACC also responds to pain, fear, and other unpleasant “avoidance” sensations.
Maybe there’s a physiological basis for the avoidance people display to challenging mental tasks? Can it be that the body interprets it as another form of pain?
Edit: Ooh, it increases with task difficulty too.
Permalink
12.03.08
Posted in Biology, Ideas, Research at 11:20 pm by Michael
One of the most intriguing connections between biology and machine learning is in the learning ability of the adaptive immune system. If you abstract away the biology, it appears to be a very complex problem of classification: is something an invader or not? False positives cause autoimmune diseases. False negatives cause dangerous infections.
Just as in machine learning, we can use the concepts of sensitivity and specificity, positive and negative predictive values, classification, clustering, and feature extraction.
And we can also use the techniques to guide treatments.
How about using AdaBoost to train the immune system? Expose it only to the examples it initially misclassifies? There are so many places where these two fields can intersect…
Permalink
11.24.08
Posted in Biology, Ideas, Research at 2:10 pm by Michael
I am wondering whether there are certain associations between the properties of the skin and, say, the presence of a noncutaneous infection in the bloodstream (or even a change in the normal levels of various hormones and other things that we currently need to take blood to test). At the least, one would expect quantifiable changes in the skin as a result of, say, leukocytosis, which can be used as a highly noninvasive biomarker for infection. It’s a potential area to explore using data mining.
Permalink
« Previous entries Next Page » Next Page »