15 pages left to go in my dissertation. I may be able to finish the first draft within the week…
Category Archives: Research
Programming by Intent via Automatic Search and Function Binding
There’s a problem with Matlab. Even though it’s great for high-level programming, it just has too many functions, of which even the most seasoned developer is doomed to know only a small fraction. For example, the function pdist will return a matrix of pairwise distances, yet I’ve seen the same being done manually countless times.
Rather than cutting down the functionality provided by the language or forcing the user to “specialize” in studying certain Matlab functionality, I thought of an alternative approach:
The user knows what he wants to do. He wants to “compute the pairwise distance between observations in matrix A”. At this moment, his best bet is issuing the command “lookfor pairwise” and sifting through the results.
But wouldn’t it be nice if he could type “I want to compute the pairwise distance between observations in matrix A”, and, based on the documentation and some tagging of the functions, Matlab would automatically fill in “pdist(A)”?
This can apply to any language, of course. Java seems another good candidate, given the number of standard classes in J2SE.
An interesting brain/learning tidbit…
When performing working memory processing tasks, the anterior cingulate cortex lights up. It does not during memory recall (the medial temporal lobes do, which sort of makes sense as the medial temporal lobe is the location of the hippocampus). The ACC also responds to pain, fear, and other unpleasant “avoidance” sensations.
Maybe there’s a physiological basis for the avoidance people display to challenging mental tasks? Can it be that the body interprets it as another form of pain?
Edit: Ooh, it increases with task difficulty too.
AdaBoosting the Immune System
One of the most intriguing connections between biology and machine learning is in the learning ability of the adaptive immune system. If you abstract away the biology, it appears to be a very complex problem of classification: is something an invader or not? False positives cause autoimmune diseases. False negatives cause dangerous infections.
Just as in machine learning, we can use the concepts of sensitivity and specificity, positive and negative predictive values, classification, clustering, and feature extraction.
And we can also use the techniques to guide treatments.
How about using AdaBoost to train the immune system? Expose it only to the examples it initially misclassifies? There are so many places where these two fields can intersect…
Computer-Aided Cutaneous Testing?
I am wondering whether there are certain associations between the properties of the skin and, say, the presence of a noncutaneous infection in the bloodstream (or even a change in the normal levels of various hormones and other things that we currently need to take blood to test). At the least, one would expect quantifiable changes in the skin as a result of, say, leukocytosis, which can be used as a highly noninvasive biomarker for infection. It’s a potential area to explore using data mining.
More generally.
Remember σ(pn) = p * σ(n) + σ(n / pα)?
More generally, σ(pkn) = pk * σ(n) + (pk – 1) / (p – 1) * σ(n / pα).
Again, p is a prime and alpha is its multiplicity in n’s prime factorization. σ is the divisor function, of course.
All I need to do now is figure out how to extend it to a composite number and I’ll have a complete multiplicative recurrence on the divisor function, which I can use to obtain a closed-form rate of growth. I’ve empirically calculated it to grow at approximately 1.6449*n, but my goal is to obtain a tight worst-case bound. I could not find anything special about this number, except that is the 90% critical value of a normal distribution.
Here’s a messy Maple worksheet containing the derivation (among a whole bunch of stuff not related to the derivation that I was experimenting with today).
Conjectures
Do highly composite numbers of the form n^2-1 occur infinitely often? What about when n is prime? Most values of n for which this holds seem to be. (Which makes sense, because if n is highly composite, n+1 is going to be deficient (because n’s prime factors will all go away), and squares of primes certainly fit the bill).
(Actually, can we always say that a highly composite number + 1 is either prime or the square of a prime? All of the values I checked were.
Things start to change around 5040…)
False negatives in animal tests.
Lots of treatments work very well in mice but fail to show benefits in human trials. They’re false positives, and they get lots of people excited over treatments that never end up working in humans.
(Why do they work so well in mice, I wonder? Is it because so much of our research uses them? I wonder, if we were willing to completely throw morals out the window, could we get those sorts of results in humans by experimenting on them directly? Not that I’m advocating this.)
I just realized something blindingly obvious: there are false negatives too. But how are these handled? Treatments that don’t work in mice never make it to human trials, even though they may work in humans. Without doing human trials on treatments that failed to work in mice, we can’t evaluate a false negative rate, but it could potentially be high. Certainly it’s nonzero in any case.
This is another example of snap judgments shooting down ideas, but this is far less clear-cut than most criticism because failing to analyze the treatment prior to human trials can endanger people’s health.
I think that what we need are better computer models.
Patent
Apparently the recent work I’ve done for my dissertation is patentable and the team wishes to apply for one. On the one hand, I disagree with the very concept of patenting an algorithm; on the other, this is a huge addition to a CV which would very definitely put me ahead of others when I seek a research position.
Maybe I can get the patent then license it freely?
Maybe killing malignant cells isn't the answer.
Unless the specificity of an anticancer drug approaches 100%, anything that kills off cancer cells is going to kill off some normal ones as well. This means side effects, often quite nasty.
But what if, instead of killing off cancerous cells, we just shut down their invasive potential?
I’ve been reading up on what differentiates noninvasive cancer cells – carcinoma in situ – against invasive cells. The literature on this has been surprisingly sparse, so either I’m not looking for the right things (quite possible), or this is a very understudied approach. The papers I’ve read have identified a few gene loci and a protein called Twist, but that is as far as I can take my search, lacking the resources to experimentally pursue such lines of study.
My point is this: carcinoma in situ is harmless except in its ability to become invasive cancer. Most of the proteins that seem to cause aberrant behavior in cancer cells seem those that are present during embryonic development (which makes sense in a way, since embryonic development is high-rate controlled, regulated division, whereas carcinogenesis is high-rate uncontrolled, unregulated division), but these proteins are all but absent in adults.
So rather than attempting to kill off the cancer cells, why not attempt to remove their ability to invade (and thus metastasize, destroy tissue, and cause other problems)? Even if the treatment were nonspecific, side effects should be far milder than the “killer” drugs, since normal cells are not known to depend on the function of the identified proteins. And unlike drugs that kill cells, there is little selective pressure against the treatment.
I see so many solutions to this problem. How I wish I could take part…