Category Archives: Research

Dissertation – Week 3

I’m running somewhat behind this week. It really can’t be helped – I had to travel to Philadelphia three times this week (once I’m there, I have a very difficult time concentrating on my work, and the three hour commute drains me for more or less the rest of the day), I’m trying to perform experiments for the first time since I began writing, and my machine learning workload is transitioning from “unreasonably heavy” to “sadistic”. Despite this, I did manage to finish off about five pages thus far, mostly dealing with the nuances of the tensor/outer product and with things like PARAFAC. The methodology section is probably going to need to wait until next week, since I don’t know how quickly I can get experimental results with all of the other things going on.

Fortunately, I planned a timeline that has me finishing approximately 6 months before I need to defend – I have plenty of time to spare 🙂

If my ML load was a bit lighter this week, I could even meet my weekly goal by tomorrow. Unfortunately, not only is it heavy, but the professor decided to make it a competition. I abhor competitive research almost as much as competitive programming (I only competed in one programming contest, when I was 15, which I won to the detriment of some very talented programmers who really should have been lauded for their efforts and talent) – it fosters a spirit of treachery and ensures that the total number of people performing well remains very small. There’s already enough adversity in research; I fail to see why we need to add more.

In any case, I am now bound by the Red Queen hypothesis. I can’t simply submit functional homework; now I have to keep improving it to the extent that it is more accurate than the other students’ submissions.

I’m probably safe if I can get my cross-validated accuracy above 90%.

"Big comma" operator for multidimensional indexing?

I’m running into an annoying problem on my dissertation. Since it’s on tensors, I’m doing a lot of multidimensional indexing, and I find myself doing a lot of this:

Xi1,i2,…,in.

Now, Einstein notation lets me avoid writing sums (again, though, I’m not a fan of it, since there already exists a very nice Σ operator for sums), but it doesn’t do a thing for sequences like this one. In fact, I can’t find any operators that can represent this more concisely.

Now, I can probably treat i as a vector rather than a sequence of scalars, but that might confuse people.

If all else fails, I can probably define something like this:
X,j=1n

, with a big comma in that phrase similar to the huge sigma you use in a sum, but that’s difficult to represent, as the fact that I need to explain what I was writing demonstrates.

I’m leaning towards the vector solution, but if someone knows a good notation for this, please let me know.

Implementation of Media Similarity Search

Similarity search technology in images, music, and other multimedia content has been researched to death. This idea is not about research in any of those areas (I save that for work at Temple). The idea is simply an implementation of these techniques. Something like Google images that allows you to upload images and query based on similarity to the given image. Small-scale systems exist, but I have yet to find any that are as large as mainstream keyword-based image searches, such as Google Image Search. I’ve suggested this to Google when I was in their NYC office (I even gave them my BACH paper to suggest how they could do it for music!), but as far as I know, they still lack this feature (though they are joined by all of the other large search engines).

Large-scale query-by-humming systems already exist, so the lack of those isn’t a problem, but video could also benefit from such an approach (find video with this sound, find video with this frame, etc.). Images could be broken down using MPEG7 descriptors, time series analysis after linearization by a Hilbert curve, or vector quantization, among other techniques. Music could be broken down by a Fourier transform/power spectrum analysis; even the mood of the piece can be accurately predicted by this technique (according to the literature). Video search can be treated as a simple array of images and music (frames) and solved by the bagging the previous two methods.

Dissertation – Week 2

Tensor theory is deep. It’s one of those areas that’s much more complicated than it needs to be. The good news is that I can still pick the fundamentals up within an hour and it’s going to enable me to write a lot of background if I want to (I’m thinking around a 35 page background section is sufficient, leaving me around 80 or 90 for methodology and the remainder for other sections and appendices). I’m done talking about SVD and its equivalents, but now I get to mention PCA, ICA, and LSI! 🙂

I wanted to perform an actual experiment this week and get started on the methodology, but unfortunately, I don’t understand how our data is formatted and the only one who can explain it to me isn’t around when I am until next Tuesday, so that is going to have to wait.

I’ve written 4.5 pages so far; 6 left to meet the weekly goal. It may all be background, but I can do it if I take tomorrow and maybe Saturday to write.

And Einstein notation? I admire Einstein as much as the next guy, but that’s a pretty stupid idea, honestly, whether Einstein introduced it or not. Think about it: not only are you summing without using a + or summation symbol, but you’re now using both subscripts and superscripts to describe things that are not true indices or exponents. I certainly don’t think I’ll be using it in my dissertation.

Dissertation – Week 1

Writing this week encompases the remainder of the preliminaries, plus the beginning of the background section. In this, I explain the mechanics of general Singular Value Decomposition and the Tucker decomposition, as well as the High-Order Singular Value Decomposition (HOSVD) generalization. I’m not writing the abstract yet, as I’m not precisely sure what the full scope of the experiments will cover yet. It’ll depend on the results, so I’ll probably write the abstract last (or right before the dedications, anyway).

The pagination guidelines are interesting: the copyright page is second, but it begins the numbering with “iii”. So the second page is numbered third? (Is something inserted post-defense between the title and copyright pages?)

I’m going to pwn this dissertation.

Oh, and I’m a member of another honor society now: Golden Key. That makes five honor societies; all I’m waiting for now is Upsilon Pi Epsilon, who should have already inducted me according to their criteria.

Motor learning rate

Here’s an interesting idea, and one I’m in a position to test to boot!

Over several iterations of performing a simple motor task, an interesting pattern of activity occurs in the frontal cortex of the brain: the amount of activity diminishes with each iteration until a certain threshold is reached, indicating what appears to be motor learning behavior. It’s more or less linear, but I believe that the slope differs between subjects.

Now, my hypothesis is this: that the motor rate from a simple motor task could in fact be used to estimate the rapidity of motor learning in general. In other words, if I could stick you in an fMRI scanner for a few seconds and have you twiddle your thumbs, I could predict how fast you would be able to type or how long it would take you to learn to play the piano, for example.

But that’s only the beginning: that this is taking place in the frontal lobe rather than the cerebellum suggests that the processing may be somewhat unified with the process of normal cognitive learning, and thus may be a form of intelligence.

So can we build an IQ test from this? Maybe. I’m going to perform some quick experiments on data that we already have at the lab as soon as I can. Since I don’t have test data from the subjects, the initial analysis will be clustering, but if that succeeds, I may attempt to test them and perform regression.

If both of these hypotheses are true, Gardener’s theory of multiple intelligences may need to be amended.

Dissertation – Week 0

There’s only one good answer to the situation I’m in – leave as quickly as possible. So long as I am at Temple, particularly when I am not free, I believe I will never achieve my full potential. Thus, I am moving my original 3 year graduation goal up to 2 years – that is, I’d like to have my Ph. D. by the end of this year. My productivity in other areas will probably be attenuated, but I will NOT completely abandon other fields; I’ll just work on them less until I finish. Yes, that includes the “Treatise on the Objective Reality of Ideas”. Sorry.

After graduating, I will likely never publish anything again, nor participate in formal academia, since I am sorely disappointed with (a) the constant gatekeeping and (b) the lack of objective and original thought. I will most likely do what Einstein did: work at a relatively unchallenging job that leaves plenty of freedom to imagine (Programming? That’s fun, but hasn’t been a challenge since I was a teenager, so it may be perfect), refining my own ideas while doing so until they are hopefully revolutionary. It’s how I’ve done my best work to this point, so why would I abandon this strategy? Look what abandoning it did to me this past year!

I have completed my literature search and have a general idea of what I specifically want to research. I am starting to write right now. I’m going to San Diego for a conference over the weekend, so this week doesn’t count, but I am going to set my target for 10 pages per week starting next week (right now, my writing is just taking care of the preliminaries: topic, ToC, etc.). I anticipate a 150 page dissertation (I don’t believe in being long winded in a research paper, even a dissertation; even that is too long, but it’s the minimum I can get away with), so at this rate it will take me 15 weeks, which is approximately the length of one semester, giving me an ETA sometime around February. Provided my advisor doesn’t attempt to stop me simply because I’m trying to graduate too quickly (the way Temple operates, it wouldn’t surprise me if the faculty tried to keep me doing their research as long as possible) and that I pass the writing and preliminary exams (I’m already past quals so long as I don’t completely screw up the class I’m in now, which is possible because the midterm is scheduled the day I come back from San Diego and the professor won’t reschedule it), I should have this done well in advance of when I need to graduate. My effulgent hatred of what my life has become (I call it “the dark year”) motivates me.

I’ve already filled out all of the paperwork; even though I haven’t formed my committee yet, I know who is going to be on it.

Now, dates. I do this for my own benefit because it gives me a solid framework to work within (I need deadlines to operate):

The graduation date I am going to shoot for is August 31, 2008. This makes my final thesis due date August 1, 2008, which makes my defense deadline roughly July 15, 2008. This means I have to finish my thesis by July 1 and pass Prelim II by approximately June 17, 2008, though I should pass the Prelim by May if possible because I need six credits of CIS9999 (Dissertation research) between Prelim II and the defense and the Summer I and Summer II semesters would be good opportunities to take these courses. This should indicate a Prelim I completion date of approximately April 15, 2008 at the latest and completion of the writing exam (which by now I should have no problems with) during the first week of February.

This brings us roughly back to the present.

So there’s the strategy: work backwards from the final deadline, get everything but writing and research out of the way, and set regular goals. If all else fails, I’ll take another semester and have a very easy Fall (and a December ETA, which I can definitely meet).

The Research Classroom

The idea is simple, but should be effective: A class that meets solely to do research. It puts 30 or so minds on a project at once, and has everyone collaborating. Some people are naturally more effective in this environment too. It’s win/win.

Another (very high-level) cancer treatment idea

If killing cancer cells off proves too difficult, undoing the process of malignant transformation and turning them back into benign tumor cells would still greatly reduce mortality from the disease.

I find it sort of pathetic that such little attention is being given to alternate immortality pathways in tumor cells. Come on, people! If you can subject the cells to aging, they will not divide indefinitely. If they do not divide indefinitely, they won’t survive very long. If tumors don’t survive very long, cancer no longer becomes a problem. Blocking telomerase is good (though there is some present in the body itself, especially in children, it is probably worth the tradeoff, and it would just be a temporary treatment measure anyway), but you need to block ALL of the ways cells have of circumventing the Hayflick limit.

Plus more telomerase research might dramatically extend the human lifespan, as we begin studying what is likely one major cause of aging. Not that I think we need even more people on the planet. If anything, we have too many already.

So let’s get some more people on it! It has to start with the funding, of course. Everyone follows the funding.

I’d help if I could, but I’m sort of… stuck.

It’s almost all self-study, but I have both a broad and deep knowledge of medicine. It’s an interesting field and one that I tend to absorb like a sponge. I can diagnose most diseases based on symptoms as well as any doctor. I know about protein regulation (it’s actually just a very complicated graph theory problem) and interaction. I can do gene sequencing, though I don’t think that would be a good area to put my skills to use. I can read most biology papers with ease. I easily know more about the diseases we study than any of the other computer scientists on the team, and I’ve occasionally surprised the biologists on our team as well. As a computer scientist in bioinformatics, I have an extensive ability to support my experiments with my own computational models. I have enough experience to read some types of medical images (though not nearly as well as a trained radiologist). And, as readers would see, I come up with all sorts of wacky treatment ideas both routinely and subconsciously, which means I’ll never run out of approaches.

SO SOMEBODY LET ME USE MY TALENT!

What I can’t do is use the equipment or get access to a lab. This is preventing me from doing experiments. I’m trying to find a bio course that will give me such access and enough training that I don’t blow stuff up or contaminate the lab’s cell lines, but this is Temple we’re talking about, not Polymath – I need to jump through more hoops than Shamu to enroll in a course outside of my major (OTOH, Temple is very well known in medicine; even more of a pity).

Maybe I care too much. I just hate being barred from the implementation of my ideas. Especially when I think those ideas are for the good of the very society that shuns them. I’m sometimes tempted to simply leave society on its own and go live the rest of my life on a farm somewhere, but I just can’t do it – I am going to create because I am the type that simply must create. It’s the “compulsion of ideas” that I speak of in my Treatise, but it’s of course a facet of one’s personality rather than the ideas themselves. Of course, I also argue that one’s perception of an idea forms the relative basis (as opposed to the absolute one) of the idea’s reality, so the compulsion is intrinsic to the combination of the idea and a receptive person in a sense.

Working on another classifier right now. This one needs to compare ROIs with individual codebooks. I’m not even sure we can meaningfully compare them, since the codeword indices don’t mean the same thing in different ROIs. I might need to use wavelets, which I had hoped to avoid. Tensor decomposition is the next step (and one I should get familiar with, because I’ll be working with it a lot in my dissertation).

I just keep getting older and the work keeps getting less exciting.

I’m going to continue my math research from where I left off soon as well. Maybe people would shut up about wondering how applicable a pure math result in number theory is if I make an attempt at proving GRH with it (Robin’s theorem lets me do that).

Probably not. Probably only if I solved it, which I probably can’t do with my current results, since my recurrence still fundamentally unrolls in accordance with the distribution of the primes that make up n. That never was the goal, though; I just like doing number theory. A lot.