Category Archives: Research

My algorithm is new. I'm going to publish it.

I have not found any sources that already discovered my online manifold learning technique, so I am going to write up a paper for it and submit it for publication in a journal. It might take a year for that as well, but I suppose there’s nothing I can do about it.

Ordinarily, I wouldn’t publish it at all, as I don’t particularly like playing that game, but this is something worth disseminating.

Turnaround time

We have finally submitted the journal paper. We can expect to hear back… oh, probably in about a year.

Just think – the state of the art is constantly at least one year behind simply due to the turnaround time of research journals!

Researcher's Golden Rule no. 2

These rules are good research conventions that I’ve adopted based on both their intuitive appeal and the observed consequences of not applying them. The first is “it always needs more study”, which refers both to the perfectionism that can keep people from ever accomplishing anything as well as the convention of stating this in papers. I only intended one, but then I realized that there are a number of unstated rules that lead to good research productivity. That said, the second can be given as follows:

“Don’t be sloppy.”

The methodology / algorithm should be clean and easy to understand. So should the way the data is formatted. Make sure that the function of each file is immediately clear and that the entry point to running the experiments is easy to spot (something like run_classification_experiments.m is a good idea). Program generically, as your dataset and analysis will probably change at some point. Don’t program only for yourself, because at some point, someone else is going to need to run your analysis. That person will not think highly of you if you make his life difficult. Don’t program unless you know how to program well; it is a vital skill in computer science research and you should be as proficient in it as a professional programmer would be.

I spent the majority of this weekend wrapping data up from over a thousand different .hdr / .img files into one matlab “data” structure. The fields of the structure correspond to properties of the data. For example:

data.Source //”DVD 1″
data.task //”Left Squeeze”
data.subject //”John Doe”
data.volume //Raw image data.
data.foregroundIndices //Indices into volume that represent foreground voxels.
data.wavelets //Wavelet descriptors of the volume.

etc.

This is neat. Any researcher just joining the project could easily follow what is going on in this structure.

Fractional Tensor Modes

Today’s random thought: tensors have an integer rank, but what would happen if we extended the notion of a rank into the entire domain of reals (or even to complex numbers)? What would it mean for a tensor to have a rank of 2.5? Would the tensor have a fractal structure? What about a tensor of rank i?

Not the sort of question I have the time to chase, but an interesting one.

Dissertation – Week 4

I’m playing catch-up from last week, due to the hefty machine learning workload I was given then. I’ve finished my 10 pages for this week, so the remainder is simply make-up work.

I’m done with the background, though I’ve just decided to add CUR and CANDECOMP to the mix. I’m performing the wavelet experiments now; with luck we will be able to apply tensors to them after this week is over. I also found a few problems with the way SVD is described in our grant proposal; I’ve made sure to avoid replicating those mistakes in my paper.

The paper is starting to get very… verbose. But I guess that’s to be expected after writing 30 pages on something that really doesn’t need it.

On the upside, I’m more than 1/5 of the way done, according to the number of pages I’ve written. Yay 🙂

Manifold learning in AI

Manifold learning techniques such as SDE have the ability to extract data from a high dimensional space and describe it in terms of its degrees of freedom. Thus low level concepts such as “collection of pixels” become integrated into higher-level concepts such as “teapot rotated at this angle”.

In other words, this is how you teach a system abstraction. Thus, use of something of this nature may be a necessary component of an artificially intelligent system. The only problem is that current methods may be computationally infeasible for this use. Of course, approximation would be a good idea here.

"Fit", "Broader Impacts"

The world would be a much better place if everyone stopped worrying about whether ideas “fit” the purposes of their specific organization / community and simply accepted them on their perceived merit (again, my philosophy holds that the absolute merit of an idea is inestimable).

And I’d love it if I could stop having to explain how my theoretical computer science research helps every minority under the sun (but not white males; that’s taboo). First of all, it’s very difficult to explain how developing a streaming kernel PCA algorithm helps starving children in Africa. Second, my research ultimately helps everyone (by adding to knowledge, which can then be used in all sorts of ways) or no one (if nothing is ever developed on them). Which is the case depends entirely on how these ideas are used.

If these things are more important than the quality of the research, it’s no wonder the USA is losing its technical edge!

Streaming Semidefinite Embedding

I’m posting this just for the purpose of timestamping. Today I proposed an idea to stream kernel learning techniques such as semidefinite embedding. The trick is to pass minors one row and column at a time (actually, the matrix is symmetric, so just one row) and update using incremental kernel PCA. This results in an algorithm that only needs to store N elements in memory at a time rather than N^2.