Searching academic papers

While doing a literature review for online decision tree construction algorithms, it struck me that Google’s pagerank model, though designed to model the importance of pages as if they were papers in academic journals, is backwards for certain types of literature reviews. For example, suppose I were looking for state-of-the-art research rather than “classic” results in the field.

Using the pagerank model, papers that are cited often will have high pageranks (depending on the number of citations and pagerank of the citing papers), while most citing papers will tend to have lower pageranks, especially if new. However, if I am interested in the state-of-the-art, I would be most interested in papers that build upon many previous results; that is, papers with many outgoing citations. I would also like to eliminate very frequently-cited sources (such as The Art of Computer Programming or Introduction to Algorithms, which are cited for such trivial concepts as the definition of a tree) from the analysis, as those citations have lost almost all meaning.

This is the exact opposite of what the PageRank algorithm does as it is described in “Anatomy of a Large-Scale Hypertextual Web Search Engine”. It is closer to techniques such as tf-idf if anything.

Leave a Reply

Your email address will not be published. Required fields are marked *