Thanks to Rahul Dave for a pointer to Taher H. Haveliwala's WWW2002 paper on topic-sensitive page ranking. It takes a while to download all the PNGs that display the equations. But the key points of this mathematically-dense paper are easy to grasp:
- It's computationally feasible today to precompute sets of topically-biased PageRank vectors. In simple terms this means you could differentiate between blues in an Arts sense (the musical genre) and blues in a Health sense (depression), using the Open Directory as a source of classification data.
- A fruitful area for future work is the incorporation of context into queries. For example: "A search for ``basketball'' followed up with a search for ``Jordan'' presents an opportunity for disambiguating the latter."
An architecture of community-based page collections would, in principle, supply a lot of context that could be used to disambiguate queries. Hopefully RCS and systems like it will move us in that direction.
Former URL: http://weblog.infoworld.com/udell/2002/05/09.html#a228