Collaborative filtering with del.icio.us

There are currently 6550 del.icio.us folk with whom I share common bookmarks. As nobody will be surprised to see, my link affinity with that population displays the now-familiar long tail:
del.icio.us affinity
There's a recommendation engine lurking in there somewhere, and I've decided to try to flush it out. The prototype is a two-stroke engine. First, it captures the set of del.icio.us users on the steep part of the curve -- the ones with whom I have the most link affinity. Then it reads all their RSS feeds, coalesces the links, and applies another filter to select just the links above a threshold of commonality.

I'm still playing with the two thresholds -- personal affinity and link commonality -- but the first cut at a synthesized recommendation feed looks like a promising way to identify an implicit community of interest and tap into its emergent group mind.

Once things settle down I'll publish the code, but meanwhile here's another kind of recommendation for you. Greg Wilson (disclosure: friend of mine) has extended the Pragmatic Bookshelf with a wonderfully pragmatic volume entitled Data Crunching. I'd read it a while ago, but I thought of it again while working on this new del.icio.us hack.

My recommender uses a mixture of shell scripting, Python scripting, regular-expression pattern matching, and XML parsing. Greg's book gently introduces these techniques as well as others: XSLT transformation, packing and unpacking binary data, basic SQL. When, why, and how to combine these methods is something we don't teach often enough or well enough. We've needed a book like this forever; I'm delighted that it has finally arrived.


Former URL: http://weblog.infoworld.com/udell/2005/06/23.html#a1256