More collaborative filtering

When I reran my collaborative filter last night, I forgot to limit the degree of personal affinity. So I wound up checking all of the 6789 folks with whom I share common bookmarks -- that's a few hundred more than a week ago. ("Please wait AT LEAST ONE SECOND between queries," the API documentation says; my script does so.) The group included the minority on the steep part of this curve, with whom I share dozens of common links, plus the majority with whom I share just one link, plus everyone in between.

As before, the report mentions only those items bookmarked by at least three people besides me, and it was ranked from most to least popular.

Reading from the top down, there were few surprises. Most everything above a certain threshold of popularity had already come to my attention. But reading from the bottom up yielded some gems. In the following examples, the parenthesized numbers are Bloglines' count of references to these items.

Podiobooks - Serialized audio books in podcast form (40)

Authors receive one half of all the proceeds from the donations from listeners. The other half goes to the maintenance and upkeep of

Find-A-Human -- IVR Phone System Shortcuts (0)

Chase. 800-CHASE24. Hit five, pause, then hit one, four, star, zero.

OpenMap(tm) (22)

BBN Technologies' OpenMap TM package is an Open Source JavaBeans TM based programmer's toolkit.

Drawn! The Illustration Blog (666)

Drawn! is a collaborative weblog for illustrators, artists, cartoonists, and anyone who likes to draw.

State Machine (40)

This is a dynamic visualization of the relationship of the members of the United States Senate to different sources of campaign funding.

Howtoons (45)

Howtoons are one-page cartoons showing 5-to-15 year-old kids "How To" build things.

Snippets 0.2 (152)

Snippets is a public code repository. You can easily add code to your personal collection of code snippets, categorize your code snippets with keywords (known as 'tags'), and share your snippets via this site.

Fundable (1)

Fundable is a new service that lets groups of people pool money to raise funds or make purchases.

Trend graphing | BlogPulse (267)

BlogPulse applies machine-learning and natural-language processing techniques to discover trends in the highly dynamic world of blogs.

StormReportMap.Com - An Interactive Look at the Latest Storm Reports (12)

This site uses data from the Storm Prediction Center (SPC).

A tutorial on character code issues (30)

This document tries to clarify the concepts of character repertoire, character code, and character encoding especially in the Internet context. - the generative art resource (17) is a collection of work and research by various artists interested in the possibilities of generative art.

The first version of the list was ridiculously long but, if I use it as a filter, future lists should be much more manageable. Ideally a fully automated process will be able to produce a set of items interesting to me and -- if you are one of my 6789 (and counting) fellow travelers -- useful to you as well. Until I determine if that will work, though, I'll manually prune the list as I've done here, and use those results to update the feed.

This is starting to feel like the kind of tangential discovery I'm aiming for. For example, though I'm not an artist, I've lately gotten more interested in ways to visualize information. That's why a number of the links that caught my eye are related to illustration. Some of these I would probably have found in due course, but others I probably wouldn't have.

The fact that the Bloglines counts are all over the map is something I regard as a feature rather than a bug. Measuring popularity isn't the goal here. We have plenty of Top 100 lists. It's great to find a popular item that I've missed (like BlogPulse), but the goal here is to mine the long tail.

Of course one person's long tail is the steep part of another person's curve. So arguably this really is about popularity, but in a more nuanced way. My bookmarks (and my RSS subscriptions) declare an affinity with a group that is mainly interested in Internet technologies. Social bookmarking and blogging inform me about interesting developments in that realm, and they do so in way that is remarkably natural, reliable, and comprehensive. But there isn't yet a natural, reliable, or comprehensive way to connect me to interesting developments in tangentially related realms, where interest is determined by a group connected to mine by weak ties. Nothing yet automates the bridging function of the Gladwellian connector. That might be doable, and it's certainly worth a try.

