A progress report on InfoWorld's del.icio.us experiment

Now that InfoWorld's experiment with del.icio.us tagging has been running for a while, it's a good time to step back and assess how things are going. Let's start with this column on AJAX,which I wrote back in April. If you visit that page, you'll find this widget near the top of the right column:




The See Also links, assigned by editors, point to a pair of stories about Firefox security. That's a useful perspective to add: the column talks about Greasemonkey, and Greasemonkey does raise security concerns.

The tags were also assigned by editors. As a list of words -- AJAX, DHTML, Greasemonkey, JavaScript, appdev, Firefox -- they usefully describe the column. That same list, by the way, can appear in other contexts. This query, for example, returns (in part):

The effect is uneven because only recent articles are tagged, but note how surfacing the tags in this context provides clues that help you decide which items to drill down into. (When you search just my blog, the effect is more pronounced because nearly all my items are tagged.) In the long run we'll want a subtler design -- maybe with the tags inside the title attribute of the link, so they pop up while hovering -- but you get the idea.

Unlike the See Also items, which are hardcoded links from the column to other articles, the tags associate the column with other articles through a level of indirection that supports several kinds of collaboration.

Consider, for example, the current version of del.icio.us/infoworld/AJAX:

It's possible that the same InfoWorld editor tagged each of these stories, but more likely two or three did. Working as a loosely-coupled team, and without really intending to, they've created a short list of what are clearly the three most essential AJAX-related stories InfoWorld has published.

The related tags do a couple of interesting things here. Those that overlap with AJAX provide a kind of thesaurus. If you hadn't heard the term AJAX before, its strong congruence with DHTML and JavaScript would give you an important clue. The outliers are useful in a different way. The fact that Rohit Khare's article is tagged with push_technology, for example, conspires with its title -- What's next after AJAX? -- to suggest how this article differs from the others in the AJAX set.

These collaborative effects are local to a single del.icio.us account, http://del.icio.us/infoworld, which happens to be shared by multiple editors. But even in this view, we can see global effects. Each of these three AJAX stories was also tagged by other del.icio.us users. The numbers of people who bookmarked each story tells us, a priori, something about the level of interest in those stories. When you click on the and NN other people links, you can see how the group mind has processed them. Here, for example, are the common tags assigned by the 68 people who bookmarked Peter Wayner's story:

common tags
55 ajax
23 javascript
8 dhtml
8 programming
7 web
5 internet
5 xml
4 article
3 xmlhttprequest
3 development
2 webdev
2 history
2 dev
2 webdesign

As we'd expect, the dominant tags are AJAX, JavaScript, and DHTML. But at this level XMLHTTPRequest also shows up. That hadn't occurred to the InfoWorld editor who tagged this story, but the fact that three other people used the tag could influence future decisions.

It might also make sense to mine the database, compare the community's vocabulary to our own, and adjust accordingly. In the del.icio.us screencast, I do that comparison and adjustment manually. Of course it could also be automated.

Consider, for example, this InfoWorld-assigned tag: del.icio.us/infoworld/Athlon_64. If you look in that bucket you'll find two InfoWorld stories. In the corresponding global bucket -- del.icio.us/tag/athlon_64 -- you'll find the same two stories. Nobody else is using that tag. However, people are putting items into this global bucket: del.icio.us/tag/Athlon64.

This, to me, is the most fascinating aspect of del.icio.us. There are no right or wrong categorizations, there are only statistical clusters. The Athlon_64 cluster that InfoWorld's editors have created is useful, in the same way that the AJAX cluster is useful. It's a collaborative way to put related items into a bucket. Or, another way I like to think about this, it's a collaborative way to create a self-updating list. If I give you the del.icio.us/infoworld/athlon_64 link today, it's a list of two items. But if you resolve that link in July, it might be a list of four items. Recall also that every del.icio.us tag is an RSS feed. So you can subscribe to InfoWorld's Athlon_64 stories.

The Athlon_64 cluster is, however, not as useful as it might be. The Athlon64 cluster is bigger and more active. Nothing requires us to bow to the will of the group mind -- by changing Athlon_64 to Athlon64, or perhaps by using Athlon64 in addition to Athlon_64 -- but it would be in our interest to do so. If more people are congregating around Athlon64 than around Athlon_64, why wouldn't we want to be there too?

In general, I'm getting the impression that InfoWorld's use of underscores in tags isn't common. Since it's easy to add or rename tags, and since there's no penalty for maintaining variants, it might be useful to take a pass through InfoWorld's items and add variants that don't use underscores.

It would be interesting to do a full analysis of the intersection between the global tagspace and InfoWorld's tagspace, across the set of InfoWorld-tagged items. The ability to visualize and tune these kinds of intersecting vocabularies is clearly one of the next frontiers for social tagging.

