Del.icio.us is a database

From time to time, I get requests for pointers to one or another of my less technical, more general-interest screencasts. Clearly I needed a bucket for these, so I added a del.icio.us tag that defines the subset of popular/general-interest screencasts. In a similar vein, as I work through the backlog of transcripts of my Friday podcasts, I added a tag that defines the subset of podcasts for which transcripts are available. Although it's intuitively obvious to me, I suspect that most people don't yet appreciate how easily, and powerfully, tagging systems can work as databases for personal (yet shareable) information management.

Del.icio.us isn't simply backed by a database, it can function as a database to which you add (a lot of) queryable columns. For example, I use del.icio.us to keep track of two broad categories of items: those I've written and published to this blog, and those others have written and published elsewhere. So this del.icio.us query:

del.icio.us/judell/ical+jonudell

is kind of like this SQL query:

select * from bookmarks where ical IS NOT NULL and jonudell IS NOT NULL

It's too bad that you can't issue this del.icio.us query:

del.icio.us/judell/ical-jonudell

which would correspond to this SQL query:

select * from bookmarks where ical IS NOT NULL and jonudell IS NULL

But as a practical matter, for most people most of the time, this kind of negation isn't too important. You mainly want to be able to easily define, and query for, multiple overlapping subsets. So for example:

del.icio.us/judell/fridaypocast

del.icio.us/judell/fridaypocast+transcriptavailable

I think most people overlook this incredibly useful capability because they're not yet comfortable with queries involving multiple terms. That may be changing though. I recently heard JotSpot's Joe Kraus say that the old adage that people will on average type only 1.3 keywords no longer holds. Now they'll type two or three keywords. The samples I've seen from the AOL data spill support that claim.

It strikes me that there's a sweet spot somewhere between this shoestring approach and the likes of Dabble DB, an application that offers powerful web-based data management. Consider how dBase and later Access were overkill for most people's recipe lists and address books, and how 1-2-3 and Excel wound up meeting the need instead. Tag systems might turn out to be the spreadsheets of modern information management.

If you buy that notion, here are a couple of next steps. First beef up the query language with support for term negation, along with string, numeric, and date comparison.

Second, enable privacy along the tag axis as well. For example, when I was collecting resources for my 2004 special report on Longhorn, I used the tag longhorn-udell-2004-04. Private bookmarks weren't then available. Now they are, so I could track a set of such resources privately. But I can't track a set of resources publicly while also tagging them with private identifiers for my eyes only. That'd be handy.

Social tagging systems are also darned useful personal information managers. If we thought more about them from that perspective, they could become even more useful.

Update: Deeje Cooley, who works on Flex stuff at Adobe (and who is also one of the co-contributors of the term screencast) pointed me to Adobe Labs' experimental NoteTag. As shown in the NoteTag screencast, this note-taking and task-managing application stores its metadata on a tag server (e.g. del.icio.us) and its freestyle content on a blog server (e.g. blogger.com). Interesting!

Former URL: http://weblog.infoworld.com/udell/2006/08/22.html#a1510