Tag clouds that manage data in the cloud

Tag clouds have become a favorite way to visualize unstructured textual data. Run the 2007 US State of the Union address through a tag cloud generator and the words that dominate -- iraq, terrorist, quaeda, bagdhad -- tell you a lot about what George W. Bush said and what was top-of-mind for Americans in that moment.

There's a very different way to use tag clouds, though, They can work with structured data too. And you can use tag clouds not only to visualize, but also to manage, such data. I first hit on this idea back when del.icio.us (as it was originally spelled) was one of the darlings of the mid-2000s Web 2.0 movement. That was before Joshua Schachter's brainchild was acquired by Yahoo! -- and, years later, by AVOS.

What I discovered, back in 2006, was that del.icio.us is a database. Now obviously the service, like all tag-oriented services, runs on top of a database. That's not what I meant. Instead I meant that Delicious is a database. More precisely, it's a cloud-based database that enables people to craft useful applications to manage personal or shared data. And those applications (in their basic form) require no conventional programming. The trick can be accomplished purely by means of disciplined use of tag names and tagging conventions.

You can still use Delicious that way, but the AVOS makeover obscures the possibility. Also, the AVOS version is free. As I said in the inaugural column of this series, that's not necessarily a good thing. Information may or may not want to be free, but services that manage free information want to be valuable. So nowadays I use a commercial service, Pinboard, which faithfully replicates (and in some ways improves on) the original del.icio.us service. So I'll use Pinboard to illustrate the unconventional style of database development that I pioneered in Delicious.

There's a particular kind of application for which this approach is best suited: directories of web resources. It's a niche, to be sure, but one that we all find ourselves occupying at one time or another. Maybe you're a web DJ making playlists. Maybe you're a localvore building a directory of local food producers. Maybe you're an environmental group cataloguing geotagged photos of drainage culverts. In all these cases there's a common thread woven through the data you need to manage: URLs. For the playlists you link to MP3s. For the local food producers you link to their websites. For the drainage culverts you link to images. In each case you are making a list of web resources. The most natural way to do that is to bookmark them. The most effective way to do that bookmarking is to do it in the cloud. But what then? How do you turn those bookmarks into an online directory that people can view and interact with?

The answer is simple but not obvious, which is why most people will conclude that a cloud spreadsheet on Google Drive or SkyDrive is the right solution. And indeed those are good solutions. The alternative I'll show you here isn't for everyone, it's for power users. But if you are a power user, or aspire to be one, read on.

Here's a real problem I had to solve recently. As part of my calendar syndication project I needed to build a particular kind of online resource directory. It's a compendium of applications and services that people can use to publish iCalendar feeds, along with examples of live sites that are using those applications and services to manage their web calendars. Conceptually it's just two parallel lists of bookmarks. Over time, as I find URLs that belong on one or another of those lists, I bookmark them. How do I target a bookmark to one or the other list? Using tags. Everything on the list of example sites is tagged ical-example. Everything on the list of iCalendar producers supporting those examples is tagged ical-producer.

Because Pinboard, like Delicious before it, can deliver views of bookmarks filtered by individual tags (or, as we'll see, by combinations of tags), I can point to those two lists like so: ical-examples, ical-producers, So far so good But how can we match examples to producers?

As you may have guessed, by using more tags. Let's consider one example/producer pair. The example is the Brown University calendar. The corresponding producer is Bedework, an open-source enterprise calendar system that supports (and, indeed, as a member of CalConnect, has participated in the development of) Internet calendar standards. So when I bookmarked those to URLs -- Brown's web calendar and Bedework's home page -- I joined them with a common tag, like so:

Brown	Bedework
ical-example	ical-producer
id:brown	id:brown

As a result, when you locate the Brown example in the examples list it links, via the id:brown tag, to Bedework. Likewise when you locate Bedework in the producers list it links, again via the id:brown:brown tag, to Brown.

The syntax I'm using here implies an id: namespace that's meaningful in some way. And it is, but only because I'm imposing that meaning. From Pinboard's point of view there's nothing special about colon-delimited prefix/suffix combinations. To it, id:brown is just a tag name, a string of characters like ical-producer or xyz123. I, however, can chose to regard id: as a namespace that links examples to producers.

I can invent other namespaces too. For example, iCalendar producers are often associated with content management systems. I'm often asked how WordPress blogs can host calendars that provide iCalendar feeds. So I keep a list of WordPress plugins that can do that. So far I've found four of them, and I can point to that list using the tag combo cms:wordpress+ical-producer. Or I can point to the parallel list of sites that illustrate the use of those plugins using the tag combo cms:wordpress+ical-example.

Here again the parallel lists are linked by way of the id: convention. So if you locate GigPress in the producers list it points to the Boston Music Intelligencer which uses that WordPress plugin. And if you locate the Boston Music Intelligencer in the examples list it points to GigPress, the provider of the plugin. Both link, via cms:wordpress, to the full set of WordPress examples and producers. When I discover another plugin/example pair I'll just coin an id: tag for it and save a pair of bookmarks pointing to the plugin and its example. The id: tag will link them together. The ical-producer and ical-example tags will target those respective lists. And the cms:wordpress tag will link both items to the family of WordPress-related examples and producers.

The result is a densely interlinked and usefully navigable web of information. Basic functions, like sorting views by title or time, come for free. And the tag cloud shown in every view highlights important items. In the producers view, for example, cms:wordpress dominates the cms: namespace.

That interface is, admittedly, unconventional. So I also provide a conventional view that flows the information through an HTML template. That required a bit of programming, to be sure, but only a bit. Like Delicious before it, Pinboard provides data feeds for views driven by individual tags or combinations of tags. For anyone with even minimal programming skill it's easy to fetch those feeds, recombine them, and flow them through templates. That isn't what limits the power of this approach. Instead the only limit is your imagination. If you can dream up a way to use tags to organize sets of resources, and then apply that method consistently, any service that supports free tagging -- and also supports queries for tags or combinations of tags -- is a cloud platform you can use to manage directories of resources in the cloud.