The InfoWorld metadata explorer

After noodling some more on the question of tag-oriented query and tag discovery [1,2] I came up with an application I'm calling the infoworld explorer. It's a Firefox 1.5-only deal for now, partly because I'm relying heavily on Mozilla's XPath search API and partly because only Firefox 1.5 seems to pass this CSS test. So if this link just brings up a big table that doesn't do anything interesting, you can watch the screencast to see what I'm talking about.

Before I look into porting this app to other browsers, or recasting it in a form that's more conventionally server-based, I need to shake it down for a while. In previous efforts along these lines I've focused on search, but here it's all about navigation. You're driving around in the following collection of metadata:

article types: Features (FE), Opinion (OP), Test Center reviews (TC)
author names
dates, written as months in the form 2006-01
tags from del.icio.us/infoworld
URLs

Everything on the page is clickable. The URLs, of course, lead to InfoWorld.com articles. But every other element toggles a filter. Clicking OP hides or reveals non-Opinion items; clicking Ephraim Schwartz hides or reveals non-Ephraim items; clicking macosx hides or reveals items not tagged with macosx.

It's really just query-by-example. You start with the whole dataset, which is currently the 800+ InfoWorld features, reviews, and columns that have been tagged at del.icio.us/infoworld, and you narrow or widen the scope by clicking on metadata values. But because tags are multivalued, there's an element of tag discovery that I find really interesting.

The screencast, for example, begins by toggling the virtualization tag. The resulting display differs from del.icio.us/infoworld/virtualization in two ways. First, it's sparser because I'm excluding (for now) the InfoWorld news stories that are the majority of our tagged items in del.icio.us. But second and more importantly, it places the tags in a metadata framework that's specific to this dataset.

At a glance, I can see that the InfoWorld authors most closely associated with virtualization are Galen Grumman and Tom Yager. What's more, I can see how they approach that topic in different ways: Galen's focus is storage virtualization, Tom's is CPU virtualization.

Mario Apicella also turns up in the context of storage virtualization. The combination of his name with two tags -- storage and virtualization -- yields just one row. But releasing the virtualization filter yields a flock of results: Mario is our storage specialist. Clicking 2005-11 shows that Mario was, in fact, the author of all five storage-tagged items that month. Releasing the date filter and switching authors to Galen, while retaining the storage filter, reveals the tag iscsi. Switching to that tag surfaces a couple of other authors: Logan Harbaugh and Paul Venezia. Again, the tags tell you something about their interests. By focusing on one or the other, and then releasing the iscsi filter, you can instantly apprehend their full ranges of topics.

There's a lot of geeky stuff under the covers here. Apart from the external links, the application is self-contained and works offline -- except for the clickthroughs to InfoWorld.com, of course. It's produced by a Python script that writes out a combination of code (JavaScript) and data (XHTML). Although the HTML file doesn't have to be well-formed XML in order for the JavaScript XPath search idioms to work, the fact that it is well-formed means that there's a kind of query portability. If you save the file you've got an XML dataset that will respond to the same XPath queries used in the interactive version.

But all this is really beside the point. The question is whether it's ultimately useful. If you're an InfoWorld author or editor, it's going to be very useful, and I've got a hunch that this app will motivate some upgrades to Firefox 1.5. People more casually involved with InfoWorld-related data, though, will be less excited to be able to explore it in this way.

Broader appeal will come, I suspect, when people can bring their own filters to the application. Everybody has a unique mix of preferred information sources, authors, and associated tags. Combining those personal filters with site-specific filters, in the context of this kind of application, would be really compelling.

Former URL: http://weblog.infoworld.com/udell/2006/01/23.html#a1375