Dynamic categories

A while back I stopped assigning the items I post here to categories. It wasn't because I couldn't be bothered to do the categorization. Quite the contrary, I'm really interested in achieving that result, and more than willing to put some effort into it. But, although I'm generally a huge proponent of the publishing technique I call static serving of dynamically-generated pages, it increasingly seemed like the wrong way to deal with categories.

Lately it's becoming clear how the XPath search technology I've been working with will enable a fully dynamic approach to categories. For example, after posting yesterday's item, it struck me that two labels I'd have wanted to attach to that item were: books, and AV clips. So I added these two queries to the list of canned queries on the search page:

books: //p[contains(.//a/@href,'amazon.com') or contains(.//a/@href,'allconsuming')]

AV clips: //p[contains(.//a/@href,'.mp3') or contains(.//a/@href,'.wav') or contains(.//a/@href,'.mov') or contains(.//a/@href,'.ram')]

Each of these queries finds yesterday's item (and this one too, actually). Each also forms a result page that could serve as a category page. There are a bunch of other queries that haven't been written down yet, but that implicitly categorize the same item in other ways. For example: Doc Searls quotations. Or Jeremy Rifkin's The Age of Access. Query. Gotta love it.

I also added some instrumentation to the search page that reports the number of entries searched (213, as of this one), and the date of the earliest entry searched (April 2003). Here are some next steps:

XHTML-ize the 500+ earlier entries. That's done, pending some cleanup, thanks to HTMLTidy.

Just say no to WYSIWYG editors, such as the MS DHTML edit control, that insist on mangling your content.
Expand the roster of queries. The earlier entries contain implicit metadata that, once exposed to search, will suggest additional query possibilities.
RSS-ify queries. This handy technique, already practiced by Technorati, Feedster, and others, could be quite interesting in this context. If I refine a query so that it reshapes a category, you'd be notified. Could be annoying too, if I fiddle around too much, but we'll see how it goes.
Upgrade the search server. I'm currently running with an almost perversely minimal setup. The next incarnation, which is in the pipeline, uses Berkeley DB XML instead of a bare XSLT processor.

I'm still deciding whether to stick with Python's mini-httpd (BaseHTTPServer), or switch to something else. But here's a larger issue to consider. Most bloggers don't have the ability to maintain any non-standard server-side infrastructure. So if this approach is going to scale, it can't require that. I've been thinking about this for a while. It ties back to RSS. Any feed that includes well-formed XHTML content can deliver that content to a search service. So Technorati, or Feedster, or another service that's already in the business of aggregating and searching feeds could also offer XPath (or ultimately XQuery) services. I would love to see that happen.

Former URL: http://weblog.infoworld.com/udell/2004/01/15.html#a887