How dynamic categories work

In the spirit of the lightweight browser-based solution, I decided to create an equally lightweight server-based version based on Python and libxml2/libxslt. (I'm also working on a slightly heftier, but more powerful variation based on Berkeley DB XML; we'll explore that one next time.) [O'Reilly Network]

This article spells out, in more detail than I've gone into here, an approach to dynamic categories. During yesterday's roundtable at RSS WinterFest, I mentioned one use of this technique: querying for items, by date, that include QuickTime movies. Kevin Marks, who's now director of engineering for Technorati, pointed out, correctly, that there's nothing special about searching by date. What is special is a search that combines the sort of standard metadata captured by any content management system with what we might call "inline metadata" that emerges from the content itself.

A clearer example, because it involves only inline metadata, is this dynamic category for items related to books. It's a content-aware query that returns paragraphs (along with links, images, and other markup) containing URLs to amazon.com or allconsuming.com, the two book sites I commonly refer to. As a matter of fact, when I wrote the query I forgot about a third book site I commonly refer to: Safari. When I amended the query accordingly, a few more items appeared in the category. Note also that, because the query is content-aware, it can return more context (for example, entire items), or less context (for example, just links), by adjusting its scope.

Now, since the mountain will not come to Mohammed, Mohammed will go to the mountain. By that I mean: if the majority of blogs to which I subscribe won't provide me with XHTML content to search, then I will endeavor to XHTML-ize the feeds that they do supply. The reason: to extend these dynamic categories across the whole set of blogs I read. Here's a preview of a books query against the last few days' worth of my inbound feeds:

Writing is hard (Evan @ PCSeattle.org: 2003-10-23T17:06:10-08:00)

I: That's all well and good. "Being true to yourself". Sounds like you've been listening to that William Zinsser tape that O'Reilly sent you.


Computer Control (Evan @ PCSeattle.org: 2003-02-10T01:28:51-08:00)

And I read this in the context of being introduced to the "simplicity movement" with the help of Living Simply with Children and Your Money or Your Life, both of which I've only begun to read. And both of which are causing a stirring in my soul, as well as Lisa's.


Writing style and blogging (Joi Ito's Web: Sun, 18 Jan 2004 17:56:47 +0900)

My favorite reference is the Chicago Manual of Style.


Inequality and the role of "fitness" in power laws (Joi Ito's Web: Sat, 17 Jan 2004 23:41:26 +0900)

In Linked Albert-Laszlo talks a lot about power laws and makes a few interesting points. First of all, power laws on the web make two assumptions, that the network is growing and that people tend to link to sites that have the most links. Laszlo cites work by Paul Krapivsky and Sid Redner from Boston University, working with Francois Leyvraz from Mexico,


Dynamic categories (Jon's Radio (full-length descriptions): 2004-01-15T09:42:26-05:00)

books: //p[contains(.//a/@href,'amazon.com') or contains(.//a/@href,'allconsuming')]


It's All About Your Point of "View" (Dare Obasanjo aka Carnage4Life: Mon, 19 Jan 2004 06:30:09 GMT)

Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms... Then there's deployment, adoption and evangelism...


It's All About Your Point of "View" (Dare Obasanjo aka Carnage4Life: Mon, 19 Jan 2004 06:30:09 GMT)

A few days ago I got a response to this post from Michael Brundage, author of XQuery : The XML Query Language and a lead developer of the XML<->relational database technologies the WebData XML team at Microsoft produces, on a possible solution to this problem that doesn't require lots of disparate parties to agree on schemas, data model or web service endpoints. Michael wrote


The Dork Watch Up Close (Dare Obasanjo aka Carnage4Life: Thu, 15 Jan 2004 07:00:36 GMT)

Today I picked up my rash and purely impulsive Christmas buy, a Fossil Wrist.NET Smart watch. It was probably sub-consciously induced by the new kid who came to our school (around 1977) with a calculator on his watch. No matter that it was impossible to press any of the buttons to do even the most simple sums and that this was tremendously useless, the fact that it was on a watch with a calculator built in made it ultra cool and an instant friend maker.


XML For You and Me, Your Mama and Your Cousin Too (Dare Obasanjo aka Carnage4Life: Tue, 06 Jan 2004 16:17:31 GMT)

Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms. The eternal REST vs. SOAP vs. XML-RPC that has plagued a number of online discussions. Then there's deployment, adoption and evangelism.


Google Pocket Guide out now (DJ's Weblog: None)

I don't think I mentioned it directly here (perhaps partly a cause and effect of the recent blogging hiatus) but the Google Pocket Guide has recently been released. Hurrah! It's a book I worked on with Rael and Tara (nice work, you two!). Talking to people at O'Reilly last week at OSCON, it seems the guide is selling well. Hurrah again!

I haven't yet normalized the dates of the items, and there are some conversion artifacts to deal with, but you get the idea.


Former URL: http://weblog.infoworld.com/udell/2004/01/22.html#a893