Lightweight XML search servers, part 2

In last month's installment I showed a simple search service that uses libxslt to reduce a file of XML content (my weblog writing) to just the elements matching an XPath expression. This month's challenge was to scale up to a database-backed implementation using Berkeley DB XML. [Full story at XML.com]

After looking at my implementation, John Merrells, the creator of DB XML, wrote to ask why I was using the libxml2 XPath feature to search within documents returned by DB XML XPath queries. Didn't I know that DB XML offered a document-level XPath query function, as well as a database-level one? Heh. Actually, I hadn't known.

There's some sort of object lesson here. Lately I've grown extremely fond of the libxml2/Python combination. When I need to process XML, that's how I want to do it. But having developed this habit, it also becomes necessary to break it from time to time. Materializing the libxml2/Python combination, on a given platform, can absorb time and energy that may be better spent elsewhere, and it can even lead to compromises.

Case in point: my original implementation of this service used Jython to talk to the DB XML Java API. This was actually a great combination. It married Python's flexibility to a more robust and complete DB XML API than is available from the C flavor of Python. However, it lacked my new old friend, libxml2. So I wound up using an older version of DB XML (1.2, rather than the latest 1.2.1) in order be able to use C Python. Which, as it now turns out, was unnecessary, since DB XML supports both database-level and document-level querying.

It's amazing how one wrong or missing piece of information can wind up dictating a major architectural choice. And how one unexamined habit can make us vulnerable to that outcome.

Former URL: http://weblog.infoworld.com/udell/2004/02/23.html#a926