Enjoying XPath search

I've made a few refinements to yesterday's XPath search hack. And, now that it's so easy to classify and locate code fragments, I wanted to update yesterday's posting with a detail I forgot to mention. Apart from the XML escaping, the data that came out of Radio wasn't quite what I needed. I wanted to turn elements like <table name="00000666"> into <table name="a666">, and elements like <date name="when" value="Wed, 27 Apr 2003 12:42:28 GMT"/> into <date name="when" value="2003/08/27"/>, so that the search script could more easily form URLs pointing back to found items. Used to be, I'd reach for Perl on occasions like this. But nowadays, Python seems to have become the tool of choice. Here's what I did:

import rfc822, re
def dateRepl(matchobj):
    ret = matchobj.group(0)
    dt = matchobj.group(1)
    l = list ( rfc822.parsedate(dt) )
    d = "%04d/%02d/%02d" % (l[0], l[1], l[2])
    return ret.replace (dt, d)
def permaRepl(matchobj):
    ret = matchobj.group(0)
    oldperma = matchobj.group(1)
    newperma = re.sub('^0+','a',oldperma)
    return ret.replace (oldperma, newperma)
f = open('weblog.xml')
s = f.read()
r = re.sub('<date value = "([^"]+)" name = "when" >', dateRepl, s)
r = re.sub('<table name = "(\d+)" >', permaRepl, r)
print r

Next, I wanted to reverse the top-level <table> elements in Radio's XML dump. The XSLT search finds things in document order, and I wanted to reverse that, to make newest results come first. Here's the XSLT transform to reverse the order of the items:

<?xml version="1.0"?> 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" encoding="us-ascii"/>
<xsl:template match="node() | @*">
    <xsl:apply-templates select="@* | node()"/>
<xsl:template match="/blog">
<xsl:for-each select="table[starts-with(@name, 'a')]" >
<xsl:sort  select="@name" order="descending"/>
<table name="{@name}">
<xsl:apply-templates />

Now these snippets in this entry will show up first in searches for Python and XSLT fragments. The whole entry will show up in the canned search I just added, entitled "complete entries containing the phrase 'XPath search'. If I modify the query to read "//body[contains ( . , 'XPath search' ) and contains (., 'reverse')]" I'll currently find just this entry.

Cool! Now there's a virtuous circle. The various flavors of query -- 'body contains phrase', 'element has class attribute with value', 'link text contains text', 'link address contains text' -- reminds me what's possible. Each of these queries is a template that encourages substitution and variation. As I do the substition and create new variations, I think of new kinds of elements that might exist, and new kinds of searches they could enable.

A particularly nifty aspect of this, which took me very much by surprise when I first realized it, is the effect of dynamically collapsing the document to just the found elements, while preserving their style and structural integrity. This has an interesting -- and to me pleasing -- visual effect. But there's also a universal-canvas kind of thing happening. In MSIE, a View Source of the generated results page only shows you the script. Likewise in Mozilla, but if you select a fragment and do right-click and then View Selection Source, you'll get the nearest enclosing XHTML element that contains your selection. You can then capture the element, for purposes of quoting, most likely, with little effort and no loss of integrity. That's very interesting.

Former URL: http://weblog.infoworld.com/udell/2003/08/28.html#a784