XQuery adventures

I've outfitted my Mark Logic-based XPath query service with a stats page that summarizes the feeds I'm collecting. To give you a sense of Mark Logic's model for CGI-style XQuery programming, the text of the script appears below.

I have some thoughts on the programming model, but before I jump to conclusions I think I'll wait for reactions to this example. I'm a complete XQuery novice, and there are almost certainly better idioms for what I'm doing here.

I do, however, want to use this entry to show how structured search could enhance the blogosphere. Most of the sample queries in the XPath demo rely on what I'll call the natural vocabulary of HTML: links, image tags, tables, and whatnot. But there are a couple of examples that use what I'll call extended vocabulary. For example, this query finds links decorated with the rel="tag" idiom used by Technorati tags. In my current batch of feeds, Jim McGee's is the only one using that idiom, and my query shows that he's used it four times in the items I've collected. One of those uses, the tag SocialNetworking, occurs in this item. That means the item should also appear here, as in fact it does.

In the same spirit, I'm going to instrument this item in a couple of ways. First, the XQuery code shown below will be tagged like so:

<pre class="code xquery"> ... </pre>

Second, I'm going to mention the fact that I hope to attend the ETech conference next month and, as I do so, I'm going use this syntax that I just made up:

<span class="event etech mayattend"> ... </span>

Once the database assimilates this entry, these queries should work:

xquery fragments

items about events

Now we could argue until the cows come home about syntax, except I won't. As I explained here, languages evolve when we speak, listen to others, and imitate one another. I don't care what syntaxes emerge, I only care that they emerge.

The tagging services have created flexible and supportive environments in which people can play with metadata vocabularies. If (or when) the blogosphere is fully indexed for structured search, will it create a similarly flexible and supportive environment in which to evolve some simple ways to talk about things like events, product descriptions, and media objects?

If your feed is one that I read and you want to play along, try adding some simple constructs like these to your content so we can search for them. Note that in order for this to work, whatever you want to include will have to be visible to the RSS world. In other words, it has to appear in the abbreviated part of the entry, if that's all your feed exposes. Your content will also have to be convertible, by HTML Tidy, into well-formed XHTML -- a test that most of my feeds seem to pass.


XQuery source code for the stats page

xdmp:set-response-content-type("text/html");
  
define function count-items ($channel)
{
count(//item[@channel=$channel])
}
  
define function most-recent-items ($channel)
{
let $items := //item[@channel=$channel]
for $sorted in $items
order by $sorted/date descending
return $sorted/date
}
 
define function most-recent-item ($channel)
{
let $dates := most-recent-items($channel)
return $dates[1]
}
 
define function stats-row($channel)
{
let $channel := $channel
return
<tr>
<td> { $channel }</td>
<td align="right">
<a title="details" 
  href="http://udell.infoworld.com:8006/details.xqy?channel={$channel}">
    { count-items($channel) }</a> 
</td>
<td> {most-recent-item($channel)}</td>
</tr>
}
 
define function stats-rows($sort)
{
if ( $sort eq 'name' )
 then stats-rows-by-name()
 else
   if ( $sort eq 'count' )
     then stats-rows-by-count()
     else stats-rows-by-recent()
}
 
define function stats-rows-by-name ()
{
for $channel in distinct-values(//item/@channel)
order by $channel
return stats-row($channel)
}
 
define function stats-rows-by-count ()
{
for $channel in distinct-values(//item/@channel)
order by count(//item[@channel=$channel]) descending,
 most-recent-item($channel)
return stats-row($channel)
}
 
define function stats-rows-by-recent ()
{
for $channel in distinct-values(//item/@channel)
order by most-recent-item($channel) descending, $channel
return stats-row($channel)
}
 
let $sort-param := xdmp:get-request-field("sort",""),
$sort := if ( $sort-param eq '' ) then 'name' else $sort-param,
$stats := stats-rows($sort)
 
return
 
<html><head><title>XPath query of Jon's feeds: stats</title></head>
<style>
<!--
body, td {
font-family: verdana,arial,sans-serif;
font-size: 10pt;
margin-left: 5%;
margin-right: 5%;
padding-right: 40px;
}
-->
</style>
<body>
 
<p align="center">stats for Jon's feeds 
(<a href="http://udell.infoworld.com:8006/xpath.xqy">queries</a>)</p>
 
<table align="center" cellpadding="2" cellspacing="0">
<tr>
<td align="left">
 <a href="http://udell.infoworld.com:8006/stats.xqy?sort=name">by name</a>
</td>
<td align="center">
  <a href="http://udell.infoworld.com:8006/stats.xqy?sort=count">by count</a>
</td>
<td align="center">
  <a href="http://udell.infoworld.com:8006/stats.xqy?sort=recent">by most recent</a>
</td>
</tr>
{ $stats }
</table>
  
</body>
</html>

Former URL: http://weblog.infoworld.com/udell/2005/02/18.html#a1180