Querying the blogosphere

Here's something you don't see every day. It's a view of my inbound feeds organized by bloggers who most frequently cite books on Amazon.com:
Koranteng's Toli 19
Bernie Thompson 4
Michael Rys 4
Miguel de Icaza 4
Ted Leung on the air 4
All Things Distributed 3
BookBlog 3
Don Box's Spoutlet 3
Jon's Radio 3
Jon's Radio (full-length descriptions) 3
McGee's Musings 3
Push Button Paradise 3
Evan @ PCSeattle.org 2
Infectious Greed 2
Mark O'Neill's Radio Weblog 2
rc3.org Daily 2
Chad Dickerson: CTO Connection 1
CollabuTech 1
DarrenBarefoot.com 1
Kim Cameron's Identity Weblog 1
Kingsley Idehen's Blog 1
Lessig Blog 1
Making it stick. 1
Open Access News 1
Phil Windley's Technometria 1
Ray Ozzie's Weblog 1
Smalltalk Tidbits, Industry Rants 1
The Now Economy 1
WebMink 1
bob mcwhirter 1
d2r 1

Clicking the name of a blog gets you a listing of all the items for that blogger, and clicking the number gets you just the items that cite books. All those links simply reuse the XPath query service I introduced on Wednesday and extended on Friday, which is pretty cool. But what I'm doing in this example -- the live version of which is here -- takes the next step. XPath can do a lot, but with an XQuery implementation like Mark Logic's you can do a lot more. I intend to find out just how much more.

I'll include the source code again, for those of you who are geek-inclined, but I hope those of you who aren't won't be put off by that. I'd really like everyone to consider the possibility of a blogosphere that's queryable in these kinds of ways. What kinds of questions would you like to be able to ask? What kinds of views of the blogosphere would you like to see? Our imagination is the limit.


XQuery source for the book blogger view of Jon's feeds

xdmp:set-response-content-type("text/html");
 
define function bookitems()
{
let $items := //a[contains(@href,'amazon.com') and
  matches(@href,'\d{9,9}[\dX]')]/ancestor::item
return $items
}
 
define function bookitems-in-channel($items, $channel)
{
count($items[@channel=$channel])
}
 
define function quote($s)
{
replace($s,"'","''")
}
 
define function bookchannels ()
{
let $items := bookitems(),
$channels := distinct-values($items/@channel)
for $channel in $channels
let $allitems := count(//item[@channel=$channel]),
$bookitems := bookitems-in-channel($items,$channel)
order by $bookitems descending, $channel
return
 
<tr>
<td align="right"><a title="all {$allitems} items"
  href="http://udell.infoworld.com:8006/xpath.xqy?\\
  xpath=//item[contains(@channel,'{quote($channel)}')]/title">
  {$channel}</a> </td>
<td align="right"><a title="just the book items for {$channel}"
  href="http://udell.infoworld.com:8006/xpath.xqy?\\
  xpath=//a[contains(ancestor::item/@channel,'{quote($channel)}')
  and contains (@href,'amazon.com')
  and matches(@href,'\d%7b9,9%7d[\dX]')]">
  { $bookitems }</a></td>
</tr>
}
 
let $bookchannels := bookchannels()
 
return
 
<html>
<head><title>Book bloggers in Jon's feeds</title>
<style>
<!--
body, td {
font-family: verdana,arial,sans-serif;
font-size: 10pt;
margin-left: 5%;
margin-right: 5%;
padding-right: 40px;
}
-->
</style>
</head>
<body>
<p align="center">a view of my <a
  href="http://udell.infoworld.com:8006/stats.xqy">feeds</a>
  by book bloggers (<a href="http://weblog.infoworld.com/udell">how
  this works</a>)</p>
 
<table>
{ $bookchannels }
</table>
 
</body>
</html>
 

Former URL: http://weblog.infoworld.com/udell/2005/02/19.html#a1181