Moving forward with microformats

If I were near Sandy, Utah, I'd love to attend the Asterisk users group meeting that Phil Windley mentions today. It would be fascinating to learn more about what people are doing with that open source PBX. Phil's announcement was interesting for another reason too. He's using microformats to transmit structured data about the event, and a combination of local and remote code to activate that data.

Here's the structured data:

<div class="veventFormat"> 
  <div class="vevent"> 
    <div class="summary"> Asterix Users Group Meeting </div>
    <abbr class="dtstart" title="20060111T1900-0700"> 
      January 11, 2006 @ 19:00 </abbr> 
    <abbr class="dtend" title="20060111T2030-0700"> - 20:30 </abbr>
    <div class="location"> 8160 S. Highland Drive in Sandy Utah 
      (<a href="http://utaug.org/files/directions.html">map</a>)
    </div> 
    <div class="description">
      Meet in the executive conference room of Willow Creek Plaza. 
      Speakers include Dave Packham, longtime Asterisk Guru. 
   </div>
  </div>
</div>

This strategy combines ordinary HTML that's meaningful to the browser with CSS style tags that double as semantic cues, so you can extract and reuse the structured data. The extraction can happen client-side, as I showed a couple of years ago in my XML.com article on interactive microcontent1, or it can happen server-side. Phil's doing it server-side, thought it's not obvious because a formatting bug prevents the server-side URL from being activated. Here's what you're supposed to see:

download to calendar

This link feeds the URL of Phil's page to Brian Suda's X2V service, which extracts hCalendar (or hCard) data from the page.

Once I got the link working, I clicked it and was surprised in two ways. First, because two events loaded into my calendar:

BEGIN:VCALENDAR
PRODID:-//suda.co.uk//X2V 0.6.5 (BETA)//EN
X-ORIGINAL-URL: http://www.windley.com/archives/2006/01/asterix_users_g.shtml
X-WR-CALNAME: Phil Windley's Technometria | Asterix Users Group in Utah
VERSION:2.0
 
BEGIN:VEVENT
SUMMARY:Univ. of Hawaii Network Testing Lab
DTSTART:20060126
DTEND:20060130
END:VEVENT
 
BEGIN:VEVENT
DESCRIPTION:Meet in the executive conference room of Willow Creek Plaza. 
LOCATION:8160 S. Highland Drive in Sandy Utah (map)
SUMMARY:Asterix Users Group Meeting
DTSTART:20060112T020000Z
DTEND:20060112T033000Z
URL:http://utaug.org/?q=node/9
END:VEVENT
 
END:VCALENDAR

The problem is that Phil's page also includes an upcoming events blurb, which uses the same technique. The hCalendar parser finds every VEVENT on the page, not just the one Phil intended.

The second surprise was that the event loaded into iCal, because that was still the default for .ICS files on the Mac I'm using at the moment. When I switched the default to Sunbird, no joy, it doesn't yet seem to handle .ICS files on launch.

If this all sounds like a lot of hassle, well, it is, but my purpose is not to complain. The only way we'll move forward is by doing the kinds of experiments Phil has done here, discussing them, and then applying the lessons we learn.

Another important experiment is Bob Wyman's structured blogging initiative. When Paul Kedrosky predicted that structured blogging will flop, Bob Wyman shot back that it already is, in fact, playing in Peoria. He pointed to an entry on a WordPress blog that announces a city council meeting in Peoria, using the WordPress event plugin provided by StructuredBlogging.com.

Can X2V read that page? Let's try. Nope:

Warning: Sablotron error on line 136: XML parser error 4: not well-formed

How about this page? Yup:

BEGIN:VCALENDAR
PRODID:-\//suda.co.uk//X2V 0.6.5 (BETA)\//EN
X-ORIGINAL-URL: http://jonudell.net/udell/gems/peoria.xhtml
X-WR-CALNAME: Peoria Pundit - Blog Archive - City Council will meet Tuesday
VERSION:2.0
BEGIN:VEVENT
LOCATION:Peoria City Hall
SUMMARY:CITY OF PEORIA\, ILLINOIS CITY COUNCIL AGENDA
DTSTART:20051220T181500
DTEND:20051220T181500
END:VEVENT

The difference? I ran the page through HTML Tidy to get well-formed XML2.

As gnarly as all this seems, I see progress and even momentum. Interactive tools -- the data-entry plugins and display templates that Bob Wyman's team offers, the viewers and extractors that Brian Suda provides -- are part of the equation. The other part will involve blog aggregators and search engines that pay attention to these embedded structures. I've been demonstrating structured search on this blog, and publishing microformats in my RSS feed, for a couple of years. For a while I was using the MarkLogic XQuery engine to extend structured search over the whole set of blogs that I read. Here's hoping that somebody will take the next step this year and extend structured search -- at least for a handful of well-known information types -- across the whole blogosophere.

Update: Scott Reynen has a spider that looks for microformatted content. Here's a query for event information about Peoria. And here are all the events it knows about so far. Excellent! One of the event sources I found there, by the way, is Brian Suda's page of lesser-known holidays. Apparently today is International Thank You Day.


1 The code used in that article to illustrate the DOM Range API fails, I've just discovered, in Firefox 1.5. If anybody knows why, please clue me in.

2 It wasn't entirely automatic, unfortunately, I had to wrestle with character encoding issues too.


Former URL: http://weblog.infoworld.com/udell/2006/01/11.html#a1368