Word 11, XML, and the universal canvas

I'm in LA at Fusion, a gathering of Microsoft partners and resellers. In this morning's keynote, Jeff Raikes recapitulated his PC Expo talk on Office productivity futures and the Tablet PC. The new part of the talk, which brought me to the edge of my seat, was a preview of a genuinely XML-capable version of Word. By that I mean, and Microsoft seems to mean, not just the ability to export to XML, or to consume SOAP services -- capabilities that are in parts of Office XP today (Excel, Access). More profoundly, it's about a writing environment that natively produces XML which is valid with respect to an arbitrary XML Schema.

This has been a long time coming, and it's still a year away, since Office 11 isn't expected to ship until next summer. As a longtime fanatic on the subject of creating and using semi-structured information, I'm happy to see Word finally stepping up to the plate. But another part of Raikes' demo showed me what a long hard slog still lies ahead of us. Showing the Table PC, Raikes browsed to a sports web page, circled some numbers, inked some comments onto them, and fired off an email. Unfortunately what he sent from Outlook was just a picture of the data plus the ink. So the recipient could look at the numbers, but not work with them.

Let's think about why not. First, the web page was HTML, not XML. It could have been XML, rendered by IE or Mozilla, but the reality is that routine delivery of schema-valid XML as web content is a distant dream. This isn't anybody's fault; HTML was just too successful.

Even so, imagine that the capture software could XML-ify and, in limited ways, even Schema-ize the content it grabs. That should be doable, but what happens when you dump this enriched stuff into Outlook? The mail client, which is the point of capture for most of the world's keystrokes, has no use for it. That's true for all mail clients, of course, not just Outlook.

Now, granted, it's routine today to copy an HTML table from a web page and send it as an HTML email. So there's some structure there to work with. But the data doesn't describe itself. As the recipient of Raikes' email, what you'd really like to get is a package that includes an image of the annotated data, the data itself, and metadata that defines the parts and their relationships so your software can decide how best to use it.

The endgame is what Microsoft has called the universal canvas. In the long run, that means migrating software to a common storage model. That won't happen any time soon, but there's a big near-term opportunity to leverage XML as an exchange format much more aggressively. I'd like to see that happen across the suite of Microsoft's clients by the time Office 11 ships. I'm not holding my breath, though.


Former URL: http://weblog.infoworld.com/udell/2002/07/13.html#a338