Open document formats, revisited

The XML-enhanced version of Office has been on the scene for two years now. When a Microsoft spokeperson contacted me recently about a status update, my response was: "Hmm. Now that you mention it, where are the third-party XML-enhanced Office apps?" His reply: "Let me get back to you on that." While the rolodex is spinning, it occurred to me to put out a public invitation here. Have you done a groundbreaking application of Word 2003 or Excel 2003? Something that not only benefits from the openness of XML, but that leverages custom schemas? If so, drop me a line. Meanwhile, I'm equally interested to learn more about the XForms support in the next version of OpenOffice. Micah Dubinko, who is an editor of the XForms spec, says on his blog that Sun's demo of this capability at XML 2004 was a highlight of the show.

This subject is timely because I'll be in Boston today at the Gilbane Conference. One of the topics my keynote panel will discuss is the effort by the European Commission's IDA (Interchange of Data Between Administrations) to promote open document exchange. As I've mentioned here before, the IDA's Valoris report, published in June, highlighted the Microsoft and Sun/OASIS XML formats as the two best options for open document exchange.

It's interesting to note the formats used by the IDA to publish its own report and recommendations, as well as the responses from vendors. HTML is the glue that holds this collection of documents together, and PDF is the format for individual documents -- including, weirdly, a Microsoft response which appears as a PDF image of a crookedly-scanned fax.

It doesn't get any more real-world than this crazy quilt of technologies. Interoperable XML formats are clearly part of the answer. But formats alone won't transport monolithic "office suites" and "desktop productivity software" into the 21st century. I use such software rarely, for a tiny fraction of the innumerable documents I write, read, and edit. Dismiss me as an emacs-addicted geek, if you must, but for many (maybe most) regular folks it's not too different. Their email clients and browsers deliver less format fidelity than their word processors, but see more action. Why? Because email and the web contextualize content. The documents belonging to an email thread, or to a blog conversation, form a kind of a network whose value rises with its interconnectedness.

Musing on the European Commission's findings, Tim Bray wondered if maybe XHTML had gotten short shrift:

They considered, and rejected, XHTML as a standard office document format. I think that it can do most things you need in a modern office document and has remarkably few real drawbacks. [ongoing]
I'm not ready to go along with the other conclusion he reaches in that posting -- that custom schemas are a red herring. But I agree that XHTML is more valuable than most people think. For the vast majority of useful documents, it can have as much structure as we need, and for the rest it can be extended internally with namespaced inclusions. But the real power arises from its hypertextual nature. For me, increasingly, there is no office, and there is no desktop, there is only a network of linked documents. A successful open document format will have to be supremely well-adapted to that environment, as XHTML is.

Former URL: