Feeling Chad's pain

In his column this week, Chad Dickerson fesses up to the dirty secret of XML content management. The blurb reads: "XML isn't a panacea, especially if the semantic integrity of data hasn't been maintained properly." You can say that again. Chad, I feel your pain! I've been involved in electronic publishing of one form or another for almost 20 years. Nobody ever admits how hard it is to push the boulder up the hill of entropy, and how hard it is to keep it from rolling back down the other side.

Five years ago I wrote my book using what would soon be called XHTML. I had a DTD, and scripts to validate the stuff and transform it into various deliverables. If I had to do the same today, I'd probably do it roughly the same way: edit in emacs, transform using an expat-bound scripting language -- then Perl, now more likely Python. The better tools that were right around the corner are still...right around the corner.

To be sure, there are richer XML script-language bindings nowadays. Next time I'm in need of such a thing, I'll give Python's minidom a try. It's a lightweight DOM that can be used in elegant ways, as Mark Pilgrim has shown. There have been rumors of an X# from Microsoft, and Adam Bosworth continues to evangelize intrinsic programming-language support for XML.

These are ideas whose time has come. And not a moment too soon. Chad writes:

At InfoWorld, we started our data migration project with high hopes, approaching our mother lode of XML data with the tools that any self-respecting 21 st century developer would use: Java and XSL. It was all in XML -- how could we lose? In the end, we shuffled away from the XML scrap heap with heavy hearts and a mountain of one-off Perl scripts that got the data migration job done. We prevailed, but ultimately it was what you hear some football coaches call winning ugly.

You're being too hard on yourself, Chad. You didn't have the luxury of single-handed control of the archive. People had to evolve it, and there weren't -- and aren't yet -- tools sufficiently deployable and usable to enable the necessary delegation of control. That's exactly why I'm jazzed about Office 11's XML support, and the forthcoming XDocs. But these won't be slam-dunks either. It's going to take a really long time for all this stuff to get cooked.

And you know what? Even then, migrations will require elbow grease. As a longtime content wrangler, I'm guilty of assuming that any structure is mappable to any other structure. And that's true. But it's never as trivial as we like to imagine. Transformation is work. We'll always need to do some of it programmatically. The good news is that XSLT, which I've made my peace with and learned to use somewhat productively, was only a first draft of the kind of program-language bindings that are in the pipeline. Things will improve. There will always be times when we need to "win ugly." Over time, though, it'll get less ugly.


Former URL: http://weblog.infoworld.com/udell/2003/02/11.html#a603