Secrets of the XML gods

text, code, data

It's the season for confession. First Tim Bray reveals a dirty secret: "a lot of input data these days is XML...in most cases, I use the perl regexp engine to read and process it." Then Sean McGrath fesses up to his Python habit: "I know I should be invoking a WF [well-formed] parser on the content.xml string but gee Ma, I've got work to do."

Text is code, code is data, data is text. Around and around we go. If the XML gods are resorting to Perl and Python hackery to shred documents, are we just spinning our wheels? I don't think so. But this is, perhaps, an unusual case. Normally, as we climb the ladder of abstraction, we are happy to lose sight of the rungs below. I cannot usefully manipulate the blocks and sectors of my disk, or the assembly code my software compiles down to. I can, however, make excellent use of the text stream underlying XML abstractions. So, which way to regard a document becomes a kind of Necker cube puzzle. The bad news: it's confusing. The good news: it's useful.

Former URL: http://weblog.infoworld.com/udell/2003/03/18.html#a642