Inventing languages and monitoring queries

It's wonderful to hear from Tim Bray on a regular basis nowadays. Today he meditates on the pros and cons of inventing, rather than reusing, markup languages:

The cost of inventing a new language is lower than you might think, because it turns out to be fairly easy to transform XML to meet whatever your business needs are. On the other hand, it's higher than you might think, because language design is hard and easy to get wrong. [ ongoing]

The whole idea of language design is a bit odd, when you stop to think about it. We are of course linguistic animals, but we have no knowledge of the process that shaped our languages and not much awareness of their architecture. So I guess you could argue that language design is an unnatural act. I've committed a few such acts -- nothing fancy, but enough to have a sense of the tradeoffs Tim alludes to. I'd add that, even though transformation seems pretty easy, it also winds up being more costly than you'd think. More moving parts means more things that can break. So I've always leaned toward a unified approach -- for example, writing valid XHTML that doesn't need to be transformed in order to be rendered. Or using class attributes for both stylistic and semantic effect.

Tim can debate mind-numbingly esoteric stuff with the XML experts, but he never loses touch with the earthy reality of the Web. For example, he didn't need any XML to rip through his referrer log and check out the queries that brought people to his site. In six months, once he's accumulated more blog entries, those queries will become a much more accurate digest of his interests. Here, for example, were my queries for today: 1

1 There's a ringer in the list: "tracking porn users." What's up with that? GoogleBox. It's easy to forget that it can put things on your page that you didn't mean to put there. Here's the GoogleBox that drew that referral.

