Inventing languages and monitoring queries

It's wonderful to hear from Tim Bray on a regular basis nowadays. Today he meditates on the pros and cons of inventing, rather than reusing, markup languages:

The cost of inventing a new language is lower than you might think, because it turns out to be fairly easy to transform XML to meet whatever your business needs are. On the other hand, it's higher than you might think, because language design is hard and easy to get wrong. [ ongoing]

The whole idea of language design is a bit odd, when you stop to think about it. We are of course linguistic animals, but we have no knowledge of the process that shaped our languages and not much awareness of their architecture. So I guess you could argue that language design is an unnatural act. I've committed a few such acts -- nothing fancy, but enough to have a sense of the tradeoffs Tim alludes to. I'd add that, even though transformation seems pretty easy, it also winds up being more costly than you'd think. More moving parts means more things that can break. So I've always leaned toward a unified approach -- for example, writing valid XHTML that doesn't need to be transformed in order to be rendered. Or using class attributes for both stylistic and semantic effect.

Tim can debate mind-numbingly esoteric stuff with the XML experts, but he never loses touch with the earthy reality of the Web. For example, he didn't need any XML to rip through his referrer log and check out the queries that brought people to his site. In six months, once he's accumulated more blog entries, those queries will become a much more accurate digest of his interests. Here, for example, were my queries for today: 1

jon 10
jon udell 7
udell 6
xslt tutorial 5
bookmarklet 4
library lookup 3
valdis krebs 3
space junk 3
udell flash mx 2
er design 2
infopath 2
choicemail 2
word 2
yukon filesystem 2
vonage business 2
flash mx listbox 2
xsl url link 2
phone google udell 2
pst outlook 2
delta airlines 2
utah cto blog 2
tacit knowledge 2
mozilla outlook 2
xforms infopath 2
talis 2
rich edit 2
mud wrestling 2
er 2
open source rbac 2
microcontent 2
jon 2
nntp rss 2
john udell 2
rss xslt 2
wi-fi primer 2
soft security 2
intellij regular expression search 1
zope external ruby 1
html pagename 1
picture of a radio 1
esp test 1
architecture manifesto 1
jon udell weblog 1
manufactured cultures 1
rss the economist 1
pst format 1
pki failure rate 1
true names vinge 1
choicemail one review 1
how pipelining works with http 1
outlook.pst 1
rss nt weblog 1
groove 1
tme grid 1
galactic structure 1
clifford pickover 1
what is integration 1
open locked pst files 1
bookmarklet udell 1
isbn lookup 1
putnam social capital 1
rss nasdaq scripting 1
wsdl afault .net 1
why wired news uses css-p 1
video blog 1
pst mozilla outlook 1
508 character limit 1
udell and web services 1
importing pst. files into outlook 1
where angels fear to tread 1
flash mx wysiwyg editor 1
why my typing moving up side 1
python collaboration web 1
reading books online 1
mindreef 1
radio picture 1
why use extreme programming 1
flash mx transparency 1
outlook pst linux 1
example of forward integration 1
rss 2.0 1
xmldom rss 1
microcontent browser 1
aircraft gps in degree decimal minutes 1
book sites 1
thread.jsp 1
where angels fear to tread 1
pics the hives 1
sherlock vs watson 1
lookup law book in library 1
priceless 1
rss aggregators windows 1
perl osascript 1
xcopy and networks 1
delta airlines security 1
allconsuming rss example 1
social networking software 1
service oriented architecture 1
who is the asp running as in iis? 1
jon udell blog 1
active paper 1
genus species 1
rss .0 1
xslt tutorial apply-templates 1
underwear outside 1
new york, travel, airports 1
build bookmarklet 1
rss yahoo finance 1
longhorn filesystem 1
tracking porn users
infopath j2ee 1
pst files 1
what is multiline 1
msiconfig for xp 1
transition flash component 1
notes is dead 1
jon on finance 1
manufactured serendipity 1
<xsl value-of tutorial 1
radio innovative rock 1
mac collaboration software 1
zope page templates 1
regina health library catalog 1
javascript rss security 1
design spaces watchmaker 1
what is lego made of 1
e.s.p. test 1
vonage ata device 1
delta airlines 1
userland license key 1
ipac and security and microsoft 1
reading books online 1
outlook pst format 1
true nyms and crypto anarchy 1
rss xsl xslt 1 with iis 1
library search jon udell 1
javascript setcookie 1
cooperate 1
literary forms 1
video blog 1
puzzell 1
paul graham python 1
reading books online 1
subatomic 1
consciousness 1 windows server internet apache home cassini 1
xdocs things 1
glue intellij 1
xsl template tutorial 1
exchange rss 1
video blog 1
office infopath 1

1 There's a ringer in the list: "tracking porn users." What's up with that? GoogleBox. It's easy to forget that it can put things on your page that you didn't mean to put there. Here's the GoogleBox that drew that referral.

Former URL: