13th century standards

Traveling in France in 2001, I visited Chartres Cathedral and was lucky enough to show up in time for Malcolm Miller's lecture. Seemingly unchanged from the last time I'd seen him, in 1978, Miller again made the architecture and stained glass come alive in his inimitable way. This time, though, I heard something I hadn't the first time -- about standards. When the construction project drew in artisans from the 13th-century French countryside, the first order of business was to agree on standard weights and measures. I wonder what those negotiations were like!

It all seemed kind of quaint until, a couple of days later, I found myself in an Internet cafe struggling with a French keyboard. The @ symbol was the showstopper. I finally abandoned typing and, feeling ridiculous, copied the symbol from a web page and pasted it into the email message I was composing.

What reminded me of all this was the title of last Thursday's entry: "Active résumés." To be honest, I took the lazy route at first and wrote it as "Active resumes" because I knew that using a LATIN SMALL LETTER E WITH ACUTE would likely cause some problems. But then, mindful of Sam Ruby's recent admonition to test international characters "in every nook and cranny you can find," I went with the correct spelling.

Since I write in XML, my input strategy was to use numeric references, which meant typing this string of characters: "résumés" -- and that's exactly what showed up on the InfoWorld home page when the item was excerpted there. Evidently the process that creates those excerpts is reading, but not parsing, RSS feeds.

The item itself displayed correctly, but other subtleties emerged. For example, Technorati and Feedster produce hits when searching for the wrong spelling (T, F) but not when searching for the right one (T, F). (Update: Hmm. Technorati does find active résumé, though. So does Google, but it finds a lot more instances of active resume.)

I discovered that my own XPath search does find the entry, though entering the search term presents a bit of a challenge. Copying an instance of 'résumé' into the search form works, as does the extra-geeky method of writing the URL-encoded version ('r%C3%A9sum%C3%A9s') directly into the URL. But the resulting display was wrong, until I switched the browser's text encoding to UTF-8. I guess I should have my search server emit the appropriate UTF-8 header.

Sam's essay points to a Joel Spolsky article that is the single most lucid treatise I've seen on the subject of internationalization. We've come a long way with Unicode, but there's still some distance to go. Chartres Cathedral still stands, so apparently those 13th-century carpenters and stonemasons got things sorted out reasonably well. I trust we will too.


Former URL: http://weblog.infoworld.com/udell/2004/04/26.html#a982