My recent mangling of Diego Doval's name in a print column was a harsh reminder that I neglect one tradition of print journalism at my peril. That tradition is a fact-checking mechanism called CQ. The idea is that an author, when writing the name of a person, company, or product, should CQ it to indicate that the spelling has been double-checked. (The acronym "CQ" is itself unCQ-able, since nobody owns the term or seems to know what it stands for.) Of course a copy editor shouldn't automatically trust an author's CQ. But it's one layer of a defense-in-depth strategy.
Here's a real-life example. In next week's column I mention a certain Microsoft initiative, now mothballed. I wasn't sure about "Hailstorm" versus "HailStorm", but found some examples (via Google) that convinced me to go with the former. Having double-checked in this way, I should have written "Hailstorm [CQ]," but didn't. My editor, who had a clear memory of the Hailstorm spelling, did CQ it that way. But in fact, the correct spelling -- thankfully caught at the last minute by an eagle-eyed copy editor -- appears to be "HailStorm."
HailStorm was originally described in a Microsoft whitepaper, now 404. The original press release, still online, uses both spellings. If you search Google or even Microsoft.com, you'll also find examples of both spellings. About the best that can be said, as my editor pointed out, is that spellings with the cap S are more frequent.
I've long been fascinated with the way in which Google can perpetuate misspellings. Compare, for example, the count of results for embarrass (count: 401,000) and embarass (count: 41,400). Obviously you shouldn't use Google as a dictionary, you should instead go here or here. But I'll bet a lot of people do look up "embarass" on Google, find evidence to support their misspellings, and thus perpetuate them. I've even wondered if there's a feedback loop here that will increase the ratio of incorrect to correct spellings over time.
Although you shouldn't use Google as a dictionary, note the difference between looking up the wrong and right spellings there:
|embarass||Results 1 - 100 of about 41,400 for embarass.|
|embarrass||Results 1 - 100 of about 401,000 for embarrass[definition]|
In the latter case, Google refers you to an authoritative source -- in this case, dictionary.com. Of course, CQ-able facts usually can't be found in a dictionary. The authority that governs them is the person who owns the name in question, or the company that owns the name or product. At least, that's how it ought to be. But look at what really happens:
|infoworld "john udell"||Results 1 - 100 of about 7,740|
|infoworld "jon udell||Results 1 - 100 of about 17,900|
I own the spelling of my name. InfoWorld, as my employer, has some ownership interest in that fact too. Microsoft, even though it has 404'd the HailStorm whitepaper, still owns that piece of its institutional history. Shouldn't these responsible parties control such facts about themselves?
HailStorm, of course, was based on a mechanism for publishing machine-readable facts. There are other ways to skin the cat. FOAF, for example, is a way for individuals to assert facts about themselves. Currently Google sees 14,700 foaf.rdf files and 416 foaf.xml files -- not including mine, which I just added today. I resisted FOAF until now because I've worried about asserting things which can't be asserted, such as relationships. But the core concept of FOAF, as captured in the tagline "a Web of machine-readable homepages," is indisputably valid.
If you removed FOAF's "friend-of-a-friend" branding, the concept might make more sense to organizations. For example, the homepage of infoworld.com or microsoft.com might contain:
<link rel="dictionary" type="tbd" href="dictionary.xml">
The dictionary.xml file would assert public facts: names of employees, organizational units, products. These would reflect internal records. How would an organization mark facts in its internal databases as being both correct and releasable? In my mind's eye, I see a Web form. On the form there is a button. And the button says:
Former URL: http://weblog.infoworld.com/udell/2004/07/13.html#a1038