A parable about data provenance

Earlier this week Lorcan Dempsey, who is VP and chief strategist with the Online Computer Library Center (OCLC), blogged about an enhancement to an OCLC service that searches the Library of Congress Name Authority File. The new version uses fuzzy matching, which means that the common misspelling of my name as 'John' will find me as well as my alter ego Jon G. Udell.

This reminded me that for years, in various online venues, I've seen my book, Practical Internet Groupware, attributed to Jon G. Udell, author of The economics of the American newspaper. It turns out that's because the authoritative record at the Library of Congress has had it wrong all this time. Lorcan kindly referred the matter to an OCLC colleague who made the correction and reported it to the LC. So at some point my book as seen in WorldCat will be correctly attributed, and eventually that change should propagate to the libraries that subscribe to WorldCat.

How in general can authors resolve such problems? The OCLC advises:

We get lots of comments from authors via the Comments button on FirstSearch record displays as well as through the general oclc@oclc.org email address. In addition, the general Contacts page on the OCLC web site contains links to forms that can be used to request changes to bibliographic and authority records.

The Library of Congress gets similar comments via a feature on the record displays in their online catalog that allows users to submit an Error Report Form. It's kind of hidden at the very bottom of the display.

One caution, since catalogers work from title pages and other information in the material being cataloged, we often have to ask for proof before making a change. Proof may be a faxed copy of the title-page or its verso, etc.

We are often the best authorities for information about ourselves, and we often encounter errors that we could easily fix. Why don't we? Because the connection between the authoritative source of a fact and its erroneous manifestation is rarely explicit.

Given that the infosphere is becoming a web of syndicated facts, we'll want ato make those connections explicit. As a best practice, data provenance should be accessible at the point of display and use.


Former URL: http://weblog.infoworld.com/udell/2006/08/10.html#a1503