Canonical URLs and network effects

After retracing his steps in order to correctly credit a link he had recently cited, Darren Barefoot wondered whether it had been worth the trouble:

Generally, I just choose the site closest to the source, and credit them. That probably doesn't make sense, as I should be crediting the source where I found them. Or is it important to show the entire 'chain of evidence'? Ultimately, who really cares? [Darren Barefoot: The (boring) problem of attribution]

I think that it is worth the trouble, and that publishing platforms and blogging tools ought to conspire to help automate the tedious chore. The reason usually given is that the original source deserves credit, and that it's unfair to redirect that credit. That's true, but there's a deep systemic principle at work here too. Canonical URLs create powerful network effects that we dilute at our peril.

Consider this set of Bloglines-assembled conversations about the Reuters article that was the original link in the chain of evidence: 1, 2, 3, 4. In such cases the conversation based on the original item is typically the most complete. Conversations based on derived items usually capture only a subset of the original conversation, while perhaps also adding new items that can't easily be connected to the original conversation.

Here's another example. The InfoWorld review orginally published at this URL now also appears in the new product guide at this URL. When I mentioned this to Chad Dickerson, he pointed out that even if InfoWorld.com were to enforce a canonical-URL policy internally, our stuff is syndicated out to places we don't control. So for example, the InfoWorld column at this URL also shows up as the Computerworld story at this URL.

When a piece of content is syndicated into a new context, there's no reason why that new context shouldn't have its own canonical URL -- particularly if it adds value in the form of direct or indirect commentary. But I'd also argue there's no reason to sever the connection to the original context, and a strong reason not to. An article in this month's Wired called The Long Tail (which, ironically, I can't cite because its URL isn't yet known) helps explain why. The article describes how a 1988 book, Touching the Void, rose from the remainder pile a decade later when Into Thin Air became a hit.

What happened? In short, Amazon.com recommendations. The online bookseller's software noted patterns in buying behavior and suggested that readers who liked Into Thin Air would also like Touching the Void. [Wired]

The tagline of the Wired story nails it: "Forget squeezing millions from a few megahits at the top of the charts," it says. "The future of entertainment is in the millions of niche markets at the shallow end of the bitstream."

If you buy that argument, then try combining it with this one: nobody should own the conversations about these "long-tail" products. And sooner or later, nobody will be able to. In Next-generation infoware, for example, I suggested that Amazon's ownership of book reviews is obselete. Increasingly, the authors of the reviews will own their words, on their own blogs. Those words will be syndicated out to Amazon and to others, who will all compete to add value to them in ways that facilitate sales.

One way or another, conversations that refer to the same things will be aggregated. We can fight that by fragmenting the conversations and diluting their network effects. Or we can go with the flow.

Former URL: http://weblog.infoworld.com/udell/2004/09/27.html#a1083