An RSS/RDF epiphany

Some fascinating conversations have been weaving their way through blogspace and email in the last few days. As a result, I think I've reached a new understanding of the seemingly endless debate about whether and how to use RDF (Resource Description Framework) and RSS together. I mentioned Dan Brickley's comments the other day. He expands on his remarks over on Shelley Powers' blog:

For me, this is all about data mixing, and it's the only real way I know how to do it in XML. I'm just used to it, maybe. With RDF I can take a couple of RDF documents and merge them by adding the triples together. I just don't know how to do that with non-RDF XML.

RDF's syntax is hard to learn, and the underlying triples model isn't obvious to those without a built-in mental RDF parser. But there's also truth in the concern that we don't really know how to freely inter-mix independently defined XML namespaces when those namespaces are defined with XML rather than RDF schema languages... So imho spending time trying to have best of both worlds isn't a waste. [Commenting in Shelley Powers' Practical RDF]

Arguably I should get a life :-), but for me this remark was an epiphany. I've long suspected that we won't really understand what it means to mix XML namespaces until we do some large-scale experimentation. What I hadn't fully appreciated, until just now, is the deep connection between RDF and namespace-mixing. Dan's original hard-line position, he now explains, was that there is no sane way to mix namespaces without some higher-order model, and that RDF is that model. That he is now modulating that position, and saying that none of us yet knows whether or not that is true, strikes me as both intellectually honest and potentially a logjam-breaker.

Meanwhile, in email, Stefano Mazzocchi made this striking comment (which I hope he won't mind being quoted here):

The mental model that XML promotes is basically a tree of couples.

The mental model that RDF promotes is basically a collection of triples.

Sounds familiar doesn't it? The Hierarchical vs. Relational war over again 30 years later?

Indeed, it does. Stefano's formulation suggests to me that the troubled relationship between RSS and RDF may have been a red herring all along. Either we do or don't need some higher-order model to manage mixed namespaces sanely. Nobody knows yet. That the question arose in the context of RSS may simply have been an unfortunate historical accident -- RSS happened to be a likely candidate for the necessary large-scale experimentation, and got caught in the crossfire.

Atom is headed into the same field of fire, but if I'm right in my analysis, this isn't about syndication at all. It's about the general question of using XML namespaces. And yet, again and again, RSS gets entangled in the discussion. Today for example, Patrick Phelan referred me to this article in which Danny Ayers writes:

There's no consistent means of interpreting material from other namespaces that may appear in an RSS 2.0 document.

To which I responded:

Shouldn't we then substitute XML for RSS 2.0 in that sentence, and say there is no consistent way to interpret material from other namespaces in any XML document, period?

Shouldn't we then say, there is no reason to create any mixed-namespace XML document that is not RDF?

This is the conclusion which it seems Dan Brickley is, recently, trying to avoid. I'm glad to see him raising the issue. This has been pigeonholed as an RSS thing for too long, it's really much larger I think.

Given this analysis, Dave Winer's comment, over on Shelley's blog, also merits deep consideration:

Jon, I'd add this -- the working from both ends towards the middle should take place away from ongoing commercial development. It would be like experimenting with space travel on the construction site for the Golden Gate Bridge. The purpose of the bridge might be confusing to the motorists.

An excellent point. Over on his blog, commenting on my Plain Old Metadata proposal, Danny Ayers writes:

He [me] also talks of "plain old metadata" - ok, how are we going to present this - in a random, inconsistent HTML tag soup kind of a fashion? Or shall we try and do it in a way that tries to maximise the potential utility of the data? [Danny's Raw Blog]

Well, I'm with Dan Brickley on this:

Of these three paths for job-data-in-RSS, 'entity escape it and stuff it in the description', 'use non-RDF namespace extensions', and 'use RDF namespace extensions', to my mind only one of them stands out as clearly the worst way forward. [Commenting in Shelley Powers' Practical RDF]

What we have now is 'entity escape and stuff in description' and I doubt anyone will argue that's good. Like Dan, I don't know which of the other two options is best. I do know, however, that I can easily shred an XML document with XPath and XSLT, pick out subsets -- whether or not they're namespace-qualified -- and do useful things with them. I don't believe that doing that, without first settling on a higher-order semantic model, is a bad idea. Far from it. It's abundantly clear to me that we've wasted years, that we must do that experiment ASAP, and that it will yield new killer applications. No agreement on the higher-order model need be reached as a precondition. If some higher-order model is going to ultimately prevail, then a lot of existing data will have to get converted into it. Would you rather convert 'entity-escaped-and-stuffed-in-the-description' data, which is all we have now, or XML data that you can at least shred and manipulate? That choice seems transparently clear to me.

Finally, a plea to all concerned. Let's stop punishing RSS syndication for its success by asking it to carry the whole burden of XML usage in the semantic Web.

Former URL: