Symbol grounding and extensible aggregators

Last week's items about RDF evoked lots of feedback. First, Patrick Logan:

This is beyond RSS, beyond RDF, even beyond XML.

This is known as The Symbol Grounding Problem.

Namespaces allow you to create unique (enough) symbols. There is no consistent way to interpret them. All XML-based standards should fully support namespaces. The minimum acceptable standards should support distinguished symbols from among various standards.

What those mixtures "mean" in any specific context will be, well, context dependent. All we can hope for from XML alone is an arrangement of symbols. Whoever tells us of the arrangement will also have to tell us how to interpret the arrangement. [Patrick Logan]

The bibliography in the article Patrick refers to is full of names familiar to me: Chomsky, Dennett, Fodor, Haugeland, Miller, Minsky, Newell, Penrose, Pylyshyn. I've read all these writers, and have had a decades-long fascination with the relationship between human and computer languages. Will we ever figure out a useful mapping between the two? I hope so, but I won't make a short-term bet either way. Therefore, it seems to me, we need a strategy that doesn't depend on the outcome of that bet. In that vein, I found this exchange on the Atom mailing list to be noteworthy:

One thing about the syntax that concerns me greatly is that there doesn't yet appear to be a consistent way of interpreting material from other namespaces. I believe this to be a make or break issue for interop. [Danny Ayers writing on the atom-syntax mailing list]

This problem has never been solved in the general case that I know of. So I really hope that you're wrong on its make-or-break-ness. Worth taking a whack at, but don't underestimate the difficulty. [Tim Bray writing on the atom-syntax mailing list]

Meanwhile, Bill de hÓra, responding to my question -- "Shouldn't we then say, there is no reason to create any mixed-namespace XML document that is not RDF?" -- writes:

Yes we should say that, but that would be saying the Emperor has no clothes. Does anyone want to hear it? [Bill de hÓra]

Absolutely. If there's a naked emperor's butt flapping in the breeze, I definitely do want to know about it. In my experience, though, XPath search and XSLT transformation are quite effective and Bill seems to agree:

Web services for the most part are predicated on XML namespaced vocabularies, as are any number of behind the firewall integration efforts. In those worlds, there's historically been zero agreement on uniform content models, which is precisely why transformation is such an effective technology for integrating systems. Get the data into XML and start pipelining. And though neither the declarative or the API/RPC school of integration may like the idea of chaining processes with XML, in my and my employer's experience, the results speak for themselves. In truth, XML Namespaces are incidental to a transformation architecture. [Bill de hÓra]

But Bill also believes that a uniform content model is the best strategy, and he defines it in an interesting way:

By the way, if you don't like all the semantic web stuff that RDF is associated with, here's another way of looking at it. Think of RDF as a CVM, a Content Virtual Machine, out of which any content can be described and by which content codecs can interoperate, by sharing a uniform view of the data. That's all there really is to RDF - an instruction set for content description. This is no more naive a view than Java's WORA. [Bill de hÓra]

I find this formulation very appealing in the abstract. I'm still not sure what it means concretely, though. To get a better picture of how the CVM works, I read Shelley Powers' very well-written new book, Practical RDF. I read it online, actually. Very cool to be able to do that. (Tank, I need a pilot program for a B-212 helicopter.) My eyelids fluttered for a while, and when I opened them again it was Chapter 10: Querying RDF: RDF as Data that emerged as pivotal. Let's look at an example:

SELECT ?value
WHERE (?x, <pstcn:presentation>, ?resource),
(?resource, <pstcn:requires>, ?resource2),
(?resource2, <pstcn:type>, "stylesheet"),
(?resource2, <rdf:value>, ?value)
USING pstcn FOR <http:\//burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http:\//www.w3.org/1999/02/22-rdf-syntax-ns#>
 
The result from running this query is:
 
http:\//burningbird.net/de.css

The backstory here is that a resource -- and the running example through the book is this article (monsters1.htm) -- has an RDF description (monsters1.rdf) based on an RDF vocabulary, called PostCon, whose development the book demonstrates. When you run monsters1.rdf through an RDF parser (e.g. http://www.w3.org/RDF/Validator) -- try it here -- you get a list of subject-predicate-object triples. For example, the 57th triple says:

subject	predicate	object
http:\//burningbird/articles/ monsters1.htm	http:\//burningbird.net/ postcon/elements/1.0/reason	"Collapsed into Burningbird"

The context (an intentionally loaded word!) of this triple is something like: "The subject URL is partly described by an RDF vocabulary, PostCon, which can be used to track the history of its 'movement' -- that is, from one Web address to another. Whenever such a move occurs, there is a reason given. This triple gives the reason for one such movement.

Armed with this model, and with an understanding of the PostCon vocabulary, whose domain elements are detailed in this section of the book, we can see how the query works its way through the triples to answer the question: "What CSS resource is required (in the PostCon sense) by the subject URL"?

This is cool. RDF triples are relations, and here we see that they're amenable to relational processing. I can grok that.

Now, back to this notion of the Content Virtual Machine. Commenting on my Plain Old Metadata proposal, which focused on the idea of putting job-board postings into RSS as structured payloads, Danny Ayers wrote:

Ok, so what happens when we need a vacation language? Right, build it all again from scratch, I'm sure those aggregator developers will welcome the opportunity to do virtually the same work all over again... [Danny Ayers]

This isn't just idle speculation. A very real situation looms for both RSS and Atom alike, as Ted Leung points out:

Jon Udell is writing about extending RSS 2.0, asking whether it should be done via namespaces or via RSS. Either way you do it, you've just entered the realm of extensible aggregators, because the jobs namespace is just the first of many that will come pouring through the gate once we open it. The question then becomes, how do you build an aggregator in such a way that we don't have download after download of new aggregator binaries, or aggregator extension/plugin binaries? [Ted Leung on the air]

Exactly. Now, what the RDF advocates appear to be saying is that if extensions show up as sets of RDF triples, then the problem is solved. An aggregator that can consume job-related triples already "knows what to do with" vacation-related triples.

I'm with Patrick Logan here: you can't finesse the symbol grounding problem so easily. When I write an RDF query involving job-related and vacation-related RDF triples, I'll need to know which predicates exist in these vocabularies, what they are documented to mean, and how to construe operations that combine them.

I do absolutely see value in a common processing model, and I like the RDF style of triple-oriented querying. But I also like XPath-oriented querying, and I especially like the emerging styles of XQuery for cross-document joins in pure XML space, and SQL/XML for joins across relational and XML spaces.

If the RDF folks have really solved the symbol grounding problem, I'm all ears. I'll never turn down a free lunch! If the claim is, more modestly, that RDF gives us a common processing model for content -- a Content Virtual Machine -- then I will assert a counter-claim. XML is a kind of Content Virtual Machine too, and XPath, XQuery, and SQL/XML are examples of unifying processing models. As we move into the realm of extensible aggregators we'll face the same old issues of platform support and code mobility. Nothing new there. However, as XQuery and SQL/XML move into the mainstream -- as is rapidly occurring -- aggregator developers are going to find themselves in possession of new data-management tools that can combine and query structured payloads. Those tools will not, because they cannot, know a priori what those payloads mean. But they'll provide leverage, and will simplify otherwise more complex chores. I can't see the endgame, but for me this is enough to justify doing the experiment.

Former URL: http://weblog.infoworld.com/udell/2003/08/11.html#a775