Dueling simplicities

Most of my writing goes straight to the web, but my column still takes a detour through the print magazine. Usually that's no problem. I'm not a news hound. And while I'm comfortable editing myself, I enjoy the thoughtful feedback I get from Neil McAllister who, in addition to editing my column, writes his own. Every now and then, though, I wish I could have bypassed the print loop. Case in point: next week's column on two-way RSS. I wrote it last week; it will appear on InfoWorld.com tomorrow; magazine subscribers will get it after the holiday. In that column I discuss how both Microsoft and Google plan to use XML syndication for two-way data exchange and, more broadly, to bring database-like capabilities to the web of linked documents that we are all collectively building. If I'd blogged it last week, I'd look really prescient now.

Microsoft first signaled its intentions in my September interview with Bill Gates, who mentioned "some ideas internally...about making RSS work two-way." But no details were forthcoming.

Then Google telegraphed its punch. Adam Bosworth, writing in ACM Queue, elaborated on the database-of-the-future vision he's been evolving for a couple of years now. We can obviously read RSS and Atom feeds, and query them in simple ways, but how do we update and delete? Here's how:

Atom contains a simple HTTP-based way to INSERT, DELETE, and REPLACE <entry>s within a <feed>. [ACM Queue: Learning from the Web]

Note that Google Base doesn't work this way, at least not yet. For now, you bulk-load RSS or Atom items, and are expected to repeat the process "each time information in your bulk upload changes or becomes inaccurate." But Google seems to regard the the Atom publishing protocol as a key strategy for managing sets of XML fragments which, in both RSS and Atom contexts, we can simply call items.

Yesterday, Microsoft spelled out its vision of two-way RSS: an extension to RSS (and OPML) for synchronizing flat or nested lists of items. "What we really longed for," wrote Ray Ozzie on his newly-revived blog, "was 'the RSS of synchronization' ... something simple that would catch on very quickly."

Now that more of the cards are on the table, we can begin to compare two fascinatingly different approaches to building out the data web. Before we examine how they work, let's consider what kinds of data they manage and what problems they aim to solve. Google Base extends RSS and Atom with the following predefined "information types": Course schedules, Events, Jobs, Housing, News and articles, Wanted ads, Products, Vehicles, Personals, Research studies and Publications, Reviews, Services, Travel, and Business listings. It's open-ended, though. I defined a type called Bicycle Route, corresponding to del.icio.us/judell/bicycleroute, and inserted this instance into Google Base.

As with items posted to blogs or bookmarked in del.icio.us, every Google Base item is controlled by an individual account. Collaboration happens in the metadata layer where -- at least in theory -- Google disintermediates eBay and craigslist by empowering buyers and sellers to extend the core schemas in an exploratory ways, secure in the knowledge that brute-force search will find whatever falls through the cracks.

With Microsoft's Simple Sharing Extensions (SSE), in contrast, collaboration involves primary data that may be jointly owned. For many people, calendar sharing is the poster-child example. Two years ago on his old blog¹ Ray Ozzie wrote:

Each fall, as I manually enter the entire Celtics season schedule, my company's holidays and my childrens' school calendars into my own personal calendar, I am again reminded how ridiculous it is that The Net has not yet ubiquitously embraced the everyday exchange of virtual objects so basic as calendars and as vCards - which can also likewise be subscribed-to, aggregated into Contact Lists and auto-updated via personal RSS feeds. Bizarre. [www.ozzie.net/blog]

You couldn't pick a better Microsoft CTO to own this problem. Who else would tackle it using Creative Commons-licensed extensions to a grassroots XML standard?

So we have both Google and Microsoft flying the banner of simplicity -- a word² that can mean different things in different contexts. For Google, it means mapping the core database primitives (CRUD: Create, Retrieve, Update, Delete) onto the core HTTP verbs -- including the underutilized PUT and DELETE. As Tim Bray noted the other day:

6. If you want to update a post, you HTTP-PUT it in Atom format straight to its URI.

7. If you want to get rid of a post, you HTTP-DELETE it using its URI.

8. There is no Step 8. That's all there is to it.

This is pure unadulterated goodness.

For Microsoft, SSE entails a different kind of goodness:

SSE defines the minimum extensions necessary to enable loosely cooperating applications to use RSS as the basis for item sharing -- that is, the bidirectional, asynchronous replication of new and changed items among two or more cross-subscribed feeds. [XML Developer Center: SSE FAQ]

And a different kind of simplicity:

RSS is compelling because of the power inherent in its simplicity.
...
This got me to thinking about simplicity. Notes had just about the simplest possible replication mechanism imaginable.
...
Notefiles replicate by using a very simple mechanism based on GUID assignment, with clocks and tie-breakers to detect and deterministically propagate modifications.
...
It's designed in such a way that the minimum implementation is incredibly easy, and so that higher-level capabilities such as conflict handling can be implemented in those applications that want to do such things. [Ray Ozzie: Really simple sharing]

Of course things that look simple to the guy who created Notes and Groove present a bit more of a challenge to most of us. A whole bunch of folks can wrap their heads around the idea of invoking HTTP PUT on an XML fragment. Relatively few will ever implement the change processing algorithms spelled out in the SSE spec. Synchronization solves a harder problem than CRUD does, and it's necessarily more complex.

We're not comparing apples to apples here, though, and if it's apples and oranges then I want both. At first glance, Google's take on the RSS data web seems both simpler and more generally applicable than Microsoft's. Ah, but the plot thickens. Adam Bosworth probably regrets having shown me the Alchemy prototype in the summer of 2004. I'm like a dog with that bone, constantly gnawing. Alchemy was about pushing data intelligence into the browser. Its core components were a lightweight XML datastore and...wait for it...a simple synchronization engine.

This is going to be fun.

¹ It's too bad that all of my links to Ray Ozzie's pre-November-2005 writing now redirect blindly to the homepage of spaces.msn.com/members/rayozzie. If I hadn't cached that quote it would be lost, and a lot of other important stuff is. Update: Ray says that if you replace www.ozzie.net with spaces.msn.com/editorial/rayozzie/old -- like so -- any of the older URLs will resolve.

² This immortal sound clip is also featured in Robert Lefkowitz's The semasiology of open source (part 2), one of the most entertaining geek lectures that you are ever likely to hear.

Former URL: http://weblog.infoworld.com/udell/2005/11/22.html#a1343