Tangled in the Threads

Jon Udell, October 4, 2000

From messaging to syndication

Why send messages? Send message metadata instead.

Email is an anachronism. We don't always need to send each other all the data. We only need to send notifications that refer to data.

Last week, Matt O'Donnell kicked off a couple of long threads with this question:

Are there systems that combine both a newsgroup (with web-forum) and a mailing list, so that users can read and write in all three mediums?

Matt's question goes to the heart of an issue that has long fascinated me. Today's Net is a hodge-podge of communications applications and protocols. What I consider to be the standard Internet client is, in fact, a suite of applications: the browser, the mailer, the newsreader. Each uses its own protocol, and its own kind of data store. Well, OK, mail and news are very close, but they're different enough to matter when you're building services that integrate both. The bottom line is that while it's possible to integrate these things, it's never easy, and the results are never entirely satisfactory.

The short answer to the question is that yes, there are systems that combine mail, news, and the web, and enable users to read and write in all three media. One such system is Macrobyte Resources' Conversant, which you can try online at the Free-Conversant site. At first glance, Conversant may remind you of UserLand's Manila -- a web-based content management system that's useful for calendar-driven weblogging. And indeed, the similarities run deeper. Like Manila, Conversant is built on the Frontier scripting platform. Like Manila, Conversant has an object database at its core, and in that database, every piece of content is first and foremost a message.

What I love about this approach is the inherent duality of every document. A page on a Conversant (or Manila) site is always, potentially, the starting point for a threaded discussion. In Conversant, such a discussion is tri-modal: it can happen on the web, in email, or -- a feature near and dear to my heart -- in your newsreader.

Other groupware systems are similarly multi-modal, for example WebCrossing. I applaud this multi-modal approach, and over the years I've built a few such systems myself. But I have to admit the whole idea troubles me somewhat. On the one hand, it's a good thing that the web, email, and news are different media with different strengths. The web's strengths are rich content and (almost) no local state. Email's best for offline use, and for most people it's the natural way to communicate. News handles the threading that email never got quite right. But there are two dilemmas here. For implementors of multi-modal systems, it's a major challenge to support three very different interfaces and protocols. For users of such systems, the problem is that no single interface unites all the best features.

Defending diversity

The diversity exists for a reason. These applications came from different places, in response to different kinds of requirements, as Bjørn Borud points out:

The reason for diversity among protocols is because it is hard to satisfy many demands at the same time.

It is far easier to design, implement and deploy a protocol that is specific to a single problem domain than it is to do the same with a more general protocol -- and do it well. The more general a protocol is, the more complex it will be and the more effort will go into providing implementations. If you have a simple protocol that is easier to implement more people will use it.

I'll give you one example: the Internet. The Internet is _only_ possible because its core set of protocols consists of specific yet simple protocols. Things like SMTP, DNS, NNTP, FTP, HTTP.

Most "web-reinventions" are usually bad client applications with a remote interface. What surprises me is the willingness of the user to accept a considerable decrease in speed, reliability and functionality merely for a single aspect: potential mobility.

Defending unity

But many of us do, nonetheless, long for unification -- especially when it comes to the many different ways that we communicate. It's instructive to consider Lotus Notes from this perspective. Whatever you think of Notes (and I happen to admire it, despite my preference for native Internet tools), the model was very powerful: a single protocol, a single data format. Its equivalents of a web page, a discussion message, and an email are all the same kind of thing: records in a Notes database, which propagats (when it needs to) in a standard way, and is usable online or offline.

It's not a simple or straightforward proposition to say "let's just mush all this Internet stuff into a common protocol and data format." It may not be the right idea at all. But it's hard not to want the resulting unity. And it looks as though an XML-over-HTTP call/response protocol such as SOAP, coupled with XML-oriented logical (and maybe, where appropriate, physical) data representation, may get us there.

Mark Wilcox:

You do realize that Microsoft is heading towards this, using a combination of WebDAV and SOAP?

Yes. And Microsoft is busily at work on another key piece of the puzzle: .NET. I've talked a lot lately about the sudden interest in peer-to-peer (P2P) networking. What the .NET runtime aims to deliver, onto every PC, is technology that will among other things peer-enable the client. It will help to blur distinction between local services and data, and remote services and data. To see why this matters, consider Derek Robinson's take on a universal messaging interface:

I've been musing about collaborative cumulative web pages -- sortable by date, thread or sender, either on the client or server-side -- with in-situ editing embedded directly in the HTML (as it is at Standard Brains). Visitors could simply start writing in a special DIV which has been made editable by attaching the appropriate event handlers, with the rest of the page off-limits to editing by visitors.

The sort of web page I'm imagining would operate somewhere between a home page, web-logging, mail, and discussion lists/newsgroups. There could be a 'log-on' with passwords (done automatically by cookies) that would tailor the page content and behavior, as displayed in visitors' browsers, to only those things they are registered for: e.g., people belonging to one group may see the entire page, family members could see and leave private notes for one another, while people subscribed to different discussion groups would get only the messages of interest to them. RSS syndication could automatically interleave content from many such personal 'omnibus' pages, asynchronously updated as individual pages within the 'web-ring' are changed.

Thus, your home page would be a personal portal, aggregating material from other such 'personal portals' in your 'personal internet' of friends, family, colleagues. For example the function now served by email could be served by 'reciprocal syndication' with correspondents -- to send an email message to someone, you'd compose the (rich content!) message in your home-page, hit 'send', whereupon a URL would be sent alerting them that there is a new message waiting for them (for their eyes only!) on your home page.

Dumb replication and smart replication

This sounds great. As described here, though, it's a web-server-centric solution, and that would be a problem for lots of people. The wireless revolution notwithstanding, I don't think we can put the disconnected client onto the endangered species list anytime soon. Data replication to and from intermittently-connected devices is one of main reasons we depend on email. So, as Derek elsewhere acknowledges, that personal home page ought to be mobile. P2P infrastructure ought to blur the distinction between the page here on my PC, and the page out there on the server.

It's crucial to note the distinction between the kind of data replication that email does, and the kind I'm talking about here. Derek's proposal raises a profound question: "Why does email need to travel through the network?" Why don't we just transmit metadata (e.g., message headers) alerting one another to the availability and nature of corresponding data (messages)? The web, of course, works just this way. The Usenet doesn't, but arguably should. Its architecture, if you stop to think about it, is an anachronism. Circa 1985 there were relatively few full-time Internet nodes. A store-and-forward technology, UUCP, enabled intermittently-connected nodes to access the Internet. The first incarnation of the Usenet was therefore, of necessity, a discussion system based on data replication. By the time the web emerged, it was no longer necessary to rely on replication as the way to move data around. The world was sufficiently interconnected so that metadata (hyperlinks) could refer to canonical (singular, non-replicated) data. Well, to be fair, caching servers that mirror parts of the web are part of the story too, and always have been. But here's the key point: The web may opt to replicate data, to maximize convenience of access, but is not required to replicate data in order for people to have use of it. I've argued elsewhere that the Usenet ought to catch up with the web in this regard. The Usenet is drastically over-replicated. It ought to be refactored. Instead of many copies of shallow wells of information, it ought to reformulate itself as fewer copies of deeper and richer wells.

Likewise email. It no longer makes sense, in many cases, to transport actual message bodies. Derek puts it very nicely: "The function now served by email could be served by 'reciprocal syndication' with correspondents." What we really need to exchange, in many cases, is only the message metadata. Like RSS headline syndication, this syndication of message metadata would preserve context. In the case of RSS, a headline refers to a document on a website, and that document lives in a context. It's surrounded by similar documents, by navigation and search tools that are (one hopes) optimized for these documents. Email sorely lacks this context-preserving property. Bits of correspondence and documents end up scattered all over the place. There is no transcript of an email conversation, no reliable thread structure, no single logical container that holds it all together, no canonical set of messages and documents, no tools specialized to work with them. If messages are stored centrally, and only metadata about messages is distributed, then message stores become vastly more useful. One of the key benefits of news- or web-based discussions, and the reason I advocate this mode of communication so fiercely, is precisely this centralization. Collaboration is fluid, and if I'm pulled into an email conversation midstream, I should be able to jump into a complete context and bring myself up to speed.

Ultimately, of course, it's not centralization per se that we crave. It's availability, completeness, and coherence. If my messages and documents live out there in the cloud, and so do yours, and we notify one another by reciprocal syndication, life's great until the network fails or we disconnect. Centralization may buy us completeness and coherence, but availability is an all or none proposition. As we use email today, our messages and documents are distributed. Some stuff lives out in the cloud, some stuff lives down on our PCs, it's a mess. In this case decentralization buys us availability, but we sacrifice completeness and coherence. Can we have the best of both worlds? Yes. If Derek's machine can synchronize with peers, then his home-page/mail-server/discussion-list can live out there in the cloud and down on his PC. It's always backed up, it's always as complete and coherent as the latest synchronization, and it's always available to everybody online, and to Derek offline. If the cloud-based version fails but Derek's online, people might even be able to access his local version directly. In this scenario there is still mobility of data, but it's a different kind of mobility. Messaging does not involve wholesale replication of data to all recipients. It's just lightweight notification, with references. Those references point to coherent clusters of data. If those clusters replicate, they do it in a smart way. They only go to the few strategic locations where they need to go.


Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, from O'Reilly and Associates. His recent BYTE.com columns are archived at http://www.byte.com/index/threads

Creative Commons License
This work is licensed under a Creative Commons License.