Tangled in the ThreadsJon Udell, September 13, 2000
Dimensions of peer computingThe peer model itself has nothing to say about key issues such as trust, identity, security, and scalability. It all depends on what you're trying to accomplish.
A couple of weeks ago, at the Intel Developer Forum, there was a lively and well-attended panel discussion on peer computing. The panel, which included David Anderson (SETI@Home), Ian Clarke (FreeNet), Andrew Grimshaw (Applied MetaComputing), and Ray Ozzie (Groove Networks), was asked to address the issue of security in the peer-to-peer (P2P) model. Here are (paraphrases of) some of the responses:
Ray Ozzie: For enterprise IT, a hybrid approach is better than a pure peer model. You want to centrally determine a policy for your network of peers, and then distribute that policy to the individual desktops.
Ian Clarke: In FreeNet we assume that any node could be hostile, written specifically to damage the network, and we design the network with that assumption.
David Anderson: There's also the problem of figuring out how, when you get back the result of a computation, to know that it was your unmodified program that computed that result?
At this point, Ray Ozzie (best known as the creator of Lotus Notes) added:Both Ian's and David's responses reflect the security view of protecting the network against a rogue client. My reply was the inverse of that: protecting the client from the others on the network. Both are interesting viewpoints, depending on whether you are using peer networking to serve the individual, or using the peer network as a resource.
Exactly. There is, of course, nothing new about peer computing. It's fundamental to the architecture of the Internet, and it's something we've been doing routinely on LANs for many years. It ended up being Napster that alerted the world to the possibilities of peer networking at Internet scale but, of course, Napster's model isn't purely P2P, it relies on central coordination. From that perspective, the killer P2P application was and still is email. It too relies on central coordination, although significantly, its "central" servers actually form a peer network amongst themselves. But to to users, email just looks like a way to form ad-hoc peer-to-peer communication channels. There are many such examples. Another is eBay, which empowers users to form the kinds of peer networks that we call markets.
So when you hear someone say that security (or scalability, or trust, or anonymity) is fundamental to the peer model, you should throw down the flag and ask "For which flavor of peer-enabled application?" Today, we can identify at least these four major types:
- data distribution
- code distribution
There are many implementations of each of these types of applications. Each implementation proceeds from its own set of requirements. Those requirements dictate where each implementation falls along multiple continua. Here are some of the paired endpoints that characterize these continua:
A fully decentralized architecture versus a hybrid architecture involving central coordination of decentralized peers.
An architecture that demands (or encourages) strong proof of identity versus one that demands (or encourages) concealment of identity.
An architecture that must scale essentially without limit, versus one that can happily tolerate limits.
An architecture that deals primarily with data, versus one that deals primarily with metadata.
From requirements to architectures
For any P2P application, you can have an interesting discussion about where it falls along any of these four continua, and why, and where it might go next. Take Napster for example. It falls somewhere in the middle of the identity continuum. You usually don't know the real-world identity of a Napster peer, but neither is that person fully anonymous. You know a lot about the person by the evidence of his or her musical tastes. That's the essence of Napster's collaborative function -- the person's collection alerts you to songs you didn't know to search for.
Recently, some anti-Napster individuals have set out to undermine the integrity of Napster. They do this by sharing files that purport to be popular songs, but that are booby-trapped with audio segments that ruin the enjoyment of the songs. Napster's technology of superdistribution is so aggressive that these damaged songs can spread rapidly and widely, thus compromising the quality of the entire Napster service.
One response to this threat might be a new requirement that nudges Napster, along the identity continuum, closer to the end marked "strong proof of identity." As does eBay, Napster might benefit from a reputation system that helps users decide whether or not to transact with other users.
Towards which end of the centralization continuum might this new requirement move Napster? Arguably either. On the one hand, an eBay-like model would suggest that the reputation system might be centrally managed. But you could equally well argue in favor of a decentralized, PGP-like model with many different webs of trust forming, in a more peer-like way, among like-minded groups of Napster users. Napster's community model is, after all, not very highly evolved. Within music culture, there are many small and focused communities. In real life, such people often tend to know one another. To the extent that a Napster-like service explicitly supported the formation of communities of opera lovers or rockabilly fans, it could leverage -- and extend -- existing real-world trust relationships within those communities.
Suppose Napster went in this latter direction -- partitioning the music-sharing community into smaller, genre-based communities. Where would it go on the scalability continuum? If there were multiple genre-based servers, the burden on each, in terms of quantity of storage, numbers of connections, and so on, would lessen. Ideally -- and this would be a beautiful way to exploit the peer model -- any Napster client could be a community server, not just a song server. I could host my own little club, or several of them, each devoted to some very specific musical interest. Scalability doesn't matter much, maybe hardly at all, when you scope out a very small group of peers in this way.
Let's push the thought experiment one step further. Where does this hypothetical partitioned Napster end up on the data vs metadata continuum? Clearly, metadata becomes a lot more important. Today, Napster deals only with titles, artists, and users. A partitioned service implies that songs are tagged by genre, and that this dimension of metadata is controlled somewhat more carefully than Napster's current metadata is. Here we encounter the always vexing question of how to categorize data using metadata, and how to manage repositories of metadata, or directories. Which in turn leads us back to the centralization continuum. On MP3.com, for example, a genre is a directory managed by a central service. If I start my own peer-enabled music club, in the context of a genre-partitioned Napster, where would I list it? At a sufficiently small scale of operation, I don't need to. I just tell my friends about the service. But if I decide to publicize more widely, I'll want to appear in a directory. Separately, I may also want to be a directory. Now we're moving up the stack, from metadata that is information about songs, to metadata that is information about different kinds of music-sharing services.
"The P in P2P is people"
Dave Winer says it beautifully in his newsletter: "The P in P2P is people." Real life is a peer-to-peer environment. People can deal directly with one another when they want to. Computers, which carry our interactions into cyberspace, must ensure that when we're there we can continue to deal directly with one another as we do in the real world. Although networks intrinsically support the peer model, the recent evolution of the Internet has raised artificial barriers. As Clay Shirky has pointed out (see, for example, this message in the eGroups decentralization forum), the preponderance of new Internet users have appeared at dynamic IP addresses, and/or behind NATs or firewalls or other proxies that block unmediated P2P connections.
For all P2P applications, the first order of business is to restore the Internet's lost P2P capability. This, as Shirky notes, was Napster's great conceptual leap. (Though, if you buy my email-as-P2P argument, then email got there first.) How to assure the possibility of peer communication, on an Internet dominated by dynamic IP addressing, firewalls, and NATs, is going to be a fruitful area for standardization.
Beyond that, I'm not sure that P2P, in and of itself, mandates anything in particular about security, or scalability, or identity, or trust. These issues arise in the context of particular applications. Where P2P applications can borrow from existing technologies that address these issues, they should, and they will. What will be especially exciting are the new, yet-to-be-conceived P2P applications -- which will lead us to find novel uses for familiar technologies, as well as invent new ones. The richness of human experience is reflected in cyberspace, so far, only to the palest degree. As applications more deeply attuned to human experience come online, they'll be driven by requirements which in turn govern architectures. What are those requirements? We only need to look to ourselves. In our everyday lives, we embody and enact the protocols that P2P applications transplant into cyberspace.
Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, from O'Reilly and Associates. His recent BYTE.com columns are archived at http://www.byte.com/index/threads
This work is licensed under a Creative Commons License.