The next wave of peer-to-peer

When we consider the exponential growth of storage, we often forget that our most essential data is textual and numeric. And that stuff tends to grow only linearly. For example, my 2005 e-mail archive tops 100 megabytes, but a big chunk of it is PowerPoint attachments people have sent me. Boiled down to their textual and numeric essence, they'd occupy a fraction of the space.

There's nothing new about in-memory databases. They come in many different flavors, all of which are still fairly exotic, but emerging technologies such as LINQ (language integrated query) promise to pull this approach into the mainstream. For our most vital and most volatile data, it's a strategy whose time has come. [Full story at InfoWorld.com]

I'm constantly using my new in-memory search engine. Just now, for example, it found me the link for LINQ. As I mention in this week's column, it reminds me of my 1997 group calendar which used a Java Dictionary as a simple but effective in-memory database.

It also recalls my peer-to-peer experiments from that era. My in-memory searcher is a lightweight service that runs as easily on my PowerBook as on the server to which it's deployed. I run it locally when I'm developing it, but I can also run it locally if I happen to be offline.

During the last wave of peer-to-peer excitement, we imagined every client also being a server. That's still an interesting scenario, but this time around I think it'll expand to include peers that live in the cloud as well as on client machines.

For example, my blog works differently from the rest of the InfoWorld blogs. The content is well-formed XML, and it follows certain self-imposed rules. InfoWorld's search engine doesn't exploit those regularities, but my own search services do. Now suppose that ten other InfoWorld bloggers also follow their own private rules. So long as we all comply with a common format for search results, we're free to invent and exploit personal conventions.

Searching across the federation doesn't mean contacting everyone's personal machine, it means contacting everyone's uniquely-configured search service. Agents might be a better term for such peer services. If I have my own value-added representation for search results, my agent can make that representation available to others.

The client machine might also be a server, but perhaps more commonly it will be a smart intermediary. Suppose, for example, that each of the ten blogs has an in-memory searcher like mine. A multi-threaded consumer of those services could produce results pretty quickly, but where should that consumer reside? It's convenient to put it in the cloud. But if it has to wait until the last thread finishes, there's a potential bottleneck. Alternatively it can live in the browser, and progressively render results as they show up. This sort of progressive rendering is one of the most interesting aspects of Greasemonkey-style intermediation.

Former URL: http://weblog.infoworld.com/udell/2006/02/20.html#a1391