Over the next few days I want to explore a series of questions about the "pillars" of Longhorn -- WinFS, Avalon, and Indigo. Last fall, when this stuff was first announced, I reacted with an entry entitled Replace and Defend. I argued then that Longhorn reinvents quite a few wheels. Nobody can blame Microsoft for seeking new ways to keep customers locked into its Windows franchise. That's a business strategy that every rational player must pursue, in one way or another. In chapter 6 of Information Rules, entitled Managing Lock-In, Carl Shapiro and Hal Varian write:
The great fortunes of the information age lie in the hands of companies that have successfully established proprietary architectures that are used by a large installed base of locked-in customers. And many of the biggest headaches of the information age are visited upon companies that are locked into information systems that are inferior, orphaned, or monopolistically supplied.There's no question that Longhorn aims for lock-in -- it has to. But what is the nature of the bargain that's being offered? What kinds of benefits will it yield? And what kinds of headaches will accompany those benefits?
With respect to WinFS, Longhorn's new storage system -- an object/relational engine that also doubles as a conventional file system -- the claimed benefits are:
Finding stuff. Those of us who sometimes blog things just so we'll be assured of finding them later have a special appreciation of the absurdity of the current situation. Unless we use an add-on to Windows such as X1, we can often find things on the Internet more easily and more reliably than we can find things on our own hard disks.
Organizing stuff. We know that hierarchical foldering systems adapt poorly to the chaos of real life. Unix has always supported the concept of symbolic links, which give you the flexibility to construct alternate paths to the same thing. And indeed, modern versions of Windows do too. A little-known fact is that Junction, yet another wonderful utility from the indefatigable Mark Russinovich, enables you to create and delete symbolic links on Win2K or WinXP. But symlinking isn't something any normal user would be able to do routinely, and in any case it doesn't really solve the essence of the organizational problem, which is that we want to be able to group items dynamically based on the contents of individual items, and also -- crucially -- on relationships that tie sets of items together.
Nobody wouldn't want these benefits. The way in which Microsoft proposes to deliver them, though, contains some assumptions that I'd like to start unpacking. Let's start with the first benefit: finding stuff. Here's an example of a Longhorn search scenario:
For example, a user may want to use some pictures taken on a family vacation on her business Web site to promote a sale. She can tag these pictures already stored in a "\Family\Vacation\Photos" folder with a "Promote Sale" keyword when the sale begins. The application managing her Web site can then load all the pictures of this category and have them displayed as a slide show. When the sale ends, she can remove the tag from the pictures in a "WinFS" store. The website will stop showing them to the site visitors afterwards. [Longhorn SDK Documentation]
There's no need to wait until 2007 to see what this would be like. Just now, for example, I opened up Word 2003, wrote a short document, assigned it the keyword "Promote Sale," and saved it as XML. Here's a script to insert the document into a Berkeley DB XML database:
from dbxml import * db = 'winfs.dbxml' container = XmlContainer(None, db) container.open(None,DB_CREATE) doc = XmlDocument() item = open ('myDocument.xml').read() doc.setContent(item) container.putDocument(None, doc) container.close()
And here's a script that finds that document in the database, based on the keyword:
from dbxml import * db = 'winfs.dbxml' container = XmlContainer(None, db) container.open(None) context = XmlQueryContext(0,0) context.setNamespace ('o', 'urn:schemas-microsoft-com:office:office') xmlResults = container.queryWithXPath(None, "//o:Keywords[contains(.,'Promote Sale')]", context)A growing number number of applications -- notably, Microsoft's own latest generation of Office apps -- can store XML data in ways amenable to XPath search. The same XML data will be open to the more powerful kinds of search available in the newer XML technologies now coming online: XPath 2.0, XQuery. Meanwhile, a growing number of databases are gearing up to do this kind of search efficiently, often in combination with both relational and free-text querying.
The power of pervasive free-text search, by the way, is something that Microsoft seems consistently to underestimate. Outlook, even in its latest incarnation, is helpless to find anything quickly. Everybody has to rely on third-party add-ons for this essential function. There's a hole in the market that you could drive a truck through, and the name on the side of that truck is Gmail, but I digress.
Here's the point of this installment. To the extent that our personal information stores contain information represented in XML, we have standard ways to search them. What's more, two powerful trends point to a brighter future for this scenario: the growing use of open XML file formats, and the steady advance of databases that can index and search XML content. WinFS embraces neither trend, and that looks to me like a looming headache. Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model. And then they'll have to find ways to unify that approach with the XML-oriented model prevailing in the world at large -- and indeed, even on pre-Longhorn Windows systems.
The justification for this headache, if there is one, must lie not in the realm of "finding stuff" but in the realm of "organizing stuff." WinFS relationships, in other words, must be capable of delivering such compelling benefits that there was no choice but to invent a proprietary storage model from the ground up. I'll explore that proposition next time.
Former URL: http://weblog.infoworld.com/udell/2004/06/02.html#a1012