Questions about Longhorn, part 1: WinFS

Over the next few days I want to explore a series of questions about the "pillars" of Longhorn -- WinFS, Avalon, and Indigo. Last fall, when this stuff was first announced, I reacted with an entry entitled Replace and Defend. I argued then that Longhorn reinvents quite a few wheels. Nobody can blame Microsoft for seeking new ways to keep customers locked into its Windows franchise. That's a business strategy that every rational player must pursue, in one way or another. In chapter 6 of Information Rules, entitled Managing Lock-In, Carl Shapiro and Hal Varian write:

The great fortunes of the information age lie in the hands of companies that have successfully established proprietary architectures that are used by a large installed base of locked-in customers. And many of the biggest headaches of the information age are visited upon companies that are locked into information systems that are inferior, orphaned, or monopolistically supplied.
There's no question that Longhorn aims for lock-in -- it has to. But what is the nature of the bargain that's being offered? What kinds of benefits will it yield? And what kinds of headaches will accompany those benefits?

With respect to WinFS, Longhorn's new storage system -- an object/relational engine that also doubles as a conventional file system -- the claimed benefits are:

Nobody wouldn't want these benefits. The way in which Microsoft proposes to deliver them, though, contains some assumptions that I'd like to start unpacking. Let's start with the first benefit: finding stuff. Here's an example of a Longhorn search scenario:

For example, a user may want to use some pictures taken on a family vacation on her business Web site to promote a sale. She can tag these pictures already stored in a "\Family\Vacation\Photos" folder with a "Promote Sale" keyword when the sale begins. The application managing her Web site can then load all the pictures of this category and have them displayed as a slide show. When the sale ends, she can remove the tag from the pictures in a "WinFS" store. The website will stop showing them to the site visitors afterwards. [Longhorn SDK Documentation]

There's no need to wait until 2007 to see what this would be like. Just now, for example, I opened up Word 2003, wrote a short document, assigned it the keyword "Promote Sale," and saved it as XML. Here's a script to insert the document into a Berkeley DB XML database:

from dbxml import *
db = 'winfs.dbxml'
container = XmlContainer(None, db)
container.open(None,DB_CREATE)
doc = XmlDocument()
item = open ('myDocument.xml').read()
doc.setContent(item)
container.putDocument(None, doc)
container.close()

And here's a script that finds that document in the database, based on the keyword:

from dbxml import *
db = 'winfs.dbxml'
container = XmlContainer(None, db)
container.open(None)
context = XmlQueryContext(0,0)
context.setNamespace ('o', 'urn:schemas-microsoft-com:office:office')
xmlResults = container.queryWithXPath(None, 
    "//o:Keywords[contains(.,'Promote Sale')]", context)         
A growing number number of applications -- notably, Microsoft's own latest generation of Office apps -- can store XML data in ways amenable to XPath search. The same XML data will be open to the more powerful kinds of search available in the newer XML technologies now coming online: XPath 2.0, XQuery. Meanwhile, a growing number of databases are gearing up to do this kind of search efficiently, often in combination with both relational and free-text querying.

The power of pervasive free-text search, by the way, is something that Microsoft seems consistently to underestimate. Outlook, even in its latest incarnation, is helpless to find anything quickly. Everybody has to rely on third-party add-ons for this essential function. There's a hole in the market that you could drive a truck through, and the name on the side of that truck is Gmail, but I digress.

Here's the point of this installment. To the extent that our personal information stores contain information represented in XML, we have standard ways to search them. What's more, two powerful trends point to a brighter future for this scenario: the growing use of open XML file formats, and the steady advance of databases that can index and search XML content. WinFS embraces neither trend, and that looks to me like a looming headache. Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model. And then they'll have to find ways to unify that approach with the XML-oriented model prevailing in the world at large -- and indeed, even on pre-Longhorn Windows systems.

The justification for this headache, if there is one, must lie not in the realm of "finding stuff" but in the realm of "organizing stuff." WinFS relationships, in other words, must be capable of delivering such compelling benefits that there was no choice but to invent a proprietary storage model from the ground up. I'll explore that proposition next time.


Former URL: http://weblog.infoworld.com/udell/2004/06/02.html#a1012