Tangled in the ThreadsJon Udell, March 29, 2000
Object data and the Procrustean bed
One of these evil-doers was called Procrustes, or the Stretcher. He had an iron bedstead, on which he used to tie all travellers who fell into his hands. If they were shorter than the bed, he stretched their limbs to make them fit it; if they were longer than the bed, he lopped off a portion. -- Bulfinch's mythology, The Age of Fable, Chapter XXObject data has to be tortured to fit into SQL tables. The Web's overdue for a new storage paradigm.
Last week in the databases newsgroup, Alexander Staubo referred to a letter from Henry G. Baker to the ACM Forum which decries the last 20 years as a "Dark Ages of commercial data processing." Baker wrote, in part:Why were relational databases such a Procrustean bed? Because organizations, budgets, products, etc., are hierarchical; hierarchies require transitive closures for their "explosions"; and transitive closures cannot be expressed within the classical Codd model using only a finite number of joins (I wrote a paper in 1971 discussing this problem). Perhaps this sounds like 20-20 hindsight, but most manufacturing databases of the late 1960's were of the "Bill of Materials" type, which today would be characterized as "object-oriented". Parts "explosions" and budgets "explosions" were the norm, and these databases could easily handle the complexity of large amounts of CAD-equivalent data. These databases could also respond quickly to "real-time" requests for information, because the data was readily accessible through pointers and hash tables--without performing "joins".
In one of the chapters of my book, I made the same point, also with respect to the difficulty of supporting complex, organic data in a relational model:"The servlets we'll explore in this chapter are ... in essence, just Web APIs to in-memory Java objects that are made persistent using serialization. These objects aren't constrained by SQL's tabular row-and-column format. Like Perl, Java has hashtables and lists, and can combine these to make arbitrarily complex structures -- such as the hashes-of-hashes-of-lists (HoHoLs) or lists-of-hashes (LoHs) that we found to be so useful in earlier chapters. Groupware needs these kinds of data structures because it has to model what people really do, and that's messy. Relationships among people, tasks, events, and resources are complex and fluid. Forcing these organic structures into the Procrustean bed of normalized relational tables can be a painful and unproductive exercise. Object storage can model webs of relationships much more directly than SQL databases can."
So I do sympathize with Mr. Baker's position. However, I think it's disingenuous to say that two decades of SQL was a waste of time. Will object databases save us? It's not yet clear that they will; so far, they've hardly made a dent. In chapter 12 of Philip and Alex's Guide to Web Publishing, Philip Greenspun notes, hilariously:
"If you believed everything in the object database vendors' literature, then you'd be surprised that Larry Ellison still has $100 bills to fling to peasants as he roars past in his Acura NSX."
I know developers who would love to persist their Java and C++ objects directly to an ODB, rather than write gobs of code to translate between objects and tables, and back again. Why not go the ODB route, then? The most oft-cited objection is the lack of standard query and update tools. Vendor-specific SQL extensions abound, but at least there is a useful lowest common denominator as expressed in umbrella APIs such as ODBC. The Object Database Management Group has defined OQL, an object query language, and a number of object database vendors support it, but OQL never gained the traction its inventors hoped for. What's more, now that object databases are repositioning themselves as XML data servers, XQL (XML query language) looks like it might usurp the role of OQL.
I don't dislike OODBMSes in principle -- quite the contrary, I think OO *is* the way go. Unfortunately, in my opinion no "killer" candidates have surfaced thus far. OODBMS vendors are scrambling to support XML and related standards, and hopefully something good will come out of this. OODBMS products to this date have been mystical applications, each with its own idiosyncratic design principles, tools, and language support.
An experienced Zope developer, Alexander would like to rely more heavily on Zope's object database, ZODB, but notes these objections:
The ZODB seriously lacks mature object design tools. I can't easily tell Zope that objects of type A refer to objects of type B (mutual relationships), and that objects of type B should not be deletable as long as an object of type A is referring to it (referential integrity).
Furthermore, I am building a site with enormous performance requirements -- huge mass of data, a potentially huge number of visitors. Even with round-robin schemes, load balancing IP routers, and Digital Creations' Zope Enterprise Option product, you can scale only so far.
What would be an example of an object design tool? Well, the POET and Versant object databases, for example, support the Rational Rose modeler. ZODB itself is defined by a UML (Unified Modeling Language) model, and one can imagine integrating Zope with a variety of UML tools
ZODB and ZEO
Zope's speed and scalability are another matter. This week, Digital Creations announced that ZEO (Zope Enterprise Option), currently a high-end commercial product, will be open-sourced along with the rest of Zope.
Prior to this announcement, Digital Creations' Michel Pelletier had posted a nice description of how ZEO works:
Zope uses a client/server storage architecture. In the regular distribution of Zope, ZODB manages 'Storages'. The default Storage that comes with Zope is FileStorage which stores information on the filesystem.
ZEO is a Storage that talks via TCP/IP to a remote component that takes care of the actual storage. The Storage component that plugs into Zope is called the 'ClientStorage'. The Server component is the 'Storage Server'.
Multiple ClientStorages can connect to one Storage Server. ClientStorages maintain a local disk and memory cache of objects, so an object is really only ever fetched once. If a ClientStorage writes an object, a cache invalidation protocol makes sure that all clients are up to date.
As an added bonus, the Storage Server itself has a Storage backend. By default, the Storage Server uses FileStorage, but there is no saying that the Storage Server couldn't use a ClientStorage to connect to another Storage Server. This allows to you distribute your entire object database over an N-deep hierarchy of machines.
Soon to be shorn of its $20,000 price tag, ZEO will doubtless receive a lot of tire-kicking over the next few months. It'll be interesting to see how it shakes out. As I mentioned in my keynote talk at the Zope track of this year's Python conference, I think Zope would also do well to ally itself with commercial object databases offering maturity and tool support that ZODB/ZEO lack.
Apple's Enterprise Objects Framework
Another Zope developer, Jeffrey P Shell, put in a good word for Apple's Enterprise Objects Framework, the object modeler and object-to-relational translation layer for the WebObjects application server. "It's still light years ahead of anybody," says Jeffrey, "and it's absolutely amazing."
You can read about EOF on the Apple developer's site. A new version of WebObjects, v4.5, was released in March.
Here's why Jeffrey likes EOF so much:
It's actually much more of a persistence framework than it is a RDBMS-integration framework. At least, that's how it feels. It does a lot of the things that Zope's ZODB does:
- Managing graphs of objects, not actually retrieving the real object until it's needed (ZODB has "ghosted" objects, EOF uses a faulting scheme)
- Transaction management including dealing with multiple transactionable systems
- Transparent usage of the persistence (in EOF, you basically NEVER have to write SQL unless you REALLY want to)
- Pluggable object stores. Unless you use very specific features of the underlying database (custom SQL) in EOF, you should be able to switch the Connector (Oracle, MySQL, FlatFile, etc) without affecting any behaviour. It differs from the ZODB in that it fits a well-defined-schema, and is always driven from the Model. ZODB applications, like Zope, can have very wild and always-changing schema.
NeXT/Apple do such great OO in these frameworks that it puts much of the world to shame, in my opinion. The separation of layers of communication in EOF is very clear. It's definitely a product that has been very well thought out since its inception. NeXT had some truly great engineers. Having them at Apple is really going to ensure the flexibility and power of MacOS X as Apple starts nudging developers to start using the Cocoa (aka OpenStep/Yellowbox) frameworks.
It took a long time for the SQL discipline to mature. And even now, as SQL guru Joe Celko notes in one of the articles mentioned below, RDBMS technology itself has captured only a small fraction of business data-processing, much of which continues to rely on more archaic databases. So we shouldn't be surprised to see OODB technology following a long, gradual adoption curve as well. We like to think that everything's running on Internet time, but disciplines as fundamental as these evolve much more slowly. Renaming object databases as XML data servers was a good marketing move, but object data will probably be squirming on the Procrustean bed for a long time to come.
Object database articles from the BYTE.com archives:
- October 1997 / BYTE Software Lab Report
- The Object Is to Manage Data, by Todd Zino
- October 1997 / Features
- Debunking Object-Database Myths, by Joe Celko and Jackie Celko
- April 1994 / State of the Art
This work is licensed under a Creative Commons License.