Open source lock-in

With the release of MySQL 4.0, the licensing policy of the wildly popular open source database underwent a subtle change. The code libraries that client programs use to access the native MySQL API, formerly licensed under the LGPL (Lesser General Public License), were converted to the GPL. The LGPL was designed to exempt "nonfree" programs that link against open source libraries from the GPL's strong requirement to release source code. The purpose of the LGPL, according to the Free Software Foundation, is "to encourage the widest possible use of a certain library, so that it becomes a de-facto standard." And indeed, MySQL has become the database pillar of the so-called LAMP platform, whose acronym expands to Linux, Apache, MySQL, and the trio of Perl, Python, and PHP. [Full story at InfoWorld.com]
Here's an interesting bit of backstory. As originally filed, my use of the terms LGPL and GPL in the lead paragraph was backwards. Not because I don't know the difference, but because it's so darned easy to get yourself mixed up when talking about this stuff. The error got past my own proofreading, and got by several editorial checks as well, but was fortunately caught before it went to print. I'm tempted to say that the complexity of open source licensing can make your eyes bleed, and that's true, but I guess it applies to all software licensing. Oracle, for example, is apparently now offering licensing seminars where you go to learn, not how to use Oracle, but how to pay for it.

This week's column is only partly about licensing, though. It's also a cautionary tale about getting locked into database-specific access technologies. I referred to a posting by Kingsley Idehen which says, in part:

I have been an ardent ODBC supporter since its inception simply because data is timelessly important, and ODBC provides a critical solution for separating application logic from data repositories. There is a lot of SQL data driving mission critical business applications globally, and failure to comprehend ODBC's value proposition ultimately results in loss of control over Data, which is the foundation from which Information and Knowledge are derived.

You should never find yourself locked into any database vendor, programming language vendor, operating system vendor, or business application vendor, simply becuase you want exploit your own data. [Kingsley Idehen]

As several readers correctly pointed out, there are different kinds of database-access lock-in. Technologies such as ODBC and JDBC are ways to avoid the kind of lock-in I was talking about about in the column; we might call that "transport" or "access" lock-in. Of course, abstraction at this level doesn't help at all with another kind of lock-in; let's call that "SQL dialect lock-in." One reader, Jim Penny, argued forcefully in email (quoted with permission) that the former is far less worrisome than the latter:

The transport layer is so easy to replace that lockin at this level is hardly an issue. It should be, at most, a replacement of two or three routines and a re-linkage. In any Unix program that cares about database independence, it should be as easy as selecting some Makefile or autoconf options. (ODBC is a single transport layer, and locking in to it is hardly different in degree of lockin to locking in to any other transport layer, JDBC, libpq, etc.)

The common subset problem is much larger. It is really hard to write efficient, working SQL for multiple backends, especially if you don't know in advance what set of databases need to be supported. Nobody really attempts to be 100% compliant with the standard(s), everyone has extensions, missing features, and quirks. I have in mind things like subselect support, 'in' support, representation of booleans and dates, cast format, blob support, and varchar support. In the more exotic realm, triggers, prepared statements, and procedural support vary wildly.

If transport layer independence were truly important, more people would be using something like SQL Relay. SQL Relay offers other advantages, as well. [Jim Penny]

Jim makes excellent points. (And SQL Relay, of which I'd not heard, sounds interesting.) I agree that SQL-dialect lock-in is even more pernicious than transport data-access lock-in. Although many useful apps can write to SQL's common subset, including (fortunately for me) some I've worked on, many others can't, and that's a huge problem. That said, I don't think transport neutrality is a non-issue. I'm as tempted as the next developer to think that if some simple hack will get me from transport A to transport B, then A and B are effectively the same. But really, they're not. What Microsoft understood very well about ODBC, as Kingsley has been saying for years, is that transport-neutral data access from the desktop would be a tremendous enabler if, and only if, it were always and automatically available.


Former URL: http://weblog.infoworld.com/udell/2004/01/23.html#a895