Translucent databases

I had lunch with my old pal Peter Wayner yesterday, and he gave me a copy of his new book, Translucent Databases . In the book, Peter defines translucency as an approach that "lets some light escape the system while still providing a layer of secrecy."

Conventionally, databases store information in the clear and rely on a fortress security model. Break into the fortress (or subvert it from inside), and you can scoop up all the information. Over lunch Peter sketched a scenario that might well be a non-starter given that risk. Imagine a web service that enables parents to find available babysitters. A compromise would disastrously reveal vulnerable households where parents are absent and teenage girls are present. Translucency, in this case, means encrypting sensitive data (identities of parents, identities and schedules of babysitters) so that it is hidden even from the database itself, while yet enabling the two parties (parents, babysitters) to rendezvous.

The techniques used to accomplish this trick are simple, but the protocols -- like all cryptographic protocols -- require some thought. In general, they elaborate on the possibilities inherent in one-way hashing, like that used to guard passwords in the Unix /etc/passwd file. For example, this SQL statement:

INSERT INTO babysitter1 VALUES (MD5("Chris Jones/swordfish"), "No practice and no school.", 1, 1, "2002-01-02 16:00:00", "2002-01-02 23:00:00")

means: "Chris Jones (password swordfish) is available Jan 2, from four to eleven." A parent to whom Chris has vouchsafed her password queries Chris' schedule using:

SELECT * from babysitter1 WHERE idHash=MD5("Chris Jones/swordfish");

Most of the book spins out variations on these kinds of examples, using simple Java code to generate standard SQL. Some other techniques include misdirection (adding fake data, and certifying the real data with digital signatures), and quantization (rounding off data that doesn't need to be individually precise, as also described in an NY Times Circuits story yesterday).

The book was poorly copyedited, unfortunately, and there are an annoying number of typos. But it's an excellent exploration of what will doubtless be an important emerging field: the intersection of databases and cryptography. Perhaps in time Microsoft's initial Hailstorm proposal will be seen in a slightly different light. It was, after all, a translucent database.


Former URL: http://weblog.infoworld.com/udell/2002/07/19.html#a345