As we migrate personal data to the cloud, it seems that we trade convenience for privacy. It's convenient, for example, to access my address book from any connected device I happen to use. But when I park my address book in the cloud in order to gain this benefit, I expose my data to the provider of that cloud service. When the service is offered for free, supported by ads that use my personal info to profile me, this exposure is the price I pay for convenient access to my own data. The provider may promise not to use the data in ways I don't like, but I can't be sure that promise will be kept.
Is this a reasonable trade-off? For many people, in many cases, it appears to be. Of course we haven't, so far, been given other choices. And other choices can exist. Storing your data in the cloud doesn't necessarily mean, for example, that the cloud operator can read all the data you put there. There are ways to transform it so that it's useful only to you, or to you and designated others, or to the service provider but only in restricted ways.
Early Unix systems kept users' passwords in an unprotected system file, /etc/passwd, that anyone could read. This seemed crazy when I first learned about it many years ago. But there was a method to the madness. The file was readable, so anyone could see the usernames. But the passwords were transformed, using a cryptographic hash function, into gibberish. The system didn't need to remember your cleartext password. It only needed to verify that when you typed your cleartext password at logon, the operation that originally encoded its /etc/passwd equivalent would, when repeated, yield a matching result.
Everything old is new again. When it was recently discovered that some iPhone apps were uploading users' contacts to the cloud, one proposed remedy was to modify iOS to require explicit user approval. But in one typical scenario that's not a choice a user should have to make. A social service that uses contacts to find which of a new user's friends are already members doesn't need cleartext email addresses. If I upload hashes of my contacts, and you upload hashes of yours, the service can match hashes without knowing the email addresses from which they're derived.
In Hashing for privacy in social apps, Matt Gemmell shows how it can be done. Why wasn't it? Not for nefarious reasons, Gemmell says, but rather because developers simply weren't aware of the option to uses hashes as a proxy for email addresses.
The best general treatise I've read on this topic is Peter Wayner's Translucent Databases. I reviewed the first edition a decade ago; the revised and expanded second edition came out in 2009. A translucent system, Peter says, "lets some light escape while still providing a layer of secrecy."
Here's my favorite example from Peter's book. Consider a social app that enables parents to find available babysitters. A conventional implementation would store sensitive data -- identities and addresses of parents, identities and schedules of babysitters -- as cleartext. If evildoers break into the service, there will be another round of headlines and unsatisfying apologies.
A translucent solution encrypts the sensitive data so that it is hidden even from the operator of the service, while yet enabling the two parties (parents, babysitters) to rendezvous.
How many applications can benefit from translucency? We won't know until we start looking. The translucent approach doesn't lie along the path of least resistance, though. It takes creative thinking and hard work to craft applications that don't unnecessarily require users to disclose, or services to store, personal data. But if you can solve a problem in a translucent way, you should. We can all live without more of those headlines and apologies.