Towards politically neutral infrastructure: Amazon's S3

By now you've heard the pitch: Amazon is offering metered storage for blobs of data in quantities ranging from 1 byte to 5 GB. S3 provides a simple key/value store, like the ever-popular Berkeley DB -- albeit without locking or transactional features. Objects can be world-readable or governed by a range of access controls. REST (Representational State Transfer) and SOAP APIs are provided, along with wrappers in a variety of popular languages. Pricing is aggressive for storage, somewhat less so for data transfer. Amazon's commerce engine handles the billing.
...
Creating what's called the energy web -- a marketplace where smart producers and consumers of power exchange price signals in real time -- will require a massive overhaul of our legacy power grid. There's just no way for us to start from scratch. But in the realm of Web services, we're just now building the grid. Given a clean slate, maybe we can figure out how to aggregate demand, meter usage, and value services for what they do rather than just for the eyeballs they attract. [Full story at InfoWorld.com]

I signed up for S3 and created a bucket with a couple of objects in it. Here they are:

http://s3.amazonaws.com/jon/public

http://s3.amazonaws.com/jon/private

Some observations about these resources:

Naming

The name of the bucket is jon, and the names of the objects within it are public and private. The bucket namespace is global, which means that as long as jon is owned by my S3 developer account, nobody else can use that name. Will this lead to a namespace land grab? We'll see. Meanwhile, I've got mine, and although I may never again top Jon Stewart as Google's #1 Jon, his people are going to have to talk to my people if they want my Amazon bucket.

Billing

It's costing me fifteen cents per gigabyte per month to store this data, and twenty cents per GB-month for reads and writes. Since the contents of s3.amazonaws.com/public is currently just the string 'public data' and /private is just 'private data' it's going to be a long time before the bill adds up to a penny. And this fascinates me. A lot of the early discussion of S3 has focused on high-volume storage -- notably backup. But at the other end of the spectrum, our active working sets of essential data are pretty small. As a matter of fact, I keep a chunk of mine in network-accessible memory for easy access. I can very well imagine parking other chunks in S3 for guaranteed availability. (I'd keep my own backup, of course.)

Access

As you'll discover by clicking, /public is readable and /private is not. However, /private is readable at this URL, which includes a signature that I created using the API. It's only readable for the next 24 hours, though -- until about 8AM EST on March 23. That's because I asked for a 24-hour expiration when I created the signed URL. I love this feature! Although I prefer file uploads over email attachments as a means of collaboration, I tend to leave a trail of infoturds that I forget to delete. Expiring read access is a great way to limit unnecessary exposure. I'd even use Mission Impossible-style automatic deletion if it were available, though I can see that Amazon wouldn't want to encourage expiring revenue.

The access control regime also includes a feature called email grantees which, at first, sounded really great. Here's the access control list for /private:

<AccessControlList>
<Grant>
  <Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:type="CanonicalUser">
    <ID>7bba2e6...2bdd8</ID>
    <DisplayName>jonudell</DisplayName>
  </Grantee>
  <Permission>FULL_CONTROL</Permission>
</Grant>
<Grant>
  <Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:type="CanonicalUser">
    <ID>1b17b62....840b</ID>
    <DisplayName>maryjones</DisplayName>
  </Grantee>
  <Permission>READ</Permission>
  </Grant>
</AccessControlList>

As the owner I have full control, and I can also delegate access using signed URLs as shown above. But I've also given read permission to Mary Jones, using an API call that resolves an email address (say, mary_jones@infoworld.com) to an Amazon.com customer identity.

I wasn't the only one to jump to the conclusion that this mechanism would enable authenticated access control using Amazon's customer identity system. But apparently not. Email grants are currently just a shorthand way to identify other S3 developers and, as such, not very useful. That could change, though, and I hope it will.

"I agree that this functionality [authentication by Amazon customer identity] would be really cool," wrote Amazon's John Cormie on the S3 developer forum, "and we hope to add it to a future version of S3." My last point illustrates why I think that change would matter.

Political neutrality

At an InfoWorld forum a couple of years ago, Ray Ozzie talked about how Groove was being used in Iraq to coordinate activities among various military and governmental actors. None would have agreed to use an IT infrastructure that was owned and operated by any of the others, so Groove's fully decentralized architecture was chosen as much for political as for technical reasons.

I see S3 in a similar light. Today, if I'm with company A and you're with company B and we also want to work with a freelancer, the impulse to collaborate is often sandbagged by politics. I don't want to use your SharePoint server, you don't want to use my Plone server, and neither of us wants to roust our IT guy and ask him to create an account for the freelancer.

With a service like S3, we could all agree to use Amazon's politically neutral object store. With the right wrappers, we could even continue to use our own preferred applications.

The notion of Amazon as a politically neutral identity system is equally intriguing. Sure, it's a silo, but it's one that happens to include all of us -- or easily can. While we're waiting for federated identity to sort itself out, we could do a lot of useful work with this kind of model.


Former URL: http://weblog.infoworld.com/udell/2006/03/22.html#a1411