A couple of weeks ago I uploaded a home video to my Amazon S3 account and sent the link around to friends and family. Featuring kittens and bunnies, it's the kind of thing you'd expect to find in the cute animals category on the various video sites. Over the weekend somebody suggested that I upload it to YouTube, so I registered there and did that. YouTube reported:
Uploaded (processing, please wait).
Twenty-four hours later, it was still processing. Meanwhile, I received my June Amazon S3 bill for the princely sum of fifteen cents:
Since that 15-megabyte movie is the only thing I'm storing on AWS at the moment, my fourteen cents worth of bandwidth usage translates to about 47 downloads. Let's round it to 50 and consider the following progression:
views $$$ ----- --- 50 0.15 500 1.50 5000 15.00 50000 150.00 500000 1500.00
The vast majority of friends-and-family videos will never exceed 50 views a month, for which fifteen cents a month is effectively free. And even at 500 views a month, you'll hardly notice the buck fifty.
But what if your video becomes popular? The top 20 videos on YouTube on any given day average about 50,000 views, and you'd certainly notice the hundred and fifty bucks that would cost on S3, not to mention the thousands you'd pay for a month of that level of interest.
For a variety of reasons, I've been thinking about the kinds of services provided by S3 and by video sites like YouTube, and about how such services might fruitfully combine.
The long processing delay at YouTube occurs, I suppose, because YouTube transcodes the files people upload, from QuickTime or Windows Media to Flash, which is currently regarded as the most universally viewable format. (Blip.TV handles this more intelligently, by the way. It publishes your primary file right away, and adds the Flash video later.) Video transcoding requires a lot of computational horsepower, so it's not surprising to see it become a bottleneck.
One consequence of that bottleneck is that YouTube is not useful for near-realtime citizen journalism. When I documented the flooding last fall in Keene, NH, my first upload was available and in circulation during the event. A long processing delay would have been intolerable.
In my case I published the file to my webserver, but most people can't easily post a large media object to the web. Hosting is one major service provided by YouTube, Blip.TV, and the rest. The other major service is convenient sharing, tagging, ranking, and community participation.
In principle the video services can, and arguably should, be remixed along both of these axes. In practice we're not there yet, but it's interesting to imagine the possibilities.
Take my friends-and-family video for example. I originally posted it to my AWS account at this URL. That was fine for friends and family, but if you hit that URL now you'll be redirected to YouTube.
Could I establish the S3 URL as the canonical one for this video, while creating the option to delegate hosting and/or community services as needed? Today, YouTube and Blip and the rest are walled gardens. Each creates its own namespace to which all community activity refers. But there's no straightforward way to associate this YouTube URL with this Blip.TV URL. As a result, there's no way to combine the service-specific tagging, comments, and viewership data for each of these instances. And there's no way for a blogger to refer canonically to the video, or for the blogosphere to aggregate such references.
Now consider this scenario. I upload my video to S3, or to some other storage service in the cloud where I choose to manage a canonical chunk of web namespace. (As I mentioned in my S3 writeup, its bucket namespace is global, which means that I control http://s3.amazonaws.com/jon so long as I choose to.) The video is immediately available at that URL. I make it public or, as noted in my S3 writeup, private at an unguessable and possibly time-limited URL that I communicate only to friends and family.
Let's say I make it public, because I hope to attract attention. In the unlikely event that I succeed wildly, the pennies I pay S3 could add up alarmingly.
BitTorrent is one way to defray the cost of distributing an object that becomes hugely popular. S3 can automatically produce a BitTorrent URL for any hosted object -- a very cool feature! But that only helps with distribution. Viral communication is the other half of the equation. That's what YouTube and the rest provide, in addition to hosting. And they do it in ways that optimize for communication with, rather than across, the walled gardens.
I'm looking for ways to break down those walls. Merely redirecting my S3 URL to YouTube, as I've done, is a very imperfect solution. In the unlikely event that my cute animals video becomes popular, I'll be spared the cost. And the odds against that unlikely event are reduced somewhat by submitting the video to YouTube. But my YouTube instance doesn't know about my Blip.TV instance, and vice versa.
My S3 instance does, however, know about both of those. If you point a raw HTTP client at my video on S3 you'll see, among the other HTTP headers, these:
x-amz-meta-blip: http://www.blip.tv/file/47189 x-amz-meta-youtube: http://www.youtube.com/watch?v=YrrCuCyQVNk
Clearly that list can be extended to other video services. Now, I'm sure there will soon be -- perhaps already is -- a service that takes your video and submits it to a bunch of video services, while providing a canonical URL for the video. But it strikes me that S3 is the kind of foundation service that could support that kind of arrangement not only for video, but for any kind of collection.
One detail that's missing: I wasn't able to retain my video on S3. I'd hoped to be able to leave it in place, and tell S3 to issue a server-side redirect to one or another of the video services. Ideally I'd be able to have S3 do that intelligently, according to preferences that might be expressed in the request. But as I read the docs, there's no way to get S3 to emit the HTTP Location header that would accomplish a server-side redirect. So instead I replaced the video with an HTML file containing a client-side redirect -- i.e., the old <META HTTP-EQUIV="refresh" CONTENT="0;http://..."> trick. If there is a way to to a server-side redirect, I'd like to know about. It'd be nice to be able use S3 as canonical storage as well as canonical namespace. It really would cost pennies to keep files there, safely backed up (I presume), and then transmit them to other services, only once for each service, by way of an S3 private/unguessable URL.
In the big picture, none of this matters until we establish the idea that naming, storage, content management, tagging, and community participation are separable concerns. Achieving that separation is technically feasible and very much in the interest of everyone except, of course, the builders of the walled gardens. For obvious reasons they won't want to go there. But I wonder for how long they'll be able to remain insular.
Former URL: http://weblog.infoworld.com/udell/2006/07/05.html#a1481