Hypermedia and personal productivity

It's true that we desperately need better integration between media players and browsers. It's also true that we need ways to smooth out the differences between video formats and delivery mechanisms (i.e., streaming versus downloading). But in order to empower regular folks to weave hypertext together with hypermedia in routine conversation (for example, on blogs), we're going to have to solve a much more basic problem. The popular media players are built for an audience of consumers, not producers. They assume that you'll watch and listen, perhaps scanning backward and forward. But if you want to republish and contextualize, it's insanely hard. Before the nasty complexity of video formats and MIME types and segmentation syntaxes can even become relevant, you first have to be able to select your segments, and the players afford no reasonable ways to do that.

What we need are the kinds of features found in video editors: jump to a frame, move forward or backward a frame at a time, and select a range of frames. Substitute the word "character" for "frame" in the previous sentence, and imagine how bizarre it would be for a text player (i.e., browser) to lack such features. Yet, that's just what using a media player is like. If we want to empower people to create finely granular and richly contextualized AV experiences, it's got to change. [Full story at O'Reilly Network]

This piece, written shortly after the first presidential debate, illustrates two ways to carve up a video into a segmented and annotated presentation. Ideally I'd have applied the same techniques to last night's third debate. But I didn't have time to slog through the procedure, and it's likely that no-one else will either.

The debates, of course, have been fully transcribed and obsessively deconstructed. So perhaps my example was poorly chosen. The real opportunity lies with common everyday AV content. Today, of course, that's an oxymoron. Hypermedia is as remote a concept to most people as hypertext was a decade ago. But consider these examples:

A college student who attends a lecture knows that a video recording will be available at a private URL. As a result, she takes fewer notes and is able to spend more time attending to the lecture and related materials (chalkboard, lab demonstration, AV clips). But the notes she does take are time-coded, and when she reviews them she can access the corresponding segments of the video.
Substitute "employee" for "college student" and "business meeting" for "lecture" in the above scenario.

Commenting on the article, Rob Lanphier -- who is the development support manager for RealNetworks -- writes:

This is a great article showing where the potential is. The tricky part about realizing that potential is that there isn't market (i.e. $$$) demand for client-side, hackable hypermedia right now. As much as I would love to see SMIL take off (I was on the working group), we've got a bit of a chicken-and-egg problem getting this bootstrapped. Someone has to do the work of coding this stuff up, and no one has figured out how to get paid doing it.

The good news is that practically all portions of of the RealPlayer technology referenced in your article are open source, including SMIL, RealText, our HTTP implementation, our client core that allows for new open source plugins. We would love to see someone build the technology you are looking for. For example, someone could conceivably turn the Gecko HTML renderer into a RealText-replacement technology. Or, for that matter, an incremental approach would be to enhance RealText to add the features you feel are missing.

The current podcasting craze will clearly help drive market demand. And I'm seeing references to my work on link-addressable audio in various parts of the iPodder community.

Advances in telephony will likewise help push things along. As I recently pointed out, we need not -- and should not -- wait for ubiquitous VoIP in order to begin making better use of our audio data. Adam Curry gives a great example in the September 28 installment of The Daily Source Code: in a two-minute segment (staring at 19:58) he explains how and why interviewees can turn the tables on their interviewers. As a frequent interviewer and occasional interviewee, I agree. Increasingly the "tape recorder" can, and should, be running on both ends of the call. As a result we'll have more transparency and better information. Adam says in that clip:

I want to have the right to put that source code [i.e., the audio data] on the Internet. No-one has rejected me so far, and I also have never put the source code of the interview on the Internet, because there was no need to. But boy, do they really drill down on their facts and do their checking.

Customer service calls are a variant of this scenario. How many times have you heard this? "Your call may be recorded in order to assure quality customer service." Lately I'm starting to repeat the line back to them and then start recording on my end too. If you can pinpoint what an agent said on a previous call, you can alter the balance of power. That can be more than just a convenience. If you're a cancer patient negotiating the insurance reimbursement maze, for example, you need every advantage you can get.

Entertainment is the tail that usually wags this dog, but I've got a hunch that personal productivity -- or maybe I should say empowerment -- will also emerge as a driving force. Meanwhile it's great to know that the raw materials needed to create the necessary tools are available in the open source commons.

Former URL: http://weblog.infoworld.com/udell/2004/10/14.html#a1095