Improving the audio circulatory system

At a new service called Talkr you can subscribe to audio versions of text blogs. The system runs the text through a text-to-speech (TTS) converter to produce downloadable audio. If you check out the samples you'll find that the quality of Talkr's TTS translation is quite good. It's still TTS, though, and that means I'm unlikely to want to listen to long passages rendered that way. However, I am trying out TTS in a more restricted way in an experimental soundbites podcast.

I mentioned yesterday that I'm bookmarking and tagging audio fragments at The soundbites podcast is a transformation of that tag's RSS feed. The first incarnation just wrapped the bookmarked URLS -- which refer through my clipping service to segments of MP3 files -- in RSS 2.0 enclosures. But there was no context for the soundbites. They needed introductions, and I realized that the metadata -- if converted to audio -- could automatically supply them.

TTS was the most straightforward piece of this exercise. I've actually got two versions, one based on the wonderful pyTTS, which marries the Microsoft speech engine to Win32 Python, and another based on AT&T's online TTS demo. AT&T Audrey sounds way better than Microsoft Mary, not surprisingly, but it may not be kosher to use the AT&T service this way, we'll see how it goes.

The harder piece was integrating with podcatchers. The TTS snippets are conventional MP3 URLs, so they're no problem. But the clip URLs don't look like MP3 files, the're just references to MP3 data. Neither iPodder nor the new iTunes 4.9 knew how to convert them to local MP3 filenames.

My sordid solution for iTunes (though not for iPodder) was to append a bogus &ext=.mp3 parameter to the URLs. (Will we ever unify Web content types and local file types?) However iTunes (unlike iPodder) doesn't automatically create playlists, so that's an extra manual step for now.

The devil's always in the details, and there are lots of them to keep track of here. For example, the playlist gives you intros interspersed with clips in the appropriate order, but if your player is shuffling tunes it won't preserve that order. So the intros should probably be bound together with the clips.

Then there's the question of metadata. It's all over the place! The MP3s from which the clips are excerpted carry their own metadata. The origin sites (e.g., wrap more metadata around them. The bookmarking-and-tagging layer, in this case, adds still more. And finally there's listener-generated metadata in desktop players and on devices.

Despite this apparent chaos, I'm optimistic. If resources and metadata can flow freely among these domains, collaborative filtering will do its job. We just need to make sure the circulatory system keeps pumping.

Former URL: