Ron Owens: Intervoice

I've mentioned Amtrak Julie a couple of times before. In today's podcast I interview Ron Owens, director of software application engineering and professional services for Intervoice, the company responsible for the Julie application. Topics include voiceprint authentication, SALT and VoiceXML, the relationship between speech servers and IVR (interactive voice response) systems, and design principles for IVR software. It was an interesting conversation, and I learned a lot from it.

Here's the download (39 min, 37MB), which is also enclosed in my RSS feed. In addition to learning more about IVR, this exercise showed me that I've got a lot to learn about working with audio.

Editing, for example, is quite an interesting challenge. Avoid it if you can. It's way harder to seamlessly edit digital media than to seamlessly edit text. But in this case, I'd argue that the reduction of the nearly hour-long original to the 39-minute final cut was warranted.

I'm still using Audacity for editing. I'm sure pro tools can add lots of value, and if I do a lot more of this kind of thing I'll want to start exploring them. But while my first efforts leave plenty to be desired, I'm not blaming the tool. Audio editing is a subtle process, and I suspect that -- as always -- good results depend more on the skill of the user than on the features of the tool. Now that I've got my feet wet, I have renewed respect for the level of production that goes into, for example, a typical NPR show.

I also have renewed respect for the sound quality of a typical ITConversations show. In the case of today's podcast, my download wound up being twice the size it should have been. I'd inadvertently saved the raw audio data the wrong way, then couldn't revert to an uncompressed original. Another lesson learned there.

For both podcasts and screencasts, I'm still trying to strike the right balance between effort and quality. The whole point, of course, is that this isn't NPR or NBC. I don't have the time or skills to produce things in the way those media companies do. Nor, arguably, would you expect me to. What I can do is capture conversations with interesting people -- conversations that I'm having anyway, in the course of my ongoing research into various topics -- and make those conversations available to you. In the same way that it's useful to compress the media files for optimal delivery (as I failed to do properly in this case), it's also useful to compress the content down to just the most interesting stuff. Doing that efficiently enough to make it practical on a regular basis is a real challenge, but I'm getting there.

If I can also improve the fluency of my speaking, it'll cut down on the need for post-production. Everyone knows that it's a shock to hear yourself on tape. But as I listened to this recording, I realized something else. I've known for a while that my public speaking style is full of unexpected pauses, and it now strikes me that this is related to my writing habits. As a writer I emit sentences in fits and starts, rearranging them as I go, and do hardly any editing later. My hunch is that I'm trying to do the same thing in the voice realm, but I don't have the cycles to do the syntactic processing in real time. I'm not sure what to do about this, but awareness -- and of course, once again, practice -- will probably help.

Former URL: http://weblog.infoworld.com/udell/2004/12/20.html#a1137