Speech recognition circa 2004

If you've never tried dictation, you can get a sense of how it works by watching a video screencast I made shortly after I installed Version 8 of NaturallySpeaking. The out-of-the-box experience was dramatically better than before. It got even better when I fed the program all the articles and blog entries I've written during the past few years.


What I find most interesting about this process is the way in which I train the computer to be an intelligent assistant. Because recognition accuracy is such a difficult problem, dictation software has to pay very close attention to me. It has to learn everything it can about my speech patterns, vocabulary, and writing style. And it must leverage all this information to the maximum degree possible.

Perhaps because we imagine that other application domains are not as challenging, other programs pay strikingly little attention to what we do. Sure, the browser will remember the last thing that you typed into a field on a form, and your e-mail program will help you keep track of whom you've replied to. But by and large, our so-called productivity software does not monitor what we do, is not meaningfully trainable, and does not grow more valuable over time as our relationship with it deepens. We are creatures of habit, but we are ill-served by software that does not notice or respond to those habits. When I organize my e-mail or conduct research on the Web, I exhibit predictable patterns of behavior. We have long expected but rarely experienced personal productivity software that absorbs those patterns, automates repetitive chores, and can be taught to improve its performance. [Full story at InfoWorld.com]

People reacted to the screencast in quite different ways. It knocked Chris Sells' socks off, and Jeremy Zawodny found it oddly compelling, but Darren Barefoot was underwhelmed. That's understandable. Software whose performance is so intimately related to human performance can't easily be assimilated. The acceptance threshold will vary wildly from one person to the next, and crossing it takes you deeper into cyborg territory.

Richard Sprague, who works on speech technologies at Microsoft, has a chart that illustrates how speech recognition is on the glide path toward becoming "uncannily useful." I'm inclined to agree. It'll be fascinating to compare my 2004 dictation screencast with future editions!

Former URL: http://weblog.infoworld.com/udell/2004/11/18.html#a1117