Mobile speech recognition

On Monday I visited Nuance for an update on the company's speech recognition products and initiatives. Two years ago, my screencast on Dragon NaturallySpeaking 8 demonstrated what was then the state of the art in automatic dictation. Dragon has for years been asymptotically approaching the point at which dictation becomes routine and general-purpose. For most of us, it hasn't yet reached that point. I didn't upgrade to the latest version 9 because, despite improvements, I didn't think it would yet cross my threshold for routine use. Nuance's demo of Dragon 9 confirmed that hunch.

Peter Mahoney, Nuance's marketing VP, showed me how he uses Dragon 9 for dictation. When he read a prepared statement, the results were perfect. Then I handed him a copy of Newsweek and asked him to read from a random article. The results were still very good. True, the Arabic names in the story had to be spelled out. But that wouldn't be the case if those names were common in your domain of discourse. And training Dragon to absorb specialized vocabulary is both easy and effective.

The real problem, at least for me, lies elsewhere. And the test I gave Peter yielded a stunning example of it. At one point he read:

...it's rarely so simple...

Dragon wrote:

...it's really so simple...

Because Dragon works so hard to produce plausible results, this class of error resists casual proofreading. In this case, you would have to read very carefully to notice that Dragon had reversed the intended meaning of the sentence. For me, anyway, the cost of finding and fixing these kinds of subtle errors outweighs the benefit of routine dictation, at least when a keyboard is available.

Keyboards aren't always available, though, and that fact made the second part of the demo a real eye-opener. Check out this 55-second video of Peter dictating to his Treo:

In case you can't play this video, it shows two examples of speech recognition. First Peter dictates a brief memo, and uses his voice to change "LaGuardia" to "Logan". Then he speaks the query "Eastern equine encephalitis" to Google and reviews the results. Very cool!

How do you shoehorn Dragon onto a mobile gadget? You don't. There's only a small client that relays recorded audio to a server and receives recognized text. This kind of mobile dictation should be available as a carrier-provided service, for the popular handheld operating systems, sometime next year. I'll be curious to see who uses it, and how.

In our follow-on discussion we talked about how Nuance's software is being used in the automotive realm. Cars themselves offer a growing range of voice-controllable functions: temperature, navigation. Passengers' Bluetooth-equipped gadgets paired to cars' audio I/O systems are another emerging domain for voice control.

What about those us who drive older cars and use older cellphones? I think there's still all kinds of untapped opportunity. For example, while driving I'd love to be able to speak questions like these and hear the answers:

How many new emails from Jill in the last 4 hours?

What are the subject headers?

Can you read the message entitled "New panelist for your session"?

Given the kind of client/server architecture that Nuance has developed, even my lowly LG VX4400 should be able to handle a protocol like this. The magic would all be in cloud, where the speech recognizer and my mail server would consummate a service-oriented marriage.

Former URL: http://weblog.infoworld.com/udell/2006/11/01.html#a1556