Efficient Correction Interfaces for Speech Recognition

Home > Publications > Efficient Correction Interfaces for Speech Recognition

Efficient Correction Interfaces for Speech Recognition

Keith Vertanen

PhD thesis, University of Cambridge, 2009.

The recognition of speech by computers is a challenging task and recognition errors are ultimately unavoidable. Error correction is thus a crucial part of any speech recognition interface. In this thesis, I look at how to improve the correction process in speech recognition.

Before errors can be corrected, they must firrst be detected. I look at improving error detection by visualizing the recognizer's confidence in each word. After detection, errors must be corrected. I examine three distinct ways of correcting errors: by speech, by touch, and by navigation. I also look at applying touch-based correction to the problem of entering web search queries by voice.

I tested several new correction interfaces in a series of user studies. I found that using my touch-based interface, Parakeet, users wrote at 13 words per minute while walking outdoors. Using my navigation-based interface, Speech Dasher, users wrote at 40 words per minute using only speech and a gaze tracker. Using a system I built for entering web search queries by voice, users entered queries in about 18 seconds while walking indoors. In these user studies, the speech recognizer's initial error rate, prior to user correction, was high. But by using a good correction interface, I found users were able to complete their tasks easily and efficiently.