Test sets and adaptation results on the Nokia N800

Home > Software > Parakeet > Speech recognition Nokia N800 > Test sets and adpatation results

Test Sets

I'm using an acoustic model adapted to my voice, so I needed to record my own test sets. The first I recorded was a newswire test set (si_dt_s2, 207 utterances). 4.4% OOV at my 20K vocab, non-verbalized punctuation, single sentences. I recorded this set two separate times, once on the N800 and once using a desktop mic.
si_dt_s2 test set
si_dt_s2 test set, with my audio

In the second test set, I wanted email-like utterances. So I took 300 sentences out of the Enron corpus. 2.3% OOV at 20K vocab, non-verbalized punctuation, single sentences.
Enron test set
Enron test set, with my audio

For the third test set, I wanted SMS-like utterances. I couldn't find any decent SMS-text corpus, so I took 262 messages from the sent items of my phone. I did some manual cleanup, anonymizing, etc. 5.5% OOV at 20K vocab, verbalized punctuation, single and multiple sentences.
SMS test set
SMS test set, with my audio

Amount of Adaptation Data

I recorded 600 total adaptation utterances and created adapted models using 25% - 100% of the data. For all three test sets, most of the gains were seen in the first 25% (150 utterances).

Memory usage at different vocab sizes