Simon/Tips, Tricks and Best Practices

From KDE Wiki Sandbox
Revision as of 15:07, 12 July 2012 by Bedahr (talk | contribs)

Recordings

Because Simon, when using an user generated model, creates a speech model specifically for each user, the trainings corpus is one of the most important parts of achieving good recognition rates.

This section contains a couple of frequently made mistakes when recording training utterances and possible solutions.

Loudness

If you did not use your microphone for Simon before, please double-check that its volume is set to an appropriate level.

Louder is usually better. However, your microphone should never clip. That means you better start out low and increase your level step by step until it reaches the maximum amplitute when speaking loudly (you can check the current amplitute with e.g. Audacity).

Do not, however, "boost" the volume artificially (this is often represented as increasing the volume over 100% or activating a toggle called "Mic Boost"). These options only make the signal more pronounced but do not introduce new information. This doesn't help (and can even hurt) recognition rates.

Newer versions of simon include a level-meter which is displayed while recording samples. It will tell you if your volume is set up correctly.

Pauses

Simon tries to learn the pronunciation of its users. But of course Simon does never really hear what the user is saying - it also gets all of the environment noise.

That is why Simon must also learn how what we define as "silence" sounds. This varies by your environment but also by the microphone that you are using.

Simon treats everything at the beginning and at the end of the sample as "silence". For that to work, it is best if the user leaves about one or two seconds of silence at the beginning and end of each recording.