Simon/Tips, Tricks and Best Practices: Difference between revisions
m (moved Simon: Best Practices to Simon/Best Practices) |
No edit summary |
||
Line 1: | Line 1: | ||
= Recordings = | = Recordings = | ||
Because | Because Simon, when using an user generated model, creates a speech model specifically for each user, the trainings corpus is one of the most important parts of achieving good recognition rates. | ||
== Frequent Mistakes == | == Frequent Mistakes == | ||
This section contains a couple of frequently made mistakes when recording | This section contains a couple of frequently made mistakes when recording training utterances and possible solutions. | ||
=== Loudness === | === Loudness === | ||
If you did not use your microphone for | If you did not use your microphone for Simon before, please double-check that its volume is set to an appropriate level. | ||
Louder is | Louder is usually better. However, your microphone should never [http://en.wikipedia.org/wiki/Clipping_(audio) clip]. That means you better start out low and increase your level step by step until it reaches the maximum amplitute when speaking loudly (you can check the current amplitute with e.g. [http://audacity.sourceforge.net/ Audacity]). | ||
Newer versions of simon include a level-meter which is displayed while recording samples. | Do not, however, "boost" the volume artificially (this is often represented as increasing the volume over 100% or activating a toggle called "Mic Boost"). These options only make the signal more pronounced but do not introduce new information. This doesn't help (and can even hurt) recognition rates. | ||
Newer versions of simon include a level-meter which is displayed while recording samples. It will tell you if your volume is set up correctly. | |||
=== Pauses === | === Pauses === | ||
Simon tries to learn the pronunciation of its users. But of course Simon does never really hear what the user is saying - it also gets all of the environment noise. | |||
That is why | That is why Simon must also learn how what we define as "silence" sounds. This varies by your environment but also by the microphone that you are using. | ||
Simon treats everything at the beginning and at the end of the sample as "silence". For that to work, it is best if the user leaves about one or two seconds of silence at the beginning and end of each recording. |
Revision as of 14:55, 12 July 2012
Recordings
Because Simon, when using an user generated model, creates a speech model specifically for each user, the trainings corpus is one of the most important parts of achieving good recognition rates.
Frequent Mistakes
This section contains a couple of frequently made mistakes when recording training utterances and possible solutions.
Loudness
If you did not use your microphone for Simon before, please double-check that its volume is set to an appropriate level.
Louder is usually better. However, your microphone should never clip. That means you better start out low and increase your level step by step until it reaches the maximum amplitute when speaking loudly (you can check the current amplitute with e.g. Audacity).
Do not, however, "boost" the volume artificially (this is often represented as increasing the volume over 100% or activating a toggle called "Mic Boost"). These options only make the signal more pronounced but do not introduce new information. This doesn't help (and can even hurt) recognition rates.
Newer versions of simon include a level-meter which is displayed while recording samples. It will tell you if your volume is set up correctly.
Pauses
Simon tries to learn the pronunciation of its users. But of course Simon does never really hear what the user is saying - it also gets all of the environment noise.
That is why Simon must also learn how what we define as "silence" sounds. This varies by your environment but also by the microphone that you are using.
Simon treats everything at the beginning and at the end of the sample as "silence". For that to work, it is best if the user leaves about one or two seconds of silence at the beginning and end of each recording.