Revision as of 14:55, 12 July 2012

Recordings

Because Simon, when using an user generated model, creates a speech model specifically for each user, the trainings corpus is one of the most important parts of achieving good recognition rates.

Frequent Mistakes

This section contains a couple of frequently made mistakes when recording training utterances and possible solutions.

Loudness

If you did not use your microphone for Simon before, please double-check that its volume is set to an appropriate level.

Louder is usually better. However, your microphone should never clip. That means you better start out low and increase your level step by step until it reaches the maximum amplitute when speaking loudly (you can check the current amplitute with e.g. Audacity).

Do not, however, "boost" the volume artificially (this is often represented as increasing the volume over 100% or activating a toggle called "Mic Boost"). These options only make the signal more pronounced but do not introduce new information. This doesn't help (and can even hurt) recognition rates.

Newer versions of simon include a level-meter which is displayed while recording samples. It will tell you if your volume is set up correctly.

Pauses

Simon tries to learn the pronunciation of its users. But of course Simon does never really hear what the user is saying - it also gets all of the environment noise.

That is why Simon must also learn how what we define as "silence" sounds. This varies by your environment but also by the microphone that you are using.

Simon treats everything at the beginning and at the end of the sample as "silence". For that to work, it is best if the user leaves about one or two seconds of silence at the beginning and end of each recording.

@@ Line 1: / Line 1: @@
 = Recordings =
-Because simon generates the speech model specifically for each user, the trainings corpus is one of the most important parts in the equation.
+Because Simon, when using an user generated model, creates a speech model specifically for each user, the trainings corpus is one of the most important parts of achieving good recognition rates.
 == Frequent Mistakes ==
-This section contains a couple of frequently made mistakes when recording trainings utterances and possible solutions.
+This section contains a couple of frequently made mistakes when recording training utterances and possible solutions.
 === Loudness ===
-If you did not use your microphone for simon before, please double-check that it set to an appropriate level.
+If you did not use your microphone for Simon before, please double-check that its volume is set to an appropriate level.
-Louder is basically better. However, your microphone should never [http://en.wikipedia.org/wiki/Clipping_(audio) clip]. That means you better start out low and increase your level step by step until it reaches the maximum amplitute when speaking loudly (you can check the current amplitute with e.g. [http://audacity.sourceforge.net/ Audacity]).
+Louder is usually better. However, your microphone should never [http://en.wikipedia.org/wiki/Clipping_(audio) clip]. That means you better start out low and increase your level step by step until it reaches the maximum amplitute when speaking loudly (you can check the current amplitute with e.g. [http://audacity.sourceforge.net/ Audacity]).
-Newer versions of simon include a level-meter which is displayed while recording samples. The volume is perfectly set up if the meter stays approximately in the center while you speak.
+Do not, however, "boost" the volume artificially (this is often represented as increasing the volume over 100% or activating a toggle called "Mic Boost"). These options only make the signal more pronounced but do not introduce new information. This doesn't help (and can even hurt) recognition rates.
+Newer versions of simon include a level-meter which is displayed while recording samples. It will tell you if your volume is set up correctly.
 === Pauses ===
-simon tries to learn the pronunciation of its users. But of course simon does never really hear what the user is saying - it also gets all of the environment noise.
+Simon tries to learn the pronunciation of its users. But of course Simon does never really hear what the user is saying - it also gets all of the environment noise.
-That is why simon must also learn how what we define as &quot;silence&quot; sounds. This varies by your environment but also by the microphone that you are using.
+That is why Simon must also learn how what we define as &quot;silence&quot; sounds. This varies by your environment but also by the microphone that you are using.
-simon treats everything at the beginning and at the end of the sample as &quot;silence&quot;. For that to work, it is best if the user leaves about one or two seconds of silence at the beginning and end of each recording.
+Simon treats everything at the beginning and at the end of the sample as &quot;silence&quot;. For that to work, it is best if the user leaves about one or two seconds of silence at the beginning and end of each recording.