I've started using .NET speech-to-text library (SpeechRecognizer)
While googling and searching this site i found this code sample:
var c = new Choices();
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.UnloadAllGrammars();
rec.LoadGrammar(g);
rec.Enabled = true;
Which helped me to start. I changed these 2 lines
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
to my need
c.Add("Open");
c.Add("Close");
But, when I say 'Close', the speech recognizer of windows closes my application!
In addition, Is there a better way to recognize speech than to create my own dictionary? I would like the user to say something like: "Write a note to myself" and then the user will speak and I'll write.
Sorry for asking 2 questions at the same question, both seem to be relevant to my one problem.
You are using the shared speech recognizer (SpeechRecognizer). When you instantiate
SpeechRecognizer you get a recognizer that can be shared by other applications and is typically used for building applications to control windows and applications running on the desktop.
It sounds like you want to use your own private recognition engine (SpeechRecognitionEngine). So instantiate a SpeechRecognitionEngine instead.
see SpeechRecognizer Class.
Disable built-in speech recognition commands? may also have some helpful info.
Microsoft's desktop recognizers include a special grammar called a dictation grammar that can be used to transcribe arbitrary words spoken by the user. You can use the dictation grammar to do transcription style recognition. See DictationGrammar Class and SAPI and Windows 7 Problem
I have a better answer....
Try adding a dictation Grammar to your recognizer... it seems to disable all built-in commands like select/delete/close etc..
You then need to use the speech recognized event and SendKeys to add text to the page. My findings so far indicate that you can't have your SAPI cake and eat it.
I think the solution above should work for you if you've not already solved it (or moved on).
Related
I am wondering if there is any way to record whatever I say in full dictation. What I mean by that is for example if I have a program, recording whatever is being picked up by the microphone and writing that information to a .txt file. How can I make so the program doesn't just "guess" the sentences? Because I find that using dictations like this.
DictationGrammar spellingDictationGrammar = new DictationGrammar("grammar:dictation#spelling");
spellingDictationGrammar.Name = "spelling dictation";
spellingDictationGrammar.Enabled = true;
recEngine.LoadGrammar(spellingDictationGrammar);
//question dictation grammar
DictationGrammar customDictationGrammar = new DictationGrammar("grammar:dictation");
customDictationGrammar.Name = "question dictation";
customDictationGrammar.Enabled = true;
recEngine.LoadGrammar(customDictationGrammar);
Just guesses whatever is being said. It is not even near accurate. Is there anyway to add a dictation or go around so it gets somewhat close to accurate?
I am using c# and the windows speech recognition in order to communicate with my program. The only word to be recognized is "Yes", this works fine in my program the only problem is that since the speech recognition is activated it will type in what ever I am saying is there a way to limit the speech recognition program to only recognize one word, in this case the word "yes"?
Thank you
What do you mean "since the speech recognition is activated it will type in what ever I am saying"? Are you saying that the desktop recognizer continues to run and handle commands? Perhaps you should be using an inproc recognizer rather than the shared recognizer (see Using System.Speech.Recognition opens Windows Speech Recognition)
Are you using a dictation grammar? If you only want to recognize a limited set of words or commands, do not use the dictation grammar. Use a GrammarBuilder (or similar) and create a simple grammar. See http://msdn.microsoft.com/en-us/library/hh361596
There is a very good article that was published a few years ago at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. It is probably the best introductory article I’ve found so far. It is a little out of date, but very helfpul. (The AppendResultKeyValue method was dropped after the beta.) Look at the examples of how they build the grammars for ordering Pizza.
One thing to keep in mind, a grammar with one word may show many false positives (since the recognizer will try to match to something in your grammar). You may want to put in at lest Yes and No so it can have something to compare to.
If your code is similar to the following:
SpeechRecognitionEngine recognitionEngine = new SpeechRecognitionEngine();
recognitionEngine.SetInputToDefaultAudioDevice();
recognitionEngine.SpeechRecognized += (s, args) =>
{
foreach (RecognizedWordUnit word in args.Result.Words)
{
Console.WriteLine(word.Text);
}
};
recognitionEngine.LoadGrammar(new DictationGrammar());
Just use an if statement:
foreach (RecognizedWordUnit word in args.Result.Words)
{
if (word.Text == "yes")
Console.WriteLine(word.Text);
}
Note that the recognitionEngine.SpeechRecognized is an event handler that happens whenever it recognizes a word and can be used in other ways such as:
{
recognitionEngine.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
}
//this method is static because I called it from a console main method. It can be changed.
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine(e.Result.Text);
}
My examples are in a Console but it works the same for GUI.
I am writing a little application that is supposed to listen for user commands and send keystrokes to another program. I am using Speech Recognition Engine Class but my script doesn't work properly.
If I use a custom grammar (with very few words like "start" or "exit") the program will always recognize one of the my words even though I said something completely different.
For istance I say "stackoverflow" and the program recognizes "start".
With a default dictionary the program becomes almost impossible to use (I have to be 100% correct otherwise it won't understand).
The strange thing is that if I use Speech Recognizer instead of Speech Recognition Engine my program works perfect but ofcourse everytime I say something unrelated it messes up because Windows Speech Recognition handles the result and I don't want that to happen. That is the reason why I am using Speech Recognition Engine actually.
What am I doing wrong?
Choices c = new Choices(new string[] { "use", "menu", "map", "save", "talk", "esc" });
GrammarBuilder gb = new GrammarBuilder(c);
Grammar g = new Grammar(gb);
sr = new SpeechRecognitionEngine();
sr.LoadGrammar(g);
sr.SetInputToDefaultAudioDevice();
sr.SpeechRecognized += sr_SpeechRecognized;
Almost forgot, I don't know if that's relevant but I am using Visual Studio 11 Ultimate Beta.
For each speech recognition result detected you also receive the confidence for the recognition - a low confidence level would indicate that the engine is "not so sure" about the result and you might want to reject it, e.g.:
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Confidence >= 0.7)
{
//high enough confidence, use result
}
else
{
//reject result
}
}
I'm trying to use the SpeechRecognizer with a custom Grammar to handle the following pattern:
"Can you open {item}?" where {item} uses DictationGrammar.
I'm using the speech engine built into Vista and .NET 4.0.
I would like to be able to get the confidences for the SemanticValues returned. See example below.
If I simply use "recognizer.AddGrammar( new DictationGrammar() )", I can browse through e.Results.Alternates and view the confidence values of each alternate. That works if DictationGrammar is at the top level.
Made up example:
Can you open Firefox? .95
Can you open Fairfax? .93
Can you open file fax? .72
Can you pen Firefox? .85
Can you pin Fairfax? .63
But if I build a grammar that looks for "Can you open {semanticValue Key='item' GrammarBuilder=new DictationGrammar()}?", then I get this:
Can you open Firefox? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you open Fairfax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you open file fax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you pen Firefox? .85 - Semantics = null
Can you pin Fairfax? .63 - Semantics = null
The .91 shows me that how confident it is that it matched the pattern of "Can you open {item}?" but doesn't distinguish any further.
However, if I then look at e.Result.Alternates.Semantics.Where( s => s.Key == "item" ), and view their Confidence, I get this:
Firefox 1.0
Fairfax 1.0
file fax 1.0
Which doesn't help me much.
What I really want is something like this when I view the Confidence of the matching SemanticValues:
Firefox .95
Fairfax .93
file fax .85
It seems like it should work that way...
Am I doing something wrong? Is there even a way to do that within the Speech framework?
I'm hoping there's some inbuilt mechanism so that I can do it the "right" way.
As for another approach that will probably work...
Use the SemanticValue approach to match on the pattern
For anything that matches on that pattern, extract the raw Audio for {item} (use RecognitionResult.Words and RecognitionResult.GetAudioForWordRange)
Run the raw audio for {item} through a SpeechRecognizer with the DictationGrammar to get the Confidence
... but that's more processing than I really want to do.
I think a dictation grammar only does transcription. It does speech to text without extracting semantic meaning because by definition a dictation grammar supports all words and doesn't have any clues to your specific semantic mapping. You need to use a custom grammar to extract semantic meaning. If you supply an SRGS grammar or build one in code or with SpeechServer tools, you can specify Semantic mappings for certain words and phrases. Then the recognizer can extract semantic meaning and give you a semantic confidence.
You should be able to get Confidence value from the recognizer on the recognition, try System.Speech.Recognition.RecognitionResult.Confidence.
The help file that comes with the Microsoft Server Speech Platform 10.2 SDK has more details. (this is the Microsoft.Speech API for Server applications which is very similar to the System.Speech API for client applications) See (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4.) or the Microsoft.Speech documentation at http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.semanticvalue(v=office.13).aspx
For SemanticValue Class it says:
All Speech platform-based recognition
engines output provide valid instances
of SemanticValue for all recognized
output, even phrases with no explicit
semantic structure.
The SemanticValue instance for a
phrase is obtained using the Semantics
property on the RecognizedPhrase
object (or objects which inherit from
it, such as RecognitionResult).
SemanticValue objects obtained for
recognized phrases without semantic
structure are characterized by:
Having no children (Count is 0)
The Value property is null.
An artificial confidence level of 1.0
(returned by Confidence)
Typically, applications create
instance of SemanticValue indirectly,
adding them to Grammar objects by
using SemanticResultValue, and
SemanticResultKey instances in
conjunction with, Choices and
GrammarBuilder objects.
Direct construction of an
SemanticValue is useful during the
creation of strongly typed grammars
When you use the SemanticValue features in the grammar you are typically trying to map different phrases to a single meaning. In your case the phrase "I.E" or "Internet Explorer" should both map to the same semantic meaning. You set up choices in your grammar to understand each phrase that can map to a specific meaning. Here is a simple Winform example:
private void btnTest_Click(object sender, EventArgs e)
{
SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();
Grammar testGrammar = CreateTestGrammar();
myRecognizer.LoadGrammar(testGrammar);
// use microphone
try
{
myRecognizer.SetInputToDefaultAudioDevice();
WriteTextOuput("");
RecognitionResult result = myRecognizer.Recognize();
string item = null;
float confidence = 0.0F;
if (result.Semantics.ContainsKey("item"))
{
item = result.Semantics["item"].Value.ToString();
confidence = result.Semantics["item"].Confidence;
WriteTextOuput(String.Format("Item is '{0}' with confidence {1}.", item, confidence));
}
}
catch (InvalidOperationException exception)
{
WriteTextOuput(String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message));
myRecognizer.UnloadAllGrammars();
}
}
private Grammar CreateTestGrammar()
{
// item
Choices item = new Choices();
SemanticResultValue itemSRV;
itemSRV = new SemanticResultValue("I E", "explorer");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("explorer", "explorer");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("firefox", "firefox");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("mozilla", "firefox");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("chrome", "chrome");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("google chrome", "chrome");
item.Add(itemSRV);
SemanticResultKey itemSemKey = new SemanticResultKey("item", item);
//build the permutations of choices...
GrammarBuilder gb = new GrammarBuilder();
gb.Append(itemSemKey);
//now build the complete pattern...
GrammarBuilder itemRequest = new GrammarBuilder();
//pre-amble "[I'd like] a"
itemRequest.Append(new Choices("Can you open", "Open", "Please open"));
itemRequest.Append(gb, 0, 1);
Grammar TestGrammar = new Grammar(itemRequest);
return TestGrammar;
}
I want to write a simple Windows app in Visual C#/C++ that lets users input different segments of text, and then press a set of hotkeys to hear the various text segments in TTS at any time. The program should accept hotkeys while running in background or even when fullscreen applications have focus.
Example use case: user enters "hello world" and saves it as the first text segment, and then enters "stack overflow" and saves it as the second text segment. The user can switch to another program, then press hotkey CTRL-1 to hear the TTS say "hello world" or CTRL-2 to hear the TTS say "stack overflow." The program should of course be able to run entirely offline (just in case that affects any suggestions)
As a sidenote, I'm fairly new to programming in Visual whatever, but have a decent enough background in C#/C+, so even though I'm mainly looking for help on the TTS part, I'm open to suggestions of any kind if someone's done this kind of thing before.
if you want to talk something on C# use Introp.SpeechLib.dll
E.g:
private void ReadText()
{
int iCounter = 0;
while (Convert.ToInt32(numericUpDown1.Value) > iCounter)
{
SpVoice spVoice = new SpVoice();
spVoice.Speak("Hello World", SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
spVoice.WaitUntilDone(Timeout.Infinite);
iCounter = iCounter + 1;
}
}
read this: Speech Technologies
Reference System.Speech.dll. You can instantiate a System.Speech.Synthesis.Synthesizer and call .Speak("TEXT HERE");
You have to use the Microsoft Speech SDK.
Have a look at this link for details:
http://dhavalshah.wordpress.com/2008/09/16/text-to-speech-in-c/