Full dictation mode using voice recognition - c#

I am wondering if there is any way to record whatever I say in full dictation. What I mean by that is for example if I have a program, recording whatever is being picked up by the microphone and writing that information to a .txt file. How can I make so the program doesn't just "guess" the sentences? Because I find that using dictations like this.
DictationGrammar spellingDictationGrammar = new DictationGrammar("grammar:dictation#spelling");
spellingDictationGrammar.Name = "spelling dictation";
spellingDictationGrammar.Enabled = true;
recEngine.LoadGrammar(spellingDictationGrammar);
//question dictation grammar
DictationGrammar customDictationGrammar = new DictationGrammar("grammar:dictation");
customDictationGrammar.Name = "question dictation";
customDictationGrammar.Enabled = true;
recEngine.LoadGrammar(customDictationGrammar);
Just guesses whatever is being said. It is not even near accurate. Is there anyway to add a dictation or go around so it gets somewhat close to accurate?

Related

Voice to text with c# - Detecting a sentence

I have been messing around with C# voice to text which has been pretty easy. However I am trying to figure out how I can detect a candid sentence versus looking from the preset commands I've made.
Currently I can do various things by listening for keywords:
SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();
public Form1()
{
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.LoadGrammar(new Grammar(new GrammarBuilder(new Choices(File.ReadAllLines(#"Commands.txt")))));
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(_recognizer_SpeechRecognized);
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
//magic happens here
}
This works great as I said before. I've tried to use some of the other functions associated with speech recognition such as SpeechHypothesized but it will only guess words based on the grammar loaded into the program which are preset commands. This makes sense. However if I load a full library into the grammar then my commands will be much less accurate.
I am trying to setup the program so when a keyword is said (one of the commands) it will listen to try to transcribe an actual sentence.
I was looking for "Speech Dictation." Instead of loading my command list into the grammar I was able to use DictationGrammar() that was built into c# in order to listen for a complete sentence.
SpeechRecognitionEngine recognitionEngine recognitionEngine = new SpeechRecognitionEngine();
recognitionEngine.SetInputToDefaultAudioDevice();
recognitionEngine.LoadGrammar(new DictationGrammar());
RecognitionResult result = recognitionEngine.Recognize(new TimeSpan(0, 0, 20));
foreach (RecognizedWordUnit word in result.Words)
{
Console.Write(“{0} “, word.Text);
}
Although this method doesn't seem very accurate even if I use the word.Confidence it seems only to guess the correct word less than half the time.
May try to use the Google Voice API to send flac file for post processing.

Speech Recognition Engine is not working as it should be

I am writing a little application that is supposed to listen for user commands and send keystrokes to another program. I am using Speech Recognition Engine Class but my script doesn't work properly.
If I use a custom grammar (with very few words like "start" or "exit") the program will always recognize one of the my words even though I said something completely different.
For istance I say "stackoverflow" and the program recognizes "start".
With a default dictionary the program becomes almost impossible to use (I have to be 100% correct otherwise it won't understand).
The strange thing is that if I use Speech Recognizer instead of Speech Recognition Engine my program works perfect but ofcourse everytime I say something unrelated it messes up because Windows Speech Recognition handles the result and I don't want that to happen. That is the reason why I am using Speech Recognition Engine actually.
What am I doing wrong?
Choices c = new Choices(new string[] { "use", "menu", "map", "save", "talk", "esc" });
GrammarBuilder gb = new GrammarBuilder(c);
Grammar g = new Grammar(gb);
sr = new SpeechRecognitionEngine();
sr.LoadGrammar(g);
sr.SetInputToDefaultAudioDevice();
sr.SpeechRecognized += sr_SpeechRecognized;
Almost forgot, I don't know if that's relevant but I am using Visual Studio 11 Ultimate Beta.
For each speech recognition result detected you also receive the confidence for the recognition - a low confidence level would indicate that the engine is "not so sure" about the result and you might want to reject it, e.g.:
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Confidence >= 0.7)
{
//high enough confidence, use result
}
else
{
//reject result
}
}

speech-to-text disable windows auto handler and write what i say

I've started using .NET speech-to-text library (SpeechRecognizer)
While googling and searching this site i found this code sample:
var c = new Choices();
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.UnloadAllGrammars();
rec.LoadGrammar(g);
rec.Enabled = true;
Which helped me to start. I changed these 2 lines
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
to my need
c.Add("Open");
c.Add("Close");
But, when I say 'Close', the speech recognizer of windows closes my application!
In addition, Is there a better way to recognize speech than to create my own dictionary? I would like the user to say something like: "Write a note to myself" and then the user will speak and I'll write.
Sorry for asking 2 questions at the same question, both seem to be relevant to my one problem.
You are using the shared speech recognizer (SpeechRecognizer). When you instantiate
SpeechRecognizer you get a recognizer that can be shared by other applications and is typically used for building applications to control windows and applications running on the desktop.
It sounds like you want to use your own private recognition engine (SpeechRecognitionEngine). So instantiate a SpeechRecognitionEngine instead.
see SpeechRecognizer Class.
Disable built-in speech recognition commands? may also have some helpful info.
Microsoft's desktop recognizers include a special grammar called a dictation grammar that can be used to transcribe arbitrary words spoken by the user. You can use the dictation grammar to do transcription style recognition. See DictationGrammar Class and SAPI and Windows 7 Problem
I have a better answer....
Try adding a dictation Grammar to your recognizer... it seems to disable all built-in commands like select/delete/close etc..
You then need to use the speech recognized event and SendKeys to add text to the page. My findings so far indicate that you can't have your SAPI cake and eat it.
I think the solution above should work for you if you've not already solved it (or moved on).

Why is the Microsoft Speech Recognition SemanticValue.Confidence value always 1?

I'm trying to use the SpeechRecognizer with a custom Grammar to handle the following pattern:
"Can you open {item}?" where {item} uses DictationGrammar.
I'm using the speech engine built into Vista and .NET 4.0.
I would like to be able to get the confidences for the SemanticValues returned. See example below.
If I simply use "recognizer.AddGrammar( new DictationGrammar() )", I can browse through e.Results.Alternates and view the confidence values of each alternate. That works if DictationGrammar is at the top level.
Made up example:
Can you open Firefox? .95
Can you open Fairfax? .93
Can you open file fax? .72
Can you pen Firefox? .85
Can you pin Fairfax? .63
But if I build a grammar that looks for "Can you open {semanticValue Key='item' GrammarBuilder=new DictationGrammar()}?", then I get this:
Can you open Firefox? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you open Fairfax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you open file fax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you pen Firefox? .85 - Semantics = null
Can you pin Fairfax? .63 - Semantics = null
The .91 shows me that how confident it is that it matched the pattern of "Can you open {item}?" but doesn't distinguish any further.
However, if I then look at e.Result.Alternates.Semantics.Where( s => s.Key == "item" ), and view their Confidence, I get this:
Firefox 1.0
Fairfax 1.0
file fax 1.0
Which doesn't help me much.
What I really want is something like this when I view the Confidence of the matching SemanticValues:
Firefox .95
Fairfax .93
file fax .85
It seems like it should work that way...
Am I doing something wrong? Is there even a way to do that within the Speech framework?
I'm hoping there's some inbuilt mechanism so that I can do it the "right" way.
As for another approach that will probably work...
Use the SemanticValue approach to match on the pattern
For anything that matches on that pattern, extract the raw Audio for {item} (use RecognitionResult.Words and RecognitionResult.GetAudioForWordRange)
Run the raw audio for {item} through a SpeechRecognizer with the DictationGrammar to get the Confidence
... but that's more processing than I really want to do.
I think a dictation grammar only does transcription. It does speech to text without extracting semantic meaning because by definition a dictation grammar supports all words and doesn't have any clues to your specific semantic mapping. You need to use a custom grammar to extract semantic meaning. If you supply an SRGS grammar or build one in code or with SpeechServer tools, you can specify Semantic mappings for certain words and phrases. Then the recognizer can extract semantic meaning and give you a semantic confidence.
You should be able to get Confidence value from the recognizer on the recognition, try System.Speech.Recognition.RecognitionResult.Confidence.
The help file that comes with the Microsoft Server Speech Platform 10.2 SDK has more details. (this is the Microsoft.Speech API for Server applications which is very similar to the System.Speech API for client applications) See (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4.) or the Microsoft.Speech documentation at http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.semanticvalue(v=office.13).aspx
For SemanticValue Class it says:
All Speech platform-based recognition
engines output provide valid instances
of SemanticValue for all recognized
output, even phrases with no explicit
semantic structure.
The SemanticValue instance for a
phrase is obtained using the Semantics
property on the RecognizedPhrase
object (or objects which inherit from
it, such as RecognitionResult).
SemanticValue objects obtained for
recognized phrases without semantic
structure are characterized by:
Having no children (Count is 0)
The Value property is null.
An artificial confidence level of 1.0
(returned by Confidence)
Typically, applications create
instance of SemanticValue indirectly,
adding them to Grammar objects by
using SemanticResultValue, and
SemanticResultKey instances in
conjunction with, Choices and
GrammarBuilder objects.
Direct construction of an
SemanticValue is useful during the
creation of strongly typed grammars
When you use the SemanticValue features in the grammar you are typically trying to map different phrases to a single meaning. In your case the phrase "I.E" or "Internet Explorer" should both map to the same semantic meaning. You set up choices in your grammar to understand each phrase that can map to a specific meaning. Here is a simple Winform example:
private void btnTest_Click(object sender, EventArgs e)
{
SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();
Grammar testGrammar = CreateTestGrammar();
myRecognizer.LoadGrammar(testGrammar);
// use microphone
try
{
myRecognizer.SetInputToDefaultAudioDevice();
WriteTextOuput("");
RecognitionResult result = myRecognizer.Recognize();
string item = null;
float confidence = 0.0F;
if (result.Semantics.ContainsKey("item"))
{
item = result.Semantics["item"].Value.ToString();
confidence = result.Semantics["item"].Confidence;
WriteTextOuput(String.Format("Item is '{0}' with confidence {1}.", item, confidence));
}
}
catch (InvalidOperationException exception)
{
WriteTextOuput(String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message));
myRecognizer.UnloadAllGrammars();
}
}
private Grammar CreateTestGrammar()
{
// item
Choices item = new Choices();
SemanticResultValue itemSRV;
itemSRV = new SemanticResultValue("I E", "explorer");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("explorer", "explorer");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("firefox", "firefox");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("mozilla", "firefox");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("chrome", "chrome");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("google chrome", "chrome");
item.Add(itemSRV);
SemanticResultKey itemSemKey = new SemanticResultKey("item", item);
//build the permutations of choices...
GrammarBuilder gb = new GrammarBuilder();
gb.Append(itemSemKey);
//now build the complete pattern...
GrammarBuilder itemRequest = new GrammarBuilder();
//pre-amble "[I'd like] a"
itemRequest.Append(new Choices("Can you open", "Open", "Please open"));
itemRequest.Append(gb, 0, 1);
Grammar TestGrammar = new Grammar(itemRequest);
return TestGrammar;
}

How to implement text-to-speech (TTS) in Visual C#/C++?

I want to write a simple Windows app in Visual C#/C++ that lets users input different segments of text, and then press a set of hotkeys to hear the various text segments in TTS at any time. The program should accept hotkeys while running in background or even when fullscreen applications have focus.
Example use case: user enters "hello world" and saves it as the first text segment, and then enters "stack overflow" and saves it as the second text segment. The user can switch to another program, then press hotkey CTRL-1 to hear the TTS say "hello world" or CTRL-2 to hear the TTS say "stack overflow." The program should of course be able to run entirely offline (just in case that affects any suggestions)
As a sidenote, I'm fairly new to programming in Visual whatever, but have a decent enough background in C#/C+, so even though I'm mainly looking for help on the TTS part, I'm open to suggestions of any kind if someone's done this kind of thing before.
if you want to talk something on C# use Introp.SpeechLib.dll
E.g:
private void ReadText()
{
int iCounter = 0;
while (Convert.ToInt32(numericUpDown1.Value) > iCounter)
{
SpVoice spVoice = new SpVoice();
spVoice.Speak("Hello World", SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
spVoice.WaitUntilDone(Timeout.Infinite);
iCounter = iCounter + 1;
}
}
read this: Speech Technologies
Reference System.Speech.dll. You can instantiate a System.Speech.Synthesis.Synthesizer and call .Speak("TEXT HERE");
You have to use the Microsoft Speech SDK.
Have a look at this link for details:
http://dhavalshah.wordpress.com/2008/09/16/text-to-speech-in-c/

Categories