I'd like to write the speech a user says to text. Can I do this with the Microsoft Speech Platform? Perhaps I'm just misunderstanding how it's supposed to work and what its intended use case is.
I've got this console application now:
static void Main(string[] args)
{
Choices words = new Choices();
words.Add(new string[] { "test", "hello" ,"blah"});
GrammarBuilder gb = new GrammarBuilder();
gb.Append(words);
Grammar g = new Grammar(gb);
SpeechRecognitionEngine sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
sre.LoadGrammar(g);
sre.SetInputToDefaultAudioDevice();
//add listeners
sre.Recognize();
Console.ReadLine();
}
And it only seems to output the words that I specify in Choices.
Would I have to add an entire dictionary of words if I wanted to match (most) of what a user will say?
Furthermore it stops right after it matches a single word. What if I wanted to capture entire sentences?
I'm looking for solutions for A) Capturing a wide array of words, and B) capturing more than one word at once.
Edit:
I found this: http://www.codeproject.com/Articles/483347/Speech-recognition-speech-to-text-text-to-speech-a#torecognizeallspeech
As seen in this page, the DictationGrammar class has a basic library of common words.
To capture more than one word at once I did
sre.RecognizeAsync(RecognizeMode.Multiple);
So my code is now this:
public static SpeechRecognitionEngine sre;
static void Main(string[] args)
{
sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
sre.LoadGrammar(new Grammar(new GrammarBuilder("exit")));
sre.LoadGrammar(new DictationGrammar());
sre.SetInputToDefaultAudioDevice();
sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);
Console.ReadLine();
}
private static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == "exit")
{
sre.RecognizeAsyncStop();
}
Console.WriteLine("You said: " + e.Result.Text);
}
Related
I'm using c# and the System.Speech.Recognition to load a couple of simple grammars I defined. When I say phrases matching the grammars, the engine recognizes the grammar correctly with confidences around 0.95.
But when I pronounce words that are not even in the grammar (even from difference languages or gibberish), the engines randomly returns a match to a grammar with random text never pronounced and still with high confidence like 0.92.
Is there something I need to set in the SpeechRecognitionEngine object or in each Grammar object to avoid this problem?
I think I found a solution that works for me but still it would be nice to find a more elegant one if exists:
I define a dictation grammar and a "placeholder". Then I load my grammars and disabled them immediately.
using System.Speech.Recognition;
...
private DictationGrammar dictationGrammar;
private Grammar placeholderGrammar;
private List<Grammar> commands;
public void Initialize()
{
dictationGrammar = new DictationGrammar();
recognizer.LoadGrammarAsync(dictationGrammar);
var builder = new GrammarBuilder();
builder.Append("MYPLACEHOLDER");
placeholderGrammar = new Grammar(builder);
recognizer.LoadGrammarAsync(placeholderGrammar);
commands = new List<Grammar>();
foreach (var grammar in grammarManager.GetGrammars())
{
commands.Add(grammar);
grammar.Enabled = false;
recognizer.LoadGrammarAsync(grammar);
}
}
Then on the speechRecognized event I put the logic that if placeholder is recognized then enable the commands. If a command is recognized the re-enable the dictation and disable all commands:
private async void speechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Grammar == placeholderGrammar)
{
//go to command mode
placeholderGrammar.Enabled = false;
dictationGrammar.Enabled = false;
foreach (var item in commands)
item.Enabled = true;
}
else if (commands.Any(x => e.Result.Grammar == x))
{
Do_something_with_recognized_command("!!");
//go back in normal mode
placeholderGrammar.Enabled = true;
dictationGrammar.Enabled = true;
}else {//this is dictation.. nothing to do}
}
I tried to follow this tutorial in building voice recognition C# app, the only difference is I wanted to have a Console app, not a Win Form app, so I wrote this this code:
using System;
using System.Speech.Recognition;
//using System.Speech.Synthesis;
namespace Voice_Recognation
{
class Program
{
static void Main(string[] args)
{
SpeechRecognitionEngine recEngine = new SpeechRecognitionEngine();
recEngine.SetInputToDefaultAudioDevice();
Choices commands = new Choices();
commands.Add(new string[] { "say Hi", "say Hello"});
GrammarBuilder gb = new GrammarBuilder();
gb.Append(commands);
Grammar g = new Grammar(gb);
recEngine.LoadGrammarAsync(g);
recEngine.RecognizeAsync(RecognizeMode.Multiple);
recEngine.SpeechRecognized += recEngine_SpeechRecognized;
}
// Create a simple handler for the SpeechRecognized event
static void recEngine_SpeechRecognized (object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Speech recognized: {0}", e.Result.Text);
switch(e.Result.Text){
case "Red":
Console.WriteLine("you said hi");
break;
default:
break;
}
}
}
}
and compiled it using mono project as below:
c:\mcs /reference:System.Speech.dll Program.cs
after adding the System.Speech.dll to the folder project, and got the Program.exe file generated.
Once I run the program at the terminal, it ends up directly, without giving me any chance to say anything!!
I've 2 questions:
What I'm missing here, and what the wrong thing I did?
and
How Is there a way to add the '.dll' file in a better way, I tried adding it to the Project.json file as below, but did not work, though I did not get any error at running dotnet restore:
"frameworks": {
"netcoreapp1.0": {
"bin": {
"assembly": "D:/2016/Speech/CORE/System.Speech.dll"
},
}
}
I solved the first part by adding Thread.Sleep so it give enough time for the thread to keep listening, another option is to make it endless loop using while(true);
I still could'not solve the second part which is how to make the VS code recognize the existing of the assembly file.
The new full code, if any is interested is below, and a more comprehensive code can be found here:
using System;
using System.Speech.Recognition;
using System.Threading;
//using System.Speech.Synthesis;
namespace Voice_Recognation
{
class Program
{
static void Main(string[] args)
{
SpeechRecognitionEngine recEngine = new SpeechRecognitionEngine();
recEngine.SetInputToDefaultAudioDevice();
Choices commands = new Choices();
commands.Add(new string[] { "say Hi", "say Hello"});
GrammarBuilder gb = new GrammarBuilder();
gb.Append(commands);
Grammar g = new Grammar(gb);
recEngine.LoadGrammarAsync(g);
recEngine.SpeechRecognized += recEngine_SpeechRecognized;
Console.WriteLine("Starting asynchronous recognition...");
recEngine.RecognizeAsync(RecognizeMode.Multiple);
// Wait 30 seconds, and then cancel asynchronous recognition.
Thread.Sleep(TimeSpan.FromSeconds(30));
// or
// while(true);
}
// Create a simple handler for the SpeechRecognized event
static void recEngine_SpeechRecognized (object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Speech recognized: {0}", e.Result.Text);
switch(e.Result.Text){
case "say Hello":
Console.WriteLine("you said hi");
break;
default:
break;
}
}
}
}
I have ran into an interesting issue with my voice recognition code for C#. I have had this code work before, but I migrated it to another project and it just wont work. I must be missing something, because there are no errors or warnings about the speech recognition and I do have the reference for speech. Here is the main function:
static void Main(string[] args)
{
Program prgm = new Program();
string[] argument = prgm.readConfigFile();
if(argument[2].ToLower().Contains("true"))
{
recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
recognizer.LoadGrammar(new DictationGrammar());
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.SetInputToDefaultAudioDevice();
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
prgm._con.updateConsole(argument, prgm._list);
}
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine(e.Result.Text);
}
along with the recognizer:
recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
I did add the using System.Speech at the top of my code. When ever I start talking the event handler should start, but it never gets hit (checked with breakpoint). What am I doing wrong?
I need to way to make the speech to text smarter as many of the words it is just getting incorrect in the translation. I cannot find much help on adding a list of words, not commands or grammar but words to help better translate audio recording.
Here is the code I found on the web, and this works, but I need to way to train, or make the engine smarter. Any ideas?
Thanks.
static void Main(string[] args)
{
// Create an in-process speech recognizer for the en-US locale.
using (SpeechRecognitionEngine recognizer =
new SpeechRecognitionEngine(
new System.Globalization.CultureInfo("en-US")))
{
// Create and load a dictation grammar.
recognizer.LoadGrammar(new DictationGrammar());
// Add a handler for the speech recognized event.
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
// Configure input to the speech recognizer.
recognizer.SetInputToWaveFile(#"c:\test2.wav");
// Start asynchronous, continuous speech recognition.
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// Keep the console window open.
while (true)
{
Console.ReadLine();
}
}
}
// Handle the SpeechRecognized event.
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Recognized text: " + e.Result.Text);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"C:\WriteLines2.txt", true))
{
file.WriteLine("");
}
}
I'm making a program that uses the system.speech namespace (it's a simple program that will launch movies). I load all of the filenames from a folder and add them to the grammars I want to use. It's working remarkably well, however there is a hitch: I DON'T want the windows speech recognition to interact with windows at all (ie. when I say start, I don't want the start menu to open... I don't want anything to happen).
Likewise, I have a listbox for the moment that lists all of the movies found in the directory. When I say the show/movie that I want to open, the program isn't recognizing that the name was said because windows speech recognition is selecting the listboxitem from the list instead of passing that to my program.
The recognition is working otherwise, because I have words like "stop", "play", "rewind" in the grammar, and when I catch listener_SpeechRecognized, it will correctly know the word(s)/phrase that I'm saying (and currently just type it in a textbox).
Any idea how I might be able to do this?
I'd use the SpeechRecognitionEngine class rather than the SpeechRecognizer class. This creates a speech recognizer that is completely disconnected from Windows Speech Recognition.
private bool Status = false;
SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
Choices dic = new Choices(new String[] {
"word1",
"word2",
});
public Form1()
{
InitializeComponent();
Grammar gmr = new Grammar(new GrammarBuilder(dic));
gmr.Name = "myGMR";
// My Dic
sre.LoadGrammar(gmr);
sre.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);
sre.SetInputToDefaultAudioDevice();
sre.RecognizeAsync(RecognizeMode.Multiple);
}
private void button1_Click(object sender, EventArgs e)
{
if (Status)
{
button1.Text = "START";
Status = false;
stslable.Text = "Stopped";
}
else {
button1.Text = "STOP";
Status = true;
stslable.Text = "Started";
}
}
public void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs ev)
{
String theText = ev.Result.Text;
MessageBox.Show(theText);
}