I need to way to make the speech to text smarter as many of the words it is just getting incorrect in the translation. I cannot find much help on adding a list of words, not commands or grammar but words to help better translate audio recording.
Here is the code I found on the web, and this works, but I need to way to train, or make the engine smarter. Any ideas?
Thanks.
static void Main(string[] args)
{
// Create an in-process speech recognizer for the en-US locale.
using (SpeechRecognitionEngine recognizer =
new SpeechRecognitionEngine(
new System.Globalization.CultureInfo("en-US")))
{
// Create and load a dictation grammar.
recognizer.LoadGrammar(new DictationGrammar());
// Add a handler for the speech recognized event.
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
// Configure input to the speech recognizer.
recognizer.SetInputToWaveFile(#"c:\test2.wav");
// Start asynchronous, continuous speech recognition.
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// Keep the console window open.
while (true)
{
Console.ReadLine();
}
}
}
// Handle the SpeechRecognized event.
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Recognized text: " + e.Result.Text);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"C:\WriteLines2.txt", true))
{
file.WriteLine("");
}
}
Related
Im trying to have my windows forms program to continuously listen to my microphone to detect speech and then display that information on the gui.
Here is my SpeechListener class:
{
public class SpeechListener
{
GUI gui;
public SpeechListener(GUI gui) { this.gui = gui; }
public void StartListening() {
gui.setLabel("speech activated");
// Create an in-process speech recognizer for the en-US locale.
using (
SpeechRecognitionEngine recognizer =
new SpeechRecognitionEngine(
new System.Globalization.CultureInfo("en-US")))
{
// Create and load a dictation grammar.
recognizer.LoadGrammar(new DictationGrammar());
// Add a handler for the speech recognized event.
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(Recognizer_SpeechRecognized);
// Configure input to the speech recognizer.
recognizer.SetInputToDefaultAudioDevice();
// Start asynchronous, continuous speech recognition.
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
}
// Handle the SpeechRecognized event.
public void Recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
//this is where i want to get
gui.setLabel("Recognized text: " + e.Result.Text);
}
}
}
This class is instantiated and StartListening() is called from a form class (my gui).
I never reach the method that handles the SpeechRecognized event. However when i change
recognizer.RecognizeAsync(RecognizeMode.Multiple);
to
recognizer.Recognize();
the speech detection works (but only once, and it freezes my gui). Why doesn't the async method work?
I've used this same code on a console program and it works perfectly.
I have been searching for this problem for a while but I didn't find anything helpful.
I want perform a continuous speech recognition using System.Speech.Recognition; from a wav file however , I want to return the result when the recognition engine finish its work.
I found a solution to use RecognizeCompleted event handler but it doesn't fire at all.
Recognition code:
SpeechRecognitionEngine speechRecognitionEngine = null;
bool completed = false;
// create the engine
speechRecognitionEngine = createSpeechEngine("en-GB");
// hook to events
speechRecognitionEngine.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);
speechRecognitionEngine.RecognizeCompleted += new EventHandler<RecognizeCompletedEventArgs>(recognizer_RecognizeCompleted);
// load dictionary
speechRecognitionEngine.LoadGrammar(new DictationGrammar());
speechRecognitionEngine.SetInputToWaveFile("C:\\Converted Audio\\BBC1 cut sport.wav");
// start listening for continuous speech recognition
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
while (!completed)
{
Thread.Sleep(333);
}
Event handler code:
void recognizer_RecognizeCompleted(object sender, RecognizeCompletedEventArgs e)
{
completed = true;
}
I have ran into an interesting issue with my voice recognition code for C#. I have had this code work before, but I migrated it to another project and it just wont work. I must be missing something, because there are no errors or warnings about the speech recognition and I do have the reference for speech. Here is the main function:
static void Main(string[] args)
{
Program prgm = new Program();
string[] argument = prgm.readConfigFile();
if(argument[2].ToLower().Contains("true"))
{
recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
recognizer.LoadGrammar(new DictationGrammar());
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.SetInputToDefaultAudioDevice();
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
prgm._con.updateConsole(argument, prgm._list);
}
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine(e.Result.Text);
}
along with the recognizer:
recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
I did add the using System.Speech at the top of my code. When ever I start talking the event handler should start, but it never gets hit (checked with breakpoint). What am I doing wrong?
Please have a look at the following code
private void button2_Click(object sender, EventArgs e)
{
SpeechRecognizer sr = new SpeechRecognizer();
Choices colors = new Choices();
colors.Add(new string[] { "red arrow", "green", "blue" });
GrammarBuilder gb = new GrammarBuilder();
gb.Append(colors);
Grammar g = new Grammar(gb);
sr.LoadGrammar(g);
// SpeechSynthesizer s = new SpeechSynthesizer();
// s.SpeakAsync("start speaking");
sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sr_SpeechRecognized);
}
void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
MessageBox.Show(e.Result.Text);
}
This is normal speech recognition code which uses the MS speech engine. You can see here that I have loaded some grammar. But, there is an issue as well. That is, this is not responding only to the given grammar but also to the MS Built-In speech commands! Like speech command to minimize a window, open start menu etc!
I really don't need that. My application should only respond to my grammar and not to MS built-in commands. Is there is a way I can achieve this?
The SpeechRecognizer object builds on top of the existing Windows Speech system. From MSDN:
Applications use the shared recognizer to access Windows Speech
Recognition. Use the SpeechRecognizer object to add to the Windows
speech user experience.
Consider using a SpeechRecognitionEngine object instead as this runs in-process rather than system-wide.
i want to use kinect sdk voice recognition to run application from Metro UI.
For Example when i say the word:News, it would run from Metro UI the application of News.
Thanks to everybody!
Regards!
First you need to Make the connection with the audio stream and start listening:
private KinectAudioSource source;
private SpeechRecognitionEngine sre;
private Stream stream;
private void CaptureAudio()
{
this.source = KinectSensor.KinectSensors[0].AudioSource;
this.source.AutomaticGainControlEnabled = false;
this.source.EchoCancellationMode = EchoCancellationMode.CancellationOnly;
this.source.BeamAngleMode = BeamAngleMode.Adaptive;
RecognizerInfo info = SpeechRecognitionEngine.InstalledRecognizers()
.Where(r => r.Culture.TwoLetterISOLanguageName.Equals("en"))
.FirstOrDefault();
if (info == null) { return; }
this.sre = new SpeechRecognitionEngine(info.Id);
if(!isInitialized) CreateDefaultGrammars();
sre.LoadGrammar(CreateGrammars()); //Important step
this.sre.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>
(sre_SpeechRecognized);
this.sre.SpeechHypothesized +=
new EventHandler<SpeechHypothesizedEventArgs>
(sre_SpeechHypothesized);
this.stream = this.source.Start();
this.sre.SetInputToAudioStream(this.stream, new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
this.sre.RecognizeAsync(RecognizeMode.Multiple);
}
First you can see in the sample that there's one important step sre.LoadGrammar(CreateGrammars()); which creates and loads the grammar so you have to create the method CreateGrammars():
private Grammar CreateGrammars()
{
var KLgb = new GrammarBuilder();
KLgb.Culture = sre.RecognizerInfo.Culture;
KLgb.Append("News");
return Grammar(KLgb);
}
The above sample create a grammar listening for the word "News". Once it is recognized (the probability that the word said is one in your grammar is higher than a threshold), the speech recognizer engine (sre) raise the SpeechRecognized event.
Of course you need to add the proper handler for the two events (Hypothetize, Recognize):
private void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
this.RecognizedWord = e.Result.Text;
if (e.Result.Confidence > 0.65) InterpretCommand(e);
}
Know all you have to do is to write the InterpretCommand method which does whatever you want (as running a metro app ;) ). If you have multiple words in a dictionnary the method has to parse the text recognized and verify that this is the word news wich was recognized.
Here you can download samples of a great book on Kinect: Beginning Kinect Programming with the Microsoft Kinect SDK (unfortunately the book itself is not free). On the folder Chapter7\PutThatThereComplete\ you have a sample using audio that you can be inspired by.