i want to use kinect sdk voice recognition to run application from Metro UI.
For Example when i say the word:News, it would run from Metro UI the application of News.
Thanks to everybody!
Regards!
First you need to Make the connection with the audio stream and start listening:
private KinectAudioSource source;
private SpeechRecognitionEngine sre;
private Stream stream;
private void CaptureAudio()
{
this.source = KinectSensor.KinectSensors[0].AudioSource;
this.source.AutomaticGainControlEnabled = false;
this.source.EchoCancellationMode = EchoCancellationMode.CancellationOnly;
this.source.BeamAngleMode = BeamAngleMode.Adaptive;
RecognizerInfo info = SpeechRecognitionEngine.InstalledRecognizers()
.Where(r => r.Culture.TwoLetterISOLanguageName.Equals("en"))
.FirstOrDefault();
if (info == null) { return; }
this.sre = new SpeechRecognitionEngine(info.Id);
if(!isInitialized) CreateDefaultGrammars();
sre.LoadGrammar(CreateGrammars()); //Important step
this.sre.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>
(sre_SpeechRecognized);
this.sre.SpeechHypothesized +=
new EventHandler<SpeechHypothesizedEventArgs>
(sre_SpeechHypothesized);
this.stream = this.source.Start();
this.sre.SetInputToAudioStream(this.stream, new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
this.sre.RecognizeAsync(RecognizeMode.Multiple);
}
First you can see in the sample that there's one important step sre.LoadGrammar(CreateGrammars()); which creates and loads the grammar so you have to create the method CreateGrammars():
private Grammar CreateGrammars()
{
var KLgb = new GrammarBuilder();
KLgb.Culture = sre.RecognizerInfo.Culture;
KLgb.Append("News");
return Grammar(KLgb);
}
The above sample create a grammar listening for the word "News". Once it is recognized (the probability that the word said is one in your grammar is higher than a threshold), the speech recognizer engine (sre) raise the SpeechRecognized event.
Of course you need to add the proper handler for the two events (Hypothetize, Recognize):
private void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
this.RecognizedWord = e.Result.Text;
if (e.Result.Confidence > 0.65) InterpretCommand(e);
}
Know all you have to do is to write the InterpretCommand method which does whatever you want (as running a metro app ;) ). If you have multiple words in a dictionnary the method has to parse the text recognized and verify that this is the word news wich was recognized.
Here you can download samples of a great book on Kinect: Beginning Kinect Programming with the Microsoft Kinect SDK (unfortunately the book itself is not free). On the folder Chapter7\PutThatThereComplete\ you have a sample using audio that you can be inspired by.
Related
I'm using c# and the System.Speech.Recognition to load a couple of simple grammars I defined. When I say phrases matching the grammars, the engine recognizes the grammar correctly with confidences around 0.95.
But when I pronounce words that are not even in the grammar (even from difference languages or gibberish), the engines randomly returns a match to a grammar with random text never pronounced and still with high confidence like 0.92.
Is there something I need to set in the SpeechRecognitionEngine object or in each Grammar object to avoid this problem?
I think I found a solution that works for me but still it would be nice to find a more elegant one if exists:
I define a dictation grammar and a "placeholder". Then I load my grammars and disabled them immediately.
using System.Speech.Recognition;
...
private DictationGrammar dictationGrammar;
private Grammar placeholderGrammar;
private List<Grammar> commands;
public void Initialize()
{
dictationGrammar = new DictationGrammar();
recognizer.LoadGrammarAsync(dictationGrammar);
var builder = new GrammarBuilder();
builder.Append("MYPLACEHOLDER");
placeholderGrammar = new Grammar(builder);
recognizer.LoadGrammarAsync(placeholderGrammar);
commands = new List<Grammar>();
foreach (var grammar in grammarManager.GetGrammars())
{
commands.Add(grammar);
grammar.Enabled = false;
recognizer.LoadGrammarAsync(grammar);
}
}
Then on the speechRecognized event I put the logic that if placeholder is recognized then enable the commands. If a command is recognized the re-enable the dictation and disable all commands:
private async void speechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Grammar == placeholderGrammar)
{
//go to command mode
placeholderGrammar.Enabled = false;
dictationGrammar.Enabled = false;
foreach (var item in commands)
item.Enabled = true;
}
else if (commands.Any(x => e.Result.Grammar == x))
{
Do_something_with_recognized_command("!!");
//go back in normal mode
placeholderGrammar.Enabled = true;
dictationGrammar.Enabled = true;
}else {//this is dictation.. nothing to do}
}
I got stuck trying to implementing file picker for windows phone app. I need to choose files from gallery using FileOpenPicker. I didn't get how it works. Here is my code:
private readonly FileOpenPicker photoPicker = new FileOpenPicker();
// This is a constructor
public MainPage()
{
// < ... >
photoPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;
photoPicker.FileTypeFilter.Add(".jpg");
}
// I have button on the UI. On click, app shows picker where I can choose a file
private void bChoosePhoto_OnClick(object sender, RoutedEventArgs e)
{
photoPicker.PickMultipleFilesAndContinue();
}
So, what to do next? I guess I need to get a file object or something.
I found this link. It is msdn explanation where custom class ContinuationManager is implemented. This solution looks weird and ugly. I am not sure if it is the best one. Please help!
PickAndContinue is the only method that would work on Windows Phone 8.1. It's not so weird and ugly, here goes a simple example without ContinuationManager:
Let's assume that you want to pick a .jpg file, you use FileOpenPicker:
FileOpenPicker picker = new FileOpenPicker();
picker.FileTypeFilter.Add(".jpg");
picker.ContinuationData.Add("keyParameter", "Parameter"); // some data which you can pass
picker.PickSingleFileAndContinue();
Once you run PickSingleFileAndContinue();, your app is deactivated. When you finish picking a file, then OnActivated event is fired, where you can read the file(s) you have picked:
protected async override void OnActivated(IActivatedEventArgs args)
{
var continuationEventArgs = args as IContinuationActivatedEventArgs;
if (continuationEventArgs != null)
{
switch (continuationEventArgs.Kind)
{
case ActivationKind.PickFileContinuation:
FileOpenPickerContinuationEventArgs arguments = continuationEventArgs as FileOpenPickerContinuationEventArgs;
string passedData = (string)arguments.ContinuationData["keyParameter"];
StorageFile file = arguments.Files.FirstOrDefault(); // your picked file
// do what you want
break;
// rest of the code - other continuation, window activation etc.
Note that when you run file picker, your app is deactivated and in some rare situations it can be terminated by OS (little resources for example).
The ContinuationManager is only a helper that should help to make some things easier. Of course, you can implement your own behaviour for simpler cases.
Please have a look at the following code
private void button2_Click(object sender, EventArgs e)
{
SpeechRecognizer sr = new SpeechRecognizer();
Choices colors = new Choices();
colors.Add(new string[] { "red arrow", "green", "blue" });
GrammarBuilder gb = new GrammarBuilder();
gb.Append(colors);
Grammar g = new Grammar(gb);
sr.LoadGrammar(g);
// SpeechSynthesizer s = new SpeechSynthesizer();
// s.SpeakAsync("start speaking");
sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sr_SpeechRecognized);
}
void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
MessageBox.Show(e.Result.Text);
}
This is normal speech recognition code which uses the MS speech engine. You can see here that I have loaded some grammar. But, there is an issue as well. That is, this is not responding only to the given grammar but also to the MS Built-In speech commands! Like speech command to minimize a window, open start menu etc!
I really don't need that. My application should only respond to my grammar and not to MS built-in commands. Is there is a way I can achieve this?
The SpeechRecognizer object builds on top of the existing Windows Speech system. From MSDN:
Applications use the shared recognizer to access Windows Speech
Recognition. Use the SpeechRecognizer object to add to the Windows
speech user experience.
Consider using a SpeechRecognitionEngine object instead as this runs in-process rather than system-wide.
I need to way to make the speech to text smarter as many of the words it is just getting incorrect in the translation. I cannot find much help on adding a list of words, not commands or grammar but words to help better translate audio recording.
Here is the code I found on the web, and this works, but I need to way to train, or make the engine smarter. Any ideas?
Thanks.
static void Main(string[] args)
{
// Create an in-process speech recognizer for the en-US locale.
using (SpeechRecognitionEngine recognizer =
new SpeechRecognitionEngine(
new System.Globalization.CultureInfo("en-US")))
{
// Create and load a dictation grammar.
recognizer.LoadGrammar(new DictationGrammar());
// Add a handler for the speech recognized event.
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
// Configure input to the speech recognizer.
recognizer.SetInputToWaveFile(#"c:\test2.wav");
// Start asynchronous, continuous speech recognition.
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// Keep the console window open.
while (true)
{
Console.ReadLine();
}
}
}
// Handle the SpeechRecognized event.
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("Recognized text: " + e.Result.Text);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"C:\WriteLines2.txt", true))
{
file.WriteLine("");
}
}
I'm making a program that uses the system.speech namespace (it's a simple program that will launch movies). I load all of the filenames from a folder and add them to the grammars I want to use. It's working remarkably well, however there is a hitch: I DON'T want the windows speech recognition to interact with windows at all (ie. when I say start, I don't want the start menu to open... I don't want anything to happen).
Likewise, I have a listbox for the moment that lists all of the movies found in the directory. When I say the show/movie that I want to open, the program isn't recognizing that the name was said because windows speech recognition is selecting the listboxitem from the list instead of passing that to my program.
The recognition is working otherwise, because I have words like "stop", "play", "rewind" in the grammar, and when I catch listener_SpeechRecognized, it will correctly know the word(s)/phrase that I'm saying (and currently just type it in a textbox).
Any idea how I might be able to do this?
I'd use the SpeechRecognitionEngine class rather than the SpeechRecognizer class. This creates a speech recognizer that is completely disconnected from Windows Speech Recognition.
private bool Status = false;
SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
Choices dic = new Choices(new String[] {
"word1",
"word2",
});
public Form1()
{
InitializeComponent();
Grammar gmr = new Grammar(new GrammarBuilder(dic));
gmr.Name = "myGMR";
// My Dic
sre.LoadGrammar(gmr);
sre.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);
sre.SetInputToDefaultAudioDevice();
sre.RecognizeAsync(RecognizeMode.Multiple);
}
private void button1_Click(object sender, EventArgs e)
{
if (Status)
{
button1.Text = "START";
Status = false;
stslable.Text = "Stopped";
}
else {
button1.Text = "STOP";
Status = true;
stslable.Text = "Started";
}
}
public void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs ev)
{
String theText = ev.Result.Text;
MessageBox.Show(theText);
}