First off I must admit that I'm fairly new to C#. I am using C# speech recognition for writing an application that will interface between a human and a robot. An example of a dialogue is as follows:
Human: Ok Robot, Drill.
Robot: Where?
Human shows where.
Robot: I am ready to drill.
Human: Ok Robot, start.
My approach is having two speech recognizes. The first one is for higher level commands such as "Drill", "Cut Square", "Cut Rectangle", "Play Game" etc. The second one is for start/stop commands for each of the higher level tasks.
This is my current code:
using System.IO;
using System.Speech.Recognition;
using System.Speech.Synthesis;
namespace SpeechTest
{
class RobotSpeech
{
public SpeechRecognitionEngine MainRec = new SpeechRecognitionEngine();
public SpeechRecognitionEngine SideRec = new SpeechRecognitionEngine();
public SpeechSynthesizer Synth = new SpeechSynthesizer();
private Grammar mainGrammar;
private Grammar sideGrammar;
private const string MainCorpusFile = #"../../../MainRobotCommands.txt";
private const string SideCorpusFile = #"../../../SideRobotCommands.txt";
public RobotSpeech()
{
Synth.SelectVoice("Microsoft Server Speech Text to Speech Voice (en-US, ZiraPro)");
MainRec.SetInputToDefaultAudioDevice();
SideRec.SetInputToDefaultAudioDevice();
BuildGrammar('M');
BuildGrammar('S');
}
private void BuildGrammar(char w)
{
var gBuilder = new GrammarBuilder("Ok Robot");
switch (w)
{
case 'M':
gBuilder.Append(new Choices(File.ReadAllLines(MainCorpusFile)));
mainGrammar = new Grammar(gBuilder) { Name = "Main Robot Speech Recognizer" };
break;
case 'S':
gBuilder.Append(new Choices(File.ReadAllLines(SideCorpusFile)));
sideGrammar = new Grammar(gBuilder) { Name = "Side Robot Speech Recognizer" };
break;
}
}
public void Say(string msg)
{
Synth.Speak(msg);
}
public void MainSpeechOn()
{
Say("Speech recognition enabled");
MainRec.LoadGrammarAsync(mainGrammar);
MainRec.RecognizeAsync(RecognizeMode.Multiple);
}
public void SideSpeechOn()
{
SideRec.LoadGrammarAsync(sideGrammar);
SideRec.RecognizeAsync();
}
public void MainSpeechOff()
{
Say("Speech recognition disabled");
MainRec.UnloadAllGrammars();
MainRec.RecognizeAsyncStop();
}
public void SideSpeechOff()
{
SideRec.UnloadAllGrammars();
SideRec.RecognizeAsyncStop();
}
}
}
In my main program I have the speech recognized event as follows:
private RobotSpeech voiceIntr;
voiceIntr.MainRec.SpeechRecognized += MainSpeechRecognized;
private void MainSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (!e.Result.Text.Contains("Ok Bishop")) return;
switch (e.Result.Text.Substring(10))
{
case "Go Home":
voiceIntr.Say("Going to home position.");
UR10MoveRobot.GoHome();
break;
case "Say Hello":
voiceIntr.Say("Hello. My name is Bishop.");
break;
case "Drill":
voiceIntr.Say("Show me where you want me to drill.");
// Actual code will be for observating the gestured pts and
// returning the number of pts observed
var msg = "I am ready to drill those " + new Random().Next(2, 5) + " holes.";
voiceIntr.Say(msg);
voiceIntr.SideSpeechOn();
voiceIntr.SideSpeechOff();
break;
case "Cut Outlet":
voiceIntr.Say("Show me where you want me to cut the outlet.");
// Launch gesture recognition to get point for cutting outlet
break;
case "Stop Program":
voiceIntr.Say("Exiting Application");
Thread.Sleep(2200);
Application.Exit();
break;
}
}
The problem I am having is, when one of the MainRec events gets triggered, I am in one the cases here. Now I only want to listen for "Ok Robot Start" and nothing else which is given by the SideRec. If I subscribe to that event here this will go to another eventhandler with a switch case there from which I wouldn't know how to get back to the main thread.
Also after telling the human that the robot is ready for drilling, I would like it to block until it receives an answer from the user for which I need to use a synchronous speech recognizer. However, after a particular task I want to switch off the recognizer which I can't do if its synchronous.
Here are the files for the grammers:
MainRobotCommands.txt
Go Home
Say Hello
Stop Program
Drill
Start Drilling
Cut Outlet
Cut Shap
Play Tic-Tac-Toe
Ready To Play
You First
SideRobotCommands.txt:
Start
Stop
The speech recognition is only a part of a bigger application hence it has to be async unless I want to make it preciously block. I am sure there is better way to design this code, but I'm not sure my knowledge of C# is enough for that. Any help is greatly appreciated!
Thanks.
My approach is having two speech recognizes.
There is no need for 2 recognizers, you can have just 1 recognizer and load/unload grammars when you need them.
private void BuildGrammar(char w)
This is not a straightforward programming style to use switch statement and invoke same function two times with different arguments. Just create two grammars sequentially.
The problem I am having is, when one of the MainRec events gets triggered, I am in one the cases here. Now I only want to listen for "Ok Robot Start" and nothing else which is given by the SideRec. If I subscribe to that event here this will go to another eventhandler with a switch case there from which I wouldn't know how to get back to the main thread.
If you have one recognizer it's enough to have a single handler and do all work there.
Also after telling the human that the robot is ready for drilling, I would like it to block until it receives an answer from the user for which I need to use a synchronous speech recognizer. However, after a particular task I want to switch off the recognizer which I can't do if its synchronous.
It not easy to mix async and sync style in the same design. You need to use either sync programming or async programming. For event-based software there is actually no need to mix, you can work in strictly async paradigm without waiting. Just start new drilling action inside MainSpeechRecognized event handler when "ok drill" is recognized. You can also synthesize audio in async mode without waiting for the result and continue processing in on SayCompleted handler.
To track the state of your software you can create a state variable and check it in event handlers to understand in what state you are and choose next action.
This programming paradigm is called "Event-driven programming", you can read a lot about it in network, check Wikipedia page and start with this tutorial.
Related
In my current project, i have music that plays depending on the gamestate. Like most games, once the soundtrack ends it then repeats. I'd like to implement such feature into my game so it automatically plays however i'm uncertain where i'd start.
The code/algorithm must acknowledge the fact that the soundtrack has ended and repeat baring in mind that it also will have more code connected to it once i create the gamesettings and sounds sub menu where the user is able to change the sounds etc. This is what i have so far:
static public WindowsMediaPlayer Introthemetune = new WindowsMediaPlayer();
public LaunchScreen()
{
this.Opacity = 0;
InitializeComponent();
Introthemetune.URL = "Finalised Game Soundtrack.mp3";
}
You can do this in two ways,
1. Subscribe to PlayStateChange Event
You would need to subscribe to PlayStateChange event and check the NewState
Introthemetune.PlayStateChange += Introthemetune_PlayStateChange;
private void Introthemetune_PlayStateChange(object sender, AxWMPLib._WMPOCXEvents_PlayStateChangeEvent e)
{
if(e.newState == 8)
{
// Play Again
}
}
You can read more on the different play states here
2. Set Loop Mode.
The most easiest approach would be to set the loop mode.
Introthemetune.settings.setMode("loop", true);
Introthemetune.URL = "Finalised Game Soundtrack.mp3";
This would ensure the track is played repeated continuously.
We have our thesis revision and one panelist said that to make our speech recognition system better is to add a way to process a command.
For example if i give a default command
case "What time is it?":
WVASRS.Speak("time");
break;
I want the system to be able to tell the user that the COMMAND is still processing before the user can give another command. Something like:
//I will give a command while the the other command is processing
case "What day is it":
WVASRS.Speak("day");
break;
WVASRS.Speak("Another command is still processing. Please Wait for a few seconds before you can give another command.");
Well, if found something on MSDN but it just does nothing. here is the snippet:
// Assign input to the recognizer and start asynchronous
// recognition.
recognizer.SetInputToDefaultAudioDevice();
completed = false;
Console.WriteLine("Starting asynchronous recognition...");
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// Wait 30 seconds, and then cancel asynchronous recognition.
Thread.Sleep(TimeSpan.FromSeconds(30));
recognizer.RecognizeAsyncCancel();
// Wait for the operation to complete.
while (!completed)
{
Thread.Sleep(333);
}
Console.WriteLine("Done.");
The code from MSDN is more or less correct, you just miss the handler of recognized command, see the complete example here:
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
and later
// Handle the SpeechRecognized event.
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (actionRunning) {
speak("Busy");
}
if (e.Result.Text.Equals("what is the time")) {
actionRunning = true;
startAction()
}
Console.WriteLine("Speech recognized: " + e.Result.Text);
}
I have a program that is similar to an ATM application. I use my android device a the Keypad. No problem with transmitting data from phone to my C# program. I use threading. I have a problem on how to really "start" the ATM application.
The only input device is the android device. I will not use computer keyboard or anything. So once the program is shown, the first thing is that the program will ask what kind of transaction is it.
private void FrontPage_Shown(object sender, EventArgs e)
{
Ask_Transaction();
}
and inside that... (Edited part)
public void Ask_Transaction()
{
string decoded_input = "";
decoded_input = KT.CheckPress(input); //"input" is the data from the arduino. the checkpress will decode it.
do
{
try
{
//input = serialPort.ReadLine();
switch (KT.CheckPress(input))
{
case "S5": //Card;
break;
case "S6": //Cardless
{
PF.Format_Cardless();
AskLanguage(); //if cardless, it will now ask the language. Once again, another input from serial is needed
}
break;
case "Cancel": PF.Format_TransactionCancelled();
break;
default:
break;
}
}
catch
{
//To catch ReadTimeout after 6 seconds
//"You do not respond to the alloted time.."
}
}
while (decoded_input != "S5"
|| decoded_input != "S6"
|| decoded_input != "Cancel");
}
my problem is that when I try to loop Ask_Transaction until I get the correct inputs (S5 or S6 or cancel) using a do-while loop, my program is lagging and eventually crashing. No errors displaying.
EDIT:
Of course we cannot assure that the user will input the correct key. And usually when you click just numbers on the keypad, ATM program will not notify you at first. It will just wait for possible correct inputs. Also, if the user enters S6, the program will now ask another input using those keypad numbers s5 and s6. the problem is that the program is not "continuous". Any kind of help will be appreciated.
I am once again a bit stuck in my practising.
I want an MP3 file to play when i open my program - I can do that, i got music.
I also want a checkbox which allows to pause the music - But either I'm very tired, or the thing won't work - Nothing happens when i check/uncheck it. I've done it like this:
public void PlayPause(int Status)
{
WMPLib.WindowsMediaPlayer wmp = new WMPLib.WindowsMediaPlayer();
switch (Status)
{
case 0:
wmp.URL = "Musik.mp3";
break;
case 1:
wmp.controls.play();
break;
case 2:
wmp.controls.pause();
break;
}
}
Upon opening the program, the method is called with case 0. Music plays. All good.
However this doesn't work, and i don't get why, as it is pretty simple code.
public void checkBox1_CheckedChanged(object sender, EventArgs e)
{
if (checkBox1.Checked == true)
{
PlayPause(2);
}
else if (checkBox1.Checked == false)
{
PlayPause(1);
}
}
Any idea as to why checking the checkbox doesn't pause/unpause the music?
You're instantiating a completely new WindowsMediaPlayer object each time you call that PlayPause function.
Thus, when you call pause later on, you're pausing nothing.
You need to hold or pass a reference to that WMP object around, so that you're operating on the same one.
Well it's because you are creating a new media player every time you call PlayPause. Create it in the constructor and it should be fine.
i want to use kinect sdk voice recognition to run application from Metro UI.
For Example when i say the word:News, it would run from Metro UI the application of News.
Thanks to everybody!
Regards!
First you need to Make the connection with the audio stream and start listening:
private KinectAudioSource source;
private SpeechRecognitionEngine sre;
private Stream stream;
private void CaptureAudio()
{
this.source = KinectSensor.KinectSensors[0].AudioSource;
this.source.AutomaticGainControlEnabled = false;
this.source.EchoCancellationMode = EchoCancellationMode.CancellationOnly;
this.source.BeamAngleMode = BeamAngleMode.Adaptive;
RecognizerInfo info = SpeechRecognitionEngine.InstalledRecognizers()
.Where(r => r.Culture.TwoLetterISOLanguageName.Equals("en"))
.FirstOrDefault();
if (info == null) { return; }
this.sre = new SpeechRecognitionEngine(info.Id);
if(!isInitialized) CreateDefaultGrammars();
sre.LoadGrammar(CreateGrammars()); //Important step
this.sre.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>
(sre_SpeechRecognized);
this.sre.SpeechHypothesized +=
new EventHandler<SpeechHypothesizedEventArgs>
(sre_SpeechHypothesized);
this.stream = this.source.Start();
this.sre.SetInputToAudioStream(this.stream, new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
this.sre.RecognizeAsync(RecognizeMode.Multiple);
}
First you can see in the sample that there's one important step sre.LoadGrammar(CreateGrammars()); which creates and loads the grammar so you have to create the method CreateGrammars():
private Grammar CreateGrammars()
{
var KLgb = new GrammarBuilder();
KLgb.Culture = sre.RecognizerInfo.Culture;
KLgb.Append("News");
return Grammar(KLgb);
}
The above sample create a grammar listening for the word "News". Once it is recognized (the probability that the word said is one in your grammar is higher than a threshold), the speech recognizer engine (sre) raise the SpeechRecognized event.
Of course you need to add the proper handler for the two events (Hypothetize, Recognize):
private void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
this.RecognizedWord = e.Result.Text;
if (e.Result.Confidence > 0.65) InterpretCommand(e);
}
Know all you have to do is to write the InterpretCommand method which does whatever you want (as running a metro app ;) ). If you have multiple words in a dictionnary the method has to parse the text recognized and verify that this is the word news wich was recognized.
Here you can download samples of a great book on Kinect: Beginning Kinect Programming with the Microsoft Kinect SDK (unfortunately the book itself is not free). On the folder Chapter7\PutThatThereComplete\ you have a sample using audio that you can be inspired by.