I am making a flashlight application in unity C#. The application is almost complete I just want to add this voice command feature in this like when I say "ON" the flashlight should turn on and when I say " OFF " the flashlight should turn off. The application is for Android devices. I saw several tutorials about calling functions on voice commands but that all were only for windows platform please help me if you know something about doing this in android thanks
I have not used any Speech Recognition tools but its not very difficult to implement if you can create a java plugin & use it to call native function. Anyways I have found few of the SDK:
You can check out the pocket sphinx demos for speech recognition.
https://github.com/cmusphinx/pocketsphinx
https://github.com/cmusphinx/pocketsphinx-android-demo
Here is a repo I found which uses AndroidSpeechRecognition.
https://github.com/gsssrao/UnityAndroidSpeechRecognition
Programmer has given a nice explaination of voice recognition implementation natively:
How to add Speech Recognition to Unity project?
Then there is WatsonSDK for unity but it seems to be via cloud but you can check this one out:
https://github.com/watson-developer-cloud/unity-sdk
And if you dont mind paying for this plugin called Android SpeakNow you can grab it from asset store:
https://assetstore.unity.com/packages/tools/integration/android-speaknow-16781
These are some cloud based packages from asset store, I really doubt you might need this one to implement but in any case this is for someone who may require them at some point of time:
https://assetstore.unity.com/packages/add-ons/machinelearning/google-cloud-speech-recognition-vr-ar-desktop-desktop-72625
https://assetstore.unity.com/packages/tools/integration/yandex-cloud-speech-recognition-vr-ar-mobile-desktop-75155
And finally DictationRecognizer; by default this one is available only for windows 10 as of Unity 2018.2. So this is out of question. My best bet would be cmusphinx or implementing natively which I believe would be more suitable for your needs. Check them out. Try to implement one or two and let us know if you were successful or not.
If anyone can add more links to SDK for voice recognition feel free to add. This would be really great.
If you just need only ON and OFF voice inputs you can use the following code
Speech to text in unity
If you need exact speech recognition then refer the following code
Speech recognition in unity
Related
I'm considering porting a speech 2D HTML5 web game I've built to Unity2D for iPhone and Android. I'm a full-stack web developer, and not a Unity developer, so an agency would help me build the Unity app. Before signing with them, I need to be sure both Speech to Text (STT) and Text to Speech (TTS) services are available for Mandarin, Spanish, and English, otherwise I'd waste a lot of money up front.
For Web, Webkit Speech (STT Docs, STT Demo, TTS Docs, TTS Demo) is easily accessible via the browser. I've found that IBM Watson has an API available, and has demos for STT and TTS, and I've found that they have a Unity SDK here, but I don't have the skillsets to test the Unity SDK.
I'm looking for guidance on great STT and TTS APIs that the agency can use for those three foreign languages.
Does the Unity SDK provide support for frontend STT and TTS audio streaming? STT needs to capture users' voice input and transcribe it quickly. Likewise, TTS needs to allow the user to hover over a target language word and listen to a near-native pronunciation.
Does it offer both STT and TTS for Spanish, Mandarin, and English?
What other NLP APIs are there which meet my requirements?
Apologies, I'm completely new to Unity/phone development so any guidance here would be extremely helpful. If no APIs exist that meet these requirements then Unity won't work for my app since STT and TTS is critical.
Overall, realtime audio recording in Unity is awful, the system is simply not designed to record audio continuously. You can record a clip with AudioSource but that is a clip of fixed length, not a streaming solution.
For streaming you can get the audio with AudioFilterRead but it is not really the API for recording, it is more for effects. For recording it has unpredictable latency and also slows down the UI significantly.
As a result, you can only have push-to-talk kind of interaction, not realtime interaction.
If you have other alternatives you'd better consider them too. For example, you can consider native app.
I know how to make it so it'll trigger to different phrases, but I want one SpeechRecognitionEngine to listen for a wakeup word (example: trigger), then start another Engine to listen for everything until enough silence, and then save what I've said. If this isn't possible with SpeechRecognitionEngine, is there a way to do it in C# with an API of some sorts?
Porcupine has support for C# it runs on Windows, Linux, and macOS at the moment. You can check the quick start on its GitHub repo.
I'm developing an AI to act as a personal assistant which gets Voice Commands from input,Matches them against a preloaded Grammar and do some jobs based on the result. Everything is fine except as I use Microsoft.Speech, It gets used to my voice. So it gets better and better recognizing my voice but it doesn't recognize my wife's voice at all!
So I guess my options are:
1.Dynamically switch profiles.
2.Go with another library.
What should I do?
Any other suggestion would be nice.
I can inject any other library into my Robot if u know a better one.
After searching for a while, I finally used wit.ai speech recognition. It works fine for me at the moment, Except it takes a couple of seconds until it shows me the result. I'm going to use google speech engine as well. I'll let you know if it works better than wit.ai
I saw on the documentation of the Bing Speech API that it is possible to stream a recording microphone input to the REST service (https://learn.microsoft.com/en-us/azure/cognitive-services/speech/home):
Real-time continuous recognition. The speech recognition API enables
users to transcribe audio into text in real time, and supports to
receive the intermediate results of the words that have been
recognized so far.
However, I was not able to find a sample showing how this could be achieved in a cross-platform fashion using Xamarin Forms.
I have found the following tutorial: https://developer.xamarin.com/guides/xamarin-forms/cloud-services/cognitive-services/speech-recognition/
But in this, the audio stream sent to the API is an already existing audio file, what I would like to achieve, however, is to stream the microphone input of the device running the app (Android, iOS, UWP).
Any insight would be appreciated.
I am afraid that there are no libraries compatible with Xamarin that support real-time Microsoft Speech API. The only compatible is the Bing Speech API which uses the REST protocol and does not offer the real-time transcription.
The real-time transcription requires Speech Service WebSocket protocol which is fully documented. You could implement this interface yourself, but it may be quite a complex task to do it reliably.
There are however native libraries for iOS and Android which do support the real-time streaming functionality. You can see tutorial for iOS and tutorial for Android.
What you could do then is use Xamarin Binding Libraries to bind the native libraries into your Xamarin project. For Java library see this tutorial and for Objective-C library see this tutorial.
Especially creating the Objective-C binding might be a daunting task and it is usually easier to create a Objective-C library that will act as a facade, which then uses the native library. You will know the interface of your facade library and you will then be able to create the binding more easily. You may also consider asking the Xamarin team to create the binding for you, as they maintain a growing collection of third-party library bindings on GitHub.
I have a cross platform solution using Bing Speech. Got the IOS working. Never tested the Android solution.
There is a great library here that should fit your needs:
https://github.com/NateRickard/Xamarin.Cognitive.BingSpeech
I need to perform actions in my Desktop app when a user says certain things, for example, "Save Document" or "Save As" or "Save changes" will raise its corresponding event.
But I don't want to rely on, or even implement buttons (this is an app for me). So setting the AccessibleName or whatever is not good enough. I need more control.
Is there a way to "listen" for commands in a Windows WPF Desktop app? Then raise an event when that command has been spoken?
Since everyone is posting links to Microsoft Speech API, you might still be lost at how to use it.
So here is a tutorial for using Microsoft Speech API
Have you seen the Microsoft Speech API, which supports speech recognition?
You are looking for the Microsoft Speech API (This is a Get Started with Speech Recognition with a neat code example. Though it is for WinForms it should work for WPF too.). It allows you to create a grammar which can be recognized and input handled.
I'm looking into adding speech recognition to my fork of Hotspotizer Kinect-based app (http://github.com/birbilis/hotspotizer)
After some search I see you can't markup the actionable UI elements
with related speech commands in order to simulate user actions on them
as one would expect if Speech input was integrated in WPF. I'm
thinking of making a XAML markup extension to do that, unless someone
can point to pre-existing work on this that I could reuse...
the following links should be useful:
http://www.wpf-tutorial.com/audio-video/speech-recognition-making-wpf-listen/
http://www.c-sharpcorner.com/uploadfile/mahesh/programming-speech-in-wpf-speech-recognition/
http://blogs.msdn.com/b/rlucero/archive/2012/01/17/speech-recognition-exploring-grammar-based-recognition.aspx
https://msdn.microsoft.com/en-us/library/hh855387.aspx (make use of Kinect mic array audio input)
http://kin-educate.blogspot.gr/2012/06/speech-recognition-for-kinect-easy-way.html
https://channel9.msdn.com/Series/KinectQuickstart/Audio-Fundamentals
https://msdn.microsoft.com/en-us/library/hh855359.aspx?f=255&MSPPError=-2147217396#Software_Requirements
https://www.microsoft.com/en-us/download/details.aspx?id=27225
https://www.microsoft.com/en-us/download/details.aspx?id=27226
http://www.redmondpie.com/speech-recognition-in-a-c-wpf-application/
http://www.codeproject.com/Articles/55383/A-WPF-Voice-Commanded-Database-Management-Applicat
http://www.codeproject.com/Articles/483347/Speech-recognition-speech-to-text-text-to-speech-a
http://www.c-sharpcorner.com/uploadfile/nipuntomar/speech-to-text-in-wpf/
http://www.w3.org/TR/speech-grammar/
https://msdn.microsoft.com/en-us/library/hh361625(v=office.14).aspx
https://msdn.microsoft.com/en-us/library/hh323806.aspx
https://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognitionengine.requestrecognizerupdate.aspx
http://blogs.msdn.com/b/rlucero/archive/2012/02/03/speech-recognition-using-multiple-grammars-to-improve-recognition.aspx