How to recognize adjectives in speech? - c#

For a project, I would like to have people talk in front of a Kinect (v1) and every adjective they say should appear on a screen.
Unfortunately, I'm new to Kinect development and I'm having real trouble finding good documentation and tutorials.
I did some testing of the idea, but the best I could do is hack one of the Kinect SDK samples for basic speech recognition and put some adjectives in the grammar.
The problem is that this doesn't work well when saying full sentences in front of the Kinect, and you have to speak pretty loudly and close to the Kinect. I will not be able to place the Kinect that close to the speaker.
I've also tried using System.Speech and (like the sample) Microsoft.Speech. However, none of them seem to support the Dictation grammar. I did some research on what the error (Grammar referenced by grammar not found) means, and it seems this indicates that the dictating functionality is not installed on my system, or something similar. I don't know how to proceed from here.
Can you point me to some other things I could try to recognize the adjectives people are saying in front of a Kinect (or another microphone)?

I ended up using the grammar-based speech recognition which I simply prefilled with a lot of adjectives. Users were limited to saying one word at a time, instead of being able to speak freely, but I couldn't make it work otherwise.

Related

Xamarin Forms Image Recognition

I would like to know if there is any recognition system for Xamarin Forms that can recognise a point (for example a green filled circle) with the camera, in order to extract info from that point (like coordinates).
I know that EmguCV maybe can do that, but the samples are not working and if you want to use it on Xamarin Forms, you have to pay a commercial license, what does not make sense to me if I can't test it before.
Any info about this would be greatly appreciated.
I see 3 ways for you:
use EmguCV:
I use EmguCV for Xamarin Forms, and it's working pretty well.
But it's pretty complicated to configure it... Try this tutorial: Using Emgu with Xamarin Forms. I think you can test it without buying a licence but only on a simulator...
I also found an Azure service called "Custom Vision". You can train a neural network? to recognize objects on your pictures... Take a look at here (there is a free plan): Custom vision Azure service
Finally, If you have enough skill in image processing you can do it by yourself (there are many tutorial on the web).
==> For me the first solution is the best (Emgu is really powerfull). So if you plan to use it for several projects, I suggest you to buy a licence...
"Custom Vision" Azure service look really convenient but I don't know if it fit your needs... You have to test it, and the free plan is limited too...
Good luck

different images from different point of view

I want different images to be displayed from different point of view. For the whole concept explaination please look at the images. they explain my idea/query!
As in the first image you see that there are three people at different angle looking at the monitor. Now i want the webcam to track the eyes and show the particular defined image to the user> For example: If user is at 45 degree angle then show image1.png
Depending upon the user's prespective of watching. The computer should show the image.
(the lady is the game character for representation purpose)
Can you please guide me on what steps can be taken to accomplish this? Is there any plugin available for unity that tracks faces? Please guide me
Also thanks for the compliments on my sketching skills xD
Stackoverflow is not really meant to recommend plugins, since the choice is usually opinion based so there is no exact answer.
That being said, on of the most common used API for computer vision (meaning interpreting images, including face recognition) is OpenCV, so that could be a good start for you to look at that.
And fortunately for you, there is a Unity plugin for OpenCV
It is too broad to give you more details about how it works here. You should try to make it work, and if you have a problem with your code, open a new question with the code portion that you struggle with.
PS: nice sketching skills
Perhaps easier option would be to use Kinect
(trying to detect face or eyes from that far might be shaky?)
With Kinect you can get skeletons for multiple people, and getting the angle between target and those kinect avatars would be easy.
If there is no space to put kinect in good position,
could consider placing it on the ceiling above (and then use depth data only to detect people in its view)
Only issue is that apparently Microsoft has stopped Windows kinect support,
so you would need to find 2nd hand versions.. (Unity Asset store still has some kinect plugins and examples available)
https://www.polygon.com/2018/1/2/16842072/xbox-one-kinect-adapter-out-of-stock-production-ended
Or look for kinect alternatives that work with unity, try RealSense cameras:
https://www.intel.sg/content/www/xa/en/architecture-and-technology/realsense-overview.html

Language learning program, word recognition and comparing

I'm trying to create polish/english language learning program.
I'm using C#, and ideally would like to support Windows XP, Vista and (obviously) newer versions.
At the beginning, computer selects some random polish/english word, and "says" it. Program user is then expected to say the same word, but in another language, and program evaluates his correctness. If user said correct word, he is granted a point, otherwise he loses a point.
My first idea was to use speech-to-text library (like System.Speech), but it turns out that
polish language is not very well supported
- speech-to-text is (afaik) not optimized for comparing words
Is there a better way to do it?
Do you know about any library that can do such thing? (Ideally managed library, but im ok with creating my own C# wrapper around unmanaged code).
Is there a name for the thing i want to achieve? (comparing spoken words)
Should I stick to speech-to-text libraries or find another algorithm?
I really tried to google solution, but i wasn't sure after which keyword should i search for. Best i could find was this thread: Language learning speech recognition tools. Solution presented there kind of works for me, but is problematic to deploy (i want a standalone application, with minimum installation) and testing 'correctness' of word that way is a bit weird (i am only 'recognising' single word).
Any help would be really appreciated. Sorry for my poor English.
You might want to read about speech recognition using neural network if you intend to do some work in this area.

C# Speech Recognition from System Audio (Speaker Sound)

I've seen speech recognition from input devices (obviously) and I've seen speech recognition from files (http://gotspeech.net/forums/thread/6835.aspx). However, I was wondering whether it would be possible to run speech recognition on system audio in real time. By system audio, the sound that comes out of your speakers.
It would be a great tool for those who are hard of hearing, as they are watching YouTube videos, the C# Application could transcribe what's being said.
How could I go about doing this?
Very easily - Go to the sound mixer, choose input and enable/unmute "Stereo Mix". You should, of course, mute the mic if you don't want to record that too. Then, just start recording the same way you'd record the mic - now you'll get the same feed as the speakers at digital quality.
This can be done programatically although it can be fiddly - especially if you want to support WinXP as well as Vista/Win7 (Sound was overhauled in Vista and I believe the APIs are significantly different although I haven't had to use them yet).
You're almost certainly going to need to filter the sound before attempting recognition. Unless the speech recog. library you're using is designed to work in adverse conditions, music and special effects will interfere with proper recognition as will multiple people speaking at the same time.
If you haven't got a super-robust library, filters to attenuate non-vocal frequencies are going to be a must. You may also need to apply volume normalisation to account for loud/quiet scenes - There are hundreds of filters that could potentially improve matching.
You may want to access the recognition API at the lowest level to get as much control as possible - You'll need to tweak it to cope with people shouting, breathless, crying, etc... If you start designing for flexible low-level access, it will probably save you weeks if you find you need it later on and have to re-architect.
I'd suggest you look into NAudio as a starting point for audio processing
I suspect you'll be able to get something which works under ideal conditions without too much effort - but tweaking it to work well in all eventualities may be a mammoth task. That said, it sounds like a fun project.
You could improve recognition chance considerably by creating genre-, user- or show-specific dictionaries. These could either be pre-generated, or built automatically using a weighted feedback loop - perhaps also allowing the user to correct mistakes.

Picking my next graphics engine (Java vs. C#)

Requirements
I am developing a music game that requires access to the audio line-in and classes to help me analyze a MIDI file (playing the MIDI is NOT necessary for me). Secondly, I need a graphics engine that allows easy and quick development (within reason). The game's focus is not cutting edge graphics - think along the lines of Audiosurf.
Issue 1
Java provides easy to use and well documented Audio line-in input and MIDI file support built right into the API that I could not find with C#. I found some resources to read from the line-in and MIDI helper classes but don't have much documentation/support and seem to be workarounds to a lack of support by C#.
Issue 2
The second aspect of the game is of course the graphics engine. On the C# side, XNA seems to be the clear choice for my needs. On the Java side, I'm leaning towards JMonkeyEngine (or ogre4j as a second choice). JMonkeyEngine seems to be fine for my graphical uses but the documentation is scattered and sparse.
Deciding
Both issues are of equal importance. Also, I know the community here is prominently .NET programmers, so try to consider both languages if possible.
Use processing, http://www.processing.org/
It seems that you for now mostly want to test a see if your concept actually can be done/(is cool)
Processing is more or less made for this sort of things, audio and visual programmatic sketchpad. You can with very little code see if your ideas stands the way you want.
It's a subset of java so you could use java inside or outside depending on some factors.
Yes, you could use some .net, XNA/WPF or whatever but too me that seems premature.
Test you ideas first.
For the .NET and audio side of things, I have written some code to read and write MIDI files and included it as part of NAudio. Have a look at MIDI File Mapper for an example of how to make use of this. NAudio also includes the capability to capture microphone input.

Categories