I want to make a program that recognizes the notes I play over the microphone with my guitar, but I'm not sure how I will make my program recognize the sounds I play, and then make the program choose between a bunch of notes.
Can I have any help with this? I basically need a library, which is able to recognize sound played over the microphone, and then compare it to different audio files, and see which one is the closest to the played note.
I hope you guys understand this now, it is hard to explain.
As Dan Bryant mentioned, you basically want to do an FFT, which gives you the amount of energy at different frequencies. Find the frequency with maximum energy, then choose the note whose frequency is closest to that. This is what is going in the little digital tuners you buy that help you tune your guitar. There are several available libraries that will do the FFT for you. You just need to specify an FFT size that gives you enough frequency resolution to distinguish between notes.
Related
I want different images to be displayed from different point of view. For the whole concept explaination please look at the images. they explain my idea/query!
As in the first image you see that there are three people at different angle looking at the monitor. Now i want the webcam to track the eyes and show the particular defined image to the user> For example: If user is at 45 degree angle then show image1.png
Depending upon the user's prespective of watching. The computer should show the image.
(the lady is the game character for representation purpose)
Can you please guide me on what steps can be taken to accomplish this? Is there any plugin available for unity that tracks faces? Please guide me
Also thanks for the compliments on my sketching skills xD
Stackoverflow is not really meant to recommend plugins, since the choice is usually opinion based so there is no exact answer.
That being said, on of the most common used API for computer vision (meaning interpreting images, including face recognition) is OpenCV, so that could be a good start for you to look at that.
And fortunately for you, there is a Unity plugin for OpenCV
It is too broad to give you more details about how it works here. You should try to make it work, and if you have a problem with your code, open a new question with the code portion that you struggle with.
PS: nice sketching skills
Perhaps easier option would be to use Kinect
(trying to detect face or eyes from that far might be shaky?)
With Kinect you can get skeletons for multiple people, and getting the angle between target and those kinect avatars would be easy.
If there is no space to put kinect in good position,
could consider placing it on the ceiling above (and then use depth data only to detect people in its view)
Only issue is that apparently Microsoft has stopped Windows kinect support,
so you would need to find 2nd hand versions.. (Unity Asset store still has some kinect plugins and examples available)
https://www.polygon.com/2018/1/2/16842072/xbox-one-kinect-adapter-out-of-stock-production-ended
Or look for kinect alternatives that work with unity, try RealSense cameras:
https://www.intel.sg/content/www/xa/en/architecture-and-technology/realsense-overview.html
I've seen speech recognition from input devices (obviously) and I've seen speech recognition from files (http://gotspeech.net/forums/thread/6835.aspx). However, I was wondering whether it would be possible to run speech recognition on system audio in real time. By system audio, the sound that comes out of your speakers.
It would be a great tool for those who are hard of hearing, as they are watching YouTube videos, the C# Application could transcribe what's being said.
How could I go about doing this?
Very easily - Go to the sound mixer, choose input and enable/unmute "Stereo Mix". You should, of course, mute the mic if you don't want to record that too. Then, just start recording the same way you'd record the mic - now you'll get the same feed as the speakers at digital quality.
This can be done programatically although it can be fiddly - especially if you want to support WinXP as well as Vista/Win7 (Sound was overhauled in Vista and I believe the APIs are significantly different although I haven't had to use them yet).
You're almost certainly going to need to filter the sound before attempting recognition. Unless the speech recog. library you're using is designed to work in adverse conditions, music and special effects will interfere with proper recognition as will multiple people speaking at the same time.
If you haven't got a super-robust library, filters to attenuate non-vocal frequencies are going to be a must. You may also need to apply volume normalisation to account for loud/quiet scenes - There are hundreds of filters that could potentially improve matching.
You may want to access the recognition API at the lowest level to get as much control as possible - You'll need to tweak it to cope with people shouting, breathless, crying, etc... If you start designing for flexible low-level access, it will probably save you weeks if you find you need it later on and have to re-architect.
I'd suggest you look into NAudio as a starting point for audio processing
I suspect you'll be able to get something which works under ideal conditions without too much effort - but tweaking it to work well in all eventualities may be a mammoth task. That said, it sounds like a fun project.
You could improve recognition chance considerably by creating genre-, user- or show-specific dictionaries. These could either be pre-generated, or built automatically using a weighted feedback loop - perhaps also allowing the user to correct mistakes.
I'm currently building a game in windows phone 7 using xna
i'm trying to get beat per minute from song that played in background song,
i also not quite sure if what i want is BPM, what i want is something like pace or tempo in music, faster the pace ,faster the sprites is moving. What i'm thinking right now, BPM is how much a frequency from music hits a range of defined constant, e.g 20 Mhz - 30 MHz,
Feel free to correct me if i'm wrong, i'm not really familiar with audio thing, i have tried using VisualizationData from MediaLibrary XNA, but after some googling they said that VisualizationData doesn't work with WP7, i also had tried it and the output is 256 length float array contains 0 value,,or if i could do some fft with it,i'll give it a try
Thank you...
Like you were saying, as for the beats you can't get it directly but you'll have to interpret this data. If you personally can preprocess this music and ship it with your title it would be your best bet
In XNA you really only have MediaPlayer.GetVisualizationData to work with. There isn't anything built in that allows you to predetermine this sort of thing. It's used like the following and gets you information about the different frequencies that are playing.
MediaPlayer.IsVisualizationEnabled = true;
VisualizationData visData = new VisualizationData();
MediaPlayer.GetVisualizationData(visData);
So how do you take this frequency stuff and make it worthwhile for your application? There's a great breakdown of how you can do this that's on the App Hub forums in this thread called "Audio Analysis" in the reply by jwatte. Essentially, you're going to look at the low frequencies and try to figure out when the beats are coming in. Nothing perfect, but hopefully you'll get something that you approve of.
Good luck!
I want to develop a "People Counting System" using OpenCV (or Emgu CV).
Please guide me on how to implement or lead me to some examples or open source projects.
(I have done some work: extracting diff then threshold to delete background, using motion history and like that; still no good results.)
Edit 1: I am counting a high people flow (a dozen of them may come through simultaneously).
Edit 2: It must be at least 80% accurate. People are walking through a door that is almost 5 meters wide. The problem is I have no control on the position or angle of the camera. Camera is shouting the place from a 10m distance at a 2.5m height.
Thank you
If you call a people counting system a system that counts people that are in a room then I recommend you implement the hardware with a microcontroller with 2 lazers(normal lazer toys work) and 2 photoresistors.For the microcontroller I recomen you use Arduino.And then make an C# application that has a SerialPort object and reads the data that the arduino sends through the USB.The arduino will send 1 for "someone entered the room" and 0 for "someone left the room" for example.Then the logging and statistics can be done easily in C#.
Arduiono Site:here
Photoresistor for $1: here
This solution is alot cheaper and easyer to implement than using a camera that is with a fairly good quality.
Hope I helped you.
Check out the HOG pedestrian detector that comes with recent versions of OpenCV (>= 2.2).
See modules/objdetect/src/hog.cpp and samples/cpp/peopledetect.cpp in the OpenCV sources. Unfortunately there is no official documentation about it yet.
This would help you to count moving things including people: Motion Detection project on CodeProject
Are people the only kind of "entities" in the scene? If this is not the case, do you care about considering a person some other kind of thing that moves through the scene? Because if that is the case, you could just count blobs that come in or come out from the scene. It may sound a bit naive but I will take some kind of motion image, group motion pixels by distance in clusters. Your distance metric could take into account some restrictions, such as that people will "often" stand so pixels in a cluster should group around some kind of regression line (an straight-up line if the camera is aligned with de floor). It shouldn't be necessary to track them in the scene, just noticing when they enter or they leave, though you'd get some issues with, for example, people entering on their own in the scene and leaving in pairs or in groups... Good luck :)
I think if you have dense people crowd with a lot of occlusions you have to use some machine learning algorithm, for example you can use Implicit Shape Model for features.
It really depends on the position of the camera. Assuming that you can get front facing profiles of the people in the images:
This problem is basically face detection and recognition.
There are many ways to go about finding faces, but this is the approach that I'm a little more familiar with.
For the face detection you need to do image segmentation on the skin tone color. This will extract skin regions. [Arms, the chest (for those wearing V cut tops), face, legs, etc] Then you would need to line up the profiles of the skin regions to the profile of your trained faces.
[You'll need to use Eigenfaces to create a generic profile of what a face looks like]
If the skin region lines up and doesn't devate too far from the profile, then it is considered a face. Once the face is confirmed, then add it into the eigenfaces data store [for recognition]. To save processing you might want to consider limiting the search area if you are looking for a previous face. [Given the frame rate, and last time the person was seen]
If you are referring to "Crowd flow" I think you just mean the density of faces in a crowd.
Now you've confirmed that a moving object in the video is a person. Now you just need to note that and then make sure that you don't consider them as a new person again.
This approach: Really depends on your ability to detect face regions. This may not work if the people in the video are looking down, not fitting the profile of the trained data etc. Also it may be effected if a person puts on sunglasses within the video. [Probably would be considered a "new face"]
I want add machine sounds to my router table simulator program. The idea is to hear the stepper motors accelerate and decelerate with a change in pitch synced with my graphics. Same with the spindle motor, it has a pitch change as its rpm goes up and down. I thought of either adding real recorded motor sounds and modifying the pitch at run time or create synthetic sounds simulating the motors. Can anyone suggest the easiest way to achieve this? I have hardly any experience in sound programming besides the most basic stuff. I am programming in C# in Visual Studio.net
check these links out:
http://www.codeproject.com/KB/audio-video/cswavplayfx.aspx
http://channel9.msdn.com/coding4fun/articles/Generating-Sound-Waves-with-C-Wave-Oscillators
These should give an idea for the options (playing pre-recorded audio with some effects versus generating the sound synthetically) you mentioned, either way...