Reduce or predict sound latency - c#

I know that a certain delay between the “play-order” and the actual start of the playback of a sound is inevitable.
However, for my current project, I must be able to start a sound-playback at a certain moment in time. This moment is known, so the solution to the problem is ether to reduce the delay-time as much as possible or to somehow predict the latency and start the sound somewhat earlier (depending on the predicted latency).
I describe the problem in detail here:
https://naudio.codeplex.com/discussions/662236
My current solution is to use NAudio to play a sound and simultaneously observe the sound-output-volume. This way I can measure the latency and use it to time the “play-order” for the following sounds.
This way I get decent results (about 30 ms deviation from the supposed play-time), but I wanted to ask if you guys have better suggestions.
Best regards and many thanks

Related

Beat Detection Algorithm

I'm currently working on an idea for a game i have that involves beat detction. Th engine im working with is Unity, and I've never had any experience with audio, coding wise, so be gentle :)
I've looked at several articles and tested out several algorithms including some of my own, but none we're really successful nor accurate enough, and i feel like I've been getting something wrong this entire time.
Specifically I've tried implementing the idea's presented here:
http://archive.gamedev.net/archive/reference/programming/features/beatdetection/index.html
but with little success, i still think im skipping over something and i cant quite pinpoint it.
If someone could provide an explanation about how to make an actually accurate beat detector i would be very grateful.
EDIT:
some people were confused as to what im having trouble with. Here is my latest try at detecting beats, i still dont understand why it's so inaccurate:
http://pastebin.com/BD8y9tfz
in this i used (R1) equation in the link i posted above to compute the instant energy from the 1024 samples i took, and then i used (R3) to calculate the local average sound energy from the buffer containing all the previous instant energy calculations, then i checked if there is a significant rise in instant energy compared to the average local sound energy, if there is, it means there is a beat, if there isn't, the program continues as usual.
(stupid reputation system doesnt let me post links and pictures ): ).
Edit 2:
added implementation for R4,R5 and R6, still not working though.
added a bit of debug, and for some reason the constant is ridicolosely small, numbers like:
Constant: -103416
and Constant: -54793.28, ive got no clue why im getting these numbers, any help?

Frequencies and amplitudes of sound file in C#

I'm trying to create a program which gets the various "notes" in a sound file (WAV or MP3) and can get the frequency and amplitude of each. I've been searching around for this, and of course there is the problem of distinguishing individual "notes" in a music file which isn't a MIDI, but it seems that something along these lines can be done with NAudio or DirectSound. Any ideas?
Thanks!
What you are asking to do is extremely difficult.
Step one would be to convert your audio from a time domain to a frequency domain. That is, you take a number of samples, and do a Fourier transform (implemented in your software as FFT).
Next, you begin deciding what you call a note or not. This is as not as simple as picking out the loudest of the frequencies! Different instruments have different timbre, which is created by various harmonics. If you had a song of nothing but sine waves, this would be much simpler. However, you'll find that you'll start seeing notes where your ear tells you they don't exist.
Now, psychoacoustics come into play. It is entirely possible for humans to "hear" notes that do not even have a fundamental. This is particularly true in a musical context. If I were to take a trombone and start playing a scale downward, at some point, the fundamental disappears or is mostly gone. However, you will still perceive that scale as going downward, when in fact the fundamental sound has all-but disappeared. Things get really tricky at this point.
To answer your question, start with an FFT. Maybe this is sufficient for your needs. If not, begin reading the significant amount of technical literature on the subject.

WebCam: Switch on and of or or keep on

Good day,
I wasn't sure if I should post this on the software or hardware stack; I apologize in advance if this is an invalid question.
I wrote a little application that I am using to make time lapse videos - currently it only takes the pictures using a webcam. I know there are already a few available for download, but none of them did 100% what I wanted, and some of them were a little buggy, so I decided I'd rather create my own.
The interval at which the pictures are taken can be configured anywhere from 5 seconds up. Version 1.x would activate the camera and keep it on while in "Time Lapse Mode" and save images to disc at the specified intervals. This approach proved to be very memory intensive - understandably, in retrospect.
I decided to start from scratch - Version 2.x. This version would keep the camera off and only switch it on when it needed to take a picture, and switch it off again. This approach proved much more efficient. The reason for the minimum limit of 5 second intervals is because the camera takes about 1 second to switch on and then roughly the same amount of time to switch off. Perhaps in the future I could change it to keep the camera on when the interval < 5. For now, however, for what I actually want to use it, this will do perfectly.
When I was little we, as children, were told that switching an incandescent bulb on and off and on and off is not good for the bulb - according to a colleague of mine, reliable in that field, this is true.
This got me thinking. Could it be harmful to my webcam if I switch it on and off at, say, 10 second intervals for, let's say, a day or two? And how would switching on and off compare to keeping the device on for a few days? I don't understand what happens on a hardware level so I can't say.
I suppose I have a couple of options:
Switch the application on and off as required to take the pictures. This could result in the camera being switched a few thousand times a day.
Keep the camera on. This could mean the camera might be active for very long periods of time. What if I want to create a time lapse video over a month? Or even a year? Not to mention the memory problem.
Switch between the two modes. When interval < 2 minutes ? keep on : switch. This seems like the best of both worlds but now I'm faced with the memory problem when interval < 2 minutes
Thank you in advance for any and all comments and suggestions - much appreciated.
Kind regards,
me.
Could it be harmful to my webcam if I switch it on and off at, say, 10 second intervals for, let's say, a day or two?
Switching the camera on and off will have no affect on it's lifespan.
How about keeping it on for long periods of time?
Well that really depends on the camera but for something as low powered as a webcam you should be able to run it for many many years before it begins to fail.
Not sure how you are getting frames from your camera but it should not be extremely memory intensive. Using AForge.NET you can pretty simply grab frames from you camera. Tutorial If you could post your code I could better see how to point you in the direction of optimizing it.

How to find a solution for my Gin problem?

Recently I was playing a game of Gin with my grandmother. We played a whole afternoon and as far as I can remember, I didn't won a single game.
So I told here that it with the help of computers it could become a much better player. She couldn't believe how computers could be useful there and that's why I want to demonstrate it.
I already implemented part of the logic, but now I have the problem that my solver is really not so sexy because he mainly is based on a brute force method. That is I calculated all the possibilities, score them according to the chances for a win and choose the best one. Is there any more sophisticated approach?
I'm talking about standard Gin. The implementation is done in C#.
I'm not 100% familiar with gin, but one thing to keep in mind is each strategy can be broken.
When you play, you have to play multiple players. Do they both have the same strategy? Do they have different strategy? If we played you might beat me in gin, but I might beat your grandma. How does your grandma know how to play? If you were given the same hand she does, how would she play it differently than yours? Yes you can take a look at the statistics, but your grandma doesn't play by statistics - she plays by experience. If you want to make it more sophisticated ask yourself "How can I factor experience into the hand?"

How can I compare two captures to see which one is louder?

Given two byte arrays of data captured from a microphone, how can I determine which one has more spikes in noise? I would assume there is an algorithm I can apply to the data, but I have no idea where to start.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
If it helps, I am using the Microsoft.Xna.Framework.Audio.Microphone class to capture the sound.
you can convert each sample (normalised to a range 1.0 to -1.0) into a decibel rating by applying the formula
dB = 20 * log-base-10 (sample-value)
To be honest, so long as you don't mind the occasional false positive, and your microphone is set up OK, you should have no problem telling the difference between a baby crying and ambient background noise, without going through the hassle of doing an FFT.
I'd recommend you having a look at the source code for a noise gate, which does pretty much what you are after, with configurable attack times & thresholds.
First use a Fast Fourier Transform to transform the signal into the frequency domain.
Then check if the signal in the typical "cry-frequencies" is significantly higher than the other amplitudes.
The preprocessor of the speex codec supports noise vs signal detection, but I don't know if you can get it to work with XNA.
Or if you really want some kind of loudness calculate the sum of squares of the amplitudes from the frequencies you're interested in (for example 50-20000Hz) and if the average of that over the last 30 seconds is significantly higher than the average over the last 10 minutes or exceeds a certain absolute threshold sound the alarm.
Louder at what point? The signal's average amplitude will tell you which one is louder on average, but that is kind of a dumb, brute force way to go about it. It may work for you in practice though.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
Ok, so, I'm just throwing out ideas here; I am by no means an expert on audio processing.
If you know your input, i.e., a baby crying (relatively loud with a high pitch) versus ambient noise (relatively quiet), you should be able to analyze the signal in terms of pitch (frequency) and amplitude (loudness). Of course, if during he recording someone drops some pots and pans onto the kitchen floor, that will be tough to discern.
As a first pass I would simply traverse the signal, maintaining a standard deviation of pitch and amplitude throughout, and then set a flag when those deviations jump beyond some threshold that you will have to define. When they come back down you may be able to safely assume that you captured the baby's cry.
Again, just throwing you an idea here. You will have to see how it works in practice with actual data.
I agree with #Ed Swangren, it will take a lot of playing with samples of data for a lot of sources. To me, it sounds like the trick will be to limit or hopefully eliminate false positives. My experience with babies is they are much louder crying than the environment. so, keeping track of the average measurements (freq/amp/??) of the normal environment and then classifying how well the changes match the characteristics of a crying baby which changes from kid to kid, so you'll probably want a system that 'learns'. Best of luck.
update: you might find this library useful http://naudio.codeplex.com/

Categories