Given two byte arrays of data captured from a microphone, how can I determine which one has more spikes in noise? I would assume there is an algorithm I can apply to the data, but I have no idea where to start.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
If it helps, I am using the Microsoft.Xna.Framework.Audio.Microphone class to capture the sound.
you can convert each sample (normalised to a range 1.0 to -1.0) into a decibel rating by applying the formula
dB = 20 * log-base-10 (sample-value)
To be honest, so long as you don't mind the occasional false positive, and your microphone is set up OK, you should have no problem telling the difference between a baby crying and ambient background noise, without going through the hassle of doing an FFT.
I'd recommend you having a look at the source code for a noise gate, which does pretty much what you are after, with configurable attack times & thresholds.
First use a Fast Fourier Transform to transform the signal into the frequency domain.
Then check if the signal in the typical "cry-frequencies" is significantly higher than the other amplitudes.
The preprocessor of the speex codec supports noise vs signal detection, but I don't know if you can get it to work with XNA.
Or if you really want some kind of loudness calculate the sum of squares of the amplitudes from the frequencies you're interested in (for example 50-20000Hz) and if the average of that over the last 30 seconds is significantly higher than the average over the last 10 minutes or exceeds a certain absolute threshold sound the alarm.
Louder at what point? The signal's average amplitude will tell you which one is louder on average, but that is kind of a dumb, brute force way to go about it. It may work for you in practice though.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
Ok, so, I'm just throwing out ideas here; I am by no means an expert on audio processing.
If you know your input, i.e., a baby crying (relatively loud with a high pitch) versus ambient noise (relatively quiet), you should be able to analyze the signal in terms of pitch (frequency) and amplitude (loudness). Of course, if during he recording someone drops some pots and pans onto the kitchen floor, that will be tough to discern.
As a first pass I would simply traverse the signal, maintaining a standard deviation of pitch and amplitude throughout, and then set a flag when those deviations jump beyond some threshold that you will have to define. When they come back down you may be able to safely assume that you captured the baby's cry.
Again, just throwing you an idea here. You will have to see how it works in practice with actual data.
I agree with #Ed Swangren, it will take a lot of playing with samples of data for a lot of sources. To me, it sounds like the trick will be to limit or hopefully eliminate false positives. My experience with babies is they are much louder crying than the environment. so, keeping track of the average measurements (freq/amp/??) of the normal environment and then classifying how well the changes match the characteristics of a crying baby which changes from kid to kid, so you'll probably want a system that 'learns'. Best of luck.
update: you might find this library useful http://naudio.codeplex.com/
Related
I got a (basic) voxel engine running and a water system that looks (and I assume basically works) like this: https://www.youtube.com/watch?v=Q_TdeGIOOts (not my game).
The water values are stored in a 3d Array of floats, and every 0.05s it calculates water flow by checking the voxel below and adjacent (y-1, x-1, x+1, z-1, z+1) and adds the value.
This system works fine (70+ fps) for small amounts of water, but when I start calculating water on 8+ chunks, it gets too much.
(I disabled all rendering or mesh creation to check if that is the bottleneck, it isnt. Its purely the flow calculations).
I am not a very experienced programmer so I wouldnt know where to start optimizing, apart from making the calculations happen in a coroutine as I already did.
In this post: https://gamedev.stackexchange.com/questions/55414/how-to-define-areas-filled-with-water (near the bottom) Boreal suggests running it in a compute shader. Is this the way to go for me? And how would I go about such a thing?
Any help is much appreciated.
If you're really calculating a voxel based simulation, you will be expanding the number of calculations geometrically as your size increases, so you will quickly run out of processing power on larger volumes.
A compute shader is great for doing massively parallel calculations quickly, although it's a very different programming paradigm that takes some getting used to. A compute shader will look at the contents of a buffer (ie, a 'texture' for us civilians) and do things to it very quickly -- in your case the buffer will probably be a buffer/texture whose pixel values represent water cells. If you want to do something really simple like increment them up or down the compute shader uses the parallel processing power of the GPU to do it really fast.
The hard part is that GPUs are optimized for parallel processing. This means that you can't write code like "texelA.value += texelB.value" - without extra work on your part, each fragment of the buffer is processed with zero knowledge of what happens in the other fragments. To reference other texels you need to read the texture again somehow - some techniques read one texture multiple times with offsets (this GL example does this to implement blurs, others do it by repeatedly processing a texture, putting the result into a temporary texture and then reprocessing that.
At the 10,000 foot level: yes, a compute shader is a good tool for this kind of problem since it involves tons of self-similar calculation. But, it won't be easy to do off the bat. If you have not done conventional shader programming before, You may want to look at that first to get used to the way GPUs work. Even really basic tools (if-then-else or loops) have very different performance implications and uses in GPU programming and it takes some time to get your head around the differences. As of this writing (1/10/13) it looks like Nvidia and Udacity are offering an intro to compute shader course which might be a good way to get up to speed.
FWIW you also need pretty modern hardware for compute shaders, which may limit your audience.
I've been doing a lot of research on this and I'm still having trouble, so I'm hoping someone with a strong knowledge of Digital (Audio) Signal Processing can point me in the right direction.
I've been surprised at how hard it is to find a library that can perform accurate beat detection. I know next to nothing about DSP and FFTs. What I would really like is a library where I can simply say:
BPMDetect detector = new BPMDetect();
float bpm = detector.GetBpm(filename);
But apparently this is too much to ask for. The closest I've gotten is by using the SoundTouch library, but I've recently discovered that the BPM detection there is very unreliable. I know bpm detection isn't an exact science, but SoundTouch claimed that one of my music files was 170 BPM, while abyssmedia's BPM Counter program accurately puts it at 120 BPM. So I know it's possible. I'm more concerned with accuracy than speed.
So my question is: is there a C# library that can do this without having to know a lot about DSP?
You should be able to do basic beat detection without resorting to FFTs. I've built beat detection algorithms that did this.
The basic idea is to take your audio signal and partition it into small buckets of time, say 10msec or so. For each bucket compute the RMS power: for each sample s[i] in the bucket, normalize to -1.0...1.0 and them compute the sum of s[i]**2.
Now you've got an array of power (= loudness) for little grains of time.
Next take the rate of change (derivative) from bucket to bucket: d[i] = s[i+1] - s[i].
The array of derivates tells you how fast the loudness is increasing from grain to grain. The faster the change, the more intense the beat is.
Here's where it gets a bit artful. You put a threshold on these d[i] values to decide which ones are sudden enough to constitute a beat. Then you do autocorrelation to see if they are steady and line up in a regular pattern.
I'm trying to create a program which gets the various "notes" in a sound file (WAV or MP3) and can get the frequency and amplitude of each. I've been searching around for this, and of course there is the problem of distinguishing individual "notes" in a music file which isn't a MIDI, but it seems that something along these lines can be done with NAudio or DirectSound. Any ideas?
Thanks!
What you are asking to do is extremely difficult.
Step one would be to convert your audio from a time domain to a frequency domain. That is, you take a number of samples, and do a Fourier transform (implemented in your software as FFT).
Next, you begin deciding what you call a note or not. This is as not as simple as picking out the loudest of the frequencies! Different instruments have different timbre, which is created by various harmonics. If you had a song of nothing but sine waves, this would be much simpler. However, you'll find that you'll start seeing notes where your ear tells you they don't exist.
Now, psychoacoustics come into play. It is entirely possible for humans to "hear" notes that do not even have a fundamental. This is particularly true in a musical context. If I were to take a trombone and start playing a scale downward, at some point, the fundamental disappears or is mostly gone. However, you will still perceive that scale as going downward, when in fact the fundamental sound has all-but disappeared. Things get really tricky at this point.
To answer your question, start with an FFT. Maybe this is sufficient for your needs. If not, begin reading the significant amount of technical literature on the subject.
I am looking for a way to approximate a volume of fluid moving over a heightmap. The easiest solution I can think of is to approximate it as a large number of non-drawn spheres, of small diameter (<0.1m). I would then place a visible plane representing the surface of the water on "top" of the spheres, at the locations they came to rest. To my knowledge, no managed physics engines contain a built in fluid simulator, hence the question.
Implementation would consist of using a physics engine such as JigLibX, which is capable of simulating the motion of the spheres. To determine the height of the planes, I was thinking of averaging the maximum height of each sphere that is on the top layer of a grouping.
I dont expect performance to be great, but would it be approachable for real time? If not, could I use this simulation to pre-bake lines of flow?
I hope this makes sense, I really want opinions/suggestions as to whether this is feasible, or if there is a better way of approaching this.
Thanks for any help, Venatu
(If its relevant, my target platform is XNA 4.0, using C#. Windows only at this point in time, so PhysX/Havok are possibilities for the simulation, but I would prefer a managed solution)
I haven't seen realistic fluid dynamics in real time without using something like PhysX as of yet - probably because the calculations needed are so complicated! The problem with your approach as I see it would come with the resting contact of all those spheres as they settled down, which takes up a lot of processing power. Lots of resting contact points are notorious for eating into performance very quickly, even on the most powerful of desktops.
If you are going down this route then I'd recommend modelling the fluid as an elastic but solid body using spring based physics, where the force applied to one part of the water would use springs to propagate out to the rest. This gives you the option of setting a breaking point for the springs and separating the body into two or more bodies when that happens (and the reverse for coming back together.) This can give you the foundation for things like spray. It's also a more versatile approach in terms of performance, because you can choose the number of particles and springs you use to approximate your model.
It's a big and complicated topic, but I hope that provided at least some insight!
The most popular method to simulate fluids in real-time is Smoothed-particle hydrodynamics.
Several useful links:
http://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
http://www.plunk.org/~trina/thesis/html/thesis_toc.html
In addition to simulation itself you will also need some specialized broad-phase collision detection algorithms such as sweep-and-prune or hashing cells.
And you're right, there is no completed 2d solutions for the fluid dynamics.
I have a particle system with X particles.
Each particle tests for collision with other particles. This gives X*X = X^2 collision tests per frame. For 60f/s, this corresponds to 60*X^2 collision detection per second.
What is the best technological approach for these intensive calculations? Should I use F#, C, C++ or C#, or something else?
The following are constraints
The code is written in C# with the latest XNA
Multi-threaded may be considered
No special algorithm that tests the collision with the nearest neighbors or that reduces the problem
The last constraint may be strange, so let me explain.
Regardless constraint 3, given a problem with enormous computational requirement what would be the best approach to solve the problem.
An algorithm reduces the problem; still the same algorithm may behave different depending on technology. Consider pros and cons of CLR vs native C.
The simple answer is "measure it". But take a look at this graph (that I borrowed from this question - which is worth your reading).
C++ is maybe 10% faster than MS's C# implementation (for this particular calculation) and faster still against Mono's C# implementation. But in real world terms, C++ is not all that much faster than C#.
If you're doing hard-core number crunching, you will want to use the SIMD/SSE unit of your CPU. This is something that C# does not normally support - but Mono is adding support for through Mono.Simd. You can see from the graph that using the SIMD unit gives a significant performance boost to both languages.
(It's worth noting that while C++ is still "faster" than C#, the choice of language has only a small effect on performance, compared to the choice of what hardware to use. As was mentioned in the comments - your choice of algorithm will have by far the greatest effect.)
And finally, as Jerry Coffin mentioned in his answer, that you could also do the processing on the GPU. I imagine that it would be even faster than SIMD (but as I said - measure it). Using the GPU has the added beneift of leaving the CPU free to do other tasks. The downside is that your end-users will need a reasonable GPU.
You should probably consider doing this on the GPU using something like CUDA, OpenCL, or a DirectX compute shader.
Sweep and prune is a broad phase collision detection algorithm that may be well suited to this task. If you can make use of temporal coherency, that being from frame to frame the location differences are generally small a reduction in processing may be obtained. A good book on the subject is "real time collision detection".
For a simple speed up you could sort by one axis first and loop through checking for a collision in that axis before doing a full check... For each particle you only need to look further in the array until you find one that doesn't collide in that axis then you can move to the next one.