nAudio odd buffer values from playing files

nAudio odd buffer values from playing files - c#

The float array buffers I'm getting from nAudio seem really odd, when I replay it sounds perfect but graphing the buffer showed a picture that looked mostly like noise. It took me a while but I think I've made some headway but I'm a little stuck.
The float array that comes out has a block align of 8, so 4 floats per sample (I'm recording at 16bit so one float should easily hold this. However there are 2 and often 3 (for load) floats provided per sample. I ended up graphing it - Charts of Data. The top picture is the closest I can get to reconstructing the wave, the bottom is the wave as recorded and the middle is a chart of the raw data.
It seems to me that each float is simply holding a byte value but I'm very confused as to the first value which appears to be some kind of scaling factor.
Before I go into to much detail on what I've found I might just leave it at that with the hope Mark will know exactly how/why I am seeing this.
My current best attempt to decode this data is to convert the numbers to bytes then left shift them together which provides the top chart of the attached. I'm fairly sure that there is more to it however.

OK so after a bit more tweaking I figured out that the float array was in fact an array of bytes from floats. Not sure if that makes sense, each "float" in the 4 floats per sample was raw bits that made up floats.
In the end this made it incredibly easy to process the buffer into an array of floats as follows;
_samplesToProcess = floatsIn.Length / WaveFormat.BlockAlign * WaveFormat.Channels;
if (_rawFloatsOut.Length < _samplesToProcess)
_rawFloatsOut = new float[_samplesToProcess];
Buffer.BlockCopy(floatsIn, 0, _rawFloatsOut, 0, floatsIn.Length);
BufferProcessor(_rawFloatsOut);

Related

3D Buffers in HLSL?

I wanna send a series of integers to HLSL in the form of a 3D array using unity. I've been trying to do this for a couple of days now, but without any gain. I tried to pack the buffers into each other (StructuredBuffer<StructuredBuffer<StructuredBuffer<int>>>), but it simply won't work. And I need to make this thing resizable, so I can't use arrays in structs. What should I do?
EDIT: To clarify a bit more what I am trying to do here, this is a medical program. When you go make a scan of your body, some files are generated. Those files are called DICOM files(.dcm). Those files are given out to a doctor. The doctor should open the program, select all of the DICOM files and load them. Each DICOM file contains an image. However, those images are not as the normal images used in our daily life. Those images are grayscale and each pixel has a value that ranges between -1000 to a couple of thousands, so each pixel is saved as 2 bytes(or an Int16). I need to generate a 3D model of the body that got scanned, so I'm using the Marching Cubes algorithm to generate it(have a look at Polygonising a Scalar Field). The problem is I used to loop over each pixel in about 360 512*512 sized images, which took too much time. I used to read the pixel data from each file once I needed it when I used the CPU. Now I'm trying to make this process occur at runtime. I need to send all of the pixel data to the GPU before processing it. That's my problem. I need the GPU to read data from disk. Because that ain't possible, I need to send 360*512*512*4 bytes of data to the GPU in the form of 3D array of Ints. I'm also planning to keep the data there to avoid retransfer of that huge amount of memory. What should I do? Refer to this link to know more about what I'm doing

From what I've understood, I would suggest to try the following:
Flatten your data (nested buffers are not what you want on your gpu)
Split your data across multiple ComputeBuffers if necessary (when I played around with them on a Nvidia Titan X I could store approximately 1GB of data per buffer. I was rendering a 3D point cloud with 1.5GB of data or something, the 360MBytes of data you mentioned should not be a problem then)
If you need multiple buffers: let them overlap as needed for your marching cubes algorithm
Do all of your calculations in a ComputeShader (I think requires DX11, if you have multiple buffer, run it multiple times and accumulate your results) and then use the results in a standard shader which your call from OnPostRender function (use Graphics.DrawProcedural inside to just draw points or build a mesh on the gpu)
Edit (Might be interesting to you)
If you want to append data to a gpu buffer (because you don't know the exact size or you can't write it to the gpu at once), you can use AppendBuffers and a ComputeShader.
C# Script Fragments:
struct DataStruct
{
...
}
DataStruct[] yourData;
yourData = loadStuff();
ComputeBuffer tmpBuffer = new ComputeBuffer(512, Marshal.SizeOf(typeof(DataStruct)));
ComputeBuffer gpuData = new ComputeBuffer(MAX_SIZE, Marshal.SizeOf(typeof(DataStruct)), ComputeBufferType.Append);
for (int i = 0; i < yourData.Length / 512; i++) {
// write data subset to temporary buffer on gpu
tmpBuffer.SetData(DataStruct.Skip(i*512).Take((i+1)*512).ToArray()); // Use fancy Linq stuff to select data subset
// set up and run compute shader for appending data to "gpuData" buffer
AppendComputeShader.SetBuffer(0, "inBuffer", tmpBuffer);
AppendComputeShader.SetBuffer(0, "appendBuffer", gpuData);
AppendComputeShader.Dispatch(0, 512/8, 1, 1); // 8 = gpu work group size -> use 512/8 work groups
}
ComputeShader:
struct DataStruct // replicate struct in shader
{
...
}
#pragma kernel append
StructuredBuffer<DataStruct> inBuffer;
AppendStructuredBuffer<DataStruct> appendBuffer;
[numthreads(8,1,1)]
void append(int id: SV_DispatchThreadID) {
appendBuffer.Append(inBuffer[id]);
}
Note:
AppendComputeShader has to be assigned via the Inspector
512 is an arbitrary batch size, there is an upper limit of how much data you can append to a gpu buffer at once, but I think that depends on the hardware (for me it seemed to be 65536 * 4 Bytes)
you have to provide a maximum size for gpu buffers (on the Titan X it seems to be ~1GB)

In Unity we currently have the MaterialPropertyBlock that allows SetMatrixArray and SetVectorArray, and to make this even sweeter, we can set globally using the Shader static helpers SetGlobalVectorArray and SetGlobalMatrixArray. I believe that these will help you out.
In case you prefer the old way, please look at this quite nice article showing how to pass arrays of vectors.

How to calculate dB from StreamVolumeEventArgs.MaxSampleValues of NAudio?

I am trying to implement a volume meter to help users select their microphone using NAudio. I need to do my best to weed out devices that just have background noise and insure I show something when they talk.
We are currently using version 1.7.3 within a Unity3D application so none of the MMDevice related approaches are available as they crash.
I am using a WaveInEvent that I feed into a WaveInProvider that I subsequently feed to a SampleChannel. I feed the SampleChannel into a MeteringSampleProvider which I have subscribed to the StreamVolume event.
In my OnPostVolumeMeter event handler when I receive the StreamVolumeEventArgs (I named the parameter e) I'm wondering how to calculate decibels. I have seen plenty of examples that fish out the peak volume (or sometimes it seems to be referred to as an amplitude) from e.MaxSampleValues[0]. Some examples check whether it is a stereo signal and will grab the max between e.MaxSampleValues[0] or e.MaxSampleValues[1].
Anyway, what are the values of this number? Is it a percentage? They are relatively small decimals (10^-3 or 10^-4) when I hillbilly debug to the console.
Is the calculation something like,
var peak = e.MaxSampleValues[0];
if (e.MaxSampleValues.Length > 1)
{
peak = Mathf.Max(e.MaxSampleValues[0], e.MaxSampleValues[1]);
}
var dB = Mathf.Max(20.0f*Mathf.Log10(peak), -96.0f);
or do I need to divide peak by 32768.0? As in,
var dB = Mathf.Max(20.0f*Mathf.Log10(peak/32768.0f), -96.0f);
Is this approach totally incorrect and I need to collect a buffer of samples that I do an RMS sort of calculation where I calculate the square root of the sum of the averages divided by the number of samples all divided by 32768 and feed that into the Log10?
I've seen several references to look at the AudioPlaybackPanel of the NAudioDemo and it sets the volumeMeter Amplitude to be the values of e.MaxSampleValues[0] and e.MaxSampleValues[1]

looking at the date of your post this is probably a solved issue for you but of the benefit of others here goes.
Audio signals swing between negative and positive values in a wave. The frequency of the swing and the Amplitude or height of the swing effect what you hear.
You are correct in saying you are looking for the amplitude to see if audio is present.
For a meter as the sample rate is much higher than the refresh rate of any meter you are likely to display, you will need to either record the peak using math.max or do an average over a number of samples. In your case either would work, unless you are trying to show an accurate meter in bdFS the db calculation would not be needed.
In apps where I have been looking to trigger things based on the presence of audio or lack their of. I normally convert the samples to a float this will give you a range between 0 and 1 and then pick a threshold say 0.2 and say if any sample is above that we have audio.
a float also provides a nice indicative meter for display. Note if your app was for a pro audio application and you were asking about accurate metering my answer would be totally different.

How to detect string when pitch-tracking on electric guitar?

Hi I'm a noob in audio related coding and I'm working in a pitch tracking DLL that I will use to try to create a sort of open-source version of the video-game Rocksmith as a learning experience.
So far I have managed to get the FFT to work so I can detect pitch frequency (Hz) then by using an algorithm and the table below I can manage to determine the octave (2th to 6th) and the note (C to B) for played note.
The next step is to detect the string so I can determine the fret.
I've been thinking about it and in theory I can work with this, I will know when the user is playing the right note but the game could be "Hack" because by just using the Hz the game is not able to detect if a note is played in the right string. For example 5th string + 1th fret = C4 261.63Hz is equals to 6th string + 5th fret = C4 261.63Hz.
The chances of having an user playing a note in the wrong string and getting it right is low, but I think it would be really good to know the string so I can provide to the users some error feedback when they play the wrong string (Like you should go a string up or down).
Do you know what can I do to detect the string? Thanks in advance :)
[edit]
The guitar and strings that we are using affect the timbre so analyzing the timbre seems to not be a easy way of detecting strings:
"Variations in timbre on your guitar are produced by an enormous number of factors from pickup design and position, the natural resonances and damping in your guitar due to the wood used (that's a different sort of timber!) and its construction and shape, the gauge and age of your strings, your playing technique, where you fret and pluck the string, and so on."

This might be a little bit late because the post is one years old. But here's a solution, which I found out after long research for pitch detecting a guitar.
THIS IS WHY FFT DOESN'T WORK :
You cannot use FFT since the result gives you a linear array, and the sound is calculated logarithmically (exponential distance between notes). Plus, FFT gives you an array of bins in which your frequency COULD BE, it doesnt give you the precise result.
THIS IS WHAT I SUGGEST :
Use dywapitchtrack. it's a library that uses a wavelet algorythm, which works directly on your wave instead of calculating large bins like FFT.
description:
The dywapitchtrack is based on a custom-tailored algorithm which is of very high quality:
both very accurate (precision < 0.05 semitones), very low latency (< 23 ms) and
very low error rate. It has been thoroughly tested on human voice.
It can best be described as a dynamic wavelet algorithm (dywa):
DOWNLOAD : https://github.com/inniyah/sndpeek/tree/master/src/dywapitchtrack
USE(C++):
put the .c and .h where you need it and import it in your project
include the header file
//Create a dywapitchtracker Object
dywapitchtracker pitchtracker;
//Initialise the object with this function
dywapitch_inittracking(&pitchtracker);
When your buffer is full (buffer needs to be at 44100 resolution and power of 2 of length, mine is 2048):
//use this function with your buffer
double thePitch = dywapitch_computepitch(&pitchtracker, yourBuffer, 0, 2048);
And voilà, thePitch contains precisely what you need. (feel free to ask question if something is unclear)

An simple FFT peak estimator is not a good guitar pitch detector/estimator, due to many potentially strong overtones. There exist more robust pitch estimation algorithms (search stackoverflow and DSP.stackexchange). But if you require the players to pre-characterize each string on their individual instruments, both open and fretted, before starting the game, an FFT fingerprint of those characterizations might be able to differentiate the same note played on different strings on some guitars. The thicker strings will give off slightly different ratios of energy in some of the higher overtones, as well as different amounts of slight inharmonicity.

The other answers seem to suggest a simple pitch detection method. However, it is something you will have to research.
Specifically, compare the overtones of 5th string 1st fret to sixth string 5th fret. that is, only look at 261.63*2, 261.63*3, *4, etc. Also, try looking at 261.63*0.5. Compare the amplitudes of the two signals at these freqs. There might be a pattern that could be detected.

FFT on WP7 shows two mirrors

Hello
I'm exploring the audio possibilities of the WP7 platform and the first stumble I've had is trying to implement a FFT using the Cooley-Tukey method. The result of that is that the spectrogram shows 4 identical images in this order: one normal, one reversed, one normal, one reversed.
The code was taken from another C# project (for desktop), the implementation and all variables seem in place with the algorithm.
So I can see two problems right away: reduced resolution and CPU wasted to generate four identical spectrograms.
Given a sample size of 1600 (could be 2048) I know have only 512 usable frequency information which leaves me with a 15Hz resolution for an 8kHz frequency span. Not bad, but not so good either.
Should I just give up on the code and use NAudio? I cannot seem to have an explanation why the spectrum is quadrupled, input data is ok, algorithm seems ok.

This sounds correct. You have 2 mirrors, I can only assume that one is the Real part and the other is the Image part. This is standard FFT.
From the real and image you can compute the magnitude or amplitude of each harmonic which is more common or compute the angle or phase shift of each harmonic which is less common.
Gilad.

I switched to NAudio and now the FFT works. However I might have found the cause (I probably won't try to test again): when I was constructing an array of double to feed into the FFT function, I did something like:
for (int i = 0; i < buffer.Length; i+= sizeof(short))
{
samples[i] = ReadSample(buffer, i);
}
For reference, 'samples' is the double[] input to fft, ReadSample is something that takes care of little/big endian. Can't remember right now how the code was, but it was skipping every odd sample.
My math knowledge has never been great but I'm thinking this induces some aliasing patterns which might in the end produce the effect I experienced.
Anyway, problem worked around, but thanks for your input and if you can still explain the phenomenon I am grateful.

How can I make a Pink Noise generator?

((Answer selected - see Edit 5 below.))
I need to write a simple pink-noise generator in C#. The problem is, I've never done any audio work before, so I don't know how to interact with the sound card, etc. I do know that I want to stay away from using DirectX, mostly because I don't want to download a massive SDK just for this tiny project.
So I have two problems:
How do I generate Pink Noise?
How do I stream it to the sound card?
Edit: I really want to make a pink noise generator... I'm aware there are other ways to solve the root problem. =)
Edit 2: Our firewall blocks streaming audio and video - otherwise I'd just go to www.simplynoise.com as suggested in the comments. :(
Edit 3: I've got the generation of white-noise down, as well as sending output to the sound card - now all I need to know is how to turn the white-noise into pink noise. Oh - and I don't want to loop a wav file because every application I've tried to use for looping ends up with a tiny little break in between loops, which is jarring enough to have prompted me in this direction in the first place...
Edit 4: ... I'm surprised so many people have jumped in to very explicitly not answer a question. I probably would have gotten a better response if I lied about why I need pink noise... This question is more about how to generate and stream data to the sound card than it is about what sort of headphones I should be using. To that end I've edited out the background details - you can read about it in the edits...
Edit 5: I've selected Paul's answer below because the link he provided gave me the formula to convert white noise (which is easily generated via the random number generator) into pink noise. In addition to this, I used Ianier Munoz's CodeProject entry "Programming Audio Effects in C#" to learn how to generate, modify, and output sound data to the sound card. Thank you guys for your help. =)

Maybe you can convert the C/C++ code here to C#:
http://www.firstpr.com.au/dsp/pink-noise/
The easiest way to get sound to the sound card is to generate a wav (spit out some hardcoded headers and then sample data). Then you can play the .wav file.

Pink noise is just white noise put through a -3dB/octave LPF. You can generate white noise using rand() (or any function that generates uniformly random numbers).
Streaming stuff to the soundcard is reasonably trivial, as long as you have Google handy. If you choose to avoid DirectX, consider using PortAudio or ASIO for interfacing with the soundcard... although I think you're gonna have to use C++ or C.
Other than that, why waste CPU time generating it? Loop a damn WAV file!

bit late to this i realise, but anyone coming across it for answers should know that pink noise is white noise with -3dB/octave, not -6 as stated above, which is actually brown noise.

Here's a very simple way to create pink noise, which just sums lots of waves spaced logarithmically apart, together! It may be too slow for your purposes if you want the sound created in realtime, but further optimization is surely possible (e.g: a faster cosine function).
The functions outputs a double array with values from -1 to 1. This represents the lowest and highest points in the waveform respectively.
The quality parameter represents the number of waves produced to make the sound. I find 5000 waves (about 40 intervals per semitone) is just about the threshold where I can't detect any noticeable improvement with higher values, but to be on the safe side, you could (optionally) increase this to about 10,000 waves or higher. Also, according to Wikipedia, 20 hertz is around the lower limit of human perception in terms of what we can hear, but you can change this too if you want.
Note the sound gets quieter with a higher quality value due to technical reasons, so you may (optionally) want to adjust the volume via the volumeAdjust parameter.
public double[] createPinkNoise(double seconds, int quality=5000, double lowestFrequency=20, double highestFrequency = 20000, double volumeAdjust=1.0)
{
long samples = (long)(44100 * seconds);
double[] d = new double[samples];
double[] offsets = new double[samples];
double lowestWavelength = highestFrequency / lowestFrequency;
Random r = new Random();
for (int j = 0; j < quality; j++)
{
double wavelength = Math.Pow(lowestWavelength, (j * 1.0) / quality) * 44100 / highestFrequency;
double offset = r.NextDouble() * Math.PI*2; // Important offset is needed, as otherwise all the waves will be almost in phase, and this will ruin the effect!
for (long i = 0; i < samples; i++)
{
d[i] += Math.Cos(i * Math.PI * 2 / wavelength + offset) / quality * volumeAdjust;
}
}
return d;
}

Here's is an example of what the playback thread looks like. I'm using DirectSound to create a SecondaryBuffer where the samples are written. As you can see it's pretty straightforward:
/// <summary>
/// Thread in charge of feeding the playback buffer.
/// </summary>
private void playbackThreadFn()
{
// Begin playing the sound buffer.
m_playbackBuffer.Play( 0, BufferPlayFlags.Looping );
// Change playing state.
IsPlaying = true;
// Playback loop.
while( IsPlaying )
{
// Suspend thread until the playback cursor steps into a trap...
m_trapEvent.WaitOne();
// ...read audio from the input stream... (In this case from your pink noise buffer)
Input.Collect( m_target, m_target.Length );
// ...calculate the next writing position...
var writePosition = m_traps[ ((1 & m_pullCounter++) != 0) ? 0 : 1 ].Offset;
// ...and copy audio to the device buffer.
m_playbackBuffer.Write( writePosition, m_deviceBuffer, LockFlag.None );
}
// Stop playback.
m_playbackBuffer.Stop();
}
If you need more details on how it works I'll be glad to help.

If you're on Linux, you can use SOX (you may have it already, try the play command).
play -t sl - synth 3 pinknoise band -n 1200 200 tremolo .1 40 < /dev/zero

As a quick and dirty way to do it, how about just looping a pink noise wav in your audio player? (Yes, I know part of the fun is to make it yourself....)

What about an .mp3 sample of Pink Noise on repeat?

You could use Audacity to generate as much pink noise as you want, and then repeat it.
Or you could dig into the source code and see how Audacity does the pink noise generation.

I can't speak about C#, but you might be better off with some good noise canceling headphones and your favorite mp3's.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.