Getting sample data from NAudio WasapiLoopbackCapture - c#

I am using WasapiLoopbackCapture from the NAudio C# library to capture audio being sent to my computer speakers. My goal is to get the samples and send them to an FFT function to get the spectrum data.
I'm trying to find out how exactly to parse the incoming audio data, but it seems that the answer to my question is split up and scattered around several blog posts and Stack Overflow answers, so I would like to get a definitive answer for my situation.
My waveIn.WaveFormat says I have 32 bits/sample and 2 channels. It also says that the encoding is IeeeFloat.
From what I've gathered, left and right channel samples are interleaved (left, right, left, right, etc.) and if there are 4 bytes/sample, every 4 bytes in the buffer makes up a sample (this is probably obvious to most, but I want to verify). Also, is the data little endian? and will the BitConverter.ToSingle() function take care of that?
This is my code:
static void Main(string[] args)
{
IWaveIn waveIn = new WasapiLoopbackCapture();
waveIn.DataAvailable += waveIn_DataAvailable;
waveIn.StartRecording();
}
static void waveIn_DataAvailable(object sender, WaveInEventArgs e)
{
for (int i = 0; i < e.BytesRecorded; i += 8)
{
float leftSample = BitConverter.ToSingle(e.Buffer, i);
float rightSample = BitConverter.ToSingle(e.Buffer, i + 4);
}
}
Does this recover the samples correctly?

Related

Get Basic audio spectrum data in unity

I want to visualize if an audio clip has sound or not. The microphone and the
audiosource is working correctly but I am stuck with its visualizing part. I have hard time understanding the official document and I want a solution.
I tried the following code:
void Update () {
AnalyzeSound();
text1.text = "sound!\n"+ " rmsValue : " + rmsValue ;
}
void AnalyzeSound()
{
audio.GetOutputData(samples, 0);
//GetComponent rms
int i = 0;
float sum = 0;
for (; i < SAMPLE_SIZE; i++)
{
sum = samples[i] * samples[i];
}
rmsValue = Mathf.Sqrt(sum / SAMPLE_SIZE);
//get the dbValue
dbValue = 20 * Mathf.Log10(rmsValue / 0.1f);
}
Can I take rmsValue as the input of sound on microphone? or should I take the dbValue? what should be the threshold value?
in a few words, When can I say the microphone has sound?
There is no hard and fast definition that would separate noise from silence in all cases. It really depends on how loud the background noise is. Compare for example, silence recorded in an anechoic chamber vs silence recorded next to an HVAC system. The easiest thing to try is to experiment with different dB threshold values below which you consider the signal as noise and above which it is considered signal. Then adjust the threshold value up or down to suit your needs. Depending on the nature of the signal (e.g. music vs. speech) you could look into other techniques such as Voice Activity Detection (https://en.wikipedia.org/wiki/Voice_activity_detection) or a convolutional neural network to segment speech and music

NAudio: Using MixingSampleProvider correctly with VolumeSampleProvider

I have been using NAudio with the
"Fire and Forget Audio Playback with NAudio" tutorial (thank you Mark for this awesome utility!) as written here:
http://mark-dot-net.blogspot.nl/2014/02/fire-and-forget-audio-playback-with.html
I managed to add a VolumeSampleProvider to it, using the MixingSampleProvider as input. However, when I now play two sounds right after each other, the first sound always gets the volume of the second as well, even though the first is already playing.
So my question is: How do I add sounds with an individual volume per sound?
This is what I used:
mixer = new MixingSampleProvider(waveformat);
mixer.ReadFully = true;
volumeProvider = new VolumeSampleProvider(mixer);
panProvider = new PanningSampleProvider(volumeProvider);
outputDevice.Init(panProvider);
outputDevice.Play();
I realized (thanks to itsmatt) that the only way to make this work, is to leave the mixer alone and adjust the panning and volume of each CachedSound individually, before adding it to the mixer. Therefore I needed to rewrite the CachedSoundSampleProvider, using a pan and volume as extra input parameters.
This is the new constructor:
public CachedSoundSampleProvider(CachedSound cachedSound, float volume = 1, float pan = 0)
{
this.cachedSound = cachedSound;
LeftVolume = volume * (0.5f - pan / 2);
RightVolume = volume * (0.5f + pan / 2);
}
And this is the new Read() function:
public int Read(float[] buffer, int offset, int count)
{
long availableSamples = cachedSound.AudioData.Length - position;
long samplesToCopy = Math.Min(availableSamples, count);
int destOffset = offset;
for (int sourceSample = 0; sourceSample < samplesToCopy; sourceSample += 2)
{
float outL = cachedSound.AudioData[position + sourceSample + 0];
float outR = cachedSound.AudioData[position + sourceSample + 1];
buffer[destOffset + 0] = outL * LeftVolume;
buffer[destOffset + 1] = outR * RightVolume;
destOffset += 2;
}
position += samplesToCopy;
return (int)samplesToCopy;
}
I'm not 100% certain of what you are asking and I don't know if you solved this already but here's my take on this.
ISampleProvider objects play the "pass the buck" game to their source ISampleProvider via the Read() method. Eventually, someone does some actual reading of audio bytes. Individual ISampleProvider classes do whatever they do to the bytes.
MixingSampleProvider, for instance, takes N audio sources... those get mixed. When Read() is called, it iterates the audio sources and reads count bytes from each.
Passing it to a VolumeSampleProvider handles all the bytes (from those various sources) as a group... it says:
buffer[offset+n] *= volume;
That's going to adjust the bytes across the board... so every byte gets adjusted in the buffer by the volume multiplier;
The PanningSampleProvider just provides a multiplier to the stereo audio and adjusts the bytes accordingly, doing the same sort of thing as the VolumeSampleProvider.
If you want to individually handle audio source volumes, you need to handle that upstream of the MixingSampleProvider. Essentially, the things that you pass to the MixingSampleProvider need to be able to have their volume adjusted independently.
If you passed a bunch of SampleChannel objects to your MixingSampleProvider... you could accomplish independent volume adjustment. The Samplechannel class incorporates a VolumeSampleProvider object and provides a Volume property that allows one to set the volume on that VolumeSampleProvider object.
SampleChannel also incorporates a MeteringSampleProvider that provides reporting of the maximum sample value during a given period. It raises an event that gives you an array of those values, one per channel.

Using Naudio WaveIn for both performing FFT transformation and writing to disk in real-time (NOT ASIO)

I am currently creating a Winforms application for Windows 8.1, I have been able to perform an FFT on the input data from the devices microphone using ASIO Out, however to be able to use ASIO on my machine I needed to download ASIO4ALL,
This is causing a huge amount of feedback in the microphone and is resulting in very inaccurate frequency readings (to make sure it was the sound itself I wrote a copy to disc to playback),
So to get around this I have been trying to adapt my code to work with Naudio's WaveIn class, however this is returning either no data or NaN for the FFT algorithm (although I can save a recording to disk which plays back with no issues),
I've been trying to fix this for some time now and am sure it is just a silly mistake somewhere, any help would be greatly appreciated!
Below is the code for the "OnDataAvailable" event (where I'm 99% sure I am going wrong):
void OnDataAvailable(object sender, WaveInEventArgs e)
{
if (this.InvokeRequired)
{
this.BeginInvoke(new EventHandler<WaveInEventArgs>(OnDataAvailable), sender, e);
}
else
{
byte[] buffer = e.Buffer;
int bytesRecorded = e.BytesRecorded;
int bufferIncrement = waveIn.WaveFormat.BlockAlign;
for (int index = 0; index < bytesRecorded; index += bufferIncrement)
{
float sample32 = BitConverter.ToSingle(buffer, index);
sampleAggregator.Add(sample32);
}
if (waveFile != null)
{
waveFile.Write(e.Buffer, 0, e.BytesRecorded);
waveFile.Flush();
}
}
}
If any more details and/or code is required please let me know.
waveFile: Name of the file writer
e.Buffer: The buffer containing the recorded data
e.BytesRecorded: The total number of bytes recorded
For reference below is the working code when using the ASIO class:
void asio_DataAvailable(object sender, AsioAudioAvailableEventArgs e)
{
byte[] buf = new byte[e.SamplesPerBuffer * 4];
for (int i = 0; i < e.InputBuffers.Length; i++)
{
Marshal.Copy(e.InputBuffers[i], buf, 0, e.SamplesPerBuffer * 4);
}
for (int i = 0; i < e.SamplesPerBuffer * 4; i++)
{
float sample32 = Convert.ToSingle(buf[i]);
sampleAggregator.Add(sample32);
}
}
EDIT: The samples which are being returned are now accurate after changing the convert statement to Int16 as per the advice on this page, I had some other issues in my code which prevented actual results from being returned originally.
However, the file which is being written to disk is very choppy, I'm sure this is a problem with my laptop and the number of processes which is trying to perform, could anyone please advise a way around this issue?
In the NAudio WPF demo project there is an example of calculating FFTs while playback is happening with a class called SampleAggregator, that stores up blocks of 1024 samples and then performs FFTs on them.
It looks like you are trying to do something similar to this. I suspect the problem is that you are getting 16 bit samples, not 32 bit. Try using BitConverter.ToShort on every pair of bytes.
mWaveInDevice = new WaveIn();
mWaveInDevice.WaveFormat = WaveFormat.**CreateIeeeFloatWaveFormat(44100,2)**;
Set CreateIeeeFloatWaveFormat for WaveFormat, and then you will get right values after fft.

NAudio Asio Record and Playback

I'm trying to write my own VST Host and for that i need to record and play audio from an Asio Driver (in my case for an audio interface). That's why i'm trying to use NAudio's AsioOut.
For testing purposes i'm currently just trying to record the input, copy and play it to the output.
My code looks like this:
var asioout = new AsioOut();
BufferedWaveProvider wavprov = new BufferedWaveProvider(new WaveFormat(44100, 2));
asioout.AudioAvailable += new EventHandler<AsioAudioAvailableEventArgs>(asio_DataAvailable);
asioout.InitRecordAndPlayback(wavprov, 2, 25);
asioout.Play();
...
void asio_DataAvailable(object sender, AsioAudioAvailableEventArgs e)
{
Array.Copy(e.InputBuffers, e.OutputBuffers, e.InputBuffers.Length);
e.WrittenToOutputBuffers = true;
}
This way i can't hear any output. I also tried it this way:
void asio_DataAvailable(object sender, AsioAudioAvailableEventArgs e)
{
byte[] buf = new byte[e.SamplesPerBuffer];
for (int i = 0; i < e.InputBuffers.Length; i++)
{
//Marshal.Copy(e.InputBuffers[i], e.OutputBuffers, 0, e.InputBuffers.Length);
//also tried to upper one but this way i also couldn't hear anything
Marshal.Copy(e.InputBuffers[i], buf, 0, e.SamplesPerBuffer);
Marshal.Copy(buf, 0, e.OutputBuffers[i], e.SamplesPerBuffer);
}
e.WrittenToOutputBuffers = true;
}
This way i can hear sound in the volume of my input but it's very distorded.
What am i doing wrong here?
PS: I know how to record and playback.... exists but i couldn't really get a complete answer from this thread, just the idea to try it with Marshall.Copy....
Your second attempt is more correct than the first: each input buffer must be copied separately. However the final parameter of the copy method should be the number of bytes, not the number of samples in the buffer. This will typically be 3 or 4 bytes per sample, depending on your ASIO bit depth.

Converting a WAV file to a spectrogram

Hi im very new to this thing so please bear with me. I am trying to convert a WAV file to a spectrogram but arent sure how to begin with. I read on something that says to read the PCM data(which i think is my WAV file) and store it in an array in the WavReader class before apply the FFT on it and converting it to GUI. Im currently using Naudio to achieve this but could not find anything that shows how to convert the WAV file to a spectrogram. Thanks
Edit :
I found out about converting PCM to FFT with Naudio and im stuck.
using (var reader = new AudioFileReader("test1.wav"))
{
// test1.wav is my file to process
// test0.wav is my temp file
IWaveProvider stream16 = new WaveFloatTo16Provider(reader);
using (WaveFileWriter converted = new WaveFileWriter("test0.wav", stream16.WaveFormat))
{
// buffer length needs to be a power of 2 for FFT to work nicely
// however, make the buffer too long and pitches aren't detected fast enough
// successful buffer sizes: 8192, 4096, 2048, 1024
// (some pitch detection algorithms need at least 2048)
byte[] buffer = new byte[8192];
int bytesRead;
do
{
bytesRead = stream16.Read(buffer, 0, buffer.Length);
converted.WriteData(buffer, 0, bytesRead);
} while (bytesRead != 0 && converted.Length < reader.Length);
}
}
Edit : I would also like to know if it is possible to compare 2 spectrograms of 2 different files programmatically.
You could also use BASS.NET library which natively provides all these features and is free.
The Visuals.CreateSpectrum3DVoicePrint Method does exactly that.
Feel free to ask for assistance if you're having a hard time using it.
EDIT : here's a quick and dirty sample
public partial class Form1 : Form
{
private int _handle;
private int _pos;
private BASSTimer _timer;
private Visuals _visuals;
public Form1()
{
InitializeComponent();
}
private void timer_Tick(object sender, EventArgs e)
{
bool spectrum3DVoicePrint = _visuals.CreateSpectrum3DVoicePrint(_handle, pictureBox1.CreateGraphics(),
pictureBox1.Bounds, Color.Cyan, Color.Green,
_pos, false, true);
_pos++;
if (_pos >= pictureBox1.Width)
{
_pos = 0;
}
}
private void Form1_Load(object sender, EventArgs e)
{
string file = "..\\..\\mysong.mp3";
if (Bass.BASS_Init(-1, 44100, BASSInit.BASS_DEVICE_DEFAULT, Handle))
{
_handle = Bass.BASS_StreamCreateFile(file, 0, 0, BASSFlag.BASS_DEFAULT);
if (Bass.BASS_ChannelPlay(_handle, false))
{
_visuals = new Visuals();
_timer = new BASSTimer((int) (1.0d/10*1000));
_timer.Tick += timer_Tick;
_timer.Start();
}
}
}
}
EDIT 2
You can provide a file name but you can also provide your own audio data using the other overload that accepts an IntPtr or use Bass.BASS_StreamCreatePush with Bass.BASS_StreamPutData.
Regarding comparing spectrograms you could do the following :
Resize the image to a smaller size, reduce information by dithering it to 8-bit (with a good algorithm however)
Compare the two images
However for comparing audio data I'd strongly suggest you to use fingerprints, it roughly does that but is much more robust than my suggestion.
Here's a fingerprinting library that is free to use :
http://www.codeproject.com/Articles/206507/Duplicates-detector-via-audio-fingerprinting
Not entirely sure it would work for small samples, though.
EDIT 3
I'm afraid I can't find the link where I've read that but that's what they do: reducing data and comparing images such as the example below (last image):
(note : not to compare at all with image 1, it's something else but just to show why using a lower resolution might give better yields)
(from http://blog.echonest.com/post/545323349/the-echo-nest-musical-fingerprint-enmfp)
Now a very basic explanation of the process:
Comparison source A:
Comparison source B: (I've just changed a region of A)
Comparison result:
(done with Paint.Net by adding the former images as layers and setting 2nd layer blending to difference instead of normal)
If the fingerprints were to be identical the resulting image would be completely black.
And by reducing data to a 8-bit image you are easing the comparison process but keep in mind you will need a good dithering algorithm.
This is one is quite good :
http://www.codeproject.com/Articles/66341/A-Simple-Yet-Quite-Powerful-Palette-Quantizer-in-C
Well it's not on par with Photoshop or Hypersnap's one (which IMO is exceptional) but that might be enough for the task.
And avoid at all costs Floyd–Steinberg dithering or something that does error diffusion.
Here some attempts on creating dithering algorithms : http://bisqwit.iki.fi/story/howto/dither/jy/
Take this with caution as I'm not an expert in the field but that's roughly how it's done.
Go to https://dsp.stackexchange.com/ and ask a few questions there, you might get useful hints on achieving this.

Categories