NAudio: Using MixingSampleProvider correctly with VolumeSampleProvider

NAudio: Using MixingSampleProvider correctly with VolumeSampleProvider - c#

I have been using NAudio with the
"Fire and Forget Audio Playback with NAudio" tutorial (thank you Mark for this awesome utility!) as written here:
http://mark-dot-net.blogspot.nl/2014/02/fire-and-forget-audio-playback-with.html
I managed to add a VolumeSampleProvider to it, using the MixingSampleProvider as input. However, when I now play two sounds right after each other, the first sound always gets the volume of the second as well, even though the first is already playing.
So my question is: How do I add sounds with an individual volume per sound?
This is what I used:
mixer = new MixingSampleProvider(waveformat);
mixer.ReadFully = true;
volumeProvider = new VolumeSampleProvider(mixer);
panProvider = new PanningSampleProvider(volumeProvider);
outputDevice.Init(panProvider);
outputDevice.Play();

I realized (thanks to itsmatt) that the only way to make this work, is to leave the mixer alone and adjust the panning and volume of each CachedSound individually, before adding it to the mixer. Therefore I needed to rewrite the CachedSoundSampleProvider, using a pan and volume as extra input parameters.
This is the new constructor:
public CachedSoundSampleProvider(CachedSound cachedSound, float volume = 1, float pan = 0)
{
this.cachedSound = cachedSound;
LeftVolume = volume * (0.5f - pan / 2);
RightVolume = volume * (0.5f + pan / 2);
}
And this is the new Read() function:
public int Read(float[] buffer, int offset, int count)
{
long availableSamples = cachedSound.AudioData.Length - position;
long samplesToCopy = Math.Min(availableSamples, count);
int destOffset = offset;
for (int sourceSample = 0; sourceSample < samplesToCopy; sourceSample += 2)
{
float outL = cachedSound.AudioData[position + sourceSample + 0];
float outR = cachedSound.AudioData[position + sourceSample + 1];
buffer[destOffset + 0] = outL * LeftVolume;
buffer[destOffset + 1] = outR * RightVolume;
destOffset += 2;
}
position += samplesToCopy;
return (int)samplesToCopy;
}

I'm not 100% certain of what you are asking and I don't know if you solved this already but here's my take on this.
ISampleProvider objects play the "pass the buck" game to their source ISampleProvider via the Read() method. Eventually, someone does some actual reading of audio bytes. Individual ISampleProvider classes do whatever they do to the bytes.
MixingSampleProvider, for instance, takes N audio sources... those get mixed. When Read() is called, it iterates the audio sources and reads count bytes from each.
Passing it to a VolumeSampleProvider handles all the bytes (from those various sources) as a group... it says:
buffer[offset+n] *= volume;
That's going to adjust the bytes across the board... so every byte gets adjusted in the buffer by the volume multiplier;
The PanningSampleProvider just provides a multiplier to the stereo audio and adjusts the bytes accordingly, doing the same sort of thing as the VolumeSampleProvider.
If you want to individually handle audio source volumes, you need to handle that upstream of the MixingSampleProvider. Essentially, the things that you pass to the MixingSampleProvider need to be able to have their volume adjusted independently.
If you passed a bunch of SampleChannel objects to your MixingSampleProvider... you could accomplish independent volume adjustment. The Samplechannel class incorporates a VolumeSampleProvider object and provides a Volume property that allows one to set the volume on that VolumeSampleProvider object.
SampleChannel also incorporates a MeteringSampleProvider that provides reporting of the maximum sample value during a given period. It raises an event that gives you an array of those values, one per channel.

Related

MediaStreamSource.SampleRequested event is not called (in UWP) when video encoding is created with specific height/width

I'm using MediaStreamSource to wrap a given video frame encoded in I420 format (code below). Whenever I get a frame width/height of a specific value (ie:- say height-1037 and width-1932), media foundation playback pipeline is not invoking the "SampleRequested" event to start playing the video frames. But when the frames received are close to height-600 and width 1080 for instance, it is getting triggered. I'm not sure what could be preventing the media foundation pipeline to not trigger "SampleRequested" event. From logs, I could see
No suitable transform was found to encode or decode the content.
(Exception from HRESULT: 0xC00D5212).
But not sure how to rectify. Any pointers?
private MediaStreamSource CreateI420VideoStreamSource(uint width, uint height,int framerate)
{
if (width == 0)
{
throw new ArgumentException("Invalid zero width for video.", "width");
}
if (height == 0)
{
throw new ArgumentException("Invalid zero height for video.", "height");
}
// Note: IYUV and I420 have same memory layout (though different FOURCC)
// https://learn.microsoft.com/en-us/windows/desktop/medfound/video-subtype-guids
var videoProperties = VideoEncodingProperties.CreateUncompressed(MediaEncodingSubtypes.Iyuv,
width, height);
var videoStreamDesc = new Windows.Media.Core.VideoStreamDescriptor(videoProperties);
videoStreamDesc.EncodingProperties.FrameRate.Numerator = (uint)framerate;
videoStreamDesc.EncodingProperties.FrameRate.Denominator = 1;
videoStreamDesc.EncodingProperties.Height = height;
videoStreamDesc.EncodingProperties.Width = width;
// Bitrate in bits per second : framerate * frame pixel size * I420=12bpp
//videoStreamDesc.EncodingProperties.Bitrate = ((uint)framerate * width * height * 12);
var videoStreamSource = new Windows.Media.Core.MediaStreamSource(videoStreamDesc);
videoStreamSource.BufferTime = new TimeSpan(0,0,0,0,0);
videoStreamSource.SampleRequested += OnMediaStreamSourceRequested;
videoStreamSource.IsLive = true; // Enables optimizations for live sources
videoStreamSource.CanSeek = false; // Cannot seek live WebRTC video stream
return videoStreamSource;
}

Sorry for the delay. I got some more information about your issue. First of all, the error message is correct, the codec (IYUV here) does not support the Height/Width that you are trying to use. You could refer to this document to enumerate the codec and find all the supported values: Encoding video and audio with Media Services.
Also, it is mentioned in the document that If you are creating custom presets, all values for height and width on AVC content must be a multiple of four.

How to avoid audio glitches whilst switching volume of an Audiosource each Update()-Frame?

I'm developing an audiotool that plays 64 Audiosources simultaneously. Therefore I created four arrays containing 16 Audiosources each. Each array of Audiosources is routed to its own Mixer. Furthermore, two mixer output to the left channel, two to the right. My DSP Buffer Size is set to Best Performance, meaning 1024 samples and there are enough real / virtual voices available.
In the beginning, 60 Audiosources are set to Volume = 0, while four of them are running with Volume = 0.5. Each Update()-Frame, I set the Volume of those playing with 0.5 to zero, therefore setting four new audiosources that were zero before to 0.5.
Something like this:
void SwitchSources()
{
noseposInd++;
if (noseposInd > 15) noseposInd = 0;
audioSources_Lm[noseposIndTemp].volume = 0.0f;
audioSources_Ln[noseposIndTemp].volume = 0.0f;
audioSources_Rm[noseposIndTemp].volume = 0.0f;
audioSources_Rn[noseposIndTemp].volume = 0.0f;
audioSources_Lm[noseposInd].volume = 0.5f;
audioSources_Ln[noseposInd].volume = 0.5f;
audioSources_Rm[noseposInd].volume = 0.5f;
audioSources_Rn[noseposInd].volume = 0.5f;
noseposIndTemp = noseposInd;
}
For test purposes, I loaded a rectangle signal with f = 2Hz (results in an audible click per second) into each Audiosource. Recording my output with Audacity results in something that can be seen on the attached photo:
It seems that the buffer of one of the four signals is not written to the output because the amplitude regarding a positive or negative pulse is just half. The width of the "notches" is exactly one blocklength. Meaning 1024 samples with a samplerate of 44.1kHz, so that there is no output for about 23ms.
Increasing the rate of changing the volume also increases the occurences of notches / time outs or however this can be called. Has anyone had the same problem or can help out with some knowledge about how the Update()-Method and the audio-block-writing of the mixers interfere?
Thanks in advance!

Get Basic audio spectrum data in unity

I want to visualize if an audio clip has sound or not. The microphone and the
audiosource is working correctly but I am stuck with its visualizing part. I have hard time understanding the official document and I want a solution.
I tried the following code:
void Update () {
AnalyzeSound();
text1.text = "sound!\n"+ " rmsValue : " + rmsValue ;
}
void AnalyzeSound()
{
audio.GetOutputData(samples, 0);
//GetComponent rms
int i = 0;
float sum = 0;
for (; i < SAMPLE_SIZE; i++)
{
sum = samples[i] * samples[i];
}
rmsValue = Mathf.Sqrt(sum / SAMPLE_SIZE);
//get the dbValue
dbValue = 20 * Mathf.Log10(rmsValue / 0.1f);
}
Can I take rmsValue as the input of sound on microphone? or should I take the dbValue? what should be the threshold value?
in a few words, When can I say the microphone has sound?

There is no hard and fast definition that would separate noise from silence in all cases. It really depends on how loud the background noise is. Compare for example, silence recorded in an anechoic chamber vs silence recorded next to an HVAC system. The easiest thing to try is to experiment with different dB threshold values below which you consider the signal as noise and above which it is considered signal. Then adjust the threshold value up or down to suit your needs. Depending on the nature of the signal (e.g. music vs. speech) you could look into other techniques such as Voice Activity Detection (https://en.wikipedia.org/wiki/Voice_activity_detection) or a convolutional neural network to segment speech and music

Using Naudio WaveIn for both performing FFT transformation and writing to disk in real-time (NOT ASIO)

I am currently creating a Winforms application for Windows 8.1, I have been able to perform an FFT on the input data from the devices microphone using ASIO Out, however to be able to use ASIO on my machine I needed to download ASIO4ALL,
This is causing a huge amount of feedback in the microphone and is resulting in very inaccurate frequency readings (to make sure it was the sound itself I wrote a copy to disc to playback),
So to get around this I have been trying to adapt my code to work with Naudio's WaveIn class, however this is returning either no data or NaN for the FFT algorithm (although I can save a recording to disk which plays back with no issues),
I've been trying to fix this for some time now and am sure it is just a silly mistake somewhere, any help would be greatly appreciated!
Below is the code for the "OnDataAvailable" event (where I'm 99% sure I am going wrong):
void OnDataAvailable(object sender, WaveInEventArgs e)
{
if (this.InvokeRequired)
{
this.BeginInvoke(new EventHandler<WaveInEventArgs>(OnDataAvailable), sender, e);
}
else
{
byte[] buffer = e.Buffer;
int bytesRecorded = e.BytesRecorded;
int bufferIncrement = waveIn.WaveFormat.BlockAlign;
for (int index = 0; index < bytesRecorded; index += bufferIncrement)
{
float sample32 = BitConverter.ToSingle(buffer, index);
sampleAggregator.Add(sample32);
}
if (waveFile != null)
{
waveFile.Write(e.Buffer, 0, e.BytesRecorded);
waveFile.Flush();
}
}
}
If any more details and/or code is required please let me know.
waveFile: Name of the file writer
e.Buffer: The buffer containing the recorded data
e.BytesRecorded: The total number of bytes recorded
For reference below is the working code when using the ASIO class:
void asio_DataAvailable(object sender, AsioAudioAvailableEventArgs e)
{
byte[] buf = new byte[e.SamplesPerBuffer * 4];
for (int i = 0; i < e.InputBuffers.Length; i++)
{
Marshal.Copy(e.InputBuffers[i], buf, 0, e.SamplesPerBuffer * 4);
}
for (int i = 0; i < e.SamplesPerBuffer * 4; i++)
{
float sample32 = Convert.ToSingle(buf[i]);
sampleAggregator.Add(sample32);
}
}
EDIT: The samples which are being returned are now accurate after changing the convert statement to Int16 as per the advice on this page, I had some other issues in my code which prevented actual results from being returned originally.
However, the file which is being written to disk is very choppy, I'm sure this is a problem with my laptop and the number of processes which is trying to perform, could anyone please advise a way around this issue?

In the NAudio WPF demo project there is an example of calculating FFTs while playback is happening with a class called SampleAggregator, that stores up blocks of 1024 samples and then performs FFTs on them.
It looks like you are trying to do something similar to this. I suspect the problem is that you are getting 16 bit samples, not 32 bit. Try using BitConverter.ToShort on every pair of bytes.

mWaveInDevice = new WaveIn();
mWaveInDevice.WaveFormat = WaveFormat.**CreateIeeeFloatWaveFormat(44100,2)**;
Set CreateIeeeFloatWaveFormat for WaveFormat, and then you will get right values after fft.

Applying a linear fade at a specify position using NAudio

I am making use of NAudio in a C# program I've written.
I want to apply a linear fade at a certain position within a piece of audio I'm working with.
In the NAudio example project is a file called FadeInOutSampleProvider.cs (Cached example) which has BeginFadeIn(double fadeDurationInMilliseconds) and BeginFadeOut(double fadeDurationInMilliseconds) methods.
I've reworked these methods to
BeginFadeOut(double fadeDurationInMilliseconds, double beginFadeAtMilliseconds)
and
BeginFadeOut(double fadeDurationInMilliseconds, double beginFadeAtMilliseconds)
However I'm having difficulty implementing the interval logic for these changes to work.
My first thought was to introduce code in the Read() method. When called it would divide the number of bytes being requested by the sample rate, which would give the number of seconds of audio requested.
I could then keep track of this and when the correct amount of auto data had been read, allow the fade to be applied.
However I'm not getting the numbers in my calculations I would expect to see. I'm sure there's a better way to approach this.
Any help would be very much appreciated.

It sounds like you are working along the right lines. As you say the amount of audio being requested can be calculated by dividing number of samples requested by the sample rate. But you must also take into account channels as well. In a stereo file there are twice as many samples per second as the sample rate.
I've put a very basic code sample of a delayed fade out in a GitHub gist here. There are improvements that could be made such as allowing the fade-out to begin part-way through the audio returned from a call to Read but holpefully this gives you a rough idea of how it can be achieved with a few small modifications to FadeInOutSampleProvider.
The main changes are an extra parameter to BeginFadeOut, that sets two new variables (fadeOutDelaySamples, fadeOutDelayPosition):
/// <summary>
/// Requests that a fade-out begins (will start on the next call to Read)
/// </summary>
/// <param name="fadeDurationInMilliseconds">Duration of fade in milliseconds</param>
public void BeginFadeOut(double fadeAfterMilliseconds, double fadeDurationInMilliseconds)
{
lock (lockObject)
{
fadeSamplePosition = 0;
fadeSampleCount = (int)((fadeDurationInMilliseconds * source.WaveFormat.SampleRate) / 1000);
fadeOutDelaySamples = (int)((fadeAfterMilliseconds * source.WaveFormat.SampleRate) / 1000);
fadeOutDelayPosition = 0;
//fadeState = FadeState.FadingOut;
}
}
Then in the Read method we can keep track of how far into the delay we are, and if so, we can start the fade-out
public int Read(float[] buffer, int offset, int count)
{
int sourceSamplesRead = source.Read(buffer, offset, count);
lock (lockObject)
{
if (fadeOutDelaySamples > 0)
{
fadeOutDelayPosition += sourceSamplesRead / WaveFormat.Channels;
if (fadeOutDelayPosition >= fadeOutDelaySamples)
{
fadeOutDelaySamples = 0;
fadeState = FadeState.FadingOut;
}
}
if (fadeState == FadeState.FadingIn)
{
FadeIn(buffer, offset, sourceSamplesRead);
}
else if (fadeState == FadeState.FadingOut)
{
FadeOut(buffer, offset, sourceSamplesRead);
}
else if (fadeState == FadeState.Silence)
{
ClearBuffer(buffer, offset, count);
}
}
return sourceSamplesRead;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.