Stream loudness normalization using BASS.NET in c#

Stream loudness normalization using BASS.NET in c# - c#

I need to normalize a playing audio stream using BASS. For this, I'm following these steps:
Play the stream
Create another stream from the file, and determine the peak value in a background worker
Apply DSP_Gain with the appropriate gain value to the stream that is playing.
I realize the normalization will only occur after the worker is done with the task, which can seem ugly, but that isn't the point.
The trouble is, when determining the peak value of the stream, the resulting value is an integer between 0 and 32768 (the bigger the value, the louder the sound), however DSP_Gain has two variables for setting the amplification value, none of which are integers. The first one is Gain - a double between 0 and 1024, and the second is Gain_dBV - a double between -infinity and 60. Trying to pass the peak value as a factor resulted in enormous clipping inside the playing stream. My question is, how do I translate this peak value into the correct parameter for DSP_Gain? Below is the code for getting peak value:
int strm = Bass.BASS_StreamCreateFile(filename, 0, 0, BASSFlag.BASS_STREAM_DECODE);
//initialized stream for getting peak value
int peak=0; //This value will be between 0 and 32768
while (System.Convert.ToBoolean(Bass.BASS_ChannelIsActive(strm)))
{
//calculates peak from a 20ms frame and advances, loops till stream over
int level = Bass.BASS_ChannelGetLevel(strm);
int left = Utils.LowWord32(level); // the left level
int right = Utils.HighWord32(level); // the right level
if (peak < left) peak = left;
if (peak < right) peak = right;
}
Applying the DSP_Gain:
DSPGain = new DSP_Gain();
DSPGain.ChannelHandle = stream; //this stream is the already playing one
DSPGain.Gain = *SOME VALUE*
DSPGain.Start();

Just reading the links you posted it seems gain is a multiplying factor being applied to the signal - values below 1.0 will reduce the level of the signal, values above 1.0 will increase the level. So you need to calculate how much you want to reduce the level by - say you want a max peak value of 30000 & your calculated peak value is 32000 - then your gain is likely to be (30000 / 32000) = 0.9375.
Gain_dBV is the gain ratio expressed in decibels - this is typically calculated as either 10 * log( power out / power in) or 20 * log(p-p Volts Out / p-p Volts In). The dB is converted back to the actual Gain before being applied to the signal as above - in the example the Gain dB would be 20 * log(0.9375) = -0.56

Related

Goertzel Filter returned Magnitude in relation to average energy of the buffer

I pass a slices of a buffer to my Goertzel filter. The buffer contains freqs of 18Khz and silence. The duration of each is 75ms. Sampling rate = 44.1.
It works like FSK.
I'm trying to detect the threshold of my Goertzel filter of 18KHz.
I thought of measuring the average energy of a buffer by the next formula:
Energy = (1/N) * Sum(Abs(x[n])) //Where N is total number of samples in the x slice
Now, my problem is how is that energy is related to the returned value of the Goertel Filter. I have noticed that as the freq I measure is lower the Goertel's magnitude is smaller.
For example: if my Goertel is tuned to detect 13K I get something like 50 to 100;
for, 18k I get even smaller numbers: 0.00001 to 0.005.
The array I pass is float[] and all numbers are in range of +-1.
Is there any good solution for that?
Thanks

Detect small Peaks in Audio file

I'm trying to calculate the loudest peak in dB of an 16bit wav file. In my current situation i don't need an RMS value. Just the dB value of the loudest peak in the file because one requirement is to detect wav files, which have errors in it. for example the loudest peak is at +2dB.
I tried it like in this thread: get peak out of wave
Here is my Code:
var streamBuffer = File.ReadAllBytes(#"C:\peakTest.wav");
double peak = 0;
for (var i = 44; i < streamBuffer.Length; i = i + 2)
{
var sample = BitConverter.ToInt16(streamBuffer, i);
if (sample > peak)
peak = sample;
else if (sample < -peak)
peak = -sample;
}
var db = 20 * Math.Log10(peak / short.MaxValue);
I manually altered this file so there is an peak in it at +2dB. The value of the peak var is now 32768. So the formula for the dB value will get me 0.0dB.
I can't get an positive value out of it because 32768 is just the max short can represent.
So my question is now how can i get the "correct" peak value of +2dB?

Your requirement is fatally flawed: The definition of clipping in both analogue and digital systems is a signal that exceeds the maximum amplitude that the channel is capable of conveying. In either case, once the signal is clipped, it's too late to recover all of the information that was previously in it.
In the case of a digital system there are two possible outcome: either that the signal saturates (in which case you might see a number of consecutive samples at peak amplitude) or that the signed int wraps (in which case very large positive values become very large negative ones or vice-versa).
To the subject of detecting clipping events after the event. A number of approaches are possible:
Look for consecutive runs of samples at the max and min sample values
Look for large discontinuities in signal amplitude between samples. Real signals don't tend to have large amplitude differences sample-sample
Perform frequency domain analysis on the signal. Clipping introduces high frequency components that will be apparent in a spectrogram

How to maintain stable frames-per-second while capturing real-time video in C#

I have written a C# program that captures video from a specialized camera through the camera manufacturer's proprietary API. I am able to write captured frames to disk through a FileStream object, but I am at the mercy of the camera and disk I/O when it comes to the framerate.
What is the best way to make sure I write to disk at the required framerate? Is there a certain algorithm available that would compute the real-time average framerate and then add/discard frames to maintain a certain desired framerate?

It's difficult to tell much because of the lack of information.
What's the format? Is there compression?
How is the camera API sending the frames? Are they timed, so the camera will send the frame rate you asked for? If it is, you are really dealing with I/O speed.
If you need high quality, and are writing without compression, you could experiment some lossless compression algorithms to balance between processing and drive I/O. You could gain some speed if the bottleneck is with a high drive I/O.
For frames, there are ways to implement that. Normally frames have time-stamp, you should search on that, and discard frames that are so near of the other.
Let's say that you want 60 fps, so the space in ms between frames is 1000/60=16ms, if the frame you get has a time stamp of 13ms after the last, you can discard it and don't write to disk.

In a perfect world, you would check the first second which gives you a number of frames per second that your system supports.
Say your camera is capturing 60 fps but your computer can really only handle 45 fps. What you have to do is skip a total of 15 frames per second in order to keep up. Up to here, that's easy enough.
The math in this basic case is:
60 / 15 = 4
So skip one frame every four incoming frames like so (keep frames marked with an X, skip the others):
000000000011111111112222222222333333333344444444445555555555
012345678901234567890123456789012345678901234567890123456789
XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX
Of course, it is likely that you do not get such a clear cut case. The math remains the same, it's just that you'll end up skipping frames at what looks like varying points in time:
// simple algorithm based on fps from source & destination
// which fails to keep up long term
double camera_fps = 60.0;
double io_fps = camera_fps;
for(;;)
{
double frames_step = camera_fps / io_fps;
double start_time = time(); // assuming you get at least ms precision
double frame = 0.0;
for(int i(0); i < camera_fps; ++i)
{
buffer frame_data = capture_frame();
if(i == (int)frame) // <- part of the magic happens here
{
save_frame(frame_data);
}
frame += frames_step;
}
double end_time = time();
double new_fps = camera_fps / (end_time - start_time);
if(new_fps < io_fps)
{
io_fps = new_fps;
}
}
As this algorithm shows, you want to adjust your fps as time goes by. During the first second, it is likely to give you an invalid result for all sorts of reasons. For example, writing to disk is likely going to be buffered so go really fast and it may look as if you could support 60 fps. Later, the fps will slow down and you may find that your maximum I/O speed is 57 fps instead.
One issue with this basic algorithm is that you can easily reduce the number of frames to make it work within 1 second, but it will only reduce the fps (i.e. I update io_fps only when new_fps is smaller). If you find the correct number for io_fps, you're fine. If you went too far, you're dropping frames when you shouldn't. This is because new_fps is going to be 1.0 when (end_time - start_time) is exactly 1 second, meaning that you did not spend too much time capturing and saving the incoming frames.
The solution to this issue is to time your save_frame() function. If the total amount is less than 1 second within the inner loop, then you can increase the amount of frames you can save. This will work better if you can use two threads. One thread reads the frame, pushes the frame on an in memory FIFO, the other thread retrieves the frames from that FIFO. This means the amount of time to capture one frame doesn't (as much) affect the time it takes to save a frame.
bool stop = false;
// capture
for(;;)
{
buffer frame_data = capture_frame();
fifo.push(frame_data);
if(stop)
{
break;
}
}
// save to disk
double expected_time_to_save_one_frame = 1.0 / camera_fps;
double next_start_time = time();
for(;;)
{
buffer frame_data = fifo.pop();
double start_time = next_start_time;
save_frame(frame_data);
next_start_time = time();
double end_time = time();
if(start_time - end_time > expected_time_to_save_one_frame)
{
fifo.pop(); // skip one frame
}
}
This is just pseudo code and I may have made a few mistakes. It also expects that the gap between start_time and end_time is not going to be more than one frame (i.e. if you have a case where the capture is 60 fps and the I/O supports less than 30 fps, you often would have to skip two frames in a row).
For people who are compressing frames, keep in mind that the timing of one call to save_frame() will vary a lot. At times, the frame can easily be compressed (no horizontal movement) and at times it's really slow. This is where such dynamism can help tremendously. Your disk I/O, assuming not much else occurs while you are recording, should not vary much once you reach the maximum speed supported.
IMPORTANT NOTE: these algorithms suppose that the camera fps is fixed; this is likely not quite true; you can also time the camera and adjust the camera_fps parameter accordingly (which means the expected_time_to_save_one_frame variable can also vary over time).

Equal loudness per frequency

So I'm coding a synthesizer from scratch in C# using NAudio. I've gotten it play different frequencies, which is cool, but I notice that the higher pitches are significantly louder than the lower pitches. Is that due to this effect:
http://en.wikipedia.org/wiki/Equal-loudness_contour
Or am I doing something wrong when I'm generating the sine wave? How would I implement an Equal-loudness contour curve if it is indeed necessary?
Thanks
My Code:
NAudio expects a buffer filled with floating point values in the range of -1 to +1 to represent the waveform.
Generating the sine wave:
buffer[n + offset] = (float)(Amplitude * Math.Sin(angle));
angle = (angle + angleIncrement) % (2 * Math.PI);
Setting a frequency:
public double Frequency
{
set
{
angleIncrement = 2 * Math.PI * value / sampleRate;
}
get
{
return angleIncrement * sampleRate / 2 / Math.PI;
}
}

Controlling the amplitude of the audio from your synthesizer based on equal-loudness contours is probably not what you want.
In theory, you would need to know the absolute level (SPL) produced by the speakers in order to choose the appropriate contour. In practice, a bigger issue would be when you extend your synthesizer to use complex waveforms instead of merely pure tones, possibly processed by filters etc. The equal-loudness contours are based on pure tones, and when you generate complex signals (i.e. containing many frequencies) you would instead need a loudness model to estimate the loudness of your synthesized sound.

Simple signal processing in C#

I'm sampling a real-world sensor, and I need to display its filtered value. The signal is sampled at a rate of 10 Hz and during that period it could rise as much as 80 per cent of the maximum range.
Earlier I've used Root Mean Square as a filter and just applying it to the last five values I've logged. For this application this wouldn't be good because I don't store unchanged values. In other words, I need to consider time in my filter...
I've read at DSP Guide, but I didn't get much out of it. Is there a tutorial that's pinned specifically at programmers, and not Mathcad engineers? Are there some simple code snippets that could help?
Update: After several spreadsheet tests I've taken the executive decision to log all samples, and apply a Butterworth filter.

You always need to store some values (but not necessarily
all input values). A filter's current output depends on a
number of input values and possibly some past output values.
The simplest filter would be a first order Butterworth low-pass
filter. This would only require you to store one past output
value. The (current) output of the filter, y(n) is:
y(n) = x(n) - a1 * y(n-1)
where x(n) is the current input and y(n-1) is the previous
output of the filter. a1 depends on the cut-off frequency
and the sampling frequency. The cut-off frequency frequency
must be less than 5 Hz (half the sampling frequency),
sufficiently low to filter out the noise, but not so low
that the output will be delayed with respect to the input. And of
course not so low that the real signal is filtered out!
In code (mostly C#):
double a1 = 0.57; //0.57 is just an example value.
double lastY = 0.0;
while (true)
{
double x = <get an input value>;
double y = x - a1 * lastY;
<Use y somehow>
lastY = y;
}
Whether a first order filter is sufficient depends on your
requirements and the characteristics of the input signal (a
higher order filter may be able to suppress more of the
noise at the expense of higher delay of the output signal).
For higher order filters, more values would have to be stored
and the code becomes a little bit more complicated. Usually
the values need to be shifted down in arrays; in an array
for past y values and in an array for past x values.

In DSP, the term "filter" usually refers to the amplification or attenuation (i.e. "lowering") of frequency components within a continuous signal. This is commonly done using Fast Fourier Transform (FFT). FFT starts with a signal recorded over a given length of time (the data are in what's called the "time domain") and transforms these values into what's called the "frequency domain", where the results indicate the strength of the signal in a series of frequency "bins" that range from 0 Hz up to the sampling rate (10 Hz in your case). So, as a rough example, an FFT of one second's worth of your data (10 samples) would tell you the strength of your signal at 0-2 Hz, 2-4 Hz, 4-6 Hz, 6-8 Hz, and 8-10 Hz.
To "filter" these data, you would increase or decrease any or all of these signal strength values, and then perform a reverse FFT to transform these values back into a time-domain signal. So, for example, let's say you wanted to do a lowpass filter on your transformed data, where the cutoff frequency was 6 Hz (in other words, you want to remove any frequency components in your signal above 6 Hz). You would programatically set the 6-8 Hz value to zero and set the 8-10 Hz value to 0, and then do a reverse FFT.
I mention all this because it doesn't sound like "filtering" is really what you want to do here. I think you just want to display the current value of your sensor, but you want to smooth out the results so that it doesn't respond excessively to transient fluctuations in the sensor's measured value. The best way to do this is with a simple running average, possibly with the more recent values weighted more heavily than older values.
A running average is very easy to program (much easier than FFT, trust me) by storing a collection of the most recent measurements. You mention that your app only stores values that are different from the prior value. Assuming you also store the time at which each value is recorded, it should be easy for your running average code to fill in the "missing values" by using the recorded prior values.

I don't have a tutorial that will help you, but in C# you may want to consider using Reactive LINQ - see blog post Reactive programming (II.) - Introducing Reactive LINQ.
As a way to get the events, so you can do your processing without having to store all the values, it would just do the processing as you get the next event in.
To consider time, you could just use an exponential with a negative exponent to decrease the impact of the past measurements.

Yes, for complex real-time systems sampling multiple streams of data, there could be an issue in the data processing (calculation and storage of data) and data consistency.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.