My problem is very spesific. I want to create narrow band noise for my small wpf application. I am using NAudio library for creating a infinite stream of noise that can be stopped and started by user. I created Tone(Simple sinus wave), Warble(Sinus wave that is modulated by another wave) and White noise.
This is the class I use in order to make any ISampleProvider that is stereo be able to give sound to left, right or both depending what user wants.
using NAudio.Wave;
using System;
namespace AppForSoundCard
{
public class SignalStereoProvider : ISampleProvider
{
private readonly ISampleProvider sample;
public WaveFormat WaveFormat => sample.WaveFormat;
public float LeftVolume { get; set; }
public float RightVolume { get; set; }
public SignalStereoProvider(ISampleProvider sample)
{
if (sample.WaveFormat.Channels != 2)
throw new ArgumentException("Source sample provider must be stereo");
this.sample = sample;
}
public int Read(float[] buffer, int offset, int count)
{
int samplesRead = sample.Read(buffer, offset, count);
for (int n = 0; n < count; n += 2)
{
buffer[offset + n] *= LeftVolume;
buffer[offset + n + 1] *= RightVolume;
}
return samplesRead;
}
}
}
I use this code to generate audio stream that I mentioned. You will ask what is NarrowBand32 is. It is the class that supposed to generate narrowband noise from white noise. I will write it's code after next paragraph.
signalGenerator = new SignalGenerator();
signalGenerator.Type = SignalGeneratorType.White;
signalGenerator.Gain = 1.0;
signalGenerator.Frequency = Frequency;
narrowBand = new NarrowBandProvider32(signalGenerator, Frequency, 96000, 100, dB);
stereoProvider = new SignalStereoProvider(narrowBand)
{
RightVolume = !((ComboBoxItem)RoutingCombobox.SelectedItem).Tag.ToString().Equals("Left") ? (float)Math.Pow(10, (dB - 80) / 20.0) * (float)Settings.Default["ReferenceAmplitudeFor" + Frequency.ToString()] : 0.0f,
LeftVolume = !((ComboBoxItem)RoutingCombobox.SelectedItem).Tag.ToString().Equals("Right") ? (float)Math.Pow(10, (dB - 80) / 20.0) * (float)Settings.Default["ReferenceAmplitudeFor" + Frequency.ToString()] : 0.0f
};
Output.Init(stereoProvider);
I first generate an white noise using NAudio's SignalGenerator class. Then give it to the NarrowbandProvider32 that I wrote. Which suppose to make white noise narrowband. After all that I make the sound either go left or right or both.
Left and right volume is amplitute value for desibel value that is given by user. There is a combobox about routng in which you can choose, left, right, bilateral. Depending on your choise leftvolume is the apmlitute, right volume is the amplitute or both of them is amplitute.
I have visited several sites about how a narrowband noise can be generated. They were all suggesting bandpass filtering a white noise to generate narrow band noise. I tried that. It sort of did what I wanted but it was narrower than I wanted. You can find frequency response of the noise that I generated for 500 hz.
Here is the NarrowBand32 class code for that noise
using NAudio.Dsp;
using NAudio.Wave;
using System;
namespace AppForSoundCard
{
class NarrowBandProvider32 : ISampleProvider
{
ISampleProvider sample;
float lowFreq;
float highFreq;
BiQuadFilter biQuad;
public WaveFormat WaveFormat => sample.WaveFormat;
public NarrowBandProvider32(ISampleProvider sample, float frequency, float sampleRate, float q, float dB)
{
if (sample.WaveFormat.Channels != 2)
throw new ArgumentException("Source sample provider must be stereo");
this.sample = sample;
//Low and High frequency variables are defined like this in audiometry.
//these variables are the boundaries for narrowband noise
lowFreq = (float)Math.Round(frequency / Math.Pow(2, 1.0 / 4.0));
highFreq = (float)Math.Round(frequency * Math.Pow(2, 1.0 / 4.0));
biQuad = BiQuadFilter.BandPassFilterConstantSkirtGain(sampleRate, frequency, q);
biQuad.SetHighPassFilter(sampleRate, lowFreq, q);
biQuad.SetLowPassFilter(sampleRate, highFreq, q);
}
public int Read(float[] buffer, int offset, int count)
{
int samplesRead = sample.Read(buffer, offset, count);
for (int i = 0; i < samplesRead; i++)
buffer[offset + i] = biQuad.Transform(buffer[offset + i]);
return samplesRead;
}
}
}
Those the arguments I gave:
narrowBand = new NarrowBandProvider32(signalGenerator, Frequency, 96000, 100, dB);
As I said this noise is close to the narrowband noise that is defined in audiometry but it is more narrow. Narrowband noise for 500 hz in audiometry has this frequency response.
As you can see it is more wide than the noise that I generated. How can I genereate a narrowband noise that is close to narrowband noise in audiometry for any hz. I only gave examples of 500 hz for the images but in my code you can generate a noise between 150hz to 8000hz. What filter should I use to filter white noise in order to generate that type of narrowband noise. Any help is appreciated.
Edit:
I find a standart which explains how a narrowband noise should be for any frequency and desibel.
Where narrow-band masking is required, the noise band shall be centred geometrically
around the test frequency. The band limits for the masking noise are given in Table 4.
Outside these band limits the sound pressure spectrum density level of the noise shall fall at
a rate of at least 12 dB per octave for at least three octaves and outside these three octaves it
shall be at least 36 dB below the level at the centre frequency. Measurements are required in
the range from 31,5 kHz to 10 kHz for instruments limited to 8 kHz. For EHF instruments
measurements are required up to 20 kHz.
Due to limitations of transducers, ear simulators, acoustic couplers and mechanical couplers,
measurements of the bandwidth at 4 kHz and above may not accurately describe the
spectrum of the masking noise. Therefore at centre frequencies above 3,15 kHz
measurements shall be made electrically across the transducer terminals.
With that definition, I guess just an standart bandpass filter wouldn't work and I have to define a custom filter for the noise. Is there a C# library that allows defining custom filters. If there is how should I define the custom filter in order to make noises in that standart.
They were all suggesting bandpass filtering a white noise to generate narrow band noise. I tried that. It sort of did what I wanted but it was narrower than I wanted.
The approach of applying a bandpass filter to a white noise source makes sense. The problem is just that the bandpass filter design is too narrow. You can make it wider by reducing the q, moving the lowFreq and highFreq a bit outward, or switching to a different filter design method.
I suggest that rather than coding directly in C#, it might be useful to prototype this first in Python using the scipy.signal library, which has a various tools for designing and working with filters.
In the code below, I vary the c parameter to tweak the low and high edges of the band.
Code:
# Copyright 2022 Google LLC.
# SPDX-License-Identifier: Apache-2.0
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as sig
fs = 96000 # Sample rate.
f0 = 500 # Center frequency in Hz.
# Generate noise with a few different bandwidths.
for c in [1.03, 1.07, 1.15]:
# Design a second-order Butterworth filter bandpass filter.
sos = sig.butter(2, [f0 / c, f0 * c], 'bandpass', output='sos', fs=fs)
# Generate white noise.
white_noise = np.random.randn(fs)
# Run it through the filter.
output = sig.sosfilt(sos, white_noise)
# Use Welch's method to estimate the PSD of the filtered noise.
f, psd = sig.welch(output, fs, nperseg=4096)
plt.semilogx(f, 10 * np.log10(psd), label=f'c = {c}')
plt.axvline(x=f0, color='k')
plt.xlim(50, fs/2)
plt.ylim(-140, -40)
plt.xlabel('Frequency (Hz)', fontsize=15)
plt.ylabel('PSD (dB)', fontsize=15)
plt.legend()
plt.show()
Output:
Related
this is the code that i found online somewhere; it works quite well, but i dont fully understand how it convert a bunch of math into an audio wave:
public static void Beeps(int Amplitude, int Frequency, int Duration)
{
double A = ((Amplitude * (System.Math.Pow(2, 15))) / 1000) - 1;
double DeltaFT = 2 * Math.PI * Frequency / 44100.0;
int Samples = 441 * Duration / 10;
int Bytes = Samples * 4;
int[] Hdr =
{ 0X46464952, 36 + Bytes, 0X45564157,
0X20746D66, 16, 0X20001, 44100, 176400, 0X100004,
0X61746164, Bytes };
using (MemoryStream MS = new MemoryStream(44 + Bytes))
{
using (BinaryWriter BW = new BinaryWriter(MS))
{
for (int I = 0; I < Hdr.Length; I++)
{
BW.Write(Hdr[I]);
}
for (int T = 0; T < Samples; T++)
{
short Sample = System.Convert.ToInt16(A * Math.Sin(DeltaFT * T));
BW.Write(Sample);
BW.Write(Sample);
}
BW.Flush();
MS.Seek(0, SeekOrigin.Begin);
using (SoundPlayer SP = new SoundPlayer(MS))
{
SP.PlaySync();
}
}
}
}
It looks like all it does is beep at certain pitches. The reason math converts into sound is because when the data is fed to your speaker, it's really bytes telling it how to vibrate during that instant.
If you're asking about how sound works, it's based on how vibrations move through the air. Vibrations exist as waves; they literally are shaking the air in certain patterns that your brain interprets as noise through your ears. If the sound has a higher pitch, the soundwaves are closer to each other, and if it's a lower pitch, they're further away. This is why a computer can "convert a bunch of math into an audio wave", because that's all sound really is: a constantly manipulated wave. That method takes a wavelength (Frequency) and creates a sine wave based on it, converts it to bytes, and feeds it to your speaker with a certain volume (Amplitude) and for a certain duration. Cool stuff right?
Also, you're looking at a "method", not a class. :)
Here's more about sound if you're interested: https://en.wikipedia.org/wiki/Sound#Sound_wave_properties_and_characteristics
This answer has a good overview of how wav files work:
Simply sample the waveform at fixed intervals, and write the amplitude at each interval into your file.
That's what the BW.Write calls are doing. T represents the Time.
In order to play the sound, that data goes after the Hdr section, which is simply the correct header for a standard .wav file. 0X46464952 is ascii for "RIFF" and 0X45564157 is "WAVE". The player needs to know what rate the wave was sampled at. In this case it's 44100, which is a common standard.
I am totally new to signal processing and am trying to make a program that shows the amplitude of low frequency signals in a PCM (WAV) file.
So far, I've been able to read in the WAV file and populate an array (actually a multi-dimensional array, one for each channel, but let's consider only on a channel-by-channel basis) of float with the data points of the sound file taken from the WAV. Each data point is an amplitude. In short, I have the time-domain representation of the sound wave.
I use this to draw a graph of the amplitude of the wave with respect to time, which looks like:
My goal is to do exactly the same, but only display frequencies below a certain value (eg. 350Hz). To be clear, it's not that I want to display a graph in the frequency domain (ie. after a Fast Fourier Transform). I want to display the same amplitude vs. time graph, but for frequencies in the range [0, 350Hz].
I'm looking for a function that can do:
// Returns an array of data points that contains
// amplitude data points, after a low pass filter
float[] low_pass_filter(float[] original_data, float low_pass_freq=350.0)
{
...
}
I've read up on the FFT, read Chris Lomont's code for the FFT and understand the "theory" behind a low-pass filter, but I'm finding it difficult to get my head around how to actually implement this specific function (above). Any help (+ explanations) would be greatly appreciated!
I ended up using this example which works really well. I wrapped it a bit nicer and ended up with:
/// <summary>
/// Returns a low-pass filter of the data
/// </summary>
/// <param name="data">Data to filter</param>
/// <param name="cutoff_freq">The frequency below which data will be preserved</param>
private float[] lowPassFilter(ref float[] data, float cutoff_freq, int sample_rate, float quality_factor=1.0f)
{
// Calculate filter parameters
float O = (float)(2.0 * Math.PI * cutoff_freq / sample_rate);
float C = quality_factor / O;
float L = 1 / quality_factor / O;
// Loop through and apply the filter
float[] output = new float[data.Length];
float V = 0, I = 0, T;
for (int s = 0; s < data.Length; s++)
{
T = (I - V) / C;
I += (data[s] * O - V) / L;
V += T;
output[s] = V / O;
}
return output;
}
The output of both regular and low-pass waveforms:
And isolating the regular waveforms vs low-pass waveforms:
Ive been experimenting with the FFT algorithm. I use NAudio along with a working code of the FFT algorithm from the internet. Based on my observations of the performance, the resulting pitch is inaccurate.
What happens is that I have an MIDI (generated from GuitarPro) converted to WAV file (44.1khz, 16-bit, mono) that contains a pitch progression starting from E2 (the lowest guitar note) up to about E6. What results is for the lower notes (around E2-B3) its generally very wrong. But reaching C4 its somewhat correct in that you can already see the proper progression (next note is C#4, then D4, etc.) However, the problem there is that the pitch detected is a half-note lower than the actual pitch (e.g. C4 should be the note but D#4 is displayed).
What do you think may be wrong? I can post the code if necessary. Thanks very much! Im still beginning to grasp the field of DSP.
Edit: Here is a rough scratch of what Im doing
byte[] buffer = new byte[8192];
int bytesRead;
do
{
bytesRead = stream16.Read(buffer, 0, buffer.Length);
} while (bytesRead != 0);
And then: (waveBuffer is simply a class that is there to convert the byte[] into float[] since the function only accepts float[])
public int Read(byte[] buffer, int offset, int bytesRead)
{
int frames = bytesRead / sizeof(float);
float pitch = DetectPitch(waveBuffer.FloatBuffer, frames);
}
And lastly: (Smbpitchfft is the class that has the FFT algo ... i believe theres nothing wrong with it so im not posting it here)
private float DetectPitch(float[] buffer, int inFrames)
{
Func<int, int, float> window = HammingWindow;
if (prevBuffer == null)
{
prevBuffer = new float[inFrames]; //only contains zeroes
}
// double frames since we are combining present and previous buffers
int frames = inFrames * 2;
if (fftBuffer == null)
{
fftBuffer = new float[frames * 2]; // times 2 because it is complex input
}
for (int n = 0; n < frames; n++)
{
if (n < inFrames)
{
fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
}
else
{
fftBuffer[n * 2] = buffer[n - inFrames] * window(n, frames);
fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
}
}
SmbPitchShift.smbFft(fftBuffer, frames, -1);
}
And for interpreting the result:
float binSize = sampleRate / frames;
int minBin = (int)(82.407 / binSize); //lowest E string on the guitar
int maxBin = (int)(1244.508 / binSize); //highest E string on the guitar
float maxIntensity = 0f;
int maxBinIndex = 0;
for (int bin = minBin; bin <= maxBin; bin++)
{
float real = fftBuffer[bin * 2];
float imaginary = fftBuffer[bin * 2 + 1];
float intensity = real * real + imaginary * imaginary;
if (intensity > maxIntensity)
{
maxIntensity = intensity;
maxBinIndex = bin;
}
}
return binSize * maxBinIndex;
UPDATE (if anyone is still interested):
So, one of the answers below stated that the frequency peak from the FFT is not always equivalent to pitch. I understand that. But I wanted to try something for myself if that was the case (on the assumption that there are times in which the frequency peak IS the resulting pitch). So basically, I got 2 softwares (SpectraPLUS and FFTProperties by DewResearch ; credits to them) that is able to display the frequency-domain for the audio signals.
So here are the results of the frequency peaks in the time domain:
SpectraPLUS
and FFT Properties:
This was done using a test note of A2 (around 110Hz). Upon looking at the images, they have frequency peaks around the range of 102-112 Hz for SpectraPLUS and 108 Hz for FFT Properties. On my code, I get 104Hz (I use 8192 blocks and a samplerate of 44.1khz ... 8192 is then doubled to make it complex input so in the end, I get around 5Hz for binsize, as compared to the 10Hz binsize of SpectraPLUS).
So now Im a bit confused, since on the softwares they seem to return the correct result but on my code, I always get 104Hz (note that I have compared the FFT function that I used with others such as Math.Net and it seems to be correct).
Do you think that the problem may be with my interpretation of the data? Or do the softwares do some other thing before displaying the Frequency-Spectrum? Thanks!
It sounds like you may have an interpretation problem with your FFT output. A few random points:
the FFT has a finite resolution - each output bin has a resolution of Fs / N, where Fs is the sample rate and N is the size of the FFT
for notes which are low on the musical scale, the difference in frequency between successive notes is relatively small, so you will need a sufficiently large N to discrimninate between notes which are a semitone apart (see note 1 below)
the first bin (index 0) contains energy centered at 0 Hz but includes energy from +/- Fs / 2N
bin i contains energy centered at i * Fs / N but includes energy from +/- Fs / 2N either side of this center frequency
you will get spectral leakage from adjacent bins - how bad this is depends on what window function you use - no window (== rectangular window) and spectral leakage will be very bad (very broad peaks) - for frequency estimation you want to pick a window function that gives you sharp peaks
pitch is not the same thing as frequency - pitch is a percept, frequency is a physical quantity - the perceived pitch of a musical instrument may be slightly different from the fundamental frequency, depending on the type of instrument (some instruments do not even produce significant energy at their fundamental frequency, yet we still perceive their pitch as if the fundamental were present)
My best guess from the limited information available though is that perhaps you are "off by one" somewhere in your conversion of bin index to frequency, or perhaps your FFT is too small to give you sufficient resolution for the low notes, and you may need to increase N.
You can also improve your pitch estimation via several techniques, such as cepstral analysis, or by looking at the phase component of your FFT output and comparing it for successive FFTs (this allows for a more accurate frequency estimate within a bin for a given FFT size).
Notes
(1) Just to put some numbers on this, E2 is 82.4 Hz, F2 is 87.3 Hz, so you need a resolution somewhat better than 5 Hz to discriminate between the lowest two notes on a guitar (and much finer than this if you actually want to do, say, accurate tuning). At a 44.1 kHz sample then you probably need an FFT of at least N = 8192 to give you sufficient resolution (44100 / 8192 = 5.4 Hz), probably N = 16384 would be better.
I thought this might help you. I made some plots of the 6 open strings of a guitar. The code is in Python using pylab, which I recommend for experimenting:
# analyze distorted guitar notes from
# http://www.freesound.org/packsViewSingle.php?id=643
#
# 329.6 E - open 1st string
# 246.9 B - open 2nd string
# 196.0 G - open 3rd string
# 146.8 D - open 4th string
# 110.0 A - open 5th string
# 82.4 E - open 6th string
from pylab import *
import wave
fs = 44100.0
N = 8192 * 10
t = r_[:N] / fs
f = r_[:N/2+1] * fs / N
gtr_fun = [329.6, 246.9, 196.0, 146.8, 110.0, 82.4]
gtr_wav = [wave.open('dist_gtr_{0}.wav'.format(n),'r') for n in r_[1:7]]
gtr = [fromstring(g.readframes(N), dtype='int16') for g in gtr_wav]
gtr_t = [g / float64(max(abs(g))) for g in gtr]
gtr_f = [2 * abs(rfft(g)) / N for g in gtr_t]
def make_plots():
for n in r_[:len(gtr_t)]:
fig = figure()
fig.subplots_adjust(wspace=0.5, hspace=0.5)
subplot2grid((2,2), (0,0))
plot(t, gtr_t[n]); axis('tight')
title('String ' + str(n+1) + ' Waveform')
subplot2grid((2,2), (0,1))
plot(f, gtr_f[n]); axis('tight')
title('String ' + str(n+1) + ' DFT')
subplot2grid((2,2), (1,0), colspan=2)
M = int(gtr_fun[n] * 16.5 / fs * N)
plot(f[:M], gtr_f[n][:M]); axis('tight')
title('String ' + str(n+1) + ' DFT (16 Harmonics)')
if __name__ == '__main__':
make_plots()
show()
String 1, fundamental = 329.6 Hz:
String 2, fundamental = 246.9 Hz:
String 3, fundamental = 196.0 Hz:
String 4, fundamental = 146.8 Hz:
String 5, fundamental = 110.0 Hz:
String 6, fundamental = 82.4 Hz:
The fundamental frequency isn't always the dominant harmonic. It determines the spacing between harmonics of a periodic signal.
I had a similar question and the answer for me was to use Goertzel instead of FFT. If you know what tones you are looking for (MIDI) Goertzel is capable of detecting the tones to within one sinus wave (one cycle). It does this by generating the sinus wave of the sound and "placing it on top of the raw data" to see if it exist. FFT samples large amounts of data to provide an aproximate frequency spectrum.
Musical pitch is different from frequency peak. Pitch is a psycho-perceptual phenomena that may depend more on the overtones and such. The frequency of what a human would call the pitch could be missing or quite small in the actual signal spectra.
And a frequency peak in a spectrum can be different from any FFT bin center. The FFT bin center frequencies will change in frequency and spacing depending only on the FFT length and sample rate, not the spectra in the data.
So you have at least 2 problems with which to contend. There are a ton of academic papers on frequency estimation as well as the separate subject of pitch estimation. Start there.
I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.
Images have the following properties:
Consist of dark text on a light background
Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
Skewed between -45 and 45 degrees.
See this image as a reference (its been skewed 2.8 degrees).
So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.
Here's my code:
private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }
private bool IsBlack(Color c) { return !IsWhite(c); }
private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }
private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
decimal minSlope = 0.0M;
decimal maxSlope = 0.0M;
for (int start_y = 0; start_y < image.Height; start_y++)
{
int end_y = start_y;
for (int x = 1; x < image.Width; x++)
{
int above_y = Math.Max(end_y - 1, 0);
int below_y = Math.Min(end_y + 1, image.Height - 1);
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
minSlope = Math.Min(minSlope, slope);
maxSlope = Math.Max(maxSlope, slope);
}
minSkew = ToDegrees(minSlope);
maxSkew = ToDegrees(maxSlope);
}
This works well on some images, not so well on others, and its slow.
Is there a more efficient, more reliable way to determine the tilt of an image?
I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.
I've made the following improvements:
Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.
My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:
http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif
Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:
http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif
As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:
http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif
Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.
The selected path is pretty good for a greedy algorithm.
As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.
Code
private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }
private double GetSkew(Bitmap image)
{
BrightnessWrapper wrapper = new BrightnessWrapper(image);
LinkedList<double> slopes = new LinkedList<double>();
for (int y = 0; y < wrapper.Height; y++)
{
int endY = y;
long sumOfX = 0;
long sumOfY = y;
long sumOfXY = 0;
long sumOfXX = 0;
int itemsInSet = 1;
for (int x = 1; x < wrapper.Width; x++)
{
int aboveY = endY - 1;
int belowY = endY + 1;
if (aboveY < 0 || belowY >= wrapper.Height)
{
break;
}
int center = wrapper.GetBrightness(x, endY);
int above = wrapper.GetBrightness(x, aboveY);
int below = wrapper.GetBrightness(x, belowY);
if (center >= above && center >= below) { /* no change to endY */ }
else if (above >= center && above >= below) { endY = aboveY; }
else if (below >= center && below >= above) { endY = belowY; }
itemsInSet++;
sumOfX += x;
sumOfY += endY;
sumOfXX += (x * x);
sumOfXY += (x * endY);
}
// least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
if (itemsInSet > image.Width / 2) // path covers at least half of the image
{
decimal sumOfX_d = Convert.ToDecimal(sumOfX);
decimal sumOfY_d = Convert.ToDecimal(sumOfY);
decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
decimal slope =
((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
/
((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));
slopes.AddLast(Convert.ToDouble(slope));
}
}
double mean = slopes.Average();
double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));
// select items within 1 standard deviation of the mean
var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);
return ToDegrees(testSample.Average());
}
class BrightnessWrapper
{
byte[] rgbValues;
int stride;
public int Height { get; private set; }
public int Width { get; private set; }
public BrightnessWrapper(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect,
System.Drawing.Imaging.ImageLockMode.ReadOnly,
bmp.PixelFormat);
IntPtr ptr = bmpData.Scan0;
int bytes = bmpData.Stride * bmp.Height;
this.rgbValues = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(ptr,
rgbValues, 0, bytes);
this.Height = bmp.Height;
this.Width = bmp.Width;
this.stride = bmpData.Stride;
}
public int GetBrightness(int x, int y)
{
int position = (y * this.stride) + (x * 3);
int b = rgbValues[position];
int g = rgbValues[position + 1];
int r = rgbValues[position + 2];
return (r + r + b + g + g + g) / 6;
}
}
The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.
There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.
GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.
If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.
First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.
But from a performance standpoint there are a number of optimizations you could make which may add up.
Namely, I'd change this snippet from your inner loop from this:
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
To this:
Color center = image.GetPixel(x, end_y);
if (IsWhite(center)) { /* no change to end_y */ }
else
{
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
It's the same effect but should drastically reduce the number of calls to GetPixel.
Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.
Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.
At first glance, your code looks overly naive.
Which explains why it doesn't always work.
I like the approach Steve Wortham suggested,
but it might run into problems if you have background images.
Another approach that often helps with images is to blur them first.
If you blur your example image enough, each line of text will end up
as a blurry smooth line. You then apply some sort of algorithm to
basically do a regression analisys. There's lots of ways to do
that, and lots of examples on the net.
Edge detection might be useful, or it might cause more problems that its worth.
By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available.
Haven't done much of that lately so don't have any links on hand.
But a search for Image Processing library will get you good results.
I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.
Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.
I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.
Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.
For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.
What are your constraints in terms of time?
The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.
Your latest output is confusing me a little.
When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.
Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.
I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.
Very cool path finding application.
I wonder if this other approach would help or hurt with your particular data set.
Assume a black and white image:
Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
Rotate the image an arbitrary number of degrees and re-project.
Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.
I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.
This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Where can I find a free, very quick, and reliable implementation of FFT in C#?
That can be used in a product? Or are there any restrictions?
The guy that did AForge did a fairly good job but it's not commercial quality. It's great to learn from but you can tell he was learning too so he has some pretty serious mistakes like assuming the size of an image instead of using the correct bits per pixel.
I'm not knocking the guy, I respect the heck out of him for learning all that and show us how to do it. I think he's a Ph.D now or at least he's about to be so he's really smart it's just not a commercially usable library.
The Math.Net library has its own weirdness when working with Fourier transforms and complex images/numbers. Like, if I'm not mistaken, it outputs the Fourier transform in human viewable format which is nice for humans if you want to look at a picture of the transform but it's not so good when you are expecting the data to be in a certain format (the normal format). I could be mistaken about that but I just remember there was some weirdness so I actually went to the original code they used for the Fourier stuff and it worked much better. (ExocortexDSP v1.2 http://www.exocortex.org/dsp/)
Math.net also had some other funkyness I didn't like when dealing with the data from the FFT, I can't remember what it was I just know it was much easier to get what I wanted out of the ExoCortex DSP library. I'm not a mathematician or engineer though; to those guys it might make perfect sense.
So! I use the FFT code yanked from ExoCortex, which Math.Net is based on, without anything else and it works great.
And finally, I know it's not C#, but I've started looking at using FFTW (http://www.fftw.org/). And this guy already made a C# wrapper so I was going to check it out but haven't actually used it yet. (http://www.sdss.jhu.edu/~tamas/bytes/fftwcsharp.html)
OH! I don't know if you are doing this for school or work but either way there is a GREAT free lecture series given by a Stanford professor on iTunes University.
https://podcasts.apple.com/us/podcast/the-fourier-transforms-and-its-applications/id384232849
AForge.net is a free (open-source) library with Fast Fourier Transform support. (See Sources/Imaging/ComplexImage.cs for usage, Sources/Math/FourierTransform.cs for implemenation)
Math.NET's Iridium library provides a fast, regularly updated collection of math-related functions, including the FFT. It's licensed under the LGPL so you are free to use it in commercial products.
I see this is an old thread, but for what it's worth, here's a free (MIT License) 1-D power-of-2-length-only C# FFT implementation I wrote in 2010.
I haven't compared its performance to other C# FFT implementations. I wrote it mainly to compare the performance of Flash/ActionScript and Silverlight/C#. The latter is much faster, at least for number crunching.
/**
* Performs an in-place complex FFT.
*
* Released under the MIT License
*
* Copyright (c) 2010 Gerald T. Beauregard
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
public class FFT2
{
// Element for linked list in which we store the
// input/output data. We use a linked list because
// for sequential access it's faster than array index.
class FFTElement
{
public double re = 0.0; // Real component
public double im = 0.0; // Imaginary component
public FFTElement next; // Next element in linked list
public uint revTgt; // Target position post bit-reversal
}
private uint m_logN = 0; // log2 of FFT size
private uint m_N = 0; // FFT size
private FFTElement[] m_X; // Vector of linked list elements
/**
*
*/
public FFT2()
{
}
/**
* Initialize class to perform FFT of specified size.
*
* #param logN Log2 of FFT length. e.g. for 512 pt FFT, logN = 9.
*/
public void init(
uint logN )
{
m_logN = logN;
m_N = (uint)(1 << (int)m_logN);
// Allocate elements for linked list of complex numbers.
m_X = new FFTElement[m_N];
for (uint k = 0; k < m_N; k++)
m_X[k] = new FFTElement();
// Set up "next" pointers.
for (uint k = 0; k < m_N-1; k++)
m_X[k].next = m_X[k+1];
// Specify target for bit reversal re-ordering.
for (uint k = 0; k < m_N; k++ )
m_X[k].revTgt = BitReverse(k,logN);
}
/**
* Performs in-place complex FFT.
*
* #param xRe Real part of input/output
* #param xIm Imaginary part of input/output
* #param inverse If true, do an inverse FFT
*/
public void run(
double[] xRe,
double[] xIm,
bool inverse = false )
{
uint numFlies = m_N >> 1; // Number of butterflies per sub-FFT
uint span = m_N >> 1; // Width of the butterfly
uint spacing = m_N; // Distance between start of sub-FFTs
uint wIndexStep = 1; // Increment for twiddle table index
// Copy data into linked complex number objects
// If it's an IFFT, we divide by N while we're at it
FFTElement x = m_X[0];
uint k = 0;
double scale = inverse ? 1.0/m_N : 1.0;
while (x != null)
{
x.re = scale*xRe[k];
x.im = scale*xIm[k];
x = x.next;
k++;
}
// For each stage of the FFT
for (uint stage = 0; stage < m_logN; stage++)
{
// Compute a multiplier factor for the "twiddle factors".
// The twiddle factors are complex unit vectors spaced at
// regular angular intervals. The angle by which the twiddle
// factor advances depends on the FFT stage. In many FFT
// implementations the twiddle factors are cached, but because
// array lookup is relatively slow in C#, it's just
// as fast to compute them on the fly.
double wAngleInc = wIndexStep * 2.0*Math.PI/m_N;
if (inverse == false)
wAngleInc *= -1;
double wMulRe = Math.Cos(wAngleInc);
double wMulIm = Math.Sin(wAngleInc);
for (uint start = 0; start < m_N; start += spacing)
{
FFTElement xTop = m_X[start];
FFTElement xBot = m_X[start+span];
double wRe = 1.0;
double wIm = 0.0;
// For each butterfly in this stage
for (uint flyCount = 0; flyCount < numFlies; ++flyCount)
{
// Get the top & bottom values
double xTopRe = xTop.re;
double xTopIm = xTop.im;
double xBotRe = xBot.re;
double xBotIm = xBot.im;
// Top branch of butterfly has addition
xTop.re = xTopRe + xBotRe;
xTop.im = xTopIm + xBotIm;
// Bottom branch of butterly has subtraction,
// followed by multiplication by twiddle factor
xBotRe = xTopRe - xBotRe;
xBotIm = xTopIm - xBotIm;
xBot.re = xBotRe*wRe - xBotIm*wIm;
xBot.im = xBotRe*wIm + xBotIm*wRe;
// Advance butterfly to next top & bottom positions
xTop = xTop.next;
xBot = xBot.next;
// Update the twiddle factor, via complex multiply
// by unit vector with the appropriate angle
// (wRe + j wIm) = (wRe + j wIm) x (wMulRe + j wMulIm)
double tRe = wRe;
wRe = wRe*wMulRe - wIm*wMulIm;
wIm = tRe*wMulIm + wIm*wMulRe;
}
}
numFlies >>= 1; // Divide by 2 by right shift
span >>= 1;
spacing >>= 1;
wIndexStep <<= 1; // Multiply by 2 by left shift
}
// The algorithm leaves the result in a scrambled order.
// Unscramble while copying values from the complex
// linked list elements back to the input/output vectors.
x = m_X[0];
while (x != null)
{
uint target = x.revTgt;
xRe[target] = x.re;
xIm[target] = x.im;
x = x.next;
}
}
/**
* Do bit reversal of specified number of places of an int
* For example, 1101 bit-reversed is 1011
*
* #param x Number to be bit-reverse.
* #param numBits Number of bits in the number.
*/
private uint BitReverse(
uint x,
uint numBits)
{
uint y = 0;
for (uint i = 0; i < numBits; i++)
{
y <<= 1;
y |= x & 0x0001;
x >>= 1;
}
return y;
}
}
An old question but it still shows up in Google results...
A very un-restrictive MIT Licensed C# / .NET library can be found at,
https://www.codeproject.com/articles/1107480/dsplib-fft-dft-fourier-transform-library-for-net
This library is fast as it parallel threads on multiple cores and is very complete and ready to use.
Here's another; a C# port of the Ooura FFT. It's reasonably fast. The package also includes overlap/add convolution and some other DSP stuff, under the MIT license.
https://github.com/hughpyle/inguz-DSPUtil/blob/master/Fourier.cs
http://www.exocortex.org/dsp/ is an open-source C# mathematics library with FFT algorithms.
The Numerical Recipes website (http://www.nr.com/) has an FFT if you don't mind typing it in. I am working on a project converting a Labview program to C# 2008, .NET 3.5 to acquire data and then look at the frequency spectrum. Unfortunately the Math.Net uses the latest .NET framework, so I couldn't use that FFT. I tried the Exocortex one - it worked but the results to match the Labview results and I don't know enough FFT theory to know what is causing the problem. So I tried the FFT on the numerical recipes website and it worked! I was also able to program the Labview low sidelobe window (and had to introduce a scaling factor).
You can read the chapter of the Numerical Recipes book as a guest on thier site, but the book is so useful that I highly recomend purchasing it. Even if you do end up using the Math.NET FFT.
For a multi-threaded implementation tuned for Intel processors I'd check out Intel's MKL library. It's not free, but it's afforable (less than $100) and blazing fast - but you'd need to call it's C dll's via P/Invokes. The Exocortex project stopped development 6 years ago, so I'd be careful using it if this is an important project.