Convert Audio data in IeeeFloat buffer to PCM in buffer - c#

I use NAudio to capture sound input and the input appears as a buffer containing the sound information in IeeeFloat format.
Now that I have this data in the buffer, I want to translate it to PCM at a different sampling rate.
I have already figured out how to convert from IeeeFloat to PCM, and also convert between mono and stereo. Converting the sampling rate is the tough one.
Any solution, preferable using NAudio, that can convert the IeeeFLoat buffer to a buffer with PCM format of choice (including changing sampling rate)?

If you want to resample while you receive data, then you need to perform input driven resampling. I wrote an article on this a while ago.
NAudio has some helper classes to go from mono to stereo, and float to PCM, but they tend to operate on IWaveProvider or ISampleProvider inputs. Typically if I just had the samples as a raw block of bytes I'd write by own simple code to go from float to PCM and double up the samples. It's not that hard to do and the WaveBuffer class will allow you to read float samples directly from a byte[].

I recently had to do this and couldn't find a built in way to do it, so I did just what Mark is talking about, converting the raw data manually. Below is code to downsample IeeeFloat (32 bit float samples), 48000 samples/second, 2 channels to 16 bit short, 16000 samples/second, 1 channel.
I hardcoded some things because my formats were known and fixed, but the same principles apply.
private DownsampleFile()
{
var file = {your file}
using (var reader = new NAudio.Wave.WaveFileReader(file.FullName))
using (var writer = new NAudio.Wave.WaveFileWriter({your output file}, MyWaveFormat))
{
float[] floats;
//a variable to flag the mod 3-ness of the current sample
//we're mapping 48000 --> 16000, so we need to average 3 source
//samples to make 1 output sample
var arity = -1;
var runningSamples = new short[3];
while ((floats = reader.ReadNextSampleFrame()) != null)
{
//simple average to collapse 2 channels into 1
float mono = (float)((double)floaters[0] + (double)floaters[1]) / 2;
//convert (-1, 1) range int to short
short sixteenbit = (short)(mono * 32767);
//the input is 48000Hz and the output is 16000Hz, so we need 1/3rd of the data points
//so save up 3 running samples and then mix and write to the file
arity = (arity + 1) % 3;
runningSamples[arity] = sixteenbit;
//on the third of 3 running samples
if (arity == 2)
{
//simple average of the 3 and put in the 0th position
runningSamples[0] = (short)(((int)runningSamples[0] + (int)runningSamples[1] + (int)runningSamples[2]) / 3);
//write the one 16 bit short to the output
writer.WriteData(runningSamples, 0, 1);
}
}
}
}

Related

How can I create narrow band noise in C#?

My problem is very spesific. I want to create narrow band noise for my small wpf application. I am using NAudio library for creating a infinite stream of noise that can be stopped and started by user. I created Tone(Simple sinus wave), Warble(Sinus wave that is modulated by another wave) and White noise.
This is the class I use in order to make any ISampleProvider that is stereo be able to give sound to left, right or both depending what user wants.
using NAudio.Wave;
using System;
namespace AppForSoundCard
{
public class SignalStereoProvider : ISampleProvider
{
private readonly ISampleProvider sample;
public WaveFormat WaveFormat => sample.WaveFormat;
public float LeftVolume { get; set; }
public float RightVolume { get; set; }
public SignalStereoProvider(ISampleProvider sample)
{
if (sample.WaveFormat.Channels != 2)
throw new ArgumentException("Source sample provider must be stereo");
this.sample = sample;
}
public int Read(float[] buffer, int offset, int count)
{
int samplesRead = sample.Read(buffer, offset, count);
for (int n = 0; n < count; n += 2)
{
buffer[offset + n] *= LeftVolume;
buffer[offset + n + 1] *= RightVolume;
}
return samplesRead;
}
}
}
I use this code to generate audio stream that I mentioned. You will ask what is NarrowBand32 is. It is the class that supposed to generate narrowband noise from white noise. I will write it's code after next paragraph.
signalGenerator = new SignalGenerator();
signalGenerator.Type = SignalGeneratorType.White;
signalGenerator.Gain = 1.0;
signalGenerator.Frequency = Frequency;
narrowBand = new NarrowBandProvider32(signalGenerator, Frequency, 96000, 100, dB);
stereoProvider = new SignalStereoProvider(narrowBand)
{
RightVolume = !((ComboBoxItem)RoutingCombobox.SelectedItem).Tag.ToString().Equals("Left") ? (float)Math.Pow(10, (dB - 80) / 20.0) * (float)Settings.Default["ReferenceAmplitudeFor" + Frequency.ToString()] : 0.0f,
LeftVolume = !((ComboBoxItem)RoutingCombobox.SelectedItem).Tag.ToString().Equals("Right") ? (float)Math.Pow(10, (dB - 80) / 20.0) * (float)Settings.Default["ReferenceAmplitudeFor" + Frequency.ToString()] : 0.0f
};
Output.Init(stereoProvider);
I first generate an white noise using NAudio's SignalGenerator class. Then give it to the NarrowbandProvider32 that I wrote. Which suppose to make white noise narrowband. After all that I make the sound either go left or right or both.
Left and right volume is amplitute value for desibel value that is given by user. There is a combobox about routng in which you can choose, left, right, bilateral. Depending on your choise leftvolume is the apmlitute, right volume is the amplitute or both of them is amplitute.
I have visited several sites about how a narrowband noise can be generated. They were all suggesting bandpass filtering a white noise to generate narrow band noise. I tried that. It sort of did what I wanted but it was narrower than I wanted. You can find frequency response of the noise that I generated for 500 hz.
Here is the NarrowBand32 class code for that noise
using NAudio.Dsp;
using NAudio.Wave;
using System;
namespace AppForSoundCard
{
class NarrowBandProvider32 : ISampleProvider
{
ISampleProvider sample;
float lowFreq;
float highFreq;
BiQuadFilter biQuad;
public WaveFormat WaveFormat => sample.WaveFormat;
public NarrowBandProvider32(ISampleProvider sample, float frequency, float sampleRate, float q, float dB)
{
if (sample.WaveFormat.Channels != 2)
throw new ArgumentException("Source sample provider must be stereo");
this.sample = sample;
//Low and High frequency variables are defined like this in audiometry.
//these variables are the boundaries for narrowband noise
lowFreq = (float)Math.Round(frequency / Math.Pow(2, 1.0 / 4.0));
highFreq = (float)Math.Round(frequency * Math.Pow(2, 1.0 / 4.0));
biQuad = BiQuadFilter.BandPassFilterConstantSkirtGain(sampleRate, frequency, q);
biQuad.SetHighPassFilter(sampleRate, lowFreq, q);
biQuad.SetLowPassFilter(sampleRate, highFreq, q);
}
public int Read(float[] buffer, int offset, int count)
{
int samplesRead = sample.Read(buffer, offset, count);
for (int i = 0; i < samplesRead; i++)
buffer[offset + i] = biQuad.Transform(buffer[offset + i]);
return samplesRead;
}
}
}
Those the arguments I gave:
narrowBand = new NarrowBandProvider32(signalGenerator, Frequency, 96000, 100, dB);
As I said this noise is close to the narrowband noise that is defined in audiometry but it is more narrow. Narrowband noise for 500 hz in audiometry has this frequency response.
As you can see it is more wide than the noise that I generated. How can I genereate a narrowband noise that is close to narrowband noise in audiometry for any hz. I only gave examples of 500 hz for the images but in my code you can generate a noise between 150hz to 8000hz. What filter should I use to filter white noise in order to generate that type of narrowband noise. Any help is appreciated.
Edit:
I find a standart which explains how a narrowband noise should be for any frequency and desibel.
Where narrow-band masking is required, the noise band shall be centred geometrically
around the test frequency. The band limits for the masking noise are given in Table 4.
Outside these band limits the sound pressure spectrum density level of the noise shall fall at
a rate of at least 12 dB per octave for at least three octaves and outside these three octaves it
shall be at least 36 dB below the level at the centre frequency. Measurements are required in
the range from 31,5 kHz to 10 kHz for instruments limited to 8 kHz. For EHF instruments
measurements are required up to 20 kHz.
Due to limitations of transducers, ear simulators, acoustic couplers and mechanical couplers,
measurements of the bandwidth at 4 kHz and above may not accurately describe the
spectrum of the masking noise. Therefore at centre frequencies above 3,15 kHz
measurements shall be made electrically across the transducer terminals.
With that definition, I guess just an standart bandpass filter wouldn't work and I have to define a custom filter for the noise. Is there a C# library that allows defining custom filters. If there is how should I define the custom filter in order to make noises in that standart.
They were all suggesting bandpass filtering a white noise to generate narrow band noise. I tried that. It sort of did what I wanted but it was narrower than I wanted.
The approach of applying a bandpass filter to a white noise source makes sense. The problem is just that the bandpass filter design is too narrow. You can make it wider by reducing the q, moving the lowFreq and highFreq a bit outward, or switching to a different filter design method.
I suggest that rather than coding directly in C#, it might be useful to prototype this first in Python using the scipy.signal library, which has a various tools for designing and working with filters.
In the code below, I vary the c parameter to tweak the low and high edges of the band.
Code:
# Copyright 2022 Google LLC.
# SPDX-License-Identifier: Apache-2.0
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as sig
fs = 96000 # Sample rate.
f0 = 500 # Center frequency in Hz.
# Generate noise with a few different bandwidths.
for c in [1.03, 1.07, 1.15]:
# Design a second-order Butterworth filter bandpass filter.
sos = sig.butter(2, [f0 / c, f0 * c], 'bandpass', output='sos', fs=fs)
# Generate white noise.
white_noise = np.random.randn(fs)
# Run it through the filter.
output = sig.sosfilt(sos, white_noise)
# Use Welch's method to estimate the PSD of the filtered noise.
f, psd = sig.welch(output, fs, nperseg=4096)
plt.semilogx(f, 10 * np.log10(psd), label=f'c = {c}')
plt.axvline(x=f0, color='k')
plt.xlim(50, fs/2)
plt.ylim(-140, -40)
plt.xlabel('Frequency (Hz)', fontsize=15)
plt.ylabel('PSD (dB)', fontsize=15)
plt.legend()
plt.show()
Output:

Can someone explain how this class (which generates a sine wave frequency that can be changed) works?

this is the code that i found online somewhere; it works quite well, but i dont fully understand how it convert a bunch of math into an audio wave:
public static void Beeps(int Amplitude, int Frequency, int Duration)
{
double A = ((Amplitude * (System.Math.Pow(2, 15))) / 1000) - 1;
double DeltaFT = 2 * Math.PI * Frequency / 44100.0;
int Samples = 441 * Duration / 10;
int Bytes = Samples * 4;
int[] Hdr =
{ 0X46464952, 36 + Bytes, 0X45564157,
0X20746D66, 16, 0X20001, 44100, 176400, 0X100004,
0X61746164, Bytes };
using (MemoryStream MS = new MemoryStream(44 + Bytes))
{
using (BinaryWriter BW = new BinaryWriter(MS))
{
for (int I = 0; I < Hdr.Length; I++)
{
BW.Write(Hdr[I]);
}
for (int T = 0; T < Samples; T++)
{
short Sample = System.Convert.ToInt16(A * Math.Sin(DeltaFT * T));
BW.Write(Sample);
BW.Write(Sample);
}
BW.Flush();
MS.Seek(0, SeekOrigin.Begin);
using (SoundPlayer SP = new SoundPlayer(MS))
{
SP.PlaySync();
}
}
}
}
It looks like all it does is beep at certain pitches. The reason math converts into sound is because when the data is fed to your speaker, it's really bytes telling it how to vibrate during that instant.
If you're asking about how sound works, it's based on how vibrations move through the air. Vibrations exist as waves; they literally are shaking the air in certain patterns that your brain interprets as noise through your ears. If the sound has a higher pitch, the soundwaves are closer to each other, and if it's a lower pitch, they're further away. This is why a computer can "convert a bunch of math into an audio wave", because that's all sound really is: a constantly manipulated wave. That method takes a wavelength (Frequency) and creates a sine wave based on it, converts it to bytes, and feeds it to your speaker with a certain volume (Amplitude) and for a certain duration. Cool stuff right?
Also, you're looking at a "method", not a class. :)
Here's more about sound if you're interested: https://en.wikipedia.org/wiki/Sound#Sound_wave_properties_and_characteristics
This answer has a good overview of how wav files work:
Simply sample the waveform at fixed intervals, and write the amplitude at each interval into your file.
That's what the BW.Write calls are doing. T represents the Time.
In order to play the sound, that data goes after the Hdr section, which is simply the correct header for a standard .wav file. 0X46464952 is ascii for "RIFF" and 0X45564157 is "WAVE". The player needs to know what rate the wave was sampled at. In this case it's 44100, which is a common standard.

Reading Geo tiff Latitude and Longitude [duplicate]

I have acquired Digital Elevation Maps(Height Map of Earth) of some area. My aim was to create Realistic Terrains.
Terrain Generation is no problem. I have practiced that using VC# & XNA framework.
The problem is that those Height Map Files are in GeoTIFF format which i don't know how to read. Nor do i have previous experience with reading any image files so that i could experiment something using little tips-bits available on internet about reading GeoTIFF files. So far i have been unsuccessful.
The geoTIFF files I have are 3601 x 3601 files.
Each file has two version, a decimal & num valued files.
Each file has data of every second of longitude & latitude of
Geo-Coords along with Height Map i.e Lon, Lat, height from sea level
How to read these file :)
The files I have are from ASTER G-DEM Version-2 LINK TO OFFICIAL DESCRIPTION according to them GeoTIFF is pretty standard which is because some GeoTIFF Visualizers I dwonloaded are showing me the correct data.
I am gonna be using C#. I would appreciate if we talk in relation to this language.
E D I T
okay i got the libtiff and this what i have done,
using (Tiff tiff = Tiff.Open(#"Test\N41E071_dem.tif", r))
{
int width = tiff.GetField(TiffTag.IMAGEWIDTH)[0].ToInt();
int height = tiff.GetField(TiffTag.IMAGELENGTH)[0].ToInt();
double dpiX = tiff.GetField(TiffTag.XRESOLUTION)[0].ToDouble();
double dpiY = tiff.GetField(TiffTag.YRESOLUTION)[0].ToDouble();
byte[] scanline = new byte[tiff.ScanlineSize()];
ushort[] scanline16Bit = new ushort[tiff.ScanlineSize() / 2];
for (int i = 0; i < height; i++)
{
tiff.ReadScanline(scanline, i); //Loading ith Line
MultiplyScanLineAs16BitSamples(scanline, scanline16Bit, 16,i);
}
}
private static void MultiplyScanLineAs16BitSamples(byte[] scanline, ushort[] temp, ushort factor,int row)
{
if (scanline.Length % 2 != 0)
{
// each two bytes define one sample so there should be even number of bytes
throw new ArgumentException();
}
Buffer.BlockCopy(scanline, 0, temp, 0, scanline.Length);
for (int i = 0; i < temp.Length; i++)
{
temp[i] *= factor;
MessageBox.Show("Row:"+row.ToString()+"Column:"+(i/2).ToString()+"Value:"+temp[i].ToString());
}
}
where i am displaying the message box, i am displaying the corresponding values, Am i doing it Right, i am asking this cuz this is my maiden experience with images & 8\16 bit problem. I think unlike the official tutorials of libtiff i should be using short instead of ushort because the images i am using are "GeoTIFF, signed 16 bits"
There are some SDKs out there usable from C# to read GeoTIFF files:
http://www.bluemarblegeo.com/global-mapper/developer/developer.php#details (commercial)
http://bitmiracle.com/libtiff/ (free)
http://trac.osgeo.org/gdal/wiki/GdalOgrInCsharp (free?)
UPDATE:
The spec for GeoTIFF can be found here - to me it seems that GeoTIFFs can contain different "subtypes" of information which in turn need to be interpreted appropriately...
Here's a guy that did it without GDAL: http://build-failed.blogspot.com.au/2014/12/processing-geotiff-files-in-net-without.html
GDAL is available in NuGet, though.
If the GeoTIFF contains tiles, you need a different approach. This is how to read a GeoTiff that contains 32bit floats with height data:
int buffersize = 1000000;
using (Tiff tiff = Tiff.Open(geotifffile, "r"))
{
int nooftiles = tiff.GetField(TiffTag.TILEBYTECOUNTS).Length;
int width = tiff.GetField(TiffTag.TILEWIDTH)[0].ToInt();
int height = tiff.GetField(TiffTag.TILELENGTH)[0].ToInt();
byte[] buffer = new byte[buffersize];
for (int i = 0; i < nooftiles; i++)
{
int size = tiff.ReadEncodedTile(i, buffer, 0, buffersize);
float[,] data = new float[width, height];
Buffer.BlockCopy(buffer, 0, data, 0, size); // Convert byte array to x,y array of floats (height data)
// Do whatever you want with the height data (calculate hillshade images etc.)
}
}

Read binary code transmitted via radio in WAV recording

I have some WAV files that were recorded from a radio transmission. It contains information about who has send the transmission and I want to be able to read these information.
The information is transmitted by sending x hz for a 0 and y hz for a 1 ( More about AFSK on Wikipedia)
My problem is: How do I get the binary data out of the wave file? If there are controls for C# would be nice, but some source code for better understanding would be better.
Any ideas?
The WAV file specification is your blueprint for reading the sound data from the WAV file. Sample code for reading and manipulating WAV files can be found in this CodeProject article.
To achieve the tone mapping, you can read this article, which describes how to write software to transfer data between two sound cards. For example, to find out how much of a given frequency is present in a particular segment of the WAV file, you would use a Fourier Transform.
Something like this:
double fourier1(double x_in[], double n, int length) {
double x_complex[2] = { 0, 0 };
int i;
for(i = 0; i < length; i++)
{
x_complex[0] += x_in[i] * cos(M_PI * 2 * i * n / (double) length);
x_complex[1] += x_in[i] * sin(M_PI * 2 * i * n / (double) length);
}
return sqrt(x_complex[0]*x_complex[0] + x_complex[1]*x_complex[1]) / (double) length;
}
Where x_in is a se­ries of num­bers be­tween -1 and 1, and n is the mod­i­fied fre­quency:
(length * fre­quency / rate)

FFT Inaccuracy for C#

Ive been experimenting with the FFT algorithm. I use NAudio along with a working code of the FFT algorithm from the internet. Based on my observations of the performance, the resulting pitch is inaccurate.
What happens is that I have an MIDI (generated from GuitarPro) converted to WAV file (44.1khz, 16-bit, mono) that contains a pitch progression starting from E2 (the lowest guitar note) up to about E6. What results is for the lower notes (around E2-B3) its generally very wrong. But reaching C4 its somewhat correct in that you can already see the proper progression (next note is C#4, then D4, etc.) However, the problem there is that the pitch detected is a half-note lower than the actual pitch (e.g. C4 should be the note but D#4 is displayed).
What do you think may be wrong? I can post the code if necessary. Thanks very much! Im still beginning to grasp the field of DSP.
Edit: Here is a rough scratch of what Im doing
byte[] buffer = new byte[8192];
int bytesRead;
do
{
bytesRead = stream16.Read(buffer, 0, buffer.Length);
} while (bytesRead != 0);
And then: (waveBuffer is simply a class that is there to convert the byte[] into float[] since the function only accepts float[])
public int Read(byte[] buffer, int offset, int bytesRead)
{
int frames = bytesRead / sizeof(float);
float pitch = DetectPitch(waveBuffer.FloatBuffer, frames);
}
And lastly: (Smbpitchfft is the class that has the FFT algo ... i believe theres nothing wrong with it so im not posting it here)
private float DetectPitch(float[] buffer, int inFrames)
{
Func<int, int, float> window = HammingWindow;
if (prevBuffer == null)
{
prevBuffer = new float[inFrames]; //only contains zeroes
}
// double frames since we are combining present and previous buffers
int frames = inFrames * 2;
if (fftBuffer == null)
{
fftBuffer = new float[frames * 2]; // times 2 because it is complex input
}
for (int n = 0; n < frames; n++)
{
if (n < inFrames)
{
fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
}
else
{
fftBuffer[n * 2] = buffer[n - inFrames] * window(n, frames);
fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
}
}
SmbPitchShift.smbFft(fftBuffer, frames, -1);
}
And for interpreting the result:
float binSize = sampleRate / frames;
int minBin = (int)(82.407 / binSize); //lowest E string on the guitar
int maxBin = (int)(1244.508 / binSize); //highest E string on the guitar
float maxIntensity = 0f;
int maxBinIndex = 0;
for (int bin = minBin; bin <= maxBin; bin++)
{
float real = fftBuffer[bin * 2];
float imaginary = fftBuffer[bin * 2 + 1];
float intensity = real * real + imaginary * imaginary;
if (intensity > maxIntensity)
{
maxIntensity = intensity;
maxBinIndex = bin;
}
}
return binSize * maxBinIndex;
UPDATE (if anyone is still interested):
So, one of the answers below stated that the frequency peak from the FFT is not always equivalent to pitch. I understand that. But I wanted to try something for myself if that was the case (on the assumption that there are times in which the frequency peak IS the resulting pitch). So basically, I got 2 softwares (SpectraPLUS and FFTProperties by DewResearch ; credits to them) that is able to display the frequency-domain for the audio signals.
So here are the results of the frequency peaks in the time domain:
SpectraPLUS
and FFT Properties:
This was done using a test note of A2 (around 110Hz). Upon looking at the images, they have frequency peaks around the range of 102-112 Hz for SpectraPLUS and 108 Hz for FFT Properties. On my code, I get 104Hz (I use 8192 blocks and a samplerate of 44.1khz ... 8192 is then doubled to make it complex input so in the end, I get around 5Hz for binsize, as compared to the 10Hz binsize of SpectraPLUS).
So now Im a bit confused, since on the softwares they seem to return the correct result but on my code, I always get 104Hz (note that I have compared the FFT function that I used with others such as Math.Net and it seems to be correct).
Do you think that the problem may be with my interpretation of the data? Or do the softwares do some other thing before displaying the Frequency-Spectrum? Thanks!
It sounds like you may have an interpretation problem with your FFT output. A few random points:
the FFT has a finite resolution - each output bin has a resolution of Fs / N, where Fs is the sample rate and N is the size of the FFT
for notes which are low on the musical scale, the difference in frequency between successive notes is relatively small, so you will need a sufficiently large N to discrimninate between notes which are a semitone apart (see note 1 below)
the first bin (index 0) contains energy centered at 0 Hz but includes energy from +/- Fs / 2N
bin i contains energy centered at i * Fs / N but includes energy from +/- Fs / 2N either side of this center frequency
you will get spectral leakage from adjacent bins - how bad this is depends on what window function you use - no window (== rectangular window) and spectral leakage will be very bad (very broad peaks) - for frequency estimation you want to pick a window function that gives you sharp peaks
pitch is not the same thing as frequency - pitch is a percept, frequency is a physical quantity - the perceived pitch of a musical instrument may be slightly different from the fundamental frequency, depending on the type of instrument (some instruments do not even produce significant energy at their fundamental frequency, yet we still perceive their pitch as if the fundamental were present)
My best guess from the limited information available though is that perhaps you are "off by one" somewhere in your conversion of bin index to frequency, or perhaps your FFT is too small to give you sufficient resolution for the low notes, and you may need to increase N.
You can also improve your pitch estimation via several techniques, such as cepstral analysis, or by looking at the phase component of your FFT output and comparing it for successive FFTs (this allows for a more accurate frequency estimate within a bin for a given FFT size).
Notes
(1) Just to put some numbers on this, E2 is 82.4 Hz, F2 is 87.3 Hz, so you need a resolution somewhat better than 5 Hz to discriminate between the lowest two notes on a guitar (and much finer than this if you actually want to do, say, accurate tuning). At a 44.1 kHz sample then you probably need an FFT of at least N = 8192 to give you sufficient resolution (44100 / 8192 = 5.4 Hz), probably N = 16384 would be better.
I thought this might help you. I made some plots of the 6 open strings of a guitar. The code is in Python using pylab, which I recommend for experimenting:
# analyze distorted guitar notes from
# http://www.freesound.org/packsViewSingle.php?id=643
#
# 329.6 E - open 1st string
# 246.9 B - open 2nd string
# 196.0 G - open 3rd string
# 146.8 D - open 4th string
# 110.0 A - open 5th string
# 82.4 E - open 6th string
from pylab import *
import wave
fs = 44100.0
N = 8192 * 10
t = r_[:N] / fs
f = r_[:N/2+1] * fs / N
gtr_fun = [329.6, 246.9, 196.0, 146.8, 110.0, 82.4]
gtr_wav = [wave.open('dist_gtr_{0}.wav'.format(n),'r') for n in r_[1:7]]
gtr = [fromstring(g.readframes(N), dtype='int16') for g in gtr_wav]
gtr_t = [g / float64(max(abs(g))) for g in gtr]
gtr_f = [2 * abs(rfft(g)) / N for g in gtr_t]
def make_plots():
for n in r_[:len(gtr_t)]:
fig = figure()
fig.subplots_adjust(wspace=0.5, hspace=0.5)
subplot2grid((2,2), (0,0))
plot(t, gtr_t[n]); axis('tight')
title('String ' + str(n+1) + ' Waveform')
subplot2grid((2,2), (0,1))
plot(f, gtr_f[n]); axis('tight')
title('String ' + str(n+1) + ' DFT')
subplot2grid((2,2), (1,0), colspan=2)
M = int(gtr_fun[n] * 16.5 / fs * N)
plot(f[:M], gtr_f[n][:M]); axis('tight')
title('String ' + str(n+1) + ' DFT (16 Harmonics)')
if __name__ == '__main__':
make_plots()
show()
String 1, fundamental = 329.6 Hz:
String 2, fundamental = 246.9 Hz:
String 3, fundamental = 196.0 Hz:
String 4, fundamental = 146.8 Hz:
String 5, fundamental = 110.0 Hz:
String 6, fundamental = 82.4 Hz:
The fundamental frequency isn't always the dominant harmonic. It determines the spacing between harmonics of a periodic signal.
I had a similar question and the answer for me was to use Goertzel instead of FFT. If you know what tones you are looking for (MIDI) Goertzel is capable of detecting the tones to within one sinus wave (one cycle). It does this by generating the sinus wave of the sound and "placing it on top of the raw data" to see if it exist. FFT samples large amounts of data to provide an aproximate frequency spectrum.
Musical pitch is different from frequency peak. Pitch is a psycho-perceptual phenomena that may depend more on the overtones and such. The frequency of what a human would call the pitch could be missing or quite small in the actual signal spectra.
And a frequency peak in a spectrum can be different from any FFT bin center. The FFT bin center frequencies will change in frequency and spacing depending only on the FFT length and sample rate, not the spectra in the data.
So you have at least 2 problems with which to contend. There are a ton of academic papers on frequency estimation as well as the separate subject of pitch estimation. Start there.

Categories