Accessing processed values from FFT - c#

I am attempting to use Lomont FFT in order to return complex numbers to build a spectrogram / spectral density chart using c#.
I am having trouble understanding how to return values from the class.
Here is the code I have put together thus far which appears to be working.
int read = 0;
Double[] data;
byte[] buffer = new byte[1024];
FileStream wave = new FileStream(args[0], FileMode.Open, FileAccess.Read);
read = wave.Read(buffer, 0, 44);
read = wave.Read(buffer, 0, 1024);
data = new Double[read];
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
Console.WriteLine(data[i]);
}
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(data, true);
What I am not clear on is, how to return/access the values from Lomont FFT implementation back into my application (console)?
Being pretty new to c# development, I'm thinking I am perhaps missing a fundamental aspect of understanding regarding how to retrieve processed values from the instance of the Lomont Class, or perhaps even calling it incorrectly.
Console.WriteLine(LFFT.A); // Returns 0
Console.WriteLine(LFFT.B); // Returns 1
I have been searching for a code snippet or explanation of how to do this, but so far have come up with nothing that I understand or explains this particular aspect of the issue I am facing. Any guidance would be greatly appreciated.
A subset of the results held in data array noted in the code above can be found below and based on my current understanding, appear to be valid:
0.00531005859375
0.0238037109375
0.041473388671875
0.0576171875
0.07183837890625
0.083465576171875
0.092193603515625
0.097625732421875
0.099639892578125
0.098114013671875
0.0931396484375
0.0848388671875
0.07354736328125
0.05963134765625
0.043609619140625
0.026031494140625
0.007476806640625
-0.011260986328125
-0.0296630859375
-0.047027587890625
-0.062713623046875
-0.076141357421875
-0.086883544921875
-0.09454345703125
-0.098785400390625
-0.0994873046875
-0.0966796875
-0.090362548828125
-0.080810546875
-0.06842041015625
-0.05352783203125
-0.036712646484375
-0.0185546875
What am I actually attempting to do? (perspective)
I am looking to load a wave file into a console application and return a spectrogram/spectral density chart/image as a jpg/png for further processing.
The wave files I am reading are mono in format
UPDATE 1
I Receive slightly different results depending on which FFT is used.
Using RealFFT
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
//Console.WriteLine(data[i]);
}
LomontFFT LFFT = new LomontFFT();
LFFT.RealFFT(data, true);
for (int i = 0; i < buffer.Length / 2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(data[2 * i] * data[2 * i] + data[2 * i + 1] * data[2 * i + 1]));
}
Partial Result of RealFFT
0.314566983321381
0.625242818210924
0.30314888696868
0.118468857708093
0.0587697011760449
0.0369034115568654
0.0265842582236275
0.0207195964060356
0.0169601273233317
0.0143745438577886
0.012528799609089
0.0111831275153128
0.0102313284519146
0.00960198279358434
0.00920236001619566
Using FFT
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
//Console.WriteLine(data[i]);
}
double[] bufferB = new double[2 * data.Length];
for (int i = 0; i < data.Length; i++)
{
bufferB[2 * i] = data[i];
bufferB[2 * i + 1] = 0;
}
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(bufferB, true);
for (int i = 0; i < bufferB.Length / 2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(bufferB[2 * i] * bufferB[2 * i] + bufferB[2 * i + 1] * bufferB[2 * i + 1]));
}
Partial Result of FFT:
0.31456698332138
0.625242818210923
0.303148886968679
0.118468857708092
0.0587697011760447
0.0369034115568653
0.0265842582236274
0.0207195964060355
0.0169601273233317
0.0143745438577886
0.012528799609089
0.0111831275153127
0.0102313284519146
0.00960198279358439
0.00920236001619564

Looking at the LomontFFT.FFT documentation:
Compute the forward or inverse Fourier Transform of data, with
data containing complex valued data as alternating real and
imaginary parts. The length must be a power of 2. The data is
modified in place.
This tells us a few things. First the function is expecting complex-valued data whereas your data is real. A quick fix for this is to create another buffer of twice the size and setting all the imaginary parts to 0:
double[] buffer = new double[2*data.Length];
for (int i=0; i<data.Length; i++)
{
buffer[2*i] = data[i];
buffer[2*i+1] = 0;
}
The documentation also tells us that the computation is done in place. That means that after the call to FFT returns, the input array is replaced with the computed result. You could thus print the spectrum with:
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(buffer, true);
for (int i = 0; i < buffer.Length/2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(buffer[2*i]*buffer[2*i]+buffer[2*i+1]*buffer[2*i+1]));
}
Note since your input data is real valued you could also use LomontFFT.RealFFT. In that case, given a slightly different packing rule, you would obtain the FFT results using:
LomontFFT LFFT = new LomontFFT();
LFFT.RealFFT(data, true);
System.Console.WriteLine("{0}", Math.Abs(data[0]);
for (int i = 1; i < data.Length/2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(data[2*i]*data[2*i]+data[2*i+1]*data[2*i+1]));
}
System.Console.WriteLine("{0}", Math.Abs(data[1]);
This would give you the non-redundant lower half of the spectrum (Unlike LomontFFT.FFT which provides the entire spectrum). Also, numerical differences on the order of double precision (around 1e-16 times the spectrum peak value) with respect to LomontFFT.FFT can be expected.

Related

Average loss stuck on local minima (Self developed Perceptron in C#)

I am trying to build a custom machine learning library in C# , I have researched fairly about the topic. My first example (XOR estimator) was a success, I was able to lower the average loss to almost cero. Then I tried to build a model to classify handwritten digits (using MNIST text database).The problem is ,no matter how I configure the model, I always get stuck on a certain average loss over the data set.Second problem, because the MNIST dataset is very large, the model takes a lot of time to compute, maybe I can use some advice on how to carry on the slowest parts of the algorithm (I am using stochastic gradient descent).I am going to show the main method that performs most of the work.
I have tried using MSE and CrossEntropy loss functions, also tanh , sigmoid , reLu and softPlus activation functions. The model I am trying to build is a 4 layer one. First layer, 784 input neurons ; Second , 16 neurons,sigmoid ; Third , 16 neurons , sigmoid and Output layer, 10 neurons (one hot encoded digits) with sigmoid. I am aware that the code below may not be a minimal reproducible example, but it represents the algorithm I am trying to figure. I have also uploaded the solution to GitHub, maybe somebody can give me a hand figuring the problem. This is the link https://github.com/juan-carvajal/MachineLearningFramework
Running the Main method of the app will first, execute the XOR classifier, that runs good. Then the MNIST classifier.
The model is best represented here:
DataSet dataSet = new DataSet("mnist2.txt", ' ', 10, false);
//This creates a model with batching=128 , learningRate=0.5 and
//CrossEntropy loss function
var p = new Perceptron(128, 0.5, ErrorFunction.CrossEntropy())
.Layer(784, ActivationFunction.Sigmoid())
.Layer(16, ActivationFunction.Sigmoid())
.Layer(16, ActivationFunction.Sigmoid())
.Layer(10, ActivationFunction.Sigmoid());
//1000 is the number of epochs
p.Train2(dataSet, 1000);
Actual Algorithm (stochastic gradiente descent):
Console.WriteLine("Initial Loss:"+ CalculateMeanErrorOverDataSet(dataSet));
for (int i = 0; i < epochs; i++)
{
//Shuffle the data in every step
dataSet.Shuffle();
List<DataRow> batch = dataSet.NextBatch(this.Batching);
//Gets random batch from the dataSet
int count = 0;
foreach (DataRow example in batch)
{
count++;
double[] result = this.FeedForward(example.GetFeatures());
double[] labels = example.GetLabels();
if (result.Length != labels.Length)
{
throw new Exception("Inconsistent array size, Incorrect implementation.");
}
else
{
//What follows is the calculation of the gradient for this example, every example affects the current gradient, then all those changes are averaged an every parameter is updated.
double error = CalculateExampleLost(example);
for (int l = this.Layers.Count - 1; l > 0; l--)
{
if (l == this.Layers.Count - 1)
{
for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
{
this.Layers[l].CostDerivatives[j] = ErrorFunction.GetDerivativeValue(labels[j], this.Layers[l].Activations[j]);
}
}
else
{
for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
{
double acum = 0;
for (int j2 = 0; j2 < Layers[l + 1].Size; j2++)
{
acum += Layers[l + 1].WeightMatrix[j2, j] * this.Layers[l+1].ActivationFunction.GetDerivativeValue(Layers[l + 1].WeightedSum[j2]) * Layers[l + 1].CostDerivatives[j2];
}
this.Layers[l].CostDerivatives[j] = acum;
}
}
for (int j = 0; j < this.Layers[l].Activations.Length; j++)
{
this.Layers[l].BiasVectorChangeRecord[j] += this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j]) * Layers[l].CostDerivatives[j];
for (int k = 0; k < Layers[l].WeightMatrix.GetLength(1); k++)
{
this.Layers[l].WeightMatrixChangeRecord[j, k] += Layers[l - 1].Activations[k]
* this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j])
* Layers[l].CostDerivatives[j];
}
}
}
}
}
TakeGradientDescentStep(batch.Count);
if ((i + 1) % (epochs / 10) == 0)
{
Console.WriteLine("Epoch " + (i + 1) + ", Avg.Loss:" + CalculateMeanErrorOverDataSet(dataSet));
}
}
This is an example of what the local minima looks like in the current model.
In my research I found out that similar models may archieve accuracy up to 90%. My model barely got 10%.

Tensorflowsharp results getvalue() is very slow

I am using TensorflowSharp to run evaluations using a neural network on an Android phone. I am building the project with Unity.
I am using the tensorflowsharp unity plugin listed under the requirements here: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Using-TensorFlow-Sharp-in-Unity.md.
Everything is working, however extracting the result is very slow.
The network I am running is an autoencoder and the output is an image with dimensions of 128x128x16 (yes there is a lot of output channels).
The evaluation is done in ~ 0.2 seconds which is acceptable. However when i need to extract the result data using results[0].GetValue() it is VERY slow.
This is my code where i run the neural network
var runner = session.GetRunner();
runner.AddInput(graph[INPUT_NAME][0], tensor).Fetch(graph[OUTPUT_NAME][0]);
var results = runner.Run();
float[,,,] heatmaps = results[0].GetValue() as float[,,,]; // <- this is SLOW
The problem:
The last line where i convert the result to floats is taking ~1.2 seconds.
Can it realy be true that reading the result data into a float array is taking more than 5 times as long as the actual evaluation of the network?
Is there another way to extract the result values?
So I have found a solution to this. I still do not know why the GetValue() call is so slow, but I found another way to retrieve the data.
I chose to manually read the raw tensor data available at results[0].Data
I created a small function to handle this as a drop in for GetValue, (Here just with the dimensions i am expecting hardcoded)
private float[,,,] TensorToFLoats(TFTensor tensor)
{
IntPtr resData = tensor.Data;
UIntPtr dataSize = tensor.TensorByteSize;
byte[] s_ImageBuffer = new byte[(int)dataSize];
System.Runtime.InteropServices.Marshal.Copy(resData, s_ImageBuffer, 0, (int)dataSize);
int floatsLength = s_ImageBuffer.Length / 4;
float[] floats = new float[floatsLength];
for (int n = 0; n < s_ImageBuffer.Length; n += 4)
{
floats[n / 4] = BitConverter.ToSingle(s_ImageBuffer, n);
}
float[,,,] result = new float[1, 128, 128, 16];
int i = 0;
for (int y = 0; y < 128; y++)
{
for (int x = 0; x < 128; x++)
{
for (int p = 0; p < 16; p++)
{
result[0, y, x, p] = floats[i++];
}
}
}
return result;
}
Given this i can replace the code in my question with the following
var runner = session.GetRunner();
runner.AddInput(graph[INPUT_NAME][0], tensor).Fetch(graph[OUTPUT_NAME][0]);
var results = runner.Run();
float[,,,] heatmaps = TensorToFLoats(results[0]);
This is insanely much faster. Where GetValue took ~1 second the TensorToFloats function i created got the same data in ~0.02 seconds

Getting a neural network to output anything inbetween -1.0 and 1.0

I am trying to make a back propagation neural network.
Based upon the the tutorials i found here : MSDN article by James McCaffrey. He gives many examples but all his networks are based upon the same problem to solve. So his networks look like 4:7:3 >> 4input - 7hidden - 3output.
His output is always binary 0 or 1, one output gets a 1, to classify an Irish flower, into one of the three categories.
I would like to solve another problem with a neural network and that would require me 2 neural networks where one needs an output inbetween 0..255 and another inbetween 0 and 2times Pi. (a full turn, circle). Well essentially i think i need an output that range from 0.0 to 1.0 or from -1 to 1 and anything in between, so that i can multiply it to becomme 0..255 or 0..2Pi
I think his network does behave, like it does because of his computeOutputs
Which I show below here :
private double[] ComputeOutputs(double[] xValues)
{
if (xValues.Length != numInput)
throw new Exception("Bad xValues array length");
double[] hSums = new double[numHidden]; // hidden nodes sums scratch array
double[] oSums = new double[numOutput]; // output nodes sums
for (int i = 0; i < xValues.Length; ++i) // copy x-values to inputs
this.inputs[i] = xValues[i];
for (int j = 0; j < numHidden; ++j) // compute i-h sum of weights * inputs
for (int i = 0; i < numInput; ++i)
hSums[j] += this.inputs[i] * this.ihWeights[i][j]; // note +=
for (int i = 0; i < numHidden; ++i) // add biases to input-to-hidden sums
hSums[i] += this.hBiases[i];
for (int i = 0; i < numHidden; ++i) // apply activation
this.hOutputs[i] = HyperTanFunction(hSums[i]); // hard-coded
for (int j = 0; j < numOutput; ++j) // compute h-o sum of weights * hOutputs
for (int i = 0; i < numHidden; ++i)
oSums[j] += hOutputs[i] * hoWeights[i][j];
for (int i = 0; i < numOutput; ++i) // add biases to input-to-hidden sums
oSums[i] += oBiases[i];
double[] softOut = Softmax(oSums); // softmax activation does all outputs at once for efficiency
Array.Copy(softOut, outputs, softOut.Length);
double[] retResult = new double[numOutput]; // could define a GetOutputs method instead
Array.Copy(this.outputs, retResult, retResult.Length);
return retResult;
The network uses the folowing hyperTan function
private static double HyperTanFunction(double x)
{
if (x < -20.0) return -1.0; // approximation is correct to 30 decimals
else if (x > 20.0) return 1.0;
else return Math.Tanh(x);
}
In above a function makes for the output layer use of Softmax() and it is i think critical to problem here. In that I think it makes his output all binary, and it looks like this :
private static double[] Softmax(double[] oSums)
{
// determine max output sum
// does all output nodes at once so scale doesn't have to be re-computed each time
double max = oSums[0];
for (int i = 0; i < oSums.Length; ++i)
if (oSums[i] > max) max = oSums[i];
// determine scaling factor -- sum of exp(each val - max)
double scale = 0.0;
for (int i = 0; i < oSums.Length; ++i)
scale += Math.Exp(oSums[i] - max);
double[] result = new double[oSums.Length];
for (int i = 0; i < oSums.Length; ++i)
result[i] = Math.Exp(oSums[i] - max) / scale;
return result; // now scaled so that xi sum to 1.0
}
How to rewrite softmax ?
So the network will be able to give non binary answers ?
Notice the full code of the network is here. if you would like to try it out.
Also as to test the network the following accuracy function is used, maybe the binary behaviour emerges from it
public double Accuracy(double[][] testData)
{
// percentage correct using winner-takes all
int numCorrect = 0;
int numWrong = 0;
double[] xValues = new double[numInput]; // inputs
double[] tValues = new double[numOutput]; // targets
double[] yValues; // computed Y
for (int i = 0; i < testData.Length; ++i)
{
Array.Copy(testData[i], xValues, numInput); // parse test data into x-values and t-values
Array.Copy(testData[i], numInput, tValues, 0, numOutput);
yValues = this.ComputeOutputs(xValues);
int maxIndex = MaxIndex(yValues); // which cell in yValues has largest value?
int tMaxIndex = MaxIndex(tValues);
if (maxIndex == tMaxIndex)
++numCorrect;
else
++numWrong;
}
return (numCorrect * 1.0) / (double)testData.Length;
}
Just in case that someone gets into the same situation.
If you need some example code of a neural network regression
(a NNR) That's how they are called.
Here is link to sample code in C#, and here is a good article about it. Notice the guy writes more articles there, you wont find everything but there's a lot there. Despite I was following this man for a while I missed this specific article as I didn't know how they where called, when I asked the question here on stack overflow.
I'm a bit rusty at Neural Netowrks but I think, if you want to have a range of values from your output then you need to make sure your activation functions on your output layer are linear (or something that has a similar effect).
Try adding this method:
private static double[] Linear(double[] oSums)
{
double sum = oSums.Sum(d => Math.Abs(d));
double[] result = new double[oSums.Length];
for (int i = 0; i < oSums.Length; ++i)
result[i] = Math.Abs(oSums[i]) / sum;
// scaled so that xi sum to 1.0
return result;
}
And then in the ComputeOutputs method you need to use this new activation function for the output (rather than Softmax):
...
//double[] softOut = Softmax(oSums); // all outputs at once for efficiency
double[] softOut = Linear(oSums); // all outputs at once for efficiency
Array.Copy(softOut, outputs, softOut.Length);
...
This should now output linear values.

EmguCv: Reduce the grayscales

Is there a way to reduce the grayscales of an gray-image in openCv?
Normaly i have grayvalues from 0 to 256 for an
Image<Gray, byte> inputImage.
In my case i just need grayvalues from 0-10. Is there i good way to do that with OpenCV, especially for C# ?
There's nothing built-in on OpenCV that allows this sort of thing.
Nevertheless, you can write something yourself. Take a look at this C++ implementation and just translate it to C#:
void colorReduce(cv::Mat& image, int div=64)
{
int nl = image.rows; // number of lines
int nc = image.cols * image.channels(); // number of elements per line
for (int j = 0; j < nl; j++)
{
// get the address of row j
uchar* data = image.ptr<uchar>(j);
for (int i = 0; i < nc; i++)
{
// process each pixel
data[i] = data[i] / div * div + div / 2;
}
}
}
Just send a grayscale Mat to this function and play with the div parameter.

AccessViolationException with sound buffer conversion

I'm using Naudio AsioOut object to pass data from input buffer to my delayProc() function and then to output buffer.
The delayProc() needs float[] buffer type, and this is possible using e.GetAsInterleavedSamples(). The problem is I need to re-convert it to a multidimensional IntPtr, to do this I'm using AsioSampleConvertor class.
When I try to apply the effect it shows me an error: AccessViolationException on the code of AsioSampleConvertor class.
So I think the problem is due to the conversion from float[] to IntPtr[]..
I give you some code:
OnAudioAvailable()
floatIn = new float[e.SamplesPerBuffer * e.InputBuffers.Length];//*2
e.GetAsInterleavedSamples(floatIn);
floatOut = delayProc(floatIn, e.SamplesPerBuffer * e.InputBuffers.Length, 1.5f);
//conversione da float[] a IntPtr[L][R]
Outp = Marshal.AllocHGlobal(sizeof(float)*floatOut.Length);
Marshal.Copy(floatOut, 0, Outp, floatOut.Length);
NAudio.Wave.Asio.ASIOSampleConvertor.ConvertorFloatToInt2Channels(Outp, e.OutputBuffers, e.InputBuffers.Length, floatOut.Length);
delayProc()
private float[] delayProc(float[] sourceBuffer, int sampleCount, float delay)
{
if (OldBuf == null)
{
OldBuf = new float[sampleCount];
}
float[] BufDly = new float[(int)(sampleCount * delay)];
int delayLength = (int)(BufDly.Length - (BufDly.Length / delay));
for (int j = sampleCount - delayLength; j < sampleCount; j++)
for (int i = 0; i < delayLength; i++)
BufDly[i] = OldBuf[j];
for (int j = 0; j < sampleCount; j++)
for (int i = delayLength; i < BufDly.Length; i++)
BufDly[i] = sourceBuffer[j];
for (int i = 0; i < sampleCount; i++)
OldBuf[i] = sourceBuffer[i];
return BufDly;
}
AsioSampleConvertor
public static void ConvertorFloatToInt2Channels(IntPtr inputInterleavedBuffer, IntPtr[] asioOutputBuffers, int nbChannels, int nbSamples)
{
unsafe
{
float* inputSamples = (float*)inputInterleavedBuffer;
int* leftSamples = (int*)asioOutputBuffers[0];
int* rightSamples = (int*)asioOutputBuffers[1];
for (int i = 0; i < nbSamples; i++)
{
*leftSamples++ = clampToInt(inputSamples[0]);
*rightSamples++ = clampToInt(inputSamples[1]);
inputSamples += 2;
}
}
}
ClampToInt()
private static int clampToInt(double sampleValue)
{
sampleValue = (sampleValue < -1.0) ? -1.0 : (sampleValue > 1.0) ? 1.0 : sampleValue;
return (int)(sampleValue * 2147483647.0);
}
If you need some other code, just ask me.
When you call ConvertorFloatToInt2Channels you are passing in the total number of samples across all channels, then trying to read that many pairs of samples. So you are trying to read twice as many samples from your input buffer as are actually there. Using unsafe code you are trying to address well past the end of the allocated block, which results in the access violation you are getting.
Change the for loop in your ConvertorFloatToInt2Channels method to read:
for (int i = 0; i < nbSamples; i += 2)
This will stop your code from trying to read double the number of items actually present in the source memory block.
Incidentally, why are you messing around with allocating global memory and using unsafe code here? Why not process them as managed arrays? Processing the data itself isn't much slower, and you save on all the overheads of copying data to and from unmanaged memory.
Try this:
public static void FloatMonoToIntStereo(float[] samples, float[] leftChannel, float[] rightChannel)
{
for (int i = 0, j = 0; i < samples.Length; i += 2, j++)
{
leftChannel[j] = (int)(samples[i] * Int32.MaxValue);
rightChannel[j] = (int)(samples[i + 1] * Int32.MaxValue);
}
}
On my machine that processes around 12 million samples per second, converting the samples to integer and splitting the channels. About half that speed if I allocate the buffers for every set of results. About half again when I write that to use unsafe code, AllocHGlobal etc.
Never assume that unsafe code is faster.

Categories