I am trying to build a custom machine learning library in C# , I have researched fairly about the topic. My first example (XOR estimator) was a success, I was able to lower the average loss to almost cero. Then I tried to build a model to classify handwritten digits (using MNIST text database).The problem is ,no matter how I configure the model, I always get stuck on a certain average loss over the data set.Second problem, because the MNIST dataset is very large, the model takes a lot of time to compute, maybe I can use some advice on how to carry on the slowest parts of the algorithm (I am using stochastic gradient descent).I am going to show the main method that performs most of the work.
I have tried using MSE and CrossEntropy loss functions, also tanh , sigmoid , reLu and softPlus activation functions. The model I am trying to build is a 4 layer one. First layer, 784 input neurons ; Second , 16 neurons,sigmoid ; Third , 16 neurons , sigmoid and Output layer, 10 neurons (one hot encoded digits) with sigmoid. I am aware that the code below may not be a minimal reproducible example, but it represents the algorithm I am trying to figure. I have also uploaded the solution to GitHub, maybe somebody can give me a hand figuring the problem. This is the link https://github.com/juan-carvajal/MachineLearningFramework
Running the Main method of the app will first, execute the XOR classifier, that runs good. Then the MNIST classifier.
The model is best represented here:
DataSet dataSet = new DataSet("mnist2.txt", ' ', 10, false);
//This creates a model with batching=128 , learningRate=0.5 and
//CrossEntropy loss function
var p = new Perceptron(128, 0.5, ErrorFunction.CrossEntropy())
.Layer(784, ActivationFunction.Sigmoid())
.Layer(16, ActivationFunction.Sigmoid())
.Layer(16, ActivationFunction.Sigmoid())
.Layer(10, ActivationFunction.Sigmoid());
//1000 is the number of epochs
p.Train2(dataSet, 1000);
Actual Algorithm (stochastic gradiente descent):
Console.WriteLine("Initial Loss:"+ CalculateMeanErrorOverDataSet(dataSet));
for (int i = 0; i < epochs; i++)
{
//Shuffle the data in every step
dataSet.Shuffle();
List<DataRow> batch = dataSet.NextBatch(this.Batching);
//Gets random batch from the dataSet
int count = 0;
foreach (DataRow example in batch)
{
count++;
double[] result = this.FeedForward(example.GetFeatures());
double[] labels = example.GetLabels();
if (result.Length != labels.Length)
{
throw new Exception("Inconsistent array size, Incorrect implementation.");
}
else
{
//What follows is the calculation of the gradient for this example, every example affects the current gradient, then all those changes are averaged an every parameter is updated.
double error = CalculateExampleLost(example);
for (int l = this.Layers.Count - 1; l > 0; l--)
{
if (l == this.Layers.Count - 1)
{
for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
{
this.Layers[l].CostDerivatives[j] = ErrorFunction.GetDerivativeValue(labels[j], this.Layers[l].Activations[j]);
}
}
else
{
for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
{
double acum = 0;
for (int j2 = 0; j2 < Layers[l + 1].Size; j2++)
{
acum += Layers[l + 1].WeightMatrix[j2, j] * this.Layers[l+1].ActivationFunction.GetDerivativeValue(Layers[l + 1].WeightedSum[j2]) * Layers[l + 1].CostDerivatives[j2];
}
this.Layers[l].CostDerivatives[j] = acum;
}
}
for (int j = 0; j < this.Layers[l].Activations.Length; j++)
{
this.Layers[l].BiasVectorChangeRecord[j] += this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j]) * Layers[l].CostDerivatives[j];
for (int k = 0; k < Layers[l].WeightMatrix.GetLength(1); k++)
{
this.Layers[l].WeightMatrixChangeRecord[j, k] += Layers[l - 1].Activations[k]
* this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j])
* Layers[l].CostDerivatives[j];
}
}
}
}
}
TakeGradientDescentStep(batch.Count);
if ((i + 1) % (epochs / 10) == 0)
{
Console.WriteLine("Epoch " + (i + 1) + ", Avg.Loss:" + CalculateMeanErrorOverDataSet(dataSet));
}
}
This is an example of what the local minima looks like in the current model.
In my research I found out that similar models may archieve accuracy up to 90%. My model barely got 10%.
Related
I'm trying to tackle the classic handwritten digit recognition problem with a feed forward neural network and backpropagation, using the MNIST dataset. I'm using Michael Nielsen's book to learn the essentials and 3Blue1Brown's youtube video for the backpropagation algorithm.
I finished writing it some time ago and been debugging since, because the results are quite bad. At its best the network can recognize ~4000/10000 samples after 1 epoch and that number only drops on the following epochs, which lead me to believe there's some issue with the backpropagation algorithm. I've been drowning in index hell trying to debug this for the last few days and can't figure out where the issue is, I'd appreciate any help in pointing it out.
A bit of background: 1) I'm not using any matrix multiplication and no external frameworks, but doing everything with for loops because that's how I learned it from the video. 2) Unlike the book, I'm storing both weights and biases in the same array. The biases for every layer are a column at the end of the weight matrix for that layer.
And finally for the code, this is the Backpropagate method of the NeuralNetwork class, which is called in UpdateMiniBatch, which itself is called in SGD:
/// <summary>
/// Returns the partial derivative of the cost function on one sample with respect to every weight in the network.
/// </summary>
public List<double[,]> Backpropagate(ITrainingSample sample)
{
// Forwards pass
var (weightedInputs, activations) = GetWeightedInputsAndActivations(sample.Input);
// The derivative with respect to the activation of the last layer is simple to compute: activation - expectedActivation
var errors = activations.Last().Select((a, i) => a - sample.Output[i]).ToArray();
// Backwards pass
List<double[,]> delCostOverDelWeights = Weights.Select(x => new double[x.GetLength(0), x.GetLength(1)]).ToList();
List<double[]> delCostOverDelActivations = Weights.Select(x => new double[x.GetLength(0)]).ToList();
delCostOverDelActivations[delCostOverDelActivations.Count - 1] = errors;
// Comment notation:
// Cost function: C
// Weight connecting the i-th neuron on the (l + 1)-th layer to the j-th neuron on the l-th layer: w[l][i, j]
// Bias of the i-th neuron on the (l + 1)-th layer: b[l][i]
// Activation of the i-th neuon on the l-th layer: a[l][i]
// Weighted input of the i-th neuron on the l-th layer: z[l][i] // which doesn't make sense on layer 0, but is left for index convenience
// Notice that weights, biases, delCostOverDelWeights and delCostOverDelActivation all start at layer 1 (the 0-th layer is irrelevant to their meanings) while activations and weightedInputs strat at the 0-th layer
for (int l = Weights.Count - 1; l >= 0; l--)
{
//Calculate ∂C/∂w for the current layer:
for (int i = 0; i < Weights[l].GetLength(0); i++)
for (int j = 0; j < Weights[l].GetLength(1); j++)
delCostOverDelWeights[l][i, j] = // ∂C/∂w[l][i, j]
delCostOverDelActivations[l][i] * // ∂C/∂a[l + 1][i]
SigmoidPrime(weightedInputs[l + 1][i]) * // ∂a[l + 1][i]/∂z[l + 1][i] = ∂(σ(z[l + 1][i]))/∂z[l + 1][i] = σ′(z[l + 1][i])
(j < Weights[l].GetLength(1) - 1 ? activations[l][j] : 1); // ∂z[l + 1][i]/∂w[l][i, j] = a[l][j] ||OR|| ∂z[l + 1][i]/∂b[l][i] = 1
// Calculate ∂C/∂a for the previous layer(a[l]):
if (l != 0)
for (int i = 0; i < Weights[l - 1].GetLength(0); i++)
for (int j = 0; j < Weights[l].GetLength(0); j++)
delCostOverDelActivations[l - 1][i] += // ∂C/∂a[l][i] = sum over j:
delCostOverDelActivations[l][j] * // ∂C/∂a[l + 1][j]
SigmoidPrime(weightedInputs[l + 1][j]) * // ∂a[l + 1][j]/∂z[l + 1][j] = ∂(σ(z[l + 1][j]))/∂z[l + 1][j] = σ′(z[l + 1][j])
Weights[l][j, i]; // ∂z[l + 1][j]/∂a[l][i] = w[l][j, i]
}
return delCostOverDelWeights;
}
GetWeightedInputsAndActivations:
public (List<double[]>, List<double[]>) GetWeightedInputsAndActivations(double[] input)
{
List<double[]> activations = new List<double[]>() { input }.Concat(Weights.Select(x => new double[x.GetLength(0)])).ToList();
List<double[]> weightedInputs = activations.Select(x => new double[x.Length]).ToList();
for (int l = 0; l < Weights.Count; l++)
for (int i = 0; i < Weights[l].GetLength(0); i++)
{
double value = 0;
for (int j = 0; j < Weights[l].GetLength(1) - 1; j++)
value += Weights[l][i, j] * activations[l][j];// weights
weightedInputs[l + 1][i] = value + Weights[l][i, Weights[l].GetLength(1) - 1];// bias
activations[l + 1][i] = Sigmoid(weightedInputs[l + 1][i]);
}
return (weightedInputs, activations);
}
The entire NeuralNetwork as well as everything else can be found here.
EDIT: after many significant changes to the repo the above link might no longer be functional, but should hopefully be irrelevant considering the answer. For completeness' sake this is a functional link to the changed repository.
Fixed. The issue was: I didn't divide the pixel inputs by 255. Everything else seems to work correctly, and I'm now getting +9000/10000 on the first epoch.
I work in an IBM emulator environment. I'm trying to grab a row of text and split by spaces to return each word in the row. However, when printed to the console it's showing only first letters. This code has worked in the past when utilized on raw text or api results. The same code when implemented on an IBM Emulator does not however.
For example if the row from the emulator is HEALTHSELECT OF TEXAS , the code below would do the first and second console print correctly. With the double for-loop printing only the first letters. The real listed console output is below:
looking for healthselect of texas in away from home care
healthselect
of
texas
looking for h in a
looking for h in w
looking for h in a
looking for h in y
looking for e in a
looking for e in w
looking for e in a
looking for e in y
looking for a in a
looking for a in w
looking for a in a
looking for a in y
C# code:
public bool inStringAll(string needle, string haystack)
{
needle = Regex.Replace(needle.ToLower(), #"\r\n", string.Empty);
haystack = Regex.Replace(haystack.ToLower(), #"\r\n", string.Empty);
Console.WriteLine($"looking for {needle} in {haystack}");
string[] needleArr = needle.Split(' ');
string[] haystackArr = haystack.Split(' ');
for(int j = 0; j < needleArr.Length; j++)
Console.WriteLine(needleArr[j]);
int percent = 0;
int increment;
bool decision = false;
if (needleArr.Length > 3)
increment = 100 / 3;
else increment = 100 / needleArr.Length;
for (int i = 0; i < 3; i++)
{
for (int j = 0; j < haystackArr.Length; j++)
{
Console.WriteLine("looking for " + needle[i] + " in " + haystack[j]);
if (needleArr[i] == haystackArr[j])
{
percent += increment;
}
}
}
Console.WriteLine($"Percent: {percent}%");
if (percent >= 50)
decision = true;
return decision;
}
Has anyone worked with an emulator of this kind and had any issues getting these values correctly? I've worked with old attachmate systems using EXTRA! Basic and there was no issues, but this is my first client with IBM systems using C#.net so it may be different.
I am trying to make a back propagation neural network.
Based upon the the tutorials i found here : MSDN article by James McCaffrey. He gives many examples but all his networks are based upon the same problem to solve. So his networks look like 4:7:3 >> 4input - 7hidden - 3output.
His output is always binary 0 or 1, one output gets a 1, to classify an Irish flower, into one of the three categories.
I would like to solve another problem with a neural network and that would require me 2 neural networks where one needs an output inbetween 0..255 and another inbetween 0 and 2times Pi. (a full turn, circle). Well essentially i think i need an output that range from 0.0 to 1.0 or from -1 to 1 and anything in between, so that i can multiply it to becomme 0..255 or 0..2Pi
I think his network does behave, like it does because of his computeOutputs
Which I show below here :
private double[] ComputeOutputs(double[] xValues)
{
if (xValues.Length != numInput)
throw new Exception("Bad xValues array length");
double[] hSums = new double[numHidden]; // hidden nodes sums scratch array
double[] oSums = new double[numOutput]; // output nodes sums
for (int i = 0; i < xValues.Length; ++i) // copy x-values to inputs
this.inputs[i] = xValues[i];
for (int j = 0; j < numHidden; ++j) // compute i-h sum of weights * inputs
for (int i = 0; i < numInput; ++i)
hSums[j] += this.inputs[i] * this.ihWeights[i][j]; // note +=
for (int i = 0; i < numHidden; ++i) // add biases to input-to-hidden sums
hSums[i] += this.hBiases[i];
for (int i = 0; i < numHidden; ++i) // apply activation
this.hOutputs[i] = HyperTanFunction(hSums[i]); // hard-coded
for (int j = 0; j < numOutput; ++j) // compute h-o sum of weights * hOutputs
for (int i = 0; i < numHidden; ++i)
oSums[j] += hOutputs[i] * hoWeights[i][j];
for (int i = 0; i < numOutput; ++i) // add biases to input-to-hidden sums
oSums[i] += oBiases[i];
double[] softOut = Softmax(oSums); // softmax activation does all outputs at once for efficiency
Array.Copy(softOut, outputs, softOut.Length);
double[] retResult = new double[numOutput]; // could define a GetOutputs method instead
Array.Copy(this.outputs, retResult, retResult.Length);
return retResult;
The network uses the folowing hyperTan function
private static double HyperTanFunction(double x)
{
if (x < -20.0) return -1.0; // approximation is correct to 30 decimals
else if (x > 20.0) return 1.0;
else return Math.Tanh(x);
}
In above a function makes for the output layer use of Softmax() and it is i think critical to problem here. In that I think it makes his output all binary, and it looks like this :
private static double[] Softmax(double[] oSums)
{
// determine max output sum
// does all output nodes at once so scale doesn't have to be re-computed each time
double max = oSums[0];
for (int i = 0; i < oSums.Length; ++i)
if (oSums[i] > max) max = oSums[i];
// determine scaling factor -- sum of exp(each val - max)
double scale = 0.0;
for (int i = 0; i < oSums.Length; ++i)
scale += Math.Exp(oSums[i] - max);
double[] result = new double[oSums.Length];
for (int i = 0; i < oSums.Length; ++i)
result[i] = Math.Exp(oSums[i] - max) / scale;
return result; // now scaled so that xi sum to 1.0
}
How to rewrite softmax ?
So the network will be able to give non binary answers ?
Notice the full code of the network is here. if you would like to try it out.
Also as to test the network the following accuracy function is used, maybe the binary behaviour emerges from it
public double Accuracy(double[][] testData)
{
// percentage correct using winner-takes all
int numCorrect = 0;
int numWrong = 0;
double[] xValues = new double[numInput]; // inputs
double[] tValues = new double[numOutput]; // targets
double[] yValues; // computed Y
for (int i = 0; i < testData.Length; ++i)
{
Array.Copy(testData[i], xValues, numInput); // parse test data into x-values and t-values
Array.Copy(testData[i], numInput, tValues, 0, numOutput);
yValues = this.ComputeOutputs(xValues);
int maxIndex = MaxIndex(yValues); // which cell in yValues has largest value?
int tMaxIndex = MaxIndex(tValues);
if (maxIndex == tMaxIndex)
++numCorrect;
else
++numWrong;
}
return (numCorrect * 1.0) / (double)testData.Length;
}
Just in case that someone gets into the same situation.
If you need some example code of a neural network regression
(a NNR) That's how they are called.
Here is link to sample code in C#, and here is a good article about it. Notice the guy writes more articles there, you wont find everything but there's a lot there. Despite I was following this man for a while I missed this specific article as I didn't know how they where called, when I asked the question here on stack overflow.
I'm a bit rusty at Neural Netowrks but I think, if you want to have a range of values from your output then you need to make sure your activation functions on your output layer are linear (or something that has a similar effect).
Try adding this method:
private static double[] Linear(double[] oSums)
{
double sum = oSums.Sum(d => Math.Abs(d));
double[] result = new double[oSums.Length];
for (int i = 0; i < oSums.Length; ++i)
result[i] = Math.Abs(oSums[i]) / sum;
// scaled so that xi sum to 1.0
return result;
}
And then in the ComputeOutputs method you need to use this new activation function for the output (rather than Softmax):
...
//double[] softOut = Softmax(oSums); // all outputs at once for efficiency
double[] softOut = Linear(oSums); // all outputs at once for efficiency
Array.Copy(softOut, outputs, softOut.Length);
...
This should now output linear values.
I am attempting to use Lomont FFT in order to return complex numbers to build a spectrogram / spectral density chart using c#.
I am having trouble understanding how to return values from the class.
Here is the code I have put together thus far which appears to be working.
int read = 0;
Double[] data;
byte[] buffer = new byte[1024];
FileStream wave = new FileStream(args[0], FileMode.Open, FileAccess.Read);
read = wave.Read(buffer, 0, 44);
read = wave.Read(buffer, 0, 1024);
data = new Double[read];
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
Console.WriteLine(data[i]);
}
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(data, true);
What I am not clear on is, how to return/access the values from Lomont FFT implementation back into my application (console)?
Being pretty new to c# development, I'm thinking I am perhaps missing a fundamental aspect of understanding regarding how to retrieve processed values from the instance of the Lomont Class, or perhaps even calling it incorrectly.
Console.WriteLine(LFFT.A); // Returns 0
Console.WriteLine(LFFT.B); // Returns 1
I have been searching for a code snippet or explanation of how to do this, but so far have come up with nothing that I understand or explains this particular aspect of the issue I am facing. Any guidance would be greatly appreciated.
A subset of the results held in data array noted in the code above can be found below and based on my current understanding, appear to be valid:
0.00531005859375
0.0238037109375
0.041473388671875
0.0576171875
0.07183837890625
0.083465576171875
0.092193603515625
0.097625732421875
0.099639892578125
0.098114013671875
0.0931396484375
0.0848388671875
0.07354736328125
0.05963134765625
0.043609619140625
0.026031494140625
0.007476806640625
-0.011260986328125
-0.0296630859375
-0.047027587890625
-0.062713623046875
-0.076141357421875
-0.086883544921875
-0.09454345703125
-0.098785400390625
-0.0994873046875
-0.0966796875
-0.090362548828125
-0.080810546875
-0.06842041015625
-0.05352783203125
-0.036712646484375
-0.0185546875
What am I actually attempting to do? (perspective)
I am looking to load a wave file into a console application and return a spectrogram/spectral density chart/image as a jpg/png for further processing.
The wave files I am reading are mono in format
UPDATE 1
I Receive slightly different results depending on which FFT is used.
Using RealFFT
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
//Console.WriteLine(data[i]);
}
LomontFFT LFFT = new LomontFFT();
LFFT.RealFFT(data, true);
for (int i = 0; i < buffer.Length / 2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(data[2 * i] * data[2 * i] + data[2 * i + 1] * data[2 * i + 1]));
}
Partial Result of RealFFT
0.314566983321381
0.625242818210924
0.30314888696868
0.118468857708093
0.0587697011760449
0.0369034115568654
0.0265842582236275
0.0207195964060356
0.0169601273233317
0.0143745438577886
0.012528799609089
0.0111831275153128
0.0102313284519146
0.00960198279358434
0.00920236001619566
Using FFT
for (int i = 0; i < read; i+=2)
{
data[i] = BitConverter.ToInt16(buffer, i) / 32768.0;
//Console.WriteLine(data[i]);
}
double[] bufferB = new double[2 * data.Length];
for (int i = 0; i < data.Length; i++)
{
bufferB[2 * i] = data[i];
bufferB[2 * i + 1] = 0;
}
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(bufferB, true);
for (int i = 0; i < bufferB.Length / 2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(bufferB[2 * i] * bufferB[2 * i] + bufferB[2 * i + 1] * bufferB[2 * i + 1]));
}
Partial Result of FFT:
0.31456698332138
0.625242818210923
0.303148886968679
0.118468857708092
0.0587697011760447
0.0369034115568653
0.0265842582236274
0.0207195964060355
0.0169601273233317
0.0143745438577886
0.012528799609089
0.0111831275153127
0.0102313284519146
0.00960198279358439
0.00920236001619564
Looking at the LomontFFT.FFT documentation:
Compute the forward or inverse Fourier Transform of data, with
data containing complex valued data as alternating real and
imaginary parts. The length must be a power of 2. The data is
modified in place.
This tells us a few things. First the function is expecting complex-valued data whereas your data is real. A quick fix for this is to create another buffer of twice the size and setting all the imaginary parts to 0:
double[] buffer = new double[2*data.Length];
for (int i=0; i<data.Length; i++)
{
buffer[2*i] = data[i];
buffer[2*i+1] = 0;
}
The documentation also tells us that the computation is done in place. That means that after the call to FFT returns, the input array is replaced with the computed result. You could thus print the spectrum with:
LomontFFT LFFT = new LomontFFT();
LFFT.FFT(buffer, true);
for (int i = 0; i < buffer.Length/2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(buffer[2*i]*buffer[2*i]+buffer[2*i+1]*buffer[2*i+1]));
}
Note since your input data is real valued you could also use LomontFFT.RealFFT. In that case, given a slightly different packing rule, you would obtain the FFT results using:
LomontFFT LFFT = new LomontFFT();
LFFT.RealFFT(data, true);
System.Console.WriteLine("{0}", Math.Abs(data[0]);
for (int i = 1; i < data.Length/2; i++)
{
System.Console.WriteLine("{0}",
Math.Sqrt(data[2*i]*data[2*i]+data[2*i+1]*data[2*i+1]));
}
System.Console.WriteLine("{0}", Math.Abs(data[1]);
This would give you the non-redundant lower half of the spectrum (Unlike LomontFFT.FFT which provides the entire spectrum). Also, numerical differences on the order of double precision (around 1e-16 times the spectrum peak value) with respect to LomontFFT.FFT can be expected.
Given a point collection defined by x and y coordinates.
In this collection I get the start point, the end point and all the other n-2 points.
I have to find the shortest way between the start point and end point by going through all the other points. The shortest way is defined by its value and if possible the crossing point order.
At a first look this seems to be a graph problem, but i am not so sure about that right now, any way i am trying to find this shortest way by using only geometric relations since currently all the information that i have is only the x and y coordinates of the points, and which point is the start point and which is the end point.
My question is, can this way be found by using only geometric relations?
I am trying to implement this in C#, so if some helpful packages are available please let me know.
The simplest heuristic with reasonable performance is 2-opt. Put the points in an array, with the start point first and the end point last, and repeatedly attempt to improve the solution as follows. Choose a starting index i and an ending index j and reverse the subarray from i to j. If the total cost is less, then keep this change, otherwise undo it. Note that the total cost will be less if and only if d(p[i - 1], p[i]) + d(p[j], p[j + 1]) > d(p[i - 1], p[j]) + d(p[i], p[j + 1]), so you can avoid performing the swap unless it's an improvement.
There are a possible number of improvements to this method. 3-opt and k-opt consider more possible moves, resulting in better solution quality. Data structures for geometric search, kd-trees for example, decrease the time to find improving moves. As far as I know, the state of the art in local search algorithms for TSP is Keld Helsgaun's LKH.
Another family of algorithms is branch and bound. These return optimal solutions. Concorde (as far as I know) is the state of the art here.
Here's a Java implementation of the O(n^2 2^n) DP that Niklas described. There are many possible improvements, e.g., cache the distances between points, switch to floats (maybe), reorganize the iteration so that subsets are enumerating in increasing order of size (to allow only the most recent layer of minTable to be retained, resulting in a significant space saving).
class Point {
private final double x, y;
Point(double x, double y) {
this.x = x;
this.y = y;
}
double distanceTo(Point that) {
return Math.hypot(x - that.x, y - that.y);
}
public String toString() {
return x + " " + y;
}
}
public class TSP {
public static int[] minLengthPath(Point[] points) {
if (points.length < 2) {
throw new IllegalArgumentException();
}
int n = points.length - 2;
if ((1 << n) <= 0) {
throw new IllegalArgumentException();
}
byte[][] argMinTable = new byte[1 << n][n];
double[][] minTable = new double[1 << n][n];
for (int s = 0; s < (1 << n); s++) {
for (int i = 0; i < n; i++) {
int sMinusI = s & ~(1 << i);
if (sMinusI == s) {
continue;
}
int argMin = -1;
double min = points[0].distanceTo(points[1 + i]);
for (int j = 0; j < n; j++) {
if ((sMinusI & (1 << j)) == 0) {
continue;
}
double cost =
minTable[sMinusI][j] +
points[1 + j].distanceTo(points[1 + i]);
if (argMin < 0 || cost < min) {
argMin = j;
min = cost;
}
}
argMinTable[s][i] = (byte)argMin;
minTable[s][i] = min;
}
}
int s = (1 << n) - 1;
int argMin = -1;
double min = points[0].distanceTo(points[1 + n]);
for (int i = 0; i < n; i++) {
double cost =
minTable[s][i] +
points[1 + i].distanceTo(points[1 + n]);
if (argMin < 0 || cost < min) {
argMin = i;
min = cost;
}
}
int[] path = new int[1 + n + 1];
path[1 + n] = 1 + n;
int k = n;
while (argMin >= 0) {
path[k] = 1 + argMin;
k--;
int temp = s;
s &= ~(1 << argMin);
argMin = argMinTable[temp][argMin];
}
path[0] = 0;
return path;
}
public static void main(String[] args) {
Point[] points = new Point[20];
for (int i = 0; i < points.length; i++) {
points[i] = new Point(Math.random(), Math.random());
}
int[] path = minLengthPath(points);
for (int i = 0; i < points.length; i++) {
System.out.println(points[path[i]]);
System.err.println(points[i]);
}
}
}
The Euclidean travelling salesman problem can be reduced to this and it's NP-hard. So unless your point set is small or you have a very particular structure, you should probably look out for an approximation. Note that the Wikipedia article mentions the existence of a PTAS for the problem, which could turn out to be quite effective in practice.
UPDATE: Since your instances seem to have only few nodes, you can use a simple exponential-time dynamic programming approach. Let f(S, p) be the minimum cost to connect all the points in the set S, ending at the points p. We have f({start}, start) = 0 and we are looking for f(P, end), where P is the set of all points. To compute f(S, p), we can check all potential predecessors of p in the tour, so we have
f(S, p) = MIN(q in S \ {p}, f(S \ {p}, q) + distance(p, q))
You can represent S as a bitvector to save space (just use an single-word integer for maximum simplicity). Also use memoization to avoid recomputing subproblem results.
The runtime will be O(2^n * n^2) and the algorithm can be implemented with a rather low constant factor, so I predict it to be able to solve instance with n = 25 within seconds a reasonable amount of time.
This can be solved using an evolutionary algorithm.
Look at this: http://johnnewcombe.net/blog/post/23
You might want to look at TPL (Task Parallel Library) to speed up the application.
EDIT
I found this Link which has a Traveling Salesman algorithm:
http://msdn.microsoft.com/en-us/magazine/gg983491.aspx
The Source Code is said to be at:
http://archive.msdn.microsoft.com/mag201104BeeColony