Replace 2D-matrix by "double-index List<>"? - c#

I'm implementing the Floyd-Warshal algorithm from this website, as proposed in this other StackExchange question, in a C# environment.
Everything is working fine, but the used graph datatype is not very useful (it's a basic 4x4 2D matrix), hereby the example from the website itself:
int[,] graph_matrix = {{ 0, 5, INF, 8},
{INF, 0, 3, INF},
{INF, INF, 0, 1},
{INF, INF, INF, 0}};
I would like to create a graph, which consists of a large number of vertices, which are more or less only connected to their closest neighbour, like in this image:
There are about 600 vertices in my graph, and having a 600x600 matrix might blow up my memory, so I'm looking for a datatype, like a double_index_List<int> where I could do something like:
double_index_List<int> graph_connections = new double_index_List<int>();
for (int index=0, index <= 600; index++)
graph_connections.insertAt(index,index) = 0;
// make the connections between all subsequent points
graph_connections.insertAt(1, 2) = 5000;
graph_connections.insertAt(2, 3) = 5100;
graph_connections.insertAt(3, 4) = 600;
...
graph_connections.insertAt(600, 1) = 4800;
// foresee the possibility that some non-subsequent vertices in the graph are reachable.
graph_connections.insertAt(100, 20) = 4200;
graph_connections.insertAt(200, 124) = 500;
graph_connections.insertAt(320, 240) = 600;
...
graph_connections.insertAt(600, 540) = 480;
The idea is that I have a function, like:
// Explanation: If graph_connections(x, y) has been created, then return that entry.
// If not, return the default value (being `INF`).
graph_connections.get(x, y, INF)
Does such "doubly indexed List" or "double-index List" exist in C#?

having a 600x600 matrix might blow up my memory
Why? such a matrix should only require 1.44Mb of memory, that should really not be a problem, even on 32 bit systems.
Does such "doubly indexed List" or "double-index List" exist in C#?
What you are describing is essentially a sparse matrix, there is no such builtin. But you might want to take a look at Math.Net SparseMatrix.
However, it would not be very difficult to write your own based on a dictionary
var mySparseMatrix = new Dictionary<(int x, int y), int>();
mySparseMatrix[(1, 2)] = 5000;
var value = mySparseMatrix.TryGet((1, 2), out var r) ? r : 0
Remark:
The previous lines of source are based on System.ValueTuple. The more recent .Net frameworks cover this by default but in case yours doesn't, there's a Nuget library "System.ValueTuple" which can be used for this purpose.
Note that ValueTuple that this uses does not have a good GetHashCode implementation, so the performance of this will be poor. You should probably implement your own Point/Vector2i type with GetHashCode & Equals implemented. Or maybe use System.Drawing.Point. You should probably also wrap the dictionary in a custom class to provide the interface you want.

Related

Removing an array of int from a list of type int array

I intend to have a field of type List<int[]> that holds some int array (used in Unity to record some grid positions), looks roughly like:
{
{0, 0},
{0, 1},
{0, 2}
}
But when I try to remove elements from this list, it seems to be having some difficulty doing so:
int[] target = new int[] {0, 0};
// Simplified, in the actual code I have to do some calculations.
// But during debug, I confirmed that the same array I want to remove
// is in the intArrayList
intArrayList.Remove(target);
// {0, 0} seems to be still here
Is that supposed to happen? If so, how can I fix that?
The problem is that you are deleting a different instance of the same list. Since C# array uses the default equality checker, the instance needs to be the same in order for the other array to get removed.
A quick fix is to write your own method that searches the list for the appropriate array first, and then removing the item at that index:
var target = new int[] {0, 0};
var indexToRemove = intArrayList.FindIndex(a => a.SequenceEqual(target));
if (indexToRemove >= 0) {
intArrayList.RemoveAt(indexToRemove);
}
A good fix is to stop using the "naked" array: wrap it into your own class with an appropriate comparison semantic, and use that class instead. This would make your code more readable, give you more control over what goes into the array, and let you protect the array from writing if necessary. It will also let you use the original removal code, which is a small bonus on top of the other great things you are going to get for writing a meaningful wrapper on top of the array.
Use the correct data structure!
What you have is a little bit like an XY Problem. You have an issue with the attempted solution using a bad data structure for what you are actually trying to achieve.
If this is about Unity grid positions as you say, do not use int[] at all!
Rather simply use Vector2Int which already provides a structure for two coupled int values (coordinates) and implements all the necessary interfaces for successfully compare it for equality:
List<Vector2Int> yourList = new List<Vector2Int>()
{
new Vector2Int(0, 0),
new Vector2Int(0, 1),
new Vector2Int(0, 2),
...
}
var target = new Vector2Int(0, 1);
yourList.Remove(target);
Since Vector2Int implements IEquatable<Vector2Int> and GetHashCode these kind of operations on lists and dictionaries can be done implicit.
You are trying to remove the array by ref and most likely your intention is to remove it by value.
this should work:
intArrayList.RemoveAll(p => p[0] == 0 && p[1] == 0);
Another option is to use records instead of int[2] arrays since they have built-in implementation of value equality
record GridPosition(int X, int Y);
//...
List<GridPosition> intArrayList = new();
intArrayList.Add(new GridPosition(0, 0));
intArrayList.Add(new GridPosition(1, 0));
var target = new GridPosition(0, 0);
intArrayList.Remove(target);
Assert.AreEqual(1, intArrayList.Count);

What is wrong with my Fourier Transformation (Convolution) in MathNet.Numerics? - C#

I am trying to do a simple Convolution between 2 audio files using the MathNet.Numerics's FFT (Fast Fourier Transformation), but I get some weird background sounds, after the IFFT.
I tested if it's the Convolution or the Transformations, thats causing the problem, and I found out that the problem shows already in the FFT -> IFFT (Inverze FFT) conversion.
My code for a simple FFT and IFFT:
float[] sound; //here are stored my samples
Complex[] complexInput = new Complex[sound.Length];
for (int i = 0; i < complexInput.Length; i++)
{
Complex tmp = new Complex(sound[i],0);
complexInput[i] = tmp;
}
MathNet.Numerics.IntegralTransforms.Fourier.Forward(complexInput);
//do some stuff
MathNet.Numerics.IntegralTransforms.Fourier.Inverse(complexInput);
float[] outSamples = new float[complexInput.Length];
for (int i = 0; i < outSamples.Length; i++)
outSamples[i] = (float)complexInput[i].Real;
After this, the outSamples are corrupted with some wierd background sound/noise, even though I'm not doing anything between the FFT and IFFT.
What am I missing?
The current implementation of MathNet.Numerics.IntegralTransform.Fourier (see Fourier.cs and Fourier.Bluestein.cs)
uses Bluestein's algorithm for any FFT lengths that are not powers of 2.
This algorithm involves the creation of a Bluestein sequence (which includes terms proportional to n2) which up to version 3.6.0 was using the following code:
static Complex[] BluesteinSequence(int n)
{
double s = Constants.Pi/n;
var sequence = new Complex[n];
for (int k = 0; k < sequence.Length; k++)
{
double t = s*(k*k); // <--------------------- (k*k) 32-bit int expression
sequence[k] = new Complex(Math.Cos(t), Math.Sin(t));
}
return sequence;
}
For any size n greater than 46341, the intermediate expression (k*k) in this implementation is computed using int arithmetic (a 32-bit type as per MSDN integral type reference table) which results in numeric overflows for the largest values of k. As such the current implementation of MathNet.Numerics.IntegralTransfom.Fourier only supports input array sizes which are either powers of 2 or non-powers of 2 up to 46341 (included).
Thus for large input arrays, a workaround could be to pad your input to the next power of 2.
Note: this observation is based on version 3.6.0 of MathNet.Numerics, although the limitation appears to have been present in earlier releases (the Bluestein sequence code has not changed significantly going as far back as version 2.1.1).
Update 2015/04/26:
After I posted this and a comment on an similar issue on github bugtracking, the issue was quickly fixed in MathNet.Numerics. The fix should now be available in version 3.7.0. Note however that you may still want to pad to a power of two for performance reasons, especially since you already need to zero pad for the linear convolution.

How to best implement K-nearest neighbours in C# for large number of dimensions?

I'm implementing the K-nearest neighbours classification algorithm in C# for a training and testing set of about 20,000 samples each, and 25 dimensions.
There are only two classes, represented by '0' and '1' in my implementation. For now, I have the following simple implementation :
// testSamples and trainSamples consists of about 20k vectors each with 25 dimensions
// trainClasses contains 0 or 1 signifying the corresponding class for each sample in trainSamples
static int[] TestKnnCase(IList<double[]> trainSamples, IList<double[]> testSamples, IList<int[]> trainClasses, int K)
{
Console.WriteLine("Performing KNN with K = "+K);
var testResults = new int[testSamples.Count()];
var testNumber = testSamples.Count();
var trainNumber = trainSamples.Count();
// Declaring these here so that I don't have to 'new' them over and over again in the main loop,
// just to save some overhead
var distances = new double[trainNumber][];
for (var i = 0; i < trainNumber; i++)
{
distances[i] = new double[2]; // Will store both distance and index in here
}
// Performing KNN ...
for (var tst = 0; tst < testNumber; tst++)
{
// For every test sample, calculate distance from every training sample
Parallel.For(0, trainNumber, trn =>
{
var dist = GetDistance(testSamples[tst], trainSamples[trn]);
// Storing distance as well as index
distances[trn][0] = dist;
distances[trn][1] = trn;
});
// Sort distances and take top K (?What happens in case of multiple points at the same distance?)
var votingDistances = distances.AsParallel().OrderBy(t => t[0]).Take(K);
// Do a 'majority vote' to classify test sample
var yea = 0.0;
var nay = 0.0;
foreach (var voter in votingDistances)
{
if (trainClasses[(int)voter[1]] == 1)
yea++;
else
nay++;
}
if (yea > nay)
testResults[tst] = 1;
else
testResults[tst] = 0;
}
return testResults;
}
// Calculates and returns square of Euclidean distance between two vectors
static double GetDistance(IList<double> sample1, IList<double> sample2)
{
var distance = 0.0;
// assume sample1 and sample2 are valid i.e. same length
for (var i = 0; i < sample1.Count; i++)
{
var temp = sample1[i] - sample2[i];
distance += temp * temp;
}
return distance;
}
This takes quite a bit of time to execute. On my system it takes about 80 seconds to complete. How can I optimize this, while ensuring that it would also scale to larger number of data samples? As you can see, I've tried using PLINQ and parallel for loops, which did help (without these, it was taking about 120 seconds). What else can I do?
I've read about KD-trees being efficient for KNN in general, but every source I read stated that they're not efficient for higher dimensions.
I also found this stackoverflow discussion about this, but it seems like this is 3 years old, and I was hoping that someone would know about better solutions to this problem by now.
I've looked at machine learning libraries in C#, but for various reasons I don't want to call R or C code from my C# program, and some other libraries I saw were no more efficient than the code I've written. Now I'm just trying to figure out how I could write the most optimized code for this myself.
Edited to add - I cannot reduce the number of dimensions using PCA or something. For this particular model, 25 dimensions are required.
Whenever you are attempting to improve the performance of code, the first step is to analyze the current performance to see exactly where it is spending its time. A good profiler is crucial for this. In my previous job I was able to use the dotTrace profiler to good effect; Visual Studio also has a built-in profiler. A good profiler will tell you exactly where you code is spending time method-by-method or even line-by-line.
That being said, a few things come to mind in reading your implementation:
You are parallelizing some inner loops. Could you parallelize the outer loop instead? There is a small but nonzero cost associated to a delegate call (see here or here) which may be hitting you in the "Parallel.For" callback.
Similarly there is a small performance penalty for indexing through an array using its IList interface. You might consider declaring the array arguments to "GetDistance()" explicitly.
How large is K as compared to the size of the training array? You are completely sorting the "distances" array and taking the top K, but if K is much smaller than the array size it might make sense to use a partial sort / selection algorithm, for instance by using a SortedSet and replacing the smallest element when the set size exceeds K.

C#: Loop to find minima of function

I currently have this function:
public double Max(double[] x, double[] y)
{
//Get min and max of x array as integer
int xMin = Convert.ToInt32(x.Min());
int xMax = Convert.ToInt32(x.Max());
// Generate a list of x values for input to Lagrange
double i = 2;
double xOld = Lagrange(xMin,x,y);
double xNew = xMax;
do
{
xOld = xNew;
xNew = Lagrange(i,x,y);
i = i + 0.01;
} while (xOld > xNew);
return i;
}
This will find the minimum value on a curve with decreasing slope...however, given this curve, I need to find three minima.
How can I find the three minima and output them as an array or individual variables? This curve is just an example--it could be inverted--regardless, I need to find multiple variables. So once the first min is found it needs to know how to get over the point of inflection and find the next... :/
*The Lagrange function can be found here.** For all practical purposes, the Lagrange function will give me f(x) when I input x...visually, it means the curve supplied by wolfram alpha.
*The math-side of this conundrum can be found here.**
Possible solution?
Generate an array of input, say x[1,1.1,1.2,1.3,1.4...], get an array back from the Lagrange function. Then find the three lowest values of this function? Then get the keys corresponding to the values? How would I do this?
It's been a while since I've taken a numerical methods class, so bear with me. In short there are a number of ways to search for the root(s) of a function, and depending on what your your function is (continuous? differentiable?), you need to choose one that is appropriate.
For your problem, I'd probably start by trying to use Newton's Method to find the roots of the second degree Lagrange polynomial for your function. I haven't tested out this library, but there is a C# based numerical methods package on CodePlex that implements Newton's Method that is open source. If you wanted to dig through the code you could.
The majority of root finding methods have cousins in the broader CS topic of 'search'. If you want a really quick and dirty approach, or you have a very large search space, consider something like Simulated Annealing. Finding all of your minima isn't guaranteed but it's fast and easy to code.
Assuming you're just trying to "brute force" calculate this to a certain level of prcision, you need your algorithm to basically find any value where both neighbors are greater than the current value of your loop.
To simplify this, let's just say you have an array of numbers, and you want to find the indices of the three local minima. Here's a simple algorithm to do it:
public void Test()
{
var ys = new[] { 1, 2, 3, 4, 5, 4, 3, 2, 1, 2, 3, 4, 5, 4, 3, 4, 5, 4 };
var indices = GetMinIndices(ys);
}
public List<int> GetMinIndices(int[] ys)
{
var minIndices = new List<int>();
for (var index = 1; index < ys.Length; index++)
{
var currentY = ys[index];
var previousY = ys[index - 1];
if (index < ys.Length - 1)
{
var neytY = ys[index + 1];
if (previousY > currentY && neytY > currentY) // neighbors are greater
minIndices.Add(index); // add the index to the list
}
else // we're at the last index
{
if (previousY > currentY) // previous is greater
minIndices.Add(index);
}
}
return minIndices;
}
So, basically, you pass in your array of function results (ys) that you calculated for an array of inputs (xs) (not shown). What you get back from this function is the minimum indices. So, in this example, you get back 8, 14, and 17.

What is the use of the ArraySegment<T> class?

I just came across the ArraySegment<byte> type while subclassing the MessageEncoder class.
I now understand that it's a segment of a given array, takes an offset, is not enumerable, and does not have an indexer, but I still fail to understand its usage. Can someone please explain with an example?
ArraySegment<T> has become a lot more useful in .NET 4.5+ and .NET Core as it now implements:
IList<T>
ICollection<T>
IEnumerable<T>
IEnumerable
IReadOnlyList<T>
IReadOnlyCollection<T>
as opposed to the .NET 4 version which implemented no interfaces whatsoever.
The class is now able to take part in the wonderful world of LINQ so we can do the usual LINQ things like query the contents, reverse the contents without affecting the original array, get the first item, and so on:
var array = new byte[] { 5, 8, 9, 20, 70, 44, 2, 4 };
array.Dump();
var segment = new ArraySegment<byte>(array, 2, 3);
segment.Dump(); // output: 9, 20, 70
segment.Reverse().Dump(); // output 70, 20, 9
segment.Any(s => s == 99).Dump(); // output false
segment.First().Dump(); // output 9
array.Dump(); // no change
It is a puny little soldier struct that does nothing but keep a reference to an array and stores an index range. A little dangerous, beware that it does not make a copy of the array data and does not in any way make the array immutable or express the need for immutability. The more typical programming pattern is to just keep or pass the array and a length variable or parameter, like it is done in the .NET BeginRead() methods, String.SubString(), Encoding.GetString(), etc, etc.
It does not get much use inside the .NET Framework, except for what seems like one particular Microsoft programmer that worked on web sockets and WCF liking it. Which is probably the proper guidance, if you like it then use it. It did do a peek-a-boo in .NET 4.6, the added MemoryStream.TryGetBuffer() method uses it. Preferred over having two out arguments I assume.
In general, the more universal notion of slices is high on the wishlist of principal .NET engineers like Mads Torgersen and Stephen Toub. The latter kicked off the array[:] syntax proposal a while ago, you can see what they've been thinking about in this Roslyn page. I'd assume that getting CLR support is what this ultimately hinges on. This is actively being thought about for C# version 7 afaik, keep your eye on System.Slices.
Update: dead link, this shipped in version 7.2 as Span.
Update2: more support in C# version 8.0 with Range and Index types and a Slice() method.
Buffer partioning for IO classes - Use the same buffer for simultaneous
read and write operations and have a
single structure you can pass around
the describes your entire operation.
Set Functions - Mathematically speaking you can represent any
contiguous subsets using this new
structure. That basically means you
can create partitions of the array,
but you can't represent say all odds
and all evens. Note that the phone
teaser proposed by The1 could have
been elegantly solved using
ArraySegment partitioning and a tree
structure. The final numbers could
have been written out by traversing
the tree depth first. This would have
been an ideal scenario in terms of
memory and speed I believe.
Multithreading - You can now spawn multiple threads to operate over the
same data source while using segmented
arrays as the control gate. Loops
that use discrete calculations can now
be farmed out quite easily, something
that the latest C++ compilers are
starting to do as a code optimization
step.
UI Segmentation - Constrain your UI displays using segmented
structures. You can now store
structures representing pages of data
that can quickly be applied to the
display functions. Single contiguous
arrays can be used in order to display
discrete views, or even hierarchical
structures such as the nodes in a
TreeView by segmenting a linear data
store into node collection segments.
In this example, we look at how you can use the original array, the Offset and Count properties, and also how you can loop through the elements specified in the ArraySegment.
using System;
class Program
{
static void Main()
{
// Create an ArraySegment from this array.
int[] array = { 10, 20, 30 };
ArraySegment<int> segment = new ArraySegment<int>(array, 1, 2);
// Write the array.
Console.WriteLine("-- Array --");
int[] original = segment.Array;
foreach (int value in original)
{
Console.WriteLine(value);
}
// Write the offset.
Console.WriteLine("-- Offset --");
Console.WriteLine(segment.Offset);
// Write the count.
Console.WriteLine("-- Count --");
Console.WriteLine(segment.Count);
// Write the elements in the range specified in the ArraySegment.
Console.WriteLine("-- Range --");
for (int i = segment.Offset; i < segment.Count+segment.Offset; i++)
{
Console.WriteLine(segment.Array[i]);
}
}
}
ArraySegment Structure - what were they thinking?
What's about a wrapper class? Just to avoid copy data to temporal buffers.
public class SubArray<T> {
private ArraySegment<T> segment;
public SubArray(T[] array, int offset, int count) {
segment = new ArraySegment<T>(array, offset, count);
}
public int Count {
get { return segment.Count; }
}
public T this[int index] {
get {
return segment.Array[segment.Offset + index];
}
}
public T[] ToArray() {
T[] temp = new T[segment.Count];
Array.Copy(segment.Array, segment.Offset, temp, 0, segment.Count);
return temp;
}
public IEnumerator<T> GetEnumerator() {
for (int i = segment.Offset; i < segment.Offset + segment.Count; i++) {
yield return segment.Array[i];
}
}
} //end of the class
Example:
byte[] pp = new byte[] { 1, 2, 3, 4 };
SubArray<byte> sa = new SubArray<byte>(pp, 2, 2);
Console.WriteLine(sa[0]);
Console.WriteLine(sa[1]);
//Console.WriteLine(b[2]); exception
Console.WriteLine();
foreach (byte b in sa) {
Console.WriteLine(b);
}
Ouput:
3
4
3
4
The ArraySegment is MUCH more useful than you might think. Try running the following unit test and prepare to be amazed!
[TestMethod]
public void ArraySegmentMagic()
{
var arr = new[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
var arrSegs = new ArraySegment<int>[3];
arrSegs[0] = new ArraySegment<int>(arr, 0, 3);
arrSegs[1] = new ArraySegment<int>(arr, 3, 3);
arrSegs[2] = new ArraySegment<int>(arr, 6, 3);
for (var i = 0; i < 3; i++)
{
var seg = arrSegs[i] as IList<int>;
Console.Write(seg.GetType().Name.Substring(0, 12) + i);
Console.Write(" {");
for (var j = 0; j < seg.Count; j++)
{
Console.Write("{0},", seg[j]);
}
Console.WriteLine("}");
}
}
You see, all you have to do is cast an ArraySegment to IList and it will do all of the things you probably expected it to do in the first place. Notice that the type is still ArraySegment, even though it is behaving like a normal list.
OUTPUT:
ArraySegment0 {0,1,2,}
ArraySegment1 {3,4,5,}
ArraySegment2 {6,7,8,}
In simple words: it keeps reference to an array, allowing you to have multiple references to a single array variable, each one with a different range.
In fact it helps you to use and pass sections of an array in a more structured way, instead of having multiple variables, for holding start index and length. Also it provides collection interfaces to work more easily with array sections.
For example the following two code examples do the same thing, one with ArraySegment and one without:
byte[] arr1 = new byte[] { 1, 2, 3, 4, 5, 6 };
ArraySegment<byte> seg1 = new ArraySegment<byte>(arr1, 2, 2);
MessageBox.Show((seg1 as IList<byte>)[0].ToString());
and,
byte[] arr1 = new byte[] { 1, 2, 3, 4, 5, 6 };
int offset = 2;
int length = 2;
byte[] arr2 = arr1;
MessageBox.Show(arr2[offset + 0].ToString());
Obviously first code snippet is more preferred, specially when you want to pass array segments to a function.

Categories