How to speed up my pathfinding algorithm?

How to speed up my pathfinding algorithm? - c#

How can I speed up my code? And what am I doing wrong? I have a two-dimensional array of cells that store data about what is in them. And I have a map of only 100x100 and with 10 colonists and this already causes freezes. Although my game is quite raw.
And when should I build a route for a colonist? Every step he takes? Because if an unexpectedly built wall appears on his way. Then he will have to immediately change the route.
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
public class PathFinding : MonoBehaviour
{
public MainWorld MainWorld;
public MainWorld.Cell CurrentWorldCell;
public MainWorld.Cell NeighborWorldCell;
public List<MainWorld.Cell> UnvisitedWorldCells;
public List<MainWorld.Cell> VisitedWorldCells;
public Dictionary<MainWorld.Cell, MainWorld.Cell> PathTraversed;
public List<MainWorld.Cell> PathToObject;
public List<MainWorld.Cell> FindPath(MainWorld.Cell StartWorldCell, string ObjectID)
{
CurrentWorldCell = new MainWorld.Cell();
NeighborWorldCell = new MainWorld.Cell();
UnvisitedWorldCells = new List<MainWorld.Cell>();
VisitedWorldCells = new List<MainWorld.Cell>();
PathTraversed = new Dictionary<MainWorld.Cell, MainWorld.Cell>();
UnvisitedWorldCells.Add(StartWorldCell);
while (UnvisitedWorldCells.Count > 0)
{
CurrentWorldCell = UnvisitedWorldCells[0];
NeighborWorldCell = MainWorld.Data[CurrentWorldCell.Position.x, CurrentWorldCell.Position.y + 1];
CheckWorldCell(CurrentWorldCell, NeighborWorldCell, ObjectID);
NeighborWorldCell = MainWorld.Data[CurrentWorldCell.Position.x + 1, CurrentWorldCell.Position.y];
CheckWorldCell(CurrentWorldCell, NeighborWorldCell, ObjectID);
NeighborWorldCell = MainWorld.Data[CurrentWorldCell.Position.x, CurrentWorldCell.Position.y - 1];
CheckWorldCell(CurrentWorldCell, NeighborWorldCell, ObjectID);
NeighborWorldCell = MainWorld.Data[CurrentWorldCell.Position.x - 1, CurrentWorldCell.Position.y];
CheckWorldCell(CurrentWorldCell, NeighborWorldCell, ObjectID);
UnvisitedWorldCells.Remove(CurrentWorldCell);
if (CurrentWorldCell.ObjectID == ObjectID)
{
return CreatePath(StartWorldCell, CurrentWorldCell);
}
}
return null;
}
public void CheckWorldCell(MainWorld.Cell CurrentWorldCell, MainWorld.Cell NeighborWorldCell, string ObjectID)
{
if (VisitedWorldCells.Contains(NeighborWorldCell) == false)
{
if (NeighborWorldCell.IsPassable == true ||
NeighborWorldCell.ObjectID == ObjectID)
{
UnvisitedWorldCells.Add(NeighborWorldCell);
VisitedWorldCells.Add(NeighborWorldCell);
PathTraversed.Add(NeighborWorldCell, CurrentWorldCell);
}
else
{
VisitedWorldCells.Add(NeighborWorldCell);
}
}
}
public List<MainWorld.Cell> CreatePath(MainWorld.Cell StartWorldCell, MainWorld.Cell EndWorldCell)
{
PathToObject = new List<MainWorld.Cell>();
PathToObject.Add(EndWorldCell);
while (PathToObject[PathToObject.Count - 1] != StartWorldCell)
{
PathToObject.Add(PathTraversed[PathToObject[PathToObject.Count - 1]]);
}
PathToObject.Reverse();
return PathToObject;
}
}

On each iteration of your pathfinding loop, there is a check for each of the four neighboring cells. Within each check, you are searching the VisitedWorldCells List. If there are many cells in that list, it could be your first issue. Next, you either add an element to one or three lists. Manipulating lists is slow. To create a path, you are making a list and adding a bunch of elements. Doing all of this 10 times per frame is unreasonable.
Use preallocated 2d boolean arrays. You know how large the map is.
bool[,] array = new bool[mapSizeX, mapSizeY];
Using this, you can check the cells directly with something like
if(!VistedWorldCells[14,16])
Instead of checking the entire VisitedWorldCells list (which also could be a slow comparison depending on what the MainWorld.Cell type is).
Preallocated arrays and directly checking/setting the values will speed this up by orders of magnitude.

As far as I can tell your algorithm is a simple breadth first search. The first step should be to replace the UnvisitedWorldCells list (often called the 'open set') with a queue, and the VisitedWorldCells (often called closed set) with a hashSet. Or just use a 2D array for a constant time lookup. You might also consider using a more compact representation of your node, like a simple pair of x/y coordinates.
The next improvement step would be Djikstras shortest path algorithm. And there are plenty of examples how this works if you search around a bit. This works in a similar way, but uses a priority queue (usually a min heap) for the unvisited cells, this allows for different 'cost' to be specified.
The next step should probably be A*, this is an 'optimal' algorithm for pathfinding, but if you only have 100x100 nodes I would not expect a huge difference. This uses a heuristic to guess what nodes to visit first, so the priority queue does not just use the cost to traverse to the node, but also the estimated remaining cost.
As always when it comes to performance you should measure and profile your code to find bottlenecks.
When to pathfind would be a more complicated issue. A fairly simple and probably effective solution would be to create a path once, and just follow it. If a node has been blocked you can redo the path finding then. You could also redo pathfinding every n seconds in case a better path has appeared. If you have multiple units you might introduce additional requirements like "pushing" blocking units out of the way, trying to keep groups of units in a compact group as they navigate tight passages etc. You can probably spend years taking every possible feature in consideration.

Related

Optimal algorithm for merging connected parallel 3D line segments

I have a sketch with 3D line segments. Each segment has start and end 3D point. My task is to merge segments if they are parallel and connected. I've implemented it on C#. This algorithm is recursive. Can this code be optimized? Can it be not recursive?
/// <summary>
/// Merges segments if they compose a straight line.
/// </summary>
/// <param name="segments">List of segments.</param>
/// <returns>Merged list of segments.</returns>
internal static List<Segment3d> MergeSegments(List<Segment3d> segments)
{
var result = new List<Segment3d>(segments);
for (var i = 0; i < result.Count - 1; i++)
{
var firstLine = result[i];
for (int j = i + 1; j < result.Count; j++)
{
var secondLine = result[j];
var startToStartConnected = firstLine.P1.Equals(secondLine.P1);
var startToEndConnected = firstLine.P1.Equals(secondLine.P2);
var endToStartConnected = firstLine.P2.Equals(secondLine.P1);
var endToEndConnected = firstLine.P2.Equals(secondLine.P2);
if (firstLine.IsParallel(secondLine) && (startToStartConnected || startToEndConnected || endToStartConnected || endToEndConnected))
{
Segment3d mergedLine = null;
if (startToStartConnected)
{
mergedLine = new Segment3d(firstLine.P2, secondLine.P2);
}
if (startToEndConnected)
{
mergedLine = new Segment3d(firstLine.P2, secondLine.P1);
}
if (endToStartConnected)
{
mergedLine = new Segment3d(firstLine.P1, secondLine.P2);
}
if (endToEndConnected)
{
mergedLine = new Segment3d(firstLine.P1, secondLine.P1);
}
// Remove duplicate.
if (firstLine == secondLine)
{
mergedLine = new Segment3d(firstLine.P1, firstLine.P2);
}
result.Remove(firstLine);
result.Remove(secondLine);
result.Add(mergedLine);
result = MergeSegments(result);
return result;
}
}
}
return result;
}
The classes Segment3D and Point3D are pretty simple:
Class Segment3D:
internal class Segment3d{
public Segment3d(Point3D p1, Point3D p2)
{
this.P1 = p1;
this.P2 = p2;
}
public bool IsParallel(Segment3d segment)
{
// check if segments are parallel
return true;
}
public Point3D P1 { get; }
public Point3D P2 { get; }
}
internal class Point3D
{
public double X { get; set; }
public double Y { get; set; }
public double Z { get; set; }
public override bool Equals(object obj)
{
// Implement equality logic,
return true;
}
}

The optimizations
You are asking about the way to remove the recursion. However that isn't the only nor the largest problem of your current solution. So I will try to give you an outline of possible directions for the optimization. Unfortunately, since your code still isn't self-contained minimal reproducible example it is rather tricky to debug. If I have time in the future, I might re-visit.
First step: Limiting the number of comparisons.
Currently, you are performing unnecessary number of comparisons since you compare every two possible line segments and every possible alignment that they can have. This is unnecessary.
First step to lower the number of the comparisons is to separate the line segments by their direction. Currently, when you are trying to compare two vectors, you go and check if these align. If that is the case, you proceed with the rest of the comparisons.
If we sort the segments by the direction, we will naturally group them into buckets of sort. Sorting the segments might sound weird, there are three axes to sort by. That is fine though since the only thing we actually care about is the fact, that if two normalized vectors (x,y,z) and (a,b,c) differ in at least one coordinate, they are not parallel.
This sorting can be done by implementing the IComparable interface and then calling sort method as you would normally. This has already been described for example here. The sorting has the complexity O(N * log(N)) which is quite a lot better than your current O(N^2) approach. (Actually, your algorithm is even worse than that since during each recursion step you do another N^2 steps. So you might be dangerously close to the O(N^4) territory. There be the dragons.)
Note: If you are feeling adventurous and O(Nlog(N) is not enough, it could be possible to get to the O(N) territory by implementing the System.HashCode function for your direction vector and find the groups of parallel lines by using Sets or Dictionaries. However it might be rather tricky since floating point numbers and comparison for equality aren't exactly a friendly combination. Caution is required. Sorting should be enough for most of the use cases.
So now we have the line segments separated by the direction vectors. Even if you proceed now with your current approach the results should be much better since we lowered the size of the groups to be compared. However we can go further.
Second step: Smart comparison of the segment ends.
Since you want to merge the segments if either of their endpoints align(as you covered start-start, start-end, end-start, end-end combinations in your question) we can most likely start merging any two identical points we find. The easiest way to do that is once more either sorting or hashing. Since we won't be normalizing which can result in the marginall differences, hashing should be viable. However sorting is still the more-straightforward and "safer" way to go.
One of the many ways this could be done:
Put all of the endpoints into single linked list as a tuple (endpoint, parent_vector). Arrays are not suitable since the deletion would be costly.
Sort the list by the endpoints. (Once again, you will need to implement the IComparable interface for your points)
Go through the list. Whenever the two neighboring points are equal, remove them and merge their siblings into a new line segment (Or delete one of them and the closer sibling if the neighbors are both the starting or ending point of the segment).
When you pass whole list, all the segments are either merged or have no more neighbors.
Pick up your survivors and you are done.
The complexity of this step is once again O(N * log(N)) as in the previous case. No recursion necessary and the speedup should be nice.

Where is likely the performance bug here? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Many of the test cases are timing out. I've made sure I'm using lazy evaluation everywhere, linear (or better) routines, etc. I'm shocked that this is still not meeting the performance benchmarks.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Mine
{
public int Distance { get; set; } // from river
public int Gold { get; set; } // in tons
}
class Solution
{
static void Main(String[] args)
{
// helper function for reading lines
Func<string, int[]> LineToIntArray = (line) => Array.ConvertAll(line.Split(' '), Int32.Parse);
int[] line1 = LineToIntArray(Console.ReadLine());
int N = line1[0], // # of mines
K = line1[1]; // # of pickup locations
// Populate mine info
List<Mine> mines = new List<Mine>();
for(int i = 0; i < N; ++i)
{
int[] line = LineToIntArray(Console.ReadLine());
mines.Add(new Mine() { Distance = line[0], Gold = line[1] });
}
// helper function for cost of a move
Func<Mine, Mine, int> MoveCost = (mine1, mine2) =>
Math.Abs(mine1.Distance - mine2.Distance) * mine1.Gold;
int sum = 0; // running total of move costs
// all move combinations on the current list of mines,
// given by the indicies of the mines
var indices = Enumerable.Range(0, N);
var moves = from i1 in indices
from i2 in indices
where i1 != i2
select new int[] { i1, i2 };
while(N != K) // while number of mines hasn't been consolidated to K
{
// get move with the least cost
var cheapest = moves.Aggregate(
(prev, cur) => MoveCost(mines[prev[0]],mines[prev[1]])
< MoveCost(mines[cur[0]], mines[cur[1]])
? prev : cur
);
int i = cheapest[0], // index of source mine of cheapest move
j = cheapest[1]; // index of destination mine of cheapest move
// add cost to total
sum += MoveCost(mines[i], mines[j]);
// move gold from source to destination
mines[j].Gold += mines[i].Gold;
// remove from moves any that had the i-th mine as a destination or source
moves = from move in moves
where move[0] == i || move[1] == i
select move;
// update size number of mines after consolidation
--N;
}
Console.WriteLine(sum);
}
}

Lazy evaluation will not make bad algorithms perform better. It will just delay when those performance problems will affect you. What lazy evaluation can help with is space complexity, i.e. reducing the amount of memory you need to execute your algorithm. Since the data is generated lazily, you will not (necessarily) have all the data in the memory at the same time.
However, relying on lazy evaluation to fix your space (or time) complexity problems can easily shoot you in the foot. Look the following example code:
var moves = Enumerable.Range(0, 5).Select(x => {
Console.WriteLine("Generating");
return x;
});
var aggregate = moves.Aggregate((p, c) => {
Console.WriteLine("Aggregating");
return p + c;
});
var newMoves = moves.Where(x => {
Console.WriteLine("Filtering");
return x % 2 == 0;
});
newMoves.ToList();
As you can see, both the aggregate and the newMoves rely on the lazily evaluated moves enumerable. Since the original count of moves is 5, we will see 4 “Aggregating” lines in the output, and 5 “Filtering” lines. But how often do you expect “Generating” to appear in the console?
The answer is 10. This is because moves is a generator and is being evaluated lazily. When multiple places request it, an iterator will be created for each, which ultimately means that the generator will execute multiple times (to generate independent results).
This is not necessarily a problem, but in your case, it very quickly becomes one. Assume that we continue above example code with another round of aggregating. That second aggregate will consume newMoves which in turns will consume the original moves. So to aggregate, we will re-run the original moves generator, and the newMoves generator. And if we were to add another level of filtering, the next round of aggregating would run three interlocked generators, again rerunning the original moves generator.
Since your original moves generator creates an enumerable of quadratic size, and has an actual time complexity of O(n²), this is an actual problem. With each iteration, you add another round of filtering which will be linear to the size of the moves enumerable, and you actually consume the enumerable completely for the aggregation. So you end up with O(n^2 + n^3 + n^4 + n^5 + …) which will eventually be the sum of n^j for j starting at 2 up to N-K. That is a very bad time complexity, and all just because you were trying to save memory by evaluating the moves lazily.
So the first step to make this better is to avoid lazy evaluation. You are constantly iterating moves and filtering it, so you should have it in memory. Yes, that means that you have an array of quadratic size, but you won’t actually need more than that. This also limits the time complexity you have. Yes, you still need to filter the list in linear time (so O(n²) since the size is n²) and you do that inside a loop, so you will end up with cubic time (O(n³)) but that would already be your upper limit (iterating the list a constant amount of times within the loop will only increase the time complexity by a constant, and that doesn’t matter).
Once you have done that, you should consider your original problem, think about what you are actually doing. I believe you could probably reduce the computational complexity further if you use the information you have better, and use data structures (e.g. hash sets, or some graph where the move cost is already stored within) that aid you in the filtering and aggregation. I can’t give you exact ideas since I don’t know your original problem, but I’m sure there is something you can do.
Finally, if you have performance problems, always remember to profile your code. The profiler will tell you what parts of your code is the most expensive, so you can get a clear idea what you should try to optimize and what you don’t need to focus on when optimizing (since optimizing already fast parts will not help you get any faster).

Working with micro changes in floats/doubles

The last couple of days have been full with making calculations and formulas and I'm beginning to lose my mind (a little bit). So now I'm turning to you guys for some insight/help.
Here's the problem; I'm working with bluetooth beacons whom are placed all over an entire floor in a building to make an indoor GPS showcase. You can use your phone to connect with these beacons, which results in receiving your longitude and latitude location from them. These numbers are large float/double variables, looking like this:
lat: 52.501288451787076
lng: 6.079107635606511
The actual changes happen at the 4th and 5th position after the point. I'm converting these numbers to the Cartesian coordinate system using;
x = R * cos(lat) * cos(lon)
z = R *sin(lat)
Now the coordinates from this conversion are kind of solid. They are numbers with which I can work with. I use them in a 3d engine (Unity3d) to make a real-time map where you can see where someone is walking.
Now for the actual problem! These beacons are not entirely accurate. These numbers 'jump' up and down even when you lay your phone down. Ranging from, let's assume the same latitude as mentioned above, 52.501280 to 52.501296. If we convert this and use it as coordinates in a 3d engine, the 'avatar' for a user jumps from one position to another (more small jumps than large jumps).
What is a good way to cope with these jumping numbers? I've tried to check for big jumps and ignore those, but the jumps are still too big. A broader check will result in almost no movement, even when a phone is moving. Or is there a better way to convert the lat and long variables for use in a 3d engine?
If there is someone who has had the same problem as me, some mathematical wonder who can give a good conversion/formula to start with or someone who knows what I'm possibly doing wrong then please, help a fellow programmer out.

Moving Average
You could use this: (Taken here: https://stackoverflow.com/a/1305/5089204)
Attention: Please read the comments to this class as this implementation has some flaws... It's just for quick test and show...
public class LimitedQueue<T> : Queue<T> {
private int limit = -1;
public int Limit {
get { return limit; }
set { limit = value; }
}
public LimitedQueue(int limit)
: base(limit) {
this.Limit = limit;
}
public new void Enqueue(T item) {
if (this.Count >= this.Limit) {
this.Dequeue();
}
base.Enqueue(item);
}
}
Just test it like this:
var queue = new LimitedQueue<float>(4);
queue.Enqueue(52.501280f);
var avg1 = queue.Average(); //52.50128
queue.Enqueue(52.501350f);
var avg2 = queue.Average(); //52.5013161
queue.Enqueue(52.501140f);
var avg3 = queue.Average(); //52.50126
queue.Enqueue(52.501022f);
var avg4 = queue.Average(); //52.5011978
queue.Enqueue(52.501635f);
var avg5 = queue.Average(); //52.50129
queue.Enqueue(52.501500f);
var avg6 = queue.Average(); //52.5013237
queue.Enqueue(52.501505f);
var avg7 = queue.Average(); //52.5014153
queue.Enqueue(52.501230f);
var avg8 = queue.Average(); //52.50147
The limited queue will not grow... You just define the count of elements you want to use (in this case I specified 4). The 5th element pushes the first out and so on...
The average will always be a smooth sliding :-)

Limiting List Iterations in a For Loop (Closest To Player)

I have a List<Collider> colliders which is for a tile map.
One approach that I thought of is to check the full list for all colliders' positions, compare to player's position, then add closest colliders to a temporary list for faster iteration. Only doing this once every 100ms would decrease performance loss. But I'd imagine there are better ways than this, right?
I've read a different post on here for collision optimization, and it mentioned using a "CPU Budget", which I intend to implement for this and others as well. I have not read about threading yet.
Shorter question: How can I limit the maximum iterations to only the colliders closest to player?

To reduce the number of colliders for check you can previously excluded them from the list of candidates. Keep colliders in Dictionary colliders, and also create a second Dictionary > collidersByChunks. As a key in the second dictionary use coordinate of chunk, and as the value use sublist of colliders. Something like this:
class ColliderManager
{
Dictionary<TKey, Collider> colliders;
Dictionary<Vector2, List<TKey>> collidersByChunks;
public void AddCollider(TKey pKey, Collider pCOllider)
{
this.colliders.Add(pKey, pCollider);
foreach(Vector2 chunkCoord in this.GetChunkCoords(pCollider.Rectangle))
{
List<TKey> collidersAtChunk = null;
if(!this.collidersByChunks.TryGetValue(chunkCoord, out collidersAtChunk))
{
collidersAtChunk = new List<TKey>();
this.collidersByChunks.Add(chunkCoords, collidersAtChunk);
}
collidersAtChunk.Add(pKey, pCollider);
}
}
private Vector2[] GetChunkCoords(Rectangle pRectangle)
{
// return all chunks pRectangle intersects
}
}
When checking determine which chunks crosses the scanned object and compares it with colliders only from that chunks.

I found using List.Where() works best in this case. List of 100,000 colliders (somewhat extreme, however pretty likely on a 512x512 map), using:
foreach (TestCollider test in integers.Where(c => c.IsInRange() == true)) { }
Using Stopwatch, iterated through 1,000 times, with a range of 20:
List of 10,000: 0ms always
List of 50,000: 2ms avg.
List of 100,000: 4ms avg.
List of 250,000: 12 ms avg.
If anyone has any suggestions for improvements, I'd be very happy to update this.

You could try something like:
foreach (Collider collider in colliders.OrderBy(c => c.Position - player.Position))
{
// The tiles closest to the player should be at the start
if (collider.Rectangle.Intersects(player.Rectangle))
{
// Do whatever you want to do here
Break;
}
}
Don't forget to add using System.Linq; at the top of your code.

How to best implement K-nearest neighbours in C# for large number of dimensions?

I'm implementing the K-nearest neighbours classification algorithm in C# for a training and testing set of about 20,000 samples each, and 25 dimensions.
There are only two classes, represented by '0' and '1' in my implementation. For now, I have the following simple implementation :
// testSamples and trainSamples consists of about 20k vectors each with 25 dimensions
// trainClasses contains 0 or 1 signifying the corresponding class for each sample in trainSamples
static int[] TestKnnCase(IList<double[]> trainSamples, IList<double[]> testSamples, IList<int[]> trainClasses, int K)
{
Console.WriteLine("Performing KNN with K = "+K);
var testResults = new int[testSamples.Count()];
var testNumber = testSamples.Count();
var trainNumber = trainSamples.Count();
// Declaring these here so that I don't have to 'new' them over and over again in the main loop,
// just to save some overhead
var distances = new double[trainNumber][];
for (var i = 0; i < trainNumber; i++)
{
distances[i] = new double[2]; // Will store both distance and index in here
}
// Performing KNN ...
for (var tst = 0; tst < testNumber; tst++)
{
// For every test sample, calculate distance from every training sample
Parallel.For(0, trainNumber, trn =>
{
var dist = GetDistance(testSamples[tst], trainSamples[trn]);
// Storing distance as well as index
distances[trn][0] = dist;
distances[trn][1] = trn;
});
// Sort distances and take top K (?What happens in case of multiple points at the same distance?)
var votingDistances = distances.AsParallel().OrderBy(t => t[0]).Take(K);
// Do a 'majority vote' to classify test sample
var yea = 0.0;
var nay = 0.0;
foreach (var voter in votingDistances)
{
if (trainClasses[(int)voter[1]] == 1)
yea++;
else
nay++;
}
if (yea > nay)
testResults[tst] = 1;
else
testResults[tst] = 0;
}
return testResults;
}
// Calculates and returns square of Euclidean distance between two vectors
static double GetDistance(IList<double> sample1, IList<double> sample2)
{
var distance = 0.0;
// assume sample1 and sample2 are valid i.e. same length
for (var i = 0; i < sample1.Count; i++)
{
var temp = sample1[i] - sample2[i];
distance += temp * temp;
}
return distance;
}
This takes quite a bit of time to execute. On my system it takes about 80 seconds to complete. How can I optimize this, while ensuring that it would also scale to larger number of data samples? As you can see, I've tried using PLINQ and parallel for loops, which did help (without these, it was taking about 120 seconds). What else can I do?
I've read about KD-trees being efficient for KNN in general, but every source I read stated that they're not efficient for higher dimensions.
I also found this stackoverflow discussion about this, but it seems like this is 3 years old, and I was hoping that someone would know about better solutions to this problem by now.
I've looked at machine learning libraries in C#, but for various reasons I don't want to call R or C code from my C# program, and some other libraries I saw were no more efficient than the code I've written. Now I'm just trying to figure out how I could write the most optimized code for this myself.
Edited to add - I cannot reduce the number of dimensions using PCA or something. For this particular model, 25 dimensions are required.

Whenever you are attempting to improve the performance of code, the first step is to analyze the current performance to see exactly where it is spending its time. A good profiler is crucial for this. In my previous job I was able to use the dotTrace profiler to good effect; Visual Studio also has a built-in profiler. A good profiler will tell you exactly where you code is spending time method-by-method or even line-by-line.
That being said, a few things come to mind in reading your implementation:
You are parallelizing some inner loops. Could you parallelize the outer loop instead? There is a small but nonzero cost associated to a delegate call (see here or here) which may be hitting you in the "Parallel.For" callback.
Similarly there is a small performance penalty for indexing through an array using its IList interface. You might consider declaring the array arguments to "GetDistance()" explicitly.
How large is K as compared to the size of the training array? You are completely sorting the "distances" array and taking the top K, but if K is much smaller than the array size it might make sense to use a partial sort / selection algorithm, for instance by using a SortedSet and replacing the smallest element when the set size exceeds K.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.