Linq to find pair of points with longest length? - c#

I have the following code:
foreach (Tuple<Point, Point> pair in pointsCollection)
{
var points = new List<Point>()
{
pair.Value1,
pair.Value2
};
}
Within this foreach, I would like to be able to determine which pair of points has the most significant length between the coordinates for each point within the pair.
So, let's say that points are made up of the following pairs:
(1) var points = new List<Point>()
{
new Point(0,100),
new Point(100,100)
};
(2) var points = new List<Point>()
{
new Point(150,100),
new Point(200,100)
};
So I have two sets of pairs, mentioned above. They both will plot a horizontal line. I am interested in knowing what the best approach would be to find the pair of points that have the greatest distance between, them, whether it is vertically or horizontally. In the two examples above, the first pair of points has a difference of 100 between the X coordinate, so that would be the point with the most significant difference. But if I have a collection of pairs of points, where some points will plot a vertical line, some points will plot a horizontal line, what would be the best approach for retrieving the pair from the set of points whose difference, again vertically or horizontally, is the greatest among all of the points in the collection?
Thanks!
Chris

Use OrderBy to create an ordering based on your criteria, then select the first one. In this case order by the maximum absolute difference between the horizontal and vertical components in descending order.
EDIT: Actually, I think you should be doing this on the Tuples themselves, right? I'll work on adapting the example to that.
First, let's add an extension for Tuple<Point,Point> to calculate it's length.
public static class TupleExtensions
{
public static double Length( this Tuple<Point,Point> tuple )
{
var first = tuple.Item1;
var second = tuple.Item2;
double deltaX = first.X - second.X;
double deltaY = first.y - second.Y;
return Math.Sqrt( deltaX * deltaX + deltaY * deltaY );
}
}
Now we can order the tuples by their length
var max = pointCollection.OrderByDescending( t => t.Length() )
.FirstOrDefault();
Note: it is faster to just iterate over the collection and keep track of the maximum rather than sorting/selecting with LINQ.
Tuple<Point,Point> max = null;
foreach (var tuple in pointCollection)
{
if (max == null || tuple.Length() > max.Length())
{
max = tuple;
}
}
Obviously, this could be refactored to an IEnumerable extension if you used it in more than one place.

You'll need a function probably using the pythagorean theorem to calculate the distances
a^2 + b^2 = c^2
Where a would be the difference in Point.X, b would be the difference in Point.Y, and c would be your distance. And once that function has been written, then you can go to LINQ and order on the results.
Here's what I did. (Note: I do not have C# 4, so it's not apples to apples
private double GetDistance(Point a, Point b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), 2) + Math.Pow(Math.Abs(a.Y - b.Y), 2), 0.5);
}
You can turn that into an anonymous method or Func if you prefer, obviously.
var query = pointlistCollection.OrderByDescending(pair => GetDistance(pair[0], pair[1])).First();
Where pointlistCollection is a List<List<Point>>, each inner list having two items. Quick example, but it works.
List<List<Point>> pointlistCollection
= new List<List<Point>>()
{
new List<Point>() { new Point(0,0), new Point(3,4)},
new List<Point>() { new Point(5,5), new Point (3,7)}
};
***Here is my GetDistance function in Func form.
Func<Point, Point, double> getDistance
= (a, b)
=> Math.Pow(Math.Pow(Math.Abs(a.X - b.X), 2) + Math.Pow(Math.Abs(a.Y - b.Y), 2), 0.5);
var query = pointlistCollection.OrderByDescending(pair => getDistance(pair[0], pair[1])).First();

As commented above: Don't sort the list in order to get a maximum.
public static double Norm(Point x, Point y)
{
return Math.Sqrt(Math.Pow(x.X - y.X, 2) + Math.Pow(x.Y - y.Y, 2));
}
Max() needs only O(n) instead of O(n*log n)
pointsCollection.Max(t => Norm(t.Item1, t.Item2));

tvanfosson's answer is good, however I would like to suggest a slight improvement : you don't actually need to sort the collection to find the max, you just have to enumerate the collection and keep track of the maximum value. Since it's a very common scenario, I wrote an extension method to handle it :
public static class EnumerableExtensions
{
public static T WithMax<T, TValue>(this IEnumerable<T> source, Func<T, TValue> selector)
{
var max = default(TValue);
var withMax = default(T);
bool first = true;
foreach (var item in source)
{
var value = selector(item);
int compare = Comparer<TValue>.Default.Compare(value, max);
if (compare > 0 || first)
{
max = value;
withMax = item;
}
first = false;
}
return withMax;
}
}
You can then do something like that :
Tuple<Point, Point> max = pointCollection.WithMax(t => t.Length());

Related

C# - Comparing items between 2 lists

I've got two lists with the following identical fields (but different content):
TriangleID
Perimeter(in pixels)
My task is to extract the couples of triangles that have their perimeter difference smaller than a fixed threshold.
I'd like to do it with Linq.
It's not Linq, but collection's size (N) that matters. In the worst case (all equal triangles) you have to return all possible pairs of triangles as a solution; we have
N * N
pairs. If you have N ~ 1e6 triangles you are going to obtain as many as trillion (1e12) pairs as an answer. It's too much for the modern personal computers (in case of supercomputer, however, you can try solving the problem).
Let's assume that you don't have the worst case and you expect to obtain at most ~ N pairs. You can do it like this (C# pseudocode):
// Sort triangles by their perimeters
firstList.Sort((t1, t2) => t1.Perimeter.CompareTo(t2.Perimeter));
foreach (left in secondList) {
//TODO: you have to implement BinarySearchIndex
int from = firstList.BinarySearchIndex(left.Period - threshould);
int to = firstList.BinarySearchIndex(left.Period + threshould);
// Scan all triangles within borders
for (int i = leftBorder; i <= rightBorder; ++i) {
triangle right = firstList[i];
// return pair if right and left are different triangles
if (right.Id != left.Id)
yield return Pair(left, right);
}
}
Time complexity is
O(N * log(N)) + /* sorting */ +
O(N * log(N)) /* foreach (N) *
Binary search (log N) *
for (1 - not the worst case) */ =
O(N * log(N))
I'm assuming it's ok to return a Triangle and associate a collection of nearby other Triangles, instead of returning a list of pairs.
The idea is to sort both lists and then iterate through them. The first list will iterate over every single item, and compare the perimeter to items in the second list. But not every item in the second list needs to be checked. Since both lists are sorted, you can iterate through the second list until you no longer find perimeters within a cutoff. The other time saving method is advancing the starting index of the second list based on the difference to the perimeter in the first list; this is left as an exercise for the reader.
public class Triangle
{
public int TriangleId {get; set;}
public int Perimeter {get; set;}
}
// Returns a dictionary, that each triangle has an associated list of other triangles
// with a perimeter within a specified distance. This list may be empty.
public Dictionary<Triangle, List<Triangle>> NearbyPerimeter(List<Triangle> primary, List<Triangle> compareList, int maxDistance)
{
// sort ~ O(n log n)
// The sort is required to make an orderly advance through both lists, otherwise
// every element needs to be compared to every other element.
var sorteda = primary.OrderBy(x => x.Perimeter);
// Call ToList to allow indexing with []
var sortedb = compareList.OrderBy(x => x.Perimeter).ToList();
var results = new Dictionary<Triangle, List<Triangle>>();
int minCompareIndex = 0;
int compareCount = compareList.Count;
// ~ O(n)
foreach (var tprime in sorteda)
{
var neighbors = new List<Triangle>();
// Add logic to advance minCompareIndex based on
// which is larger, tprime.Perimeter or sortedb[minCompareIndex].Perimeter
int i = minCompareIndex;
var foundMatch = false;
// Until the missing logic above is added, this is O(n) x O(n) so ~ O(n^2)
while (i < compareCount)
{
var second = sortedb[i];
if (Math.Abs(tprime.Perimeter - second.Perimeter) < maxDistance)
{
neighbors.Add(second);
foundMatch = true;
}
else if (foundMatch)
{
break;
}
i++;
}
results.Add(tprime, neighbors);
}
return results;
}
You can do it with Linq but with a small artefact - basically what you need is a Cartesian product of both collections and a filter on the differences. The Cartesian product can be obtained using Join and an always true condition.
The code below should do the trick (I'm assuming the lists contain a class called Triangle; if this is not the case adjust the code to your needs):
var results = list1.Join(list2,
_ => true,
_ => true,
(t1, t2) => new { Triangle1 = t1, Triangle2 = t2})
.Where(pair = > Math.Abs(pair.Triangle1.Perimeter - pair.Triangle2.Perimeter) < threshold)
.Select(pair => new{/*…*/});

I have a list with over a million objects in it, trying to find the fastest way to search through it

I have a list that stores well over a million objects within it. I need to look through the list and update the objects found through the below query as efficiently as possible.
I was thinking of using a Dictionary or HashSet, but I'm relatively new to C# and could not figure out how to implement these other two approaches. My current code is simply a LINQ statement searching through an IList.
public IList<LandObject> landObjects = new List<LandObject>();
var lObjsToUpdate = from obj in landObjects
where
obj.position.x >= land.x - size.x &&
obj.position.x <= land.x + size.x &&
obj.position.y >= land.y - size.y &&
obj.position.y <= land.y + size.y
select obj;
foreach(var item in lObjsToUpdate)
{
//do what I need to do with records I've found
}
Could anyone be so kind as to suggest how I might approach this efficiently?
The real answer should involve performance tests and comparisons, and depends on your runtime environment (memory vs. cpu, etc).
The first thing I would try, since land and size are constant, you can maintain a simple HashSet<LandObject> of objects that fit the criteria (in addition to a list or set of all objects or just all other objects). Every time a new object is added, check if it fits the criteria and if so - add it to that set of objects. I don't know how good HashSet is at avoiding collisions when working with object references, but you can try to measure it.
BTW, this related question discussing memory limits of HashSet<int> might interest you. With .NET < 4.5 your HashSet should be ok up to several million items. From what I understand, with some configuration .NET 4.5 removes the limitation of 2gb max object size and you'll be able to go crazy, assuming you have a 64-bit machine.
One thing that will probably help with that many iterations is do the calculations once and use different variables inside your query. Also it should help some to increase the range by 1 on each end and eliminate checking for equals:
public IList<LandObject> landObjects = new List<LandObject>();
float xmax = land.x + size.x + 1;
float xmin = land.x - size.x - 1;
float ymax = land.y + size.y + 1;
float ymin = land.y - size.y - 1;
var lObjsToUpdate = from obj in landObjects
where
obj.position.x > xmin &&
obj.position.x < xmax &&
obj.position.y > ymin &&
obj.position.y < ymax
select obj;
Ideally you need a way to partition elements so that you don't have to test every single one to find the ones that fit and the ones that should be thrown out. How you partition will depend on how dense the items are - it might be as simple as partitioning on the integer portion of the X coordinate for instance, or by some suitable scaled value of that coordinate.
Given a method (let's call it Partition for now) that takes an X coordinate and returns a partition value for it, you can filter on the X coordinate fairly quickly as a first-pass to reduce the total number of nodes you need to check. You might need to play with the partition function a little to get the right distribution though.
For example, say that you have floating-point coordinates in the range -100 < X <= 100, with your 1,000,000+ objects distributed fairly uniformly across that range. That would divide the list into 200 partitions of (on average) 5000 entries if partitioned on integer values of X. That means that for every integer step in the X dimension of your search range you only have ~5,000 entries to test.
Here's some code:
public interface IPosition2F
{
float X { get; }
float Y { get; }
}
public class CoordMap<T> where T : IPosition2F
{
SortedDictionary<int, List<T>> map = new SortedDictionary<int,List<T>>();
readonly Func<float, int> xPartition = (x) => (int)Math.Floor(x);
public void Add(T entry)
{
int xpart = xPartition(entry.X);
List<T> col;
if (!map.TryGetValue(xpart, out col))
{
col = new List<T>();
map[xpart] = col;
}
col.Add(entry);
}
public T[] ExtractRange(float left, float top, float right, float bottom)
{
var rngLeft = xPartition(left) - 1;
var rngRight = xPartition(right) + 1;
var cols =
from keyval in map
where keyval.Key >= rngLeft && keyval.Key <= rngRight
select keyval.Value;
var cells =
from cell in cols.SelectMany(c => c)
where cell.X >= left && cell.X <= right &&
cell.Y >= top && cell.Y <= bottom
select cell;
return cells.ToArray();
}
public CoordMap()
{ }
// Create instance with custom partition function
public CoordMap(Func<float, int> partfunc)
{
xPartition = partfunc;
}
}
That will partition on the X coordinate, reducing your final search space. If you wanted to take it a step further you could also partition on the Y coordinate... I'll leave that as an exercise for the reader :)
If your parition function is very finely grained and could result in a large number of partitions, it might be useful to add a ColumnRange function similar to:
public IEnumerable<List<T>> ColumnRange(int left, int right)
{
using (var mapenum = map.GetEnumerator())
{
bool finished = mapenum.MoveNext();
while (!finished && mapenum.Current.Key < left)
finished = mapenum.MoveNext();
while (!finished && mapenum.Current.Key <= right)
{
yield return mapenum.Current.Value;
finished = mapenum.MoveNext();
}
}
}
The ExtractRange method can then use that like so:
public T[] ExtractRange(float left, float top, float right, float bottom)
{
var rngLeft = xPartition(left) - 1;
var rngRight = xPartition(right) + 1;
var cells =
from cell in ColumnRange(rngLeft, rngRight).SelectMany(c => c)
where cell.X >= left && cell.X <= right &&
cell.Y >= top && cell.Y <= bottom
select cell;
return cells.ToArray();
}
I used SortedDictionary for convenience, and because it makes it possible to do an ExtractRange method that is reasonably quick. There are other container types that are possibly better suited to the task.

Alternatives to nested Select in Linq

Working on a clustering project, I stumbled upon this, and I'm trying to figure out if there's a better solution than the one I've come up with.
PROBLEM : Given a List<Point> Points of points in R^n ( you can think at every Point as a double array fo dimension n), a double minDistance and a distance Func<Point,Point,double> dist , write a LINQ expression that returns, for each point, the set of other points in the list that are closer to him than minDistance according to dist.
My solution is the following:
var lst = Points.Select(
x => Points.Where(z => dist(x, z) < minDistance)
.ToList() )
.ToList();
So, after noticing that
Using LINQ is probably not the best idea, because you get to calculate every distance twice
The problem doesn't have much practical use
My code, even if bad looking, works
I have the following questions:
Is it possible to translate my code in query expression? and if so, how?
Is there a better way to solve this in dot notation?
The problem definition, that you want "for each point, the set of other points" makes it impossible to solve without the inner query - you could just disguise it in clever manner. If you could change your data storage policy, and don't stick to LINQ then, in general, there are many approaches to Nearest Neighbour Search problem. You could for example hold the points sorted according to their values on one axis, which can speed-up the queries for neighbours by eliminating early some candidates without full distance calculation. Here is the paper with this approach: Flexible Metric Nearest Neighbor Classification.
Because Points is a List you can take advantage of the fact that you can access each item by its index. So you can avoid comparing each item twice with something like this:
var lst =
from i in Enumerable.Range(0, Points.Length)
from j in Enumerable.Range(i + 1, Points.Length - i - 1)
where dist(Points[i], Points[j]) < minDistance
select new
{
x = Points[i], y = Points[j]
};
This will return a set composed of all points within minDistance of each other, but not exactly what the result you wanted. If you want to turn it into some kind of Lookup so you can see which points are close to a given point you can do this:
var lst =
(from i in Enumerable.Range(0, Points.Length)
from j in Enumerable.Range(i + 1, Points.Length - i - 1)
where dist(Points[i], Points[j]) < minDistance
select new { x = Points[i], y = Points[j] })
.SelectMany(pair => new[] { pair, { x = pair.y, y = pair.x })
.ToLookup(pair => pair.x, pair => pair.y);
I think you could add some bool Property to your Point class to mark it's has been browsed to prevent twice calling to dist, something like this:
public class Point {
//....
public bool IsBrowsed {get;set;}
}
var lst = Points.Select(
x => {
var list = Points.Where(z =>!z.IsBrowsed&&dist(x, z) < minDistance).ToList();
x.IsBrowsed = true;
return list;
})
.ToList();

Multidimensional array, elements accessed by vector

Is there any multidimensional array/collection/whatever datatype in .Net, elements of which can be accessed by vector (to vary number of dimensions easily)? Like this (C#):
var array = new Smth<double>(capacity: new int[] {xCap, yCap, zCap});
array[new int[] {x, y, z}] = 10.0;
To clarify: there is no need to explain how can I write such datatype manually.
Upodate:
I mean varying before creation, not after.
// 3D array
var array = new Smth<double>(capacity: new int[] {xCap, yCap, zCap});
array[new int[] {x, y, z}] = 10.0;
// 6D array
var array = new Smth<double>(capacity: new int[] {xCap, yCap, zCap, tCap, vCap, mCap});
array[new int[] {x, y, z, t, v, m}] = 10.0;
Although there are no off-the-shelf collections like that, you can easily emulate them using a Dictionary<int[],double> and a custom IEqualityComparerer<int[]>, like this:
class ArrayEq : IEqualityComparerer<int[]> {
public bool Equals(int[] a, int[] b) {
return a.SequenceEquals(b);
}
public int GetHashCode(int[] a) {
return a.Aggregate(0, (p, v) => 31*p + v);
}
}
With this equality comparer in hand, you can do this:
// The number of dimensions does not matter: if you pass a different number
// of dimensions, nothing bad is going to happen.
IDictionary<int[],double> array = new Dictionary<int[],double>(new ArrayEq());
array[new[] {1,2,3}] = 4.567;
array[new[] {1,2,-3}] = 7.654; // Negative indexes are OK
double x = array[new[] {1,2,3}]; // Get 4.567 back
If you need to have a certain capacity and a specific number of dimensions, you can modify the ArrayEq to be more strict at validating the data.
If you knew the number of dimensions at compile-time, you could use one of the Tuple<...> classes instead of arrays for potentially better performance. You could also define extension methods on multi-dimensional, say, double[,,,], arrays, to take vectors of indexes. Neither of these two approaches offers the same flexibility, though (which is a common trade-off -- better performance can often be gained by reducing flexibility).
EDIT: If you need to pre-allocate the storage and avoid storing your indexes, you could implement a multi-dimensional array yourself - like this:
class MultiD<T> {
private readonly T[] data;
private readonly int[] mul;
public MultiD(int[] dim) {
// Add some validation here:
// - Make sure dim has at least one dimension
// - Make sure that all dim's elements are positive
var size = dim.Aggregate(1, (p, v) => p * v);
data = new T[size];
mul = new int[dim.Length];
mul[0] = 1;
for (int i = 1; i < mul.Length; i++) {
mul[i] = mul[i - 1] * dim[i - 1];
}
}
private int GetIndex(IEnumerable<int> ind) {
return ind.Zip(mul, (a, b) => a*b).Sum();
}
public T this[int[] index] {
get { return data[GetIndex(index)]; }
set { data[GetIndex(index)] = value; }
}
}
This is a straightforward implementation of row-major indexing scheme that uses generics.

What is the best way in linq to calculate the percentage from a list?

I have a list of 1s and 0s and I have to now calculate the percent meaning if 1 he achieved it else he doesn't. So e.g -
{1,1,0,0,0}
So for e.g If List has 5 items and he got 2 ones then his percent is 40%. Is there a function or way in LINQ I could do it easily maybe in one line ? I am sure LINQ experts have a suave way of doing it ?
What about
var list = new List<int>{1,1,0,0,0};
var percentage = ((double)list.Sum())/list.Count*100;
or if you want to get the percentage of a specific element
var percentage = ((double)list.Count(i=>i==1))/list.Count*100;
EDIT
Note BrokenGlass's solution and use the Average extension method for the first case as in
var percentage = list.Average() * 100;
In this special case you can also use Average() :
var list = new List<int> {1,1,0,0,0};
double percent = list.Average() * 100;
If you're working with any ICollection<T> (such as List<T>) the Count property will probably be O(1); but in the more general case of any sequence the Count() extension method is going to be O(N), making it less than ideal. Thus for the most general case you might consider something like this which counts elements matching a specified predicate and all elements in one go:
public static double Percent<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
int total = 0;
int count = 0;
foreach (T item in source)
{
++count;
if (predicate(item))
{
total += 1;
}
}
return (100.0 * total) / count;
}
Then you'd just do:
var list = new List<int> { 1, 1, 0, 0, 0 };
double percent = list.Percent(i => i == 1);
Output:
40
Best way to do it:
var percentage = ((double)list.Count(i=>i==1))/list.Count*100;
or
var percentage = ((double)list.Count(i=>i <= yourValueHere))/list.Count*100;
If You
want to do it in one line
don't want to maintain an extension method
can't take advantage of list.Sum() because your list data isn't 1s and 0s
you can do something like this:
percentAchieved = (int)
((double)(from MyClass myClass
in myList
where MyClass.SomeProperty == "SomeValue"
select myClass).ToList().Count /
(double)myList.Count *
100.0
);

Categories