Remove duplicates in a list of XYZ points - c#

Mylist.GroupBy(x => new{x.X, x.Y}).Select(g => g.First()).ToList<XYZ>();
The above code works fine for me. I only want to compare the points based on the round(5) of the point component.
For example x.X = 16.838974347323224 should be only compared as x.X = 16.83897 because I experienced some inaccuracy after the round 5. Any suggestions?
Solution:
Mylist.GroupBy(x => new { X = Math.Round(x.X,5), Y = Math.Round(x.Y,5) })
.Select(g => g.First()).ToList();

Using Round can create a situation where two numbers, even though incredibly close to each other, can end up being considered distinct.
Take this example:
var Mylist = new []
{
new { X = 1.0000051, Y = 1.0 },
new { X = 1.0000049, Y = 1.0 },
new { X = 1.1, Y = 1.0 },
new { X = 1.0, Y = 1.005 },
};
The first two values are very close - in fact they differ in the 6th decimal place.
By what if we run this code:
var result =
Mylist
.GroupBy(x => new
{
X = Math.Round(x.X,5, MidpointRounding.AwayFromZero),
Y = Math.Round(x.Y,5, MidpointRounding.AwayFromZero)
})
.Select(g => g.First())
.ToList();
The result is:
The rounding has allowed these two values to be kept.
The correct approach is to filter by distance. If a subsequent value is within a threshold of the previous values it should be discarded.
Here's the code that does that:
var threshold = 0.000001;
Func<double, double, double, double, double> distance
= (x0, y0, x1, y1) =>
Math.Sqrt(Math.Pow(x1 - x0, 2.0) + Math.Pow(y1 - y0, 2.0));
var result = Mylist.Skip(1).Aggregate(Mylist.Take(1).ToList(), (xys, xy) =>
{
if (xys.All(xy2 => distance(xy.X, xy.Y, xy2.X, xy2.Y) >= threshold))
{
xys.Add(xy);
}
return xys;
});
Now if we run that on the Mylist data we get this:
This is a better ideal for removing duplicates.

To do so use Math.Round:
var result = Mylist.GroupBy(x => new { X = Math.Round(x.X,5, MidpointRounding.AwayFromZero), Y = Math.Round(x.Y,5, MidpointRounding.AwayFromZero) })
.Select(g => g.First()).ToList();
However if what you want is to remove duplicates then instead of GroupBy go for one of these:
Select rounded and then Distinct:
var result = Mylist.Select(item => new XYZ { X = Math.Round(item.X,5, MidpointRounding.AwayFromZero),
Y = Math.Round(item.Y,5, MidpointRounding.AwayFromZero)})
.Distinct().ToList();
Distinct and override Equals and GetHashCode - (equals will do the rounding) - wouldn't suggest
Distinct and implement a custom IEqualityComparer:
public class RoundedXyzComparer : IEqualityComparer<XYZ>
{
public int RoundingDigits { get; set; }
public RoundedXyzComparer(int roundingDigits)
{
RoundingDigits = roundingDigits;
}
public bool Equals(XYZ x, XYZ y)
{
return Math.Round(x.X, RoundingDigits, MidpointRounding.AwayFromZero) == Math.Round(y.X, RoundingDigits, MidpointRounding.AwayFromZero) &&
Math.Round(x.Y,RoundingDigits, MidpointRounding.AwayFromZero) == Math.Round(y.Y, RoundingDigits, MidpointRounding.AwayFromZero);
}
public int GetHashCode(XYZ obj)
{
return Math.Round(obj.X, RoundingDigits, MidpointRounding.AwayFromZero).GetHashCode() ^
Math.Round(obj.Y, RoundingDigits, MidpointRounding.AwayFromZero).GetHashCode();
}
}
//Use:
myList.Distinct(new RoundedXyzComparer(5));

Related

Merge first list with second list based on standard deviation of second list C#

Given 2 datasets (which are both a sequence of standard deviations away from a number, we are looking for the overlapping sections):
var list1 = new decimal[] { 357.06, 366.88, 376.70, 386.52, 406.15 };
var list2 = new decimal[] { 370.51, 375.62, 380.72, 385.82, 390.93 };
I would like to perform a merge with items from List2 being placed closest to items of List1, within a certain range, i.e. merge List2 element within 5.10 (standard deviation) of List1 element:
357.06
366.88 => 370.51
376.70 => 375.52, 380.72
386.52 => 390.93
406.15
The idea is to cluster values from List2 and count them, in this case element with value 376.70 would have the highest significance as it has 2 close neighbors of 375.52 and 380.72 (where as 366.88 and 386.52 have only 1 match, and the remaining none within range).
Which C# math/stats libraries could be used for this (or would there be a better way to combine statistically)?
If this is more of a computer science or stats question apologies in advance will close and reopen on relevant SO site.
Assuming that list2 is sorted (if not, put Array.Sort(list2);) you can try Binary Search:
Given:
var list1 = new decimal[] { 357.06m, 366.88m, 376.70m, 386.52m, 406.15m };
var list2 = new decimal[] { 370.51m, 375.62m, 380.72m, 385.82m, 390.93m };
decimal sd = 5.10m;
Code:
// Array.Sort(list2); // Uncomment, if list2 is not sorted
List<(decimal value, decimal[] list)> result = new List<(decimal value, decimal[] list)>();
foreach (decimal value in list1) {
int leftIndex = Array.BinarySearch<decimal>(list2, value - sd);
if (leftIndex < 0)
leftIndex = -leftIndex - 1;
else // edge case
for (; leftIndex >= 1 && list1[leftIndex - 1] == value - sd; --leftIndex) ;
int rightIndex = Array.BinarySearch<decimal>(list2, value + sd);
if (rightIndex < 0)
rightIndex = -rightIndex - 1;
else // edge case
for (; rightIndex < list1.Length - 1 && list1[rightIndex + 1] == value + sd; ++rightIndex) ;
result.Add((value, list2.Skip(leftIndex).Take(rightIndex - leftIndex).ToArray()));
}
Let's have a look:
string report = string.Join(Environment.NewLine, result
.Select(item => $"{item.value} => [{string.Join(", ", item.list)}]"));
Console.Write(report);
Outcome:
357.06 => []
366.88 => [370.51]
376.70 => [375.62, 380.72]
386.52 => [385.82, 390.93]
406.15 => []
Something like this should work
var list1 = new double[] { 357.06, 366.88, 376.70, 386.52, 406.15 };
var list2 = new double[] { 370.51, 375.62, 380.72, 385.82, 390.93 };
double dev = 5.1;
var result = new Dictionary<double, List<double>>();
foreach (var l in list2) {
var diffs = list1.Select(r => new { diff = Math.Abs(r - l), r })
.Where(d => d.diff <= dev)
.MinBy(r => r.diff)
.FirstOrDefault();
if (diffs == null) {
continue;
}
List<double> list;
if (! result.TryGetValue(diffs.r, out list)) {
list = new List<double>();
result.Add(diffs.r, list);
}
list.Add(l);
}
It uses MinBy from MoreLinq, but it is easy to modify to work without it.
In fact, you don't need extra libs or something else. You can use just LINQ for this.
internal class Program
{
private static void Main(string[] args)
{
var deviation = 5.1M;
var list1 = new decimal[] { 357.06M, 366.88M, 376.70M, 386.52M, 406.15M };
var list2 = new decimal[] { 370.51M, 375.62M, 380.72M, 385.82M, 390.93M };
var result = GetDistribution(list1.ToList(), list2.ToList(), deviation);
result.ForEach(x => Console.WriteLine($"{x.BaseValue} => {string.Join(", ", x.Destribution)} [{x.Weight}]"));
Console.ReadLine();
}
private static List<Distribution> GetDistribution(List<decimal> baseList, List<decimal> distrebutedList, decimal deviation)
{
return baseList.Select(x =>
new Distribution
{
BaseValue = x,
Destribution = distrebutedList.Where(y => x - deviation < y && y < x + deviation).ToList()
}).ToList();
}
}
internal class Distribution
{
public decimal BaseValue { get; set; }
public List<decimal> Destribution { get; set; }
public int Weight => Destribution.Count;
}
I hope it was useful for you.

Convert array of doubles to list object using linq

I have the following array of coordinates:
double[] points = { 1, 2, 3, 4, 5, 6 };
Then I have the following class:
public class clsPoint
{
public double X { get; set; }
public double Y { get; set; }
}
I need to copy the points into List objects. Where the first point in the array is the X and the second point in the array is the Y. Here is what I have so far but it is not correct:
List<clsPoint> lstPoints = points
.Select(coord => new clsPoint
{
X = coord[0],
Y = coord[1]
}).ToList();
Expected Results
clsPoint Objects List (lstPoints)
X = 1 , Y = 2
X = 3 , Y = 4
X = 5 , Y = 6
Any help would be appreciated. Thanks.
You can generate a sequence of consecutive values until the half your array, then you can project using those values as index to get the pairs.
var result=Enumerable.Range(0, points.Length / 2).Select(i=>new clsPoint{X=points[2*i],Y=points[2*i+1]});
Update
This is another solution using Zip extension method and one overload of Where extension method to get the index:
var r2 = points.Where((e, i) => i % 2 == 0)
.Zip(points.Where((e, i) => i % 2 != 0), (a, b) => new clsPoint{X= a, Y= b });
I think there is probably a better way for you to compose your points prior to feeding them into your class. A simple for loop may suffice better in this situation as well.
However, in LINQ, you would first use a projection to gather the index so that you could group based on pairs and then use a second projection from the grouping to populate the class.
It looks like this
points.Select((v,i) => new {
val = v,
i = i
}).GroupBy(o => o.i%2 != 0 ? o.i-1 : o.i).Select(g => new clsPoint() {
X = g.First().val,
Y = g.Last().val
});
Using the overload of Select that receives the current index you can set a grouping rule (in this case a different id for each 2 numbers), then group by it and eventually create your new clsPoint:
double[] points = { 1, 2, 3, 4, 5, 6 };
var result = points.Select((item, index) => new { item, index = index / 2 })
.GroupBy(item => item.index, item => item.item)
.Select(group => new clsPoint { X = group.First(), Y = group.Last() })
.ToList();
Doing it with a simple for loop would look like:
List<clsPoint> result = new List<clsPoint>();
for (int i = 0; i < points.Length; i += 2)
{
result.Add(new clsPoint { X = points[i], Y = points.ElementAtOrDefault(i+1) });
}

How to group in Linq based on previous Value

I want to group a pointcloud based on 2 conditions
simple on Y so I wrote pointcloudH.GroupBy(KVP => KVP.Value.Y) where KVP is an KeyValuePair<string,System.Drawing.Point>
and now I want to group it also by X if X == (previousX + 1)
as far as I know I should us ThenBy() but what do I have to write between the brackets?
and here an example for a better illustration what I want to achieve
Sample pointcloud
(x|y) (1|1),(2|1),(4|1),(1|2),(2|3),(3|3),(4|3),(5|8),(9|10)
after step 1. it looks like this
group1 (1|1),(2|1),(4|1)
group2 (1|2)
group3 (2|3),(3|3),(4|3)
group4 (5|8)
group5 (9|10)
after step 2. it should look like this
group1 (1|1),(2|1)
group2 (4|1)
group3 (1|2)
group4 (2|3),(3|3),(4|3)
group5 (5|8)
group6 (9|10)
current code
var Hgroup = pointcloudH.OrderBy(KVP => KVP.Value.Y) // order by Y
.GroupBy(KVP => KVP.Value.Y) // groub by Y
.ThenBy(KVP => KVP.Value.X); // group by X ???
I don't think LINQ is the best tool for this kind of job, but it can be achieved. The important part is to think of the relation between your Point.X and the index of the relative Point in the Point.Y group. Once you realize you want to group them by Point.X - Index, you can do:
var Hgroup = pointcloudH.OrderBy(p => p.Y)
.GroupBy(p => p.Y)
.SelectMany(yGrp =>
yGrp.Select((p, i) => new {RelativeIndex = p.X - i, Point = p})
.GroupBy(ip => ip.RelativeIndex, ip => ip.Point)
.Select(ipGrp => ipGrp.ToList()))
.ToList();
Note that this will probably perform worst than a regular iterative algorithm. My pointcloudH is an array, but you can just change the lambda to reflect your own list. Also, remove the ToList() if you want to defer execution. This was to ease the result inspection in the debugger.
If you want to group all points in a Point.Y group regardless of their index (ie order by Point.X as well. Add ThenBy(p => p.X) after the first OrderBy clause.
Your problem cannot be solved by doing 2 separate group by clauses. I have created a little sample which should work for your problem. These are the key things that are happening in the code:
Construct 'mirror' array and insert a copy of the first item at index 0, this is used to keep track of the previous point
Create a variable that is incremented whenever a 'chain' is broken. This is whenever the next value is not equal to the previous + 1. This way we can group by an unique key per 'chain'.
class Program
{
public struct Point
{
public static Point Create(int x, int y)
{
return new Point() { X = x, Y = y };
}
public int X { get; set; }
public int Y { get; set; }
public override string ToString()
{
return string.Format("({0}|{1})", X, Y);
}
}
static void Main(string[] args)
{
//helper to avoid to much keystrokes :)
var f = new Func<int, int, Point>(Point.Create);
//compose the point array
//(1|1),(2|1),(4|1),(1|2),(2|3),(3|3),(4|3),(5|8),(9|10)
var points = new[] { f(1, 1), f(2, 1), f(4, 1), f(1, 2), f(2, 3), f(3, 3), f(4, 3), f(5, 8), f(9, 10) }.OrderBy(p => p.Y).ThenBy(p => p.X);;
//create a 'previous point' array which is a copy of the source array with a item inserted at index 0
var firstPoint = points.FirstOrDefault();
var prevPoints = new[] { f(firstPoint.X - 1, firstPoint.Y) }.Union(points);
//keep track of a counter which will be the second group by key. The counter is raised whenever the previous X was not equal
//to the current - 1
int counter = 0;
//the actual group by query
var query = from point in points.Select((x, ix) => new { current = x, prev = prevPoints.ElementAt(ix) })
group point by new { point.current.Y, prev = (point.prev.X == point.current.X - 1 ? counter : ++counter) };
//method chaining equivalent
query = points.Select((x, ix) => new { current = x, prev = prevPoints.ElementAt(ix) })
.GroupBy(point => new { point.current.Y, prev = (point.prev.X == point.current.X - 1 ? counter : ++counter) });
//print results
foreach (var item in query)
Console.WriteLine(string.Join(", ", item.Select(x=> x.current)));
Console.Read();
}
}

Finding maximum of Nth elements of a list of lists in C#

I have a list of lists in C#, where each sublist has three doubles, representing a 3D point:
{{x1, y1, z1},
{x2, y2, z2},
{x3, y3, z3}}
I want to find the 3D bounding box of this dataset, and that means finding minimum X, maximum X, minimum Y, maximum Y, etc.
With Python/Numpy, I would get it with, say, zmax = list_of_lists[:,2].max(), etc.
Is there an elegant way to do this in C#? I suspect Linq is the way to go, but I haven't understood how it works yet (if some answer includes Linq, please explain how it works, please :o)
Create your list like this, possibly use a specific class for 3D points rather than tuples if you require more functionality.
var points = new[] {
Tuple.Create(x1, y1, z1),
Tuple.Create(x2, y3, z2),
Tuple.Create(x3, y3, z3)
};
Then, like #dasblinkenlight writes, you can use linq to select Max() or Min():
var maxX = points.Select(pt => pt.Item1).Max();
Or even shorter:
var maxX = points.Max(pt => pt.Item1);
Edit: A simple class for ease of use:
class Point3D {
public double X { get; set; }
public double Y { get; set; }
public double Z { get; set; }
public Point3D(double x, double y, double z) {
this.X = x;
this.Y = y;
this.Z = z;
}
}
var points = new[] {
new Point3D(x1, y1, z1),
new Point3D(x2, y3, z2),
new Point3D(x3, y3, z3)
};
var maxX = points.Max(pt => pt.X);
Yes, you can do that in C# with LINQ, like this:
var orig = new List<List<double>>();
var maxX = orig.Select(pt => pt[0]).Max();
var maxY = orig.Select(pt => pt[1]).Max();
var maxZ = orig.Select(pt => pt[2]).Max();
The way this works in LINQ is that the list of points is traversed, for each point, the requested coordinate is selected, and then the Max is computed. There is also a Min function to get you the other corner of the bounding box.
This is somewhat suboptimal, because the list is traversed multiple times. A pair of nested loops would probably do the same thing more efficiently, while remaining just as readable:
var min = new List<double>{double.MaxValue, double.MaxValue, double.MaxValue};
var max = new List<double>{double.MinValue, double.MinValue, double.MinValue};
foreach (var point in orig) {
for (var i = 0 ; i != 3 ; i++) {
min[i] = Math.Min(min[i], point[i]);
max[i] = Math.Max(max[i], point[i]);
}
}
If this is really a IEnumerable of IEnumerables of doubles with always the same order of X, Y, Z, then you could do something like this:
var list = new []
{
new [] { 1, 2, 3 },
new [] { 4, 5, 6 },
new [] { 7, 8, 9 }
};
var maxX = list.Select(s => s.Skip(0).Take(1)).Max();
var minY = list.Select(s => s.Skip(1).Take(1)).Min();
If all elements are of the same type and the property you're interested in is named, say, Z, you can do this:
var max = list_of_points.Max(p => p.Z);
In your case, since you seem to be using a double[][], you can do this:
var max = list_of_lists.Max(p => p[0]);
So the full bounding box would be defined by two points, like this:
var topLeft = new double[]
{
list_of_lists.Min(p => p[0]),
list_of_lists.Min(p => p[1]),
list_of_lists.Min(p => p[2])
};
var bottomRight = new double[]
{
list_of_lists.Max(p => p[0]),
list_of_lists.Max(p => p[1]),
list_of_lists.Max(p => p[2])
};
However, this requires you to make 6 passes through the list (one for each coordinate for each point). You can do this for some performance improvement:
var topLeft = list_of_lists.Aggregate((s, p) => return double[] { Math.Min(s[0], p[0]), Math.Min(s[1], p[1]), Math.Min(s[2], p[2]) });
var bottomRight = list_of_lists.Aggregate((s, p) => return double[] { Math.Max(s[0], p[0]), Math.Max(s[1], p[1]), Math.Max(s[2], p[2]) });
This would make only 2 passes through the list.
NOTE: The same code can be done with List<List<double>> rather than double[][] by simply adding .ToList() after each array declaration.
Have you taken a look at the SelectMany linq operator? I think it may be helpful for your problem.

Filtering two arrays to avoid Inf/NaN values

I have two arrays of doubles of the same size, containg X and Y values for some plots.
I need to create some kind of protection against INF/NaN values. I need to find all that pairs of values (X, Y) for which both, X and Y are not INF nor NaN
If I have one array, I can do it using lambdas:
var filteredValues = someValues.Where(d=> !(double.IsNaN(d) || double.IsInfinity(d))).ToList();
Now, for two arrays I use the following loop:
List<double> filteredX=new List<double>();
List<double> filteredY=new List<double>();
for(int i=0;i<XValues.Count;i++)
{
if(!double.IsNan(XValues[i]) &&
!double.IsInfinity(XValues[i]) &&
!double.IsNan(YValues[i]) &&
!double.IsInfinity(YValues[i]) )
{
filteredX.Add(XValues[i]);
filteredY.Add(YValues[i]);
}
}
Is there a way of filtering two arrays at the same time using LINQ/lambdas, as it was done for the single array?
Unfortunately I can use only .NET 3.5.
Slight correction for Mark's original answer:
var filteredValues = XValues.Zip(YValues, (x,y) => new { x, y })
.Where(p => !(double.IsNan(p.x) || double.IsNan(p.y) ||
double.IsInfinity(p.x) || double.IsInfinity(p.y)))
.ToList();
Alternatively, you might want to make it slightly neater:
Func<double, bool> valid = z => !double.IsNan(z) && !double.IsInfinity(z);
var filteredValues = XValues.Zip(YValues, (x,y) => new { x, y })
.Where(p => valid(p.x) && valid(p.y))
.ToList();
If you then need the results back into two lists, you can do:
var filteredX = filteredValues.Select(p => p.x).ToList();
var filteredY = filteredValues.Select(p => p.y).ToList();
C# 4.0 introduces the Enumerable.Zip extension method to accomplish iterating over enumerables "in parallel" as you describe.
I haven't used it myself, but it should be something like:
var filteredValues =
XValues.Zip(YValues, (x,y) => new { X = x, Y = y})
.Where( o =>
!(double.IsNan(o.X) || double.IsNan(o.Y) || double.IsInfinity(o.X) || double.IsInfinity(o.Y)))
.ToList();
(Sorry for the funny indentation, wanted it to be more readable on SO)
OK, so you can't use .NET 4.0 and therefore can't use the Zip extension.
Or can you?
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (var eFirst = first.GetEnumerator())
using (var eSecond = second.GetEnumerator())
{
while (eFirst.MoveNext() && eSecond.MoveNext())
yield return resultSelector(eFirst.Current, eSecond.Current);
}
}
See for yourself :)
static void Main(string[] args)
{
var x = new double[] { 0.0, 1.0, 2.0, double.NaN, 4.0, 5.0 };
var y = new double[] { 0.5, 1.5, double.PositiveInfinity, 3.5, 4.5, 5.5 };
// note: using KeyValuePair<double, double> --
// you could just as easily use your own custom type
// (probably a simple struct)
var zipped = x.Zip(y, (a, b) => new KeyValuePair<double, double>(a, b))
.Where(kvp => IsValid(kvp.Key) && IsValid(kvp.Value))
.ToList();
foreach (var z in zipped)
Console.WriteLine("X: {0}, Y: {1}", z.Key, z.Value);
}
static bool IsValid(double value)
{
return !double.IsNaN(value) && !double.IsInfinity(value);
}
Output:
X: 0, Y: 0.5
X: 1, Y: 1.5
X: 4, Y: 4.5
X: 5, Y: 5.5
You can try to do this:
doubleArray1.Zip(doubleArray2, (x, y) => Tuple.Create(x, y))
.Where(tup => !double.IsNaN(tup.Item1) &&
!double.IsNaN(tup.Item2) &&
!double.IsInfinity(tup.Item1) &&
!double.IsInfinity(tup.Item1));
Alternately, you could make a method for filtering and zipping at the same time, the major benefit is that you're not limited to C# 4:
public static IEnumerable<Tuple<TOne, TTwo>> DualWhere<TOne, TTwo>(this IEnumerable<TOne> one, IEnumerable<TTwo> two, Func<TOne, TTwo, bool> predicate)
{
var oneEnumerator = one.GetEnumerator();
var twoEnumerator = two.GetEnumerator();
while (oneEnumerator.MoveNext() && twoEnumerator.MoveNext())
{
if (predicate(oneEnumerator.Current, twoEnumerator.Current))
yield return Tuple.Create(oneEnumerator.Current, twoEnumerator.Current);
}
oneEnumerator.Dispose();
twoEnumerator.Dispose();
}
Edit: The latter should work with C# 3.5.
Here's a solution that will work in C# 3 and .NET 3.5
List<double> list1 = new List<double>() { 1.2, 3.8, double.NaN, 17.8 };
List<double> list2 = new List<double>() { 9.4, double.PositiveInfinity, 10.4, 26.2 };
var query = from x in list1.Select((item, idx) => new { item, idx })
where !double.IsNaN(x.item) && !double.IsInfinity(x.item)
join y in list2.Select((item, idx) => new { item, idx })
on x.idx equals y.idx
where !double.IsNaN(y.item) && !double.IsInfinity(y.item)
select new { X = x.item, Y = y.item };
Iterating over the query would produce pairs containing 1.2 & 9.4 and 17.8 & 26.2. The middle two pairs would be discarded because one contains NaN and the other contains infinity.
foreach (var pair in query)
{
Console.WriteLine("{0}\t{1}", pair.X, pair.Y);
}

Categories