Alternatives to nested Select in Linq - c#

Working on a clustering project, I stumbled upon this, and I'm trying to figure out if there's a better solution than the one I've come up with.
PROBLEM : Given a List<Point> Points of points in R^n ( you can think at every Point as a double array fo dimension n), a double minDistance and a distance Func<Point,Point,double> dist , write a LINQ expression that returns, for each point, the set of other points in the list that are closer to him than minDistance according to dist.
My solution is the following:
var lst = Points.Select(
x => Points.Where(z => dist(x, z) < minDistance)
.ToList() )
.ToList();
So, after noticing that
Using LINQ is probably not the best idea, because you get to calculate every distance twice
The problem doesn't have much practical use
My code, even if bad looking, works
I have the following questions:
Is it possible to translate my code in query expression? and if so, how?
Is there a better way to solve this in dot notation?

The problem definition, that you want "for each point, the set of other points" makes it impossible to solve without the inner query - you could just disguise it in clever manner. If you could change your data storage policy, and don't stick to LINQ then, in general, there are many approaches to Nearest Neighbour Search problem. You could for example hold the points sorted according to their values on one axis, which can speed-up the queries for neighbours by eliminating early some candidates without full distance calculation. Here is the paper with this approach: Flexible Metric Nearest Neighbor Classification.

Because Points is a List you can take advantage of the fact that you can access each item by its index. So you can avoid comparing each item twice with something like this:
var lst =
from i in Enumerable.Range(0, Points.Length)
from j in Enumerable.Range(i + 1, Points.Length - i - 1)
where dist(Points[i], Points[j]) < minDistance
select new
{
x = Points[i], y = Points[j]
};
This will return a set composed of all points within minDistance of each other, but not exactly what the result you wanted. If you want to turn it into some kind of Lookup so you can see which points are close to a given point you can do this:
var lst =
(from i in Enumerable.Range(0, Points.Length)
from j in Enumerable.Range(i + 1, Points.Length - i - 1)
where dist(Points[i], Points[j]) < minDistance
select new { x = Points[i], y = Points[j] })
.SelectMany(pair => new[] { pair, { x = pair.y, y = pair.x })
.ToLookup(pair => pair.x, pair => pair.y);

I think you could add some bool Property to your Point class to mark it's has been browsed to prevent twice calling to dist, something like this:
public class Point {
//....
public bool IsBrowsed {get;set;}
}
var lst = Points.Select(
x => {
var list = Points.Where(z =>!z.IsBrowsed&&dist(x, z) < minDistance).ToList();
x.IsBrowsed = true;
return list;
})
.ToList();

Related

How to get biggest element in HashSet of object by field?

Suppose I have the class
public class Point {
public float x, y, z;
}
And I've created this hashset:
HashSet<Point> H;
How can I get the element of H with the biggest z? It doesn't need necessarily to use Linq.
You can use Aggregate to mimic MaxBy functionality (note that you need to check if collection has any elements first):
var maxByZ = H.Aggregate((point, point1) => point.z > point1.z ? point : point1);
When .NET 6 is out it should have built in MaxBy.
You could do this:
int maxZ = H.Max(point => point.Z);
var maxPointByZ = H.Where(point => point.Z == maxZ).FirstOrDefault();
This works by first retrieving the largest value of Z in the set:
H.Max(point2 => point2.Z) //Returns the largest value of Z in the set
And then by doing a simple where statement to get the record where Z is equal to that value. If there are multiple values, it will get the first one, so you may want to sort the enumerable in advance.

Get sample set from large dataset

I have an in memory dataset that I'm trying to get an evenly distributed sample using LINQ. From what I've seen, there isn't anything that does this out of the box, so I'm trying to come up with some kind of composition or extension that will perform the sampling.
What I'm hoping for is something that I can use like this:
var sample = dataset.Sample(100);
var smallSample = smallDataset.Sample(100);
Assert.IsTrue(dataset.Count() > 100);
Assert.IsTrue(sample.Count() == 100);
Assert.IsTrue(smallDataset.Count() < 100);
Assert.IsTrue(smallSample .Count() == smallDataset.Count());
The composition I started with, but only works some of the time is this:
var sample = dataset
.Select((v,i) => new Tuple<string, int>(v,i))
.Where(t => t.Item2 / (double)(dataset.Count() / SampleSize) % 1 != 0)
.Select(t => t.Item1);
This works when the dataset and the sample size share a common devisor and the sample size is greater than 50% of the dataset size. Or something like that.
Any help would be excellent!
Update: So I have the following non-LINQ logic that works, but I'm trying to figure out if this can be "LINQ'd" somehow.
var sample = new List<T>();
double sampleRatio = dataset.Count() / sampleSize;
for (var i = 0; i < dataset.Count(); i++)
{
if ((sample.Count() * sampleRatio) <= i)
sample.Add(dataset.Skip(i).FirstOrDefault();
}
I can't find a satisfactory LINQ solution, mainly because iterating LINQ statements are not aware of the length of the sequence they work on -- which is OK: it totally fits LINQ's deferred-execution and streaming approach. Of course it's possible to store the length in a variable and use this in a Where statement, but that's not in line with LINQ's functional (stateless) paradigm, so I always try to avoid that.
The Aggregate statement can be stateless and length-aware, but I tend to find solutions using Aggregate rather contrived and hard to read. It's nothing but a covert stateful loop; for and foreach take some more lines, but are far easier to follow.
I can offer you an extension method that does what you want:
public static IEnumerable<T> TakeProrated<T>(this IEnumerable<T> sequence, int numberOfItems)
{
var local = sequence.ToList();
var n = Math.Min(local.Count, numberOfItems);
var dist = (decimal)local.Count / n;
for (int i = 0; i < n; i++)
{
var index = (int)(Math.Ceiling(i * dist));
yield return local[index];
}
}
The idea is that the required distance between items is first calculated. Then the requested number of items is returned, each time roughly skipping this distance, sometimes more, sometimes less, but evenly distributed. Using Math.Ceiling or Math.Floor is arbitrary, they either introduce a bias toward items higher in the sequence, or lower.
I think I understand what you're looking for. From what I understand, you're looking to return only a certain quantity of entities in a dataset. As my comment to your original post asks, have you tried using the Take operator? What you're looking for is something like this.
// .Skip is optional, but you can use it with it.
// Just ensure that instead of .FirstOrDefault(), you use Take(quantity)
var sample = dataSet.Skip(amt).Take(dataSet.Count() / desiredSampleSize);

I'm trying to simplify with linq a statement that takes 2 lists of numbers and subtracts the first one from the second one

I'm trying to simplify, with linq, and hopefully make cheaper, a statement that takes 2 lists of numbers and subtracts the first one from the second one. I have something that works but I think it could be cleaner and more efficient.
double[] main = _mainPower.Select(i => i.Decibels).ToArray();
double[] reference = _referencePower.Select(i => i.Decibels).ToArray();
List<double> amplitudeList = new List<double>();
for (int i = 0; i < main.Count(); i++)
{
if (!double.IsNaN(main[i] - reference[i]))
{
amplitudeList.Add(main[i] - reference[i]);
}
}
return amplitudeList;
If I have 2 lists List1 = {8,5,3} and List2 = {5,2,1} the list returned would be {3,3,2}
I have tried
return _mainPower.Select(i => i.Decibels - _referencePower.Select(a => a.Decibels));
but it obviously does not work. Is there a way to turn my function into a nice linq query? One thing that I haven't allowed for is if the lists are 2 different sizes. If the sizes are different then the longer list should be trimmed from the end to make them the same as the smaller one.
Any help would be appreciated.
Thank you,
--EDIT--
Thanks for the help I used the post from StriplingWarrior to get what I needed.
_mainPower.Zip(_referencePower, (v1, v2) => v1.Decibels - v2.Decibels).Where(i => !double.IsNaN(i));
This should do:
return _mainPower.Zip(_referencePower,(v1, v2) => v1-v2)

calculate average function of several functions

I have several ordered List of X/Y Pairs and I want to calculate a ordered List of X/Y Pairs representing the average of these Lists.
All these Lists (including the "average list") will then be drawn onto a chart (see example picture below).
I have several problems:
The different lists don't have the same amount of values
The X and Y values can increase and decrease and increase (and so on) (see example picture below)
I need to implement this in C#, altought I guess that's not really important for the algorithm itself.
Sorry, that I can't explain my problem in a more formal or mathematical way.
EDIT: I replaced the term "function" with "List of X/Y Pairs" which is less confusing.
I would use the method Justin proposes, with one adjustment. He suggests using a mappingtable with fractional indices, though I would suggest integer indices. This might sound a little mathematical, but it's no shame to have to read the following twice(I'd have to too). Suppose the point at index i in a list of pairs A has searched for the closest points in another list B, and that closest point is at index j. To find the closest point in B to A[i+1] you should only consider points in B with an index equal to or larger than j. It will probably by j + 1, but could be j or j + 2, j + 3 etc, but never below j. Even if the point closest to A[i+1] has an index smaller than j, you still shouldn't use that point to interpolate with, since that would result in an unexpected average and graph. I'll take a moment now to create some sample code for you. I hope you see that this optimalization makes sense.
EDIT: While implementing this, I realised that j is not only bounded from below(by the method described above), but also bounded from above. When you try the distance from A[i+1] to B[j], B[j+1], B[j+2] etc, you can stop comparing when the distance A[i+1] to B[j+...] stops decreasing. There's no point in searching further in B. The same reasoning applies as when j was bounded from below: even if some point elsewhere in B would be closer, that's probably not the point you want to interpolate with. Doing so would result in an unexpected graph, probably less smooth than you'd expect. And an extra bonus of this second bound is the improved performance. I've created the following code:
IEnumerable<Tuple<double, double>> Average(List<Tuple<double, double>> A, List<Tuple<double, double>> B)
{
if (A == null || B == null || A.Any(p => p == null) || B.Any(p => p == null)) throw new ArgumentException();
Func<double, double> square = d => d * d;//squares its argument
Func<int, int, double> euclidianDistance = (a, b) => Math.Sqrt(square(A[a].Item1 - B[b].Item1) + square(A[a].Item2 - B[b].Item2));//computes the distance from A[first argument] to B[second argument]
int previousIndexInB = 0;
for (int i = 0; i < A.Count; i++)
{
double distance = euclidianDistance(i, previousIndexInB);//distance between A[i] and B[j - 1], initially
for (int j = previousIndexInB + 1; j < B.Count; j++)
{
var distance2 = euclidianDistance(i, j);//distance between A[i] and B[j]
if (distance2 < distance)//if it's closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point.
{
distance = distance2;
previousIndexInB = j;
}
else
{
break;//don't place the yield return statement here, because that could go wrong at the end of B.
}
}
yield return LinearInterpolation(A[i], B[previousIndexInB]);
}
}
Tuple<double, double> LinearInterpolation(Tuple<double, double> a, Tuple<double, double> b)
{
return new Tuple<double, double>((a.Item1 + b.Item1) / 2, (a.Item2 + b.Item2) / 2);
}
For your information, the function Average returns the same amount of interpolated points the list A contains, which is probably fine, but you should think about this for your specific application. I've added some comments in it to clarify some details, and I've described all aspects of this code in the text above. I hope it's clear, and otherwise feel free to ask questions.
SECOND EDIT: I misread and thought you had only two lists of points. I have created a generalised function of that above accepting multiple lists. It still uses only those principles explained above.
IEnumerable<Tuple<double, double>> Average(List<List<Tuple<double, double>>> data)
{
if (data == null || data.Count < 2 || data.Any(list => list == null || list.Any(p => p == null))) throw new ArgumentException();
Func<double, double> square = d => d * d;
Func<Tuple<double, double>, Tuple<double, double>, double> euclidianDistance = (a, b) => Math.Sqrt(square(a.Item1 - b.Item1) + square(a.Item2 - b.Item2));
var firstList = data[0];
for (int i = 0; i < firstList.Count; i++)
{
int[] previousIndices = new int[data.Count];//the indices of points which are closest to the previous point firstList[i - 1].
//(or zero if i == 0). This is kept track of per list, except the first list.
var closests = new Tuple<double, double>[data.Count];//an array of points used for caching, of which the average will be yielded.
closests[0] = firstList[i];
for (int listIndex = 1; listIndex < data.Count; listIndex++)
{
var list = data[listIndex];
double distance = euclidianDistance(firstList[i], list[previousIndices[listIndex]]);
for (int j = previousIndices[listIndex] + 1; j < list.Count; j++)
{
var distance2 = euclidianDistance(firstList[i], list[j]);
if (distance2 < distance)//if it's closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point.
{
distance = distance2;
previousIndices[listIndex] = j;
}
else
{
break;
}
}
closests[listIndex] = list[previousIndices[listIndex]];
}
yield return new Tuple<double, double>(closests.Select(p => p.Item1).Average(), closests.Select(p => p.Item2).Average());
}
}
Actually that I did the specific case for 2 lists separately might have been a good thing: it is easily explained and offers a step before understanding the generalised version. Furthermore, the square root could be taken out, since it doesn't change the order of the distances when sorted, just the lengths.
THIRD EDIT: In the comments it became clear there might be a bug. I think there are none, aside from the mentioned small bug, which shouldn't make any difference except for at the end of the graphs. As a proof that it actually works, this is the result of it(the dotted line is the average):
I'll use a metaphor of your functions being cars racing down a curvy racetrack, where you want to extract the center-line of the track given the cars' positions. Each car's position can be described as a function of time:
p1(t) = (x1(t), y1(t))
p2(t) = (x2(t), y2(t))
p3(t) = (x3(t), y3(t))
The crucial problem is that the cars are racing at different speeds, which means that p1(10) could be twice as far down the race track as p2(10). If you took a naive average of these two points, and there was a sharp curve in the track between the cars, the average may be far from the track.
If you could just transform your functions to no longer be a function of time, but a function of the distance along the track, then you would be able to do what you want.
One way you could do this would be to choose the slowest car (i.e., the one with the greatest number of samples). Then, for each sample of the slowest car's position, look at all of the other cars' paths, find the two closest points, and choose the point on the interpolated line which is closest to the slowest car's position. Then average these points together. Once you do this for all of the slow car's samples, you have an average path.
I'm assuming that all of the cars start and end in roughly the same places; if any of the cars just race a small portion of the track, you will need to add some more logic to detect that.
A possible improvement (for both performance and accuracy), is to keep track of the most recent sample you are using for each car and the speed of each car (the relative sampling rate). For your slowest car, it would be a simple map: 1 => 1, 2 => 2, 3 => 3, ... For the other cars, though, it could be more like: 1 => 0.3, 2 => 0.7, 3 => 1.6 (fractional values are due to interpolation). The speed would be the inverse of the change in sample number (e.g., the slow car would have speed 1, and the other car would have speed 1/(1.6-0.7)=1.11). You could then ensure that you don't accidentally backtrack on any of the cars. You could also improve the calculation speed because you don't have to search through the whole set of all points on each path; instead, you can assume that the next sample will be somewhere close to the current sample plus 1/speed.
As these are not y=f(x) functions, are they perhaps something like (x,y)=f(t)?
If so, you could interpolate along t, and calculate avg(x) and avg(y) for each t.
EDIT This of course assumes that t can be made available to your code - so that you have an ordered list of T/X/Y triples.
There are several ways this can be done. One is to combine all of your data into one single set of points, and do a best-fit curve through the combined set.
you have e.g. 2 "functions" with
fc1 = { {1,0.3} {2, 0.5} {3, 0.1} }
fc1 = { {1,0.1} {2, 0.8} {3, 0.4} }
You want the arithmetic mean (slang: "average") of the two functions. To do this you just calculate the pointwise arithmetic mean:
fc3 = { {1, (0.3+0.1)/2} ... }
Optimization:
If you have large numbers of points you should first convert your "ordered List of X/Y Pairs" into a Matrix OR at least store the points column-wise like so:
{0.3, 0.1}, {0.5, 0.8}, {0.1, 0.4}

How to sort a list of 2D points using C#?

Sort a list of points in descending order according to X and then Y.
list.Sort((a,b)=>{
int result = a.X.CompareTo(b.X);
if(result==0) result = a.Y.CompareTo(b.Y);
return result;
});
An alternative way to Marc GravellĀ“s answer (which will sort the list itself) where you get an IEnumerable<T> which can be made a list with .ToList() is the LINQ syntax:
var ordered = from v in yourList
orderby v.X, v.Y
select v;
var orderedList = ordered.ToList();
But unless you don't want to actually sort the list itself or you only have, let's say an IEnumerable, List.Sort would be better.
List<Point> sortedList = MyList.Sort(
delegate(Point p1, Point p2)
{
int r = p1.x.CompareTo(p2.x);
if(r.Equals(0)) return p1.y.CompareTo(p2.y);
else return r;
}
);
If the commenters are indeed correct, and you're searching for a solution to a homework problem, I suspect the real assignment is teaching you how to sort integer values. So I'll just help get you started.
Hint: The Point structure has two properties that you may find useful, X and Y, which return the coordinate values for those two axes, respectively.
There are many methods to sort a list.
For example, to sort your list from smallest (x, y) to biggest you can try this algorithm:
Compare the first item in the list with the second
If the second point is smaller than the first (x1 > x2 || (x1 == x2 && y1 > y2)) then swap them around
Compare the second point with the third in the same way, and so on until you get to the end of the list
Go back to the beginning of the list and run the comparisons again until the last but one element
Repeat step 4 but stopping one element earlier each time until you've got no elements left to sort
This is an inefficient algorithm, but it will get the job done.
For better algorithms, have a look at http://www.sorting-algorithms.com/

Categories