Optimization: How should i Optimize the Linq Concat of Collections? C#

Optimization: How should i Optimize the Linq Concat of Collections? C# - c#

is there any way i can Optimize this:
public static IEnumerable<IEnumerable<int>> GenerateCombinedPatterns
(IEnumerable<IEnumerable<int>> patterns1,
IEnumerable<IEnumerable<int>> patterns2)
{
return patterns1
.Join(patterns2, p1key => 1, p2key => 1, (p1, p2) => p1.Concat(p2))
.Where(r => r.Sum() <= stockLen)
.AsParallel()
as IEnumerable<IEnumerable<int>>;
}

If you're looking for every combination, use SelectMany instead, usually performed with multiple "from" clauses:
return from p1 in patterns1
from p2 in patterns2
let combination = p1.Concat(p2)
where combination.Sum() <= stockLen
select combination;
That's without any parallelism though... depending on the expected collections, I'd probably just parallelize at one level, e.g.
return from p1 in patterns1.AsParallel()
from p2 in patterns2
let combination = p1.Concat(p2)
where combination.Sum() <= stockLen
select combination;
Note that there's no guarantee as to the order in which the results come out with the above - you'd need to tweak it if you wanted the original ordering.

No point in making the query parallel at the very end. Update: Jon was right, my initial solution was incorrect and turns out my corrected solution was essentially the same as his.
public static IEnumerable<IEnumerable<int>> GenerateCombinedPatterns
(IEnumerable<IEnumerable<int>> patterns1,
IEnumerable<IEnumerable<int>> patterns2)
{
var parallel1 = patterns1.AsParallel();
return parallel1.SelectMany(p1 => patterns2.Select(p2 => p1.Concat(p2)))
.Where(r => r.Sum() <= stockLen);
}

Related

C# linq - Order by alphabetical, then by certain value [duplicate]

Can anyone explain what the difference is between:
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.OrderBy(sort2 => sort2.InvoiceOwner.FirstName)
.OrderBy(sort3 => sort3.InvoiceID);
and
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.ThenBy(sort2 => sort2.InvoiceOwner.FirstName)
.ThenBy(sort3 => sort3.InvoiceID);
Which is the correct approach if I wish to order by 3 items of data?

You should definitely use ThenBy rather than multiple OrderBy calls.
I would suggest this:
tmp = invoices.InvoiceCollection
.OrderBy(o => o.InvoiceOwner.LastName)
.ThenBy(o => o.InvoiceOwner.FirstName)
.ThenBy(o => o.InvoiceID);
Note how you can use the same name each time. This is also equivalent to:
tmp = from o in invoices.InvoiceCollection
orderby o.InvoiceOwner.LastName,
o.InvoiceOwner.FirstName,
o.InvoiceID
select o;
If you call OrderBy multiple times, it will effectively reorder the sequence completely three times... so the final call will effectively be the dominant one. You can (in LINQ to Objects) write
foo.OrderBy(x).OrderBy(y).OrderBy(z)
which would be equivalent to
foo.OrderBy(z).ThenBy(y).ThenBy(x)
as the sort order is stable, but you absolutely shouldn't:
It's hard to read
It doesn't perform well (because it reorders the whole sequence)
It may well not work in other providers (e.g. LINQ to SQL)
It's basically not how OrderBy was designed to be used.
The point of OrderBy is to provide the "most important" ordering projection; then use ThenBy (repeatedly) to specify secondary, tertiary etc ordering projections.
Effectively, think of it this way: OrderBy(...).ThenBy(...).ThenBy(...) allows you to build a single composite comparison for any two objects, and then sort the sequence once using that composite comparison. That's almost certainly what you want.

I found this distinction annoying in trying to build queries in a generic manner, so I made a little helper to produce OrderBy/ThenBy in the proper order, for as many sorts as you like.
public class EFSortHelper
{
public static EFSortHelper<TModel> Create<TModel>(IQueryable<T> query)
{
return new EFSortHelper<TModel>(query);
}
}
public class EFSortHelper<TModel> : EFSortHelper
{
protected IQueryable<TModel> unsorted;
protected IOrderedQueryable<TModel> sorted;
public EFSortHelper(IQueryable<TModel> unsorted)
{
this.unsorted = unsorted;
}
public void SortBy<TCol>(Expression<Func<TModel, TCol>> sort, bool isDesc = false)
{
if (sorted == null)
{
sorted = isDesc ? unsorted.OrderByDescending(sort) : unsorted.OrderBy(sort);
unsorted = null;
}
else
{
sorted = isDesc ? sorted.ThenByDescending(sort) : sorted.ThenBy(sort)
}
}
public IOrderedQueryable<TModel> Sorted
{
get
{
return sorted;
}
}
}
There are a lot of ways you might use this depending on your use case, but if you were for example passed a list of sort columns and directions as strings and bools, you could loop over them and use them in a switch like:
var query = db.People.AsNoTracking();
var sortHelper = EFSortHelper.Create(query);
foreach(var sort in sorts)
{
switch(sort.ColumnName)
{
case "Id":
sortHelper.SortBy(p => p.Id, sort.IsDesc);
break;
case "Name":
sortHelper.SortBy(p => p.Name, sort.IsDesc);
break;
// etc
}
}
var sortedQuery = sortHelper.Sorted;
The result in sortedQuery is sorted in the desired order, instead of resorting over and over as the other answer here cautions.

if you want to sort more than one field then go for ThenBy:
like this
list.OrderBy(personLast => person.LastName)
.ThenBy(personFirst => person.FirstName)

Yes, you should never use multiple OrderBy if you are playing with multiple keys.
ThenBy is safer bet since it will perform after OrderBy.

Dynamic LINQ discards applied OrderBy sortings except the latest applied [duplicate]

Can anyone explain what the difference is between:
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.OrderBy(sort2 => sort2.InvoiceOwner.FirstName)
.OrderBy(sort3 => sort3.InvoiceID);
and
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.ThenBy(sort2 => sort2.InvoiceOwner.FirstName)
.ThenBy(sort3 => sort3.InvoiceID);
Which is the correct approach if I wish to order by 3 items of data?

You should definitely use ThenBy rather than multiple OrderBy calls.
I would suggest this:
tmp = invoices.InvoiceCollection
.OrderBy(o => o.InvoiceOwner.LastName)
.ThenBy(o => o.InvoiceOwner.FirstName)
.ThenBy(o => o.InvoiceID);
Note how you can use the same name each time. This is also equivalent to:
tmp = from o in invoices.InvoiceCollection
orderby o.InvoiceOwner.LastName,
o.InvoiceOwner.FirstName,
o.InvoiceID
select o;
If you call OrderBy multiple times, it will effectively reorder the sequence completely three times... so the final call will effectively be the dominant one. You can (in LINQ to Objects) write
foo.OrderBy(x).OrderBy(y).OrderBy(z)
which would be equivalent to
foo.OrderBy(z).ThenBy(y).ThenBy(x)
as the sort order is stable, but you absolutely shouldn't:
It's hard to read
It doesn't perform well (because it reorders the whole sequence)
It may well not work in other providers (e.g. LINQ to SQL)
It's basically not how OrderBy was designed to be used.
The point of OrderBy is to provide the "most important" ordering projection; then use ThenBy (repeatedly) to specify secondary, tertiary etc ordering projections.
Effectively, think of it this way: OrderBy(...).ThenBy(...).ThenBy(...) allows you to build a single composite comparison for any two objects, and then sort the sequence once using that composite comparison. That's almost certainly what you want.

I found this distinction annoying in trying to build queries in a generic manner, so I made a little helper to produce OrderBy/ThenBy in the proper order, for as many sorts as you like.
public class EFSortHelper
{
public static EFSortHelper<TModel> Create<TModel>(IQueryable<T> query)
{
return new EFSortHelper<TModel>(query);
}
}
public class EFSortHelper<TModel> : EFSortHelper
{
protected IQueryable<TModel> unsorted;
protected IOrderedQueryable<TModel> sorted;
public EFSortHelper(IQueryable<TModel> unsorted)
{
this.unsorted = unsorted;
}
public void SortBy<TCol>(Expression<Func<TModel, TCol>> sort, bool isDesc = false)
{
if (sorted == null)
{
sorted = isDesc ? unsorted.OrderByDescending(sort) : unsorted.OrderBy(sort);
unsorted = null;
}
else
{
sorted = isDesc ? sorted.ThenByDescending(sort) : sorted.ThenBy(sort)
}
}
public IOrderedQueryable<TModel> Sorted
{
get
{
return sorted;
}
}
}
There are a lot of ways you might use this depending on your use case, but if you were for example passed a list of sort columns and directions as strings and bools, you could loop over them and use them in a switch like:
var query = db.People.AsNoTracking();
var sortHelper = EFSortHelper.Create(query);
foreach(var sort in sorts)
{
switch(sort.ColumnName)
{
case "Id":
sortHelper.SortBy(p => p.Id, sort.IsDesc);
break;
case "Name":
sortHelper.SortBy(p => p.Name, sort.IsDesc);
break;
// etc
}
}
var sortedQuery = sortHelper.Sorted;
The result in sortedQuery is sorted in the desired order, instead of resorting over and over as the other answer here cautions.

if you want to sort more than one field then go for ThenBy:
like this
list.OrderBy(personLast => person.LastName)
.ThenBy(personFirst => person.FirstName)

Yes, you should never use multiple OrderBy if you are playing with multiple keys.
ThenBy is safer bet since it will perform after OrderBy.

Lambda Max and Max and Max

Quick and probably easy Lambda question:
I have a restaurant with reviews.
I want to query for the one with the:
Max(AverageRating)
And the Max(ReviewCount)
And the Max(NewestReviewDate)
And the Min(DistanceAway)
Something like this:
var _Result = AllRestaurants
.Max(x => x.AverageRating)
.AndMax(x => x.ReviewCount)
.AndMax(x => x.NewestReviewDate)
.AndMin(x => x.DistanceAway);
Now, I know that is pseudo code. But it describes it perfectly!
Of course, in multiple statements, this is simple.
Just wondering if this is possible in one statement without killing the readability.
Thank you in advance. I know some of you love the query questions!

You can't have multiple maxes or mins, that doesn't make sense.
You'll need some kind of heuristic like:
.Max(x => x.AverageRating * x.ReviewCount - x.DaysSinceLastReview - x.DistanceAway)

Perhaps this would do?
var bestRestaurant = AllRestaurants
.OrderByDescending(r => r.AverageRating)
.ThenByDescending(r => r.ReviewCount)
.ThenByDescending(r => r.NewestReviewCount)
.ThenBy(r => r.DistanceAway)
.FirstOrDefault();
You'd need to change the order of the statements to reflect which is the most important.

An alternative to having some weighted heuristic is to order by AverageRating, then ReviewCount, then ...
Something like this should work:
var _Result = AllRestaurants
.OrderByDescending(x => x.AverageRating)
.ThenByDescending(x => x.ReviewCount)
.ThenByDescending(x => x.NewestReviewDate)
.ThenByDescending(x => x.DistanceAway);
// using *Descending so you get the higer-valued ones first

Consider something like this...
List<RestaurantRecord> _Restaurants;
public RestaurantRecord Best()
{
return _Restaurants.Where(
x =>
x.AverageRating >= _BestRating &&
x.ReviewCount >= _MinReviews &&
x.Distance <= _MaxDistance)
.GetFirstOrDefault();
}
That being said, using a lambda in this case will have maintainability consequences down the road. It would be a good idea to refactor this, such that if other criteria appear in the future (e.g.: smartphone access? Cuisine type?), your app can be more easily modified to adapt to those.
On that note, a slightly better implementation might be something like:
public RestaurantRecord Best()
{
IQueryable temp = _Restaurants.Clone();
temp = temp.Where( x => x.AverageRating >= _BestRating );
temp = temp.Where( x => x.ReviewCount >= _MinReviews );
// ...snip...
return temp.GetFirstOrDefault();
}
I hope this sets you on the right track. :)

If I'm understanding your question, I think the best approach is going to be writing individual statements as you mention...
var HighestRating = AllRestaurants.Max(x => x.AverageRating);
var HighestReviewCount = AllRestaurants.Max(x => x.ReviewCount);
var LatestReviewDate = AllRestaurants.Max(x => x.NewestReviewDate);
var ShortestDistanceAway = AllRestaurants.Min(x => x.DistanceAway);
Retrieving various maxes and mins from a single Linq query would get pretty messy and I'm not sure there'd be any advantage with efficiency, either.

Return Modal Average in LINQ (Mode)

I am not sure if CopyMost is the correct term to use here, but it's the term my client used ("CopyMost Data Protocol"). Sounds like he wants the mode? I have a set of data:
Increment Value
.02 1
.04 1
.06 1
.08 2
.10 2
I need to return which Value occurs the most "CopyMost". In this case, the value is 1. Right now I had planned on writing an Extension Method for IEnumerable to do this for integer values. Is there something built into Linq that already does this easily? Or is it best for me to write an extension method that would look something like this
records.CopyMost(x => x.Value);
EDIT
Looks like I am looking for the modal average. I've provided an updated answer that allows for a tiebreaker condition. It's meant to be used like this, and is generic.
records.CopyMost(x => x.Value, x => x == 0);
In this case x.Value would be an int, and if the the count of 0s was the same as the counts of 1s and 3s, it would tiebreak on 0.

Well, here's one option:
var query = (from item in data
group 1 by item.Value into g
orderby g.Count() descending
select g.Key).First();
Basically we're using GroupBy to group by the value - but all we're interested in for each group is the size of the group and the key (which is the original value). We sort the groups by size, and take the first element (the one with the most elements).
Does that help?

Jon beat me to it, but the term you're looking for is Modal Average.
Edit:
If I'm right In thinking that it's modal average you need then the following should do the trick:
var i = (from t in data
group t by t.Value into aggr
orderby aggr.Count() descending
select aggr.Key).First();

This method has been updated several times in my code over the years. It's become a very important method, and is much different than it use to be. I wanted to provide the most up to date version in case anyone was looking to add CopyMost or a Modal Average as a linq extension.
One thing I did not think I would need was a tiebreaker of some sort. I have now overloaded the method to include a tiebreaker.
public static K CopyMost<T, K>(this IEnumerable<T> records, Func<T, K> propertySelector, Func<K, bool> tieBreaker)
{
var grouped = records.GroupBy(x => propertySelector(x)).Select(x => new { Group = x, Count = x.Count() });
var maxCount = grouped.Max(x => x.Count);
var subGroup = grouped.Where(x => x.Count == maxCount);
if (subGroup.Count() == 1)
return subGroup.Single().Group.Key;
else
return subGroup.Where(x => tieBreaker(x.Group.Key)).Single().Group.Key;
}
The above assumes the user enters a legitimate tiebreaker condition. You may want to check and see if the tiebreaker returns a valid value, and if not, throw an exception. And here's my normal method.
public static K CopyMost<T, K>(this IEnumerable<T> records, Func<T, K> propertySelector)
{
return records.GroupBy(x => propertySelector(x)).OrderByDescending(x => x.Count()).Select(x => x.Key).First();
}

Lambda expression to find difference

With the following data
string[] data = { "a", "a", "b" };
I'd very much like to find duplicates and get this result:
a
I tried the following code
var a = data.Distinct().ToList();
var b = a.Except(a).ToList();
obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.

When runtime is no problem, you could use
var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();
Good old O(n^n) =)
Edit: Now for a better solution. =)
If you define a new extension method like
static class Extensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
HashSet<T> hash = new HashSet<T>();
foreach (T item in input)
{
if (!hash.Contains(item))
{
hash.Add(item);
}
else
{
yield return item;
}
}
}
}
you can use
var duplicates = data.Duplicates().Distinct().ToArray();

Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.
from g in (from x in data group x by x)
where g.Count() > 1
select g.Key;
--OR if you prefer extension methods
data.GroupBy(x => x)
.Where(x => x.Count() > 1)
.Select(x => x.Key)
Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.
Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:
var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1
select x.Key;
// distinct
from x in g
where x.Count() == 1
select x.Key;
When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.

Sort the data, iterate through it and remember the last item. When the current item is the same as the last, its a duplicate. This can be easily implemented either iteratively or using a lambda expression in O(n*log(n)) time.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Optimization: How should i Optimize the Linq Concat of Collections? C# - c#

Related

C# linq - Order by alphabetical, then by certain value [duplicate]

Dynamic LINQ discards applied OrderBy sortings except the latest applied [duplicate]

Lambda Max and Max and Max

Return Modal Average in LINQ (Mode)

Lambda expression to find difference

Categories

Resources