Linq ForEach vs All Performance review

Linq ForEach vs All Performance review - c#

For most of the time I am using All(and returns true) instead of ForEach. Is it a good practice to use ALL instead of ForEach all the time(in case of IEnumerable), I understand All could be run on IEnumerable whereas foreach runs only on list
var wells = GlobalDataModel.WellList.Where(u => u.RefProjectName == project.OldProjectName);
if (wells.Any())
{
wells.All(u =>
{
u.RefProjectName = project.ProjectName;
return true;
});
}
var wellsList = GlobalDataModel.WellList.Where(u => u.RefProjectName == project.OldProjectName).ToList();
wellsList.ForEach(u => u.RefProjectName = project.ProjectName);

Nope, You're abusing the All method. Take a look at documentation
Determines whether all elements of a sequence satisfy a condition.
It should be used to determine all elements are true/false based on some condition, It is not meant to used to produce side effects.
List.ForEach is meant to be used for side effects. You may use it if you already have List<T> upfront. Calling ToList and creating new List just for the sake of List.ForEach is not worth. It adds another O(n) operation.
In short don't use All for side effects, List.ForEach is barely acceptable when you have list already. Recommended way is use loop of your choice, nothing can be better than that.
Ericlippert has something to say about ForEach, note that it is removed in ModernUI apps, may be removed in desktop version of .net too.

If you're checking whether all elements satisfy some condition, use All.
But, if you need to perform some operation on each element, don't use All or any other LINQ predicate (Where, Select, Any, etc.), as they are meant to be used in a purely functional way. You can iterate over the elements using a foreach .. in loop or if you prefer with the List<T>.ForEach method. However as you mention it is part of List<T> and makes your code slightly harder to change (e.g. from list to another enumerable).
See here for a discussion about the "right way" to use LINQ.
For example, you can write your code like this:
foreach (var u in GlobalDataModel.WellList
.Where(u => u.RefProjectName == project.OldProjectName))
{
u.RefProjectName = project.ProjectName;
}
It's more obvious that side effects are being done. Also, this will only iterate once over the sequence, skipping the elements that don't satisfy the condition.

Related

Using LINQ Where result in foreach: hidden if statement, double foreach?

foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}

Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.

To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.

It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.

Will the compiler optimise a comparison against IEnumerable<T>.Count()?

As a naive tip, you often hear to use IEnumerable.Any() because then the entire enumerable does not necessarily need to be traversed.
I just wrote a little segment of code that tries to see if the Enumerable contains a single item or multiple.
if (reportInfo.OrebodyAndPits.SelectMany(ob => ob.Pits).Count() > 1)
{
ws.Cells[row, col++].Value = "Pits";
}
else
{
ws.Cells[row, col++].Value = "Pit";
}
That made me wonder, will the comparison be compiled into a form that is smart enough to return false as soon as it enumerates past the first item?
If not, is there a way to write a linq extension method that would do that?
(Please note, I'm not terribly interested in the performance impact of this piece of code. I'm mainly curious.)

No, it will not. Your code will count all the items in the sequence. This is because LINQ statements are not optimized by the compiler, what you write is what you get.
An equivelent, more efficient way of checking whether a sequence contains more than 1 item is:
reportInfo.OrebodyAndPits.SelectMany(ob => ob.Pits).Skip(1).Any();
This will check, after skipping the first item, whether there are any items left.

If you want to know how something works why no look at the source code?
Here's the Any() method: https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/AnyAll.cs#L20
Here is the Count() method: https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Count.cs#L12
The compiler cannot make an optimisation like you describe. It asks for the count and gets a number then it compares that number with what's in your conditional statement.
It does however try and make some sort of optimisation. As you can see from the Count() method it attempts to see if the IEnumerable already supports a Count property and uses that because it is faster than counting all the elements again. If not available it has to move through the entire thing and count each individually.
If you want to write a LINQ method (which is just an extension method on IEnumerable<T>) that determines if there are at least two in an IEnumerable then that should be easy enough. Something like this:
e.g.
public static bool AtLeastTwo<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull(nameof(source));
}
using (IEnumerator<TSource> e = source.GetEnumerator())
{
e.MoveNext(); // Move past the first one
return e.MoveNext(); // true if there is at least a second element.
}
}

C# Paradigms: Side effects on Lists

I am trying to evolve my understanding of side effects and how they should be controlled and applied.
In the following List of flights, I want to set a property of each flight satisfying a conditions:
IEnumerable<FlightResults> fResults = getResultsFromProvider();
//Set all non-stop flights description
fResults.Where(flight => flight.NonStop)
.Select(flight => flight.Description = "Fly Direct!");
In this expression, I have a side effect on my list. From my limited knowledge I know for ex. "LINQ is used for queries only" and "There are only a few operations to lists and assigning or setting values is not one of them" and "lists should be immutable".
What is wrong with my LINQ statement above and how should it be changed?
Where can I get more information on the fundamental paradigms on the scenario I have described above?

You have two ways of achieving it the LINQ way:
explicit foreach loop
foreach(Flight f in fResults.Where(flight => flight.NonStop))
f.Description = "Fly Direct!";
with a ForEach operator, made for the side effects:
fResults.Where(flight => flight.NonStop)
.ForEach(flight => flight.Description = "Fly Direct!");
The first way is quite heavy for such a simple task, the second way should only be used with very short bodies.
Now, you might ask yourself why there isn't a ForEach operator in the LINQ stack. It's quite simple - LINQ is supposed to be a functional way of expressing query operations, which especially means that none of the operators are supposed to have side effects. The design team decided against adding a ForEach operator to the stack because the only usage is its side effect.
A usual implementation of the ForEach operator would be like this:
public static class EnumerableExtension
{
public static void ForEach<T> (this IEnumerable<T> source, Action<T> action)
{
if(source == null)
throw new ArgumentNullException("source");
foreach(T obj in source)
action(obj);
}
}

One problem with that approach is that it won't work at all. The query is lazy, which means that it won't execute the code in the Select until you actually read something from the query, and you never do that.
You could come around that by adding .ToList() at the end of the query, but the code is still using side effects and throwing away the actual result. You should use the result to do the update instead:
//Set all non-stop flights description
foreach (var flight in fResults.Where(flight => flight.NonStop)) {
flight.Description = "Fly Direct!";
}

Your LINQ code does not "directly" violate the guidelines you mention, because you are not modifying the list itself; you are just modifying some property on the contents of the list.
However, the main objection that drives these guidelines remains: you should not be modifying data with LINQ (also, you are abusing Select to perform your side effects).
Not modifying any data can be justified pretty easily. Consider this snippet:
fResults.Where(flight => flight.NonStop)
Do you see where this is modifying the flight properties? Neither will many maintenance programmers, since they will stop reading after the Where -- the code that follows is obviously free of side effects since this is a query, right?
[Nitpick: Certainly, seeing a query whose return value is not retained is a dead giveaway that the query does have side effects or that the code should have been removed; in any case, that "something is wrong". But it's so much easier to say that when there are only 2 lines of code to look at instead of pages upon pages.]
As a correct solution, I would recommend this:
foreach (var x in fResults.Where(flight => flight.NonStop))
{
x.Description = "Fly Direct!";
}
Pretty easy to both write and read.

There is nothing wrong with it perse, except that you need to iterate it somehow, like calling Count() on it.
From a 'style' perspective it is not good. One would not expect an iterator to mutate a list value/property.
IMO the following would be better:
foreach (var x in fResults.Where(flight => flight.NonStop))
{
x.Description = "Fly Direct!";
}
The intent is much clearer to the reader or maintainer of the code.

You should break that up into two blocks of code, one for the retrieval and one for setting the value:
var nonStopFlights = fResults.Where(f => f.NonStop);
foreach(var flight in nonStopFlights)
flight.Description = "Fly Direct!";
Or, if you really hate the look of foreach you could try:
var nonStopFlights = fResults.Where(f => f.NonStop).ToList();
// ForEach is a method on List that is acceptable to make modifications inside.
nonStopFlights.ForEach(f => f.Description = "Fly Direct!");

I like using foreach when I'm actually changing something. Something like
foreach (var flight in fResults.Where(f => f.NonStop))
{
flight.Description = "Fly Direct!";
}
and so does Eric Lippert in his article about why LINQ does not have a ForEach helper method.
But we can go a bit deeper here. I am philosophically opposed to providing such a method, for two reasons.
The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects.

Change foreach to lambda

I need a help with simpify this statement. How to change foreach to lambda
var r = mp.Call(c => c.GetDataset()); // returns IEnumerable of dataset
foreach (DatasetUserAppsUsage item in r)
{
datasetUserAppsUsage.Merge(item.AppsUsageSummary);
}

lambdas and loops are orthogonal. It is inappropriate to try to change them to brute-force one into the other. That code is fine. Leave it.
You can get .ForEach implementations, but it isn't going to make the code better (in fact, it will be harder to follow, i.e. worse), and it won't be more efficient (in fact, it will be marginally slower, i.e. worse).

You can do the following
r.ToList().ForEach(item => datasetUserAppsUsage.Merge(item.AppsUsageSummary);

Personally, I don't think I would merge this into a single lambda. You could do:
mp.Call(c => c.GetDataset()).ToList().ForEach(item => datasetUserAppsUsage.Merge(item.AppsUsageSummary));
However, I would avoid it, as it's purposefully causing side effects, which really violates the expectations of LINQ, and is not very clear in its intent.

I agree that lambdas purpose it different, but sometimes I use this trick:
mp.Call(c => c.GetDataset())
.All(a => { datasetUserAppsUsage.Merge(a.AppsUsageSummary); return true; });
The trick is to use All() and return true to avoid break.
And do not change the underlying collection when inside enumerator of course :)

Should I use a simple foreach or Linq when collecting data out of a collection

For a simple case, where class foo has a member i, and I have a collection of foos, say IEnumerable<Foo> foos, and I want to end up with a collection of foo's member i, say List<TypeOfi> result.
Question: is it preferable to use a foreach (Option 1 below) or some form of Linq (Option 2 below) or some other method. Or, perhaps, it it not even worth concerning myself with (just choose my personal preference).
Option 1:
foreach (Foo foo in foos)
result.Add(foo.i);
Option 2:
result.AddRange(foos.Select(foo => foo.i));
To me, Option 2 looks cleaner, but I'm wondering if Linq is too heavy handed for something that can achieved with such a simple foreach loop.
Looking for all opinions and suggestions.

I prefer the second option over the first. However, unless there is a reason to pre-create the List<T> and use AddRange, I would avoid it. Personally, I would use:
List<TypeOfi> results = foos.Select(f => f.i).ToList();
In addition, I would not necessarily even use ToList() unless you actually need a true List<T>, or need to force the execution to be immediate instead of deferred. If you just need the collection of "i" values to iterate, I would simply use:
var results = foos.Select(f => f.i);

I definitely prefer the second. It is far more declarative and easier to understand (to me, at least).
LINQ is here to make our lives more declarative so I would hardly consider it heavy handed even in cases as seemingly "trivial" as this.
As Reed said, though, you could improve the quality by using:
var result = foos.Select(f => f.i).ToList();
As long as there is no data already in the result collection.

LINQ isn't heavy handed in any way, both the foreach and the linq code do about the same, the foreach in the second case is just hidden away.
It really is just a matter of preference, at least concerning linq to objects. If your source collection is a linq to entities query or something different, it is a complete different case - the second case would put the query into the database which is much more effective. In this simple case, the difference probably won't be that much, but if you throw in a Where operator or others into it and make the query non-trivial, the linq query will most likely have better/faster performance.

I think you could also just do
foos.Select(foo => foo.i).ToList<TypeOfi>();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq ForEach vs All Performance review - c#

Related

Using LINQ Where result in foreach: hidden if statement, double foreach?

Will the compiler optimise a comparison against IEnumerable<T>.Count()?

C# Paradigms: Side effects on Lists

Change foreach to lambda

Should I use a simple foreach or Linq when collecting data out of a collection

Categories

Resources