Sometimes Resharper warns about:
Possible multiple enumeration of IEnumerable
There's an SO question on how to handle this issue, and the ReSharper site also explains things here. It has some sample code that tells you to do this instead:
IEnumerable<string> names = GetNames().ToList();
My question is about this specific suggestion: won't this still result in enumerating through the collection twice in the 2 for-each loops?
GetNames() returns an IEnumerable. So if you store that result:
IEnumerable foo = GetNames();
Then every time you enumerate foo, the GetNames() method is called again (not literally, I can't find a link that properly explains the details, but see IEnumerable.GetEnumerator()).
Resharper sees this, and suggests you to store the result of enumerating GetNames() in a local variable, for example by materializing it in a list:
IEnumerable fooEnumerated = GetNames().ToList();
This will make sure that the GetNames() result is only enumerated once, as long as you refer to fooEnumerated.
This does matter because you usually want to enumerate only once, for example when GetNames() performs a (slow) database call.
Because you materialized the results in a list, it doesn't matter anymore that you enumerate fooEnumerated twice; you'll be iterating over an in-memory list twice.
I found this to have the best and easiest way to understand multiple enumerations.
C# LINQ: Possible Multiple Enumeration of IEnumerable
https://helloacm.com/c-linq-possible-multiple-enumeration-of-ienumerable-resharper/
GetNames() is not called twice. The implementation of IEnumerable.GetEnumerator() is called each time you want to enumerate the collection with foreach. If within the IEnumerable.GetEnumerator() some expensive calculation is made this might be a reason to consider.
Yes, you'll be enumerating it twice with no doubt. but the point is if GetNames() returns a lazy linq query which is very expensive to compute then it will compute twice without a call to ToList() or ToArray().
Just because a method returns IEnumerable doesn't mean there will be deferred execution.
E.g.
IEnumerable<string> GetNames()
{
Console.WriteLine("Yolo");
return new string[] { "Fred", "Wilma", "Betty", "Barney" };
}
var names = GetNames(); // Yolo prints out here! and only here!
foreach(name in names)
{
// Some code...
}
foreach(name in names)
{
// Some code...
}
Back to the question, if:
a. There is deferred execution (e.g. LINQ - .Where(), .Select(), etc.): then the method returns a "promise" that knows how to iterate over the collection. So when calling .ToList() this iteration happens and we store the list in memory.
b. There is no deferred execution (e.g. method returns a List): then assuming GetNames returns a list, it's basically like doing a .ToList() on that list
var names = GetNames().ToList();
// 1 2 3
Yolo Prints out
List is returned
ReturnedList.ToList() is called
PS, I left the following comment on Resharper's documentation
Hi,
Can you please make it clear in the documentation that this'd only be
an issue if GetNames() implements deferred execution?
For example, if GetNames() uses yield under the hood or implements a
deferred execution approach like most LINQ statements for example
(.Select(), .Where(), etc.)
Otherwise, if under the hood GetNames() is not returning an
IEnumerable that implements defered execution, then there is no
performance or data integrity issues here. E.g. if GetNames returns
List
Related
I need to take the first element of IEnumerable, but without iteration. I used First(), but it can cause some bugs because it iterates. I know I can do it with enum, but how can I do it?
IEnumerable<T> is what the name says, an enumerable thing. The only method it provides is IEnumerator<T> GetEnumerator(). Every extension method will enumerate the IEnumerable<T> to some extent.
If lazy evaluation/multiple enumeration is problematic (I've had that with result sets from the database, that have been evaluated after the connection has been disposed, see here), consider enumerating once, e.g. by converting the IEnumerable<T> to a List<T> with IEnumerable<T>.ToList().
Remarks: If enumerating the enumerable causes errors, your design is flawed. Consider using another interface.
you can try with FirstOrDefault.it stands for first element
int[] numbers = { };
int first = numbers.FirstOrDefault();
Console.WriteLine(first);
Sometimes Resharper warns about:
Possible multiple enumeration of IEnumerable
There's an SO question on how to handle this issue, and the ReSharper site also explains things here. It has some sample code that tells you to do this instead:
IEnumerable<string> names = GetNames().ToList();
My question is about this specific suggestion: won't this still result in enumerating through the collection twice in the 2 for-each loops?
GetNames() returns an IEnumerable. So if you store that result:
IEnumerable foo = GetNames();
Then every time you enumerate foo, the GetNames() method is called again (not literally, I can't find a link that properly explains the details, but see IEnumerable.GetEnumerator()).
Resharper sees this, and suggests you to store the result of enumerating GetNames() in a local variable, for example by materializing it in a list:
IEnumerable fooEnumerated = GetNames().ToList();
This will make sure that the GetNames() result is only enumerated once, as long as you refer to fooEnumerated.
This does matter because you usually want to enumerate only once, for example when GetNames() performs a (slow) database call.
Because you materialized the results in a list, it doesn't matter anymore that you enumerate fooEnumerated twice; you'll be iterating over an in-memory list twice.
I found this to have the best and easiest way to understand multiple enumerations.
C# LINQ: Possible Multiple Enumeration of IEnumerable
https://helloacm.com/c-linq-possible-multiple-enumeration-of-ienumerable-resharper/
GetNames() is not called twice. The implementation of IEnumerable.GetEnumerator() is called each time you want to enumerate the collection with foreach. If within the IEnumerable.GetEnumerator() some expensive calculation is made this might be a reason to consider.
Yes, you'll be enumerating it twice with no doubt. but the point is if GetNames() returns a lazy linq query which is very expensive to compute then it will compute twice without a call to ToList() or ToArray().
Just because a method returns IEnumerable doesn't mean there will be deferred execution.
E.g.
IEnumerable<string> GetNames()
{
Console.WriteLine("Yolo");
return new string[] { "Fred", "Wilma", "Betty", "Barney" };
}
var names = GetNames(); // Yolo prints out here! and only here!
foreach(name in names)
{
// Some code...
}
foreach(name in names)
{
// Some code...
}
Back to the question, if:
a. There is deferred execution (e.g. LINQ - .Where(), .Select(), etc.): then the method returns a "promise" that knows how to iterate over the collection. So when calling .ToList() this iteration happens and we store the list in memory.
b. There is no deferred execution (e.g. method returns a List): then assuming GetNames returns a list, it's basically like doing a .ToList() on that list
var names = GetNames().ToList();
// 1 2 3
Yolo Prints out
List is returned
ReturnedList.ToList() is called
PS, I left the following comment on Resharper's documentation
Hi,
Can you please make it clear in the documentation that this'd only be
an issue if GetNames() implements deferred execution?
For example, if GetNames() uses yield under the hood or implements a
deferred execution approach like most LINQ statements for example
(.Select(), .Where(), etc.)
Otherwise, if under the hood GetNames() is not returning an
IEnumerable that implements defered execution, then there is no
performance or data integrity issues here. E.g. if GetNames returns
List
Is it required or not to use ToList() after Select() in this code:
var names = someStorage.GetItems().Select(x => x.Name).ToList();
The Enumerable.ToList method will cause the population of data, if you do not call data wont be fetched and it will be a query.
The ToList(IEnumerable) method forces immediate
query evaluation and returns a List that contains the query
results. You can append this method to your query in order to obtain a
cached copy of the query results, MSDN.
It completely depends on what your code does subsequently. The ToList() method causes the query that you defined by using Select() to run immediately against the datastore. Without it, its execution would be delayed until you access the names variable for the first time.
The other aspect is that, if you don't use ToList(), the query will be run against the datastore each time you use the names variable - not just once as is the case with ToList(). So it also heavily depends on how often you use the names variable (If you use it only once (e.g. in a loop), then there is no difference, otherwise ToList() is much more efficient.
It depends on your assignment variable if you assigning to list then you need to convert.
If you do not call ToList it will be a IEnumerable<TSource> which is the enumerator, which supports a simple iteration over a collection of a specified type.
ToList converts the source sequence into a list. Some points to note:
The signature specifies List, not just IList. Of course it
could return a subclass of List, but there seems little point.
It uses immediate execution - nothing is deferred here
The parameter (source) musn't be null
It's optimized for the case when source implements ICollection
It always creates a new, independent list.
The last two points are worth a bit more discussion. Firstly, the optimization for ICollection isn't documented, but it makes a lot of sense:
List stores its data in an array internally
ICollection exposes a Count property so the List can create an
array of exactly the right size to start with
ICollection exposes a CopyTo method so that the List can copy
all the elements into the newly created array in bulk
Source to refer
I have an ArrayList ids containing String objects that are IDs, and another ArrayList objs containing objects which have a string ID field. Right now I have code, to find which ids don't have a match in objs, which looks like this:
var missing = new List<string>();
foreach (MyObj obj in objs)
{
if (!ids.Contains(obj.ID))
{
missing.Add(obj.ID);
}
}
This works fine. But I rewrote it to this an exercise to better "think in LINQ":
var missing = objs.Cast<MyObj>().Select(x => x.ID).Except(ids.Cast<string>());
I expected this LINQ to be slower than the foreach + Contains approach (especially due to the Cast calls), but the LINQ runs significantly faster. What is the LINQ approach doing differently that gives the performance benefit?
LINQ Except uses HashSet internally, which has O(1) Contains method performance, when it's O(n) for ArrayList. That's why it's faster.
But as Tim pointed in his comment, your Except approach does not really produce any results. It just defines a query. The query is executed as soon as you need results. And it may be executed more than once. You should add ToList() call to get List<T> explicitly:
var missing = objs.Cast<MyObj>().Select(x => x.ID).Except(ids.Cast<string>()).ToList();
By the way, why are you using ArrayList instead of generic List<T>?
Except uses a HashSet<T> (or something similar) to efficiently find what object are the same, while your code uses the less-efficient List<T>.Contains (or similar) method.
I am using some of the LINQ select stuff to create some collections, which return IEnumerable<T>.
In my case I need a List<T>, so I am passing the result to List<T>'s constructor to create one.
I am wondering about the overhead of doing this. The items in my collections are usually in the millions, so I need to consider this.
I assume, if the IEnumerable<T> contains ValueTypes, it's the worst performance.
Am I right? What about Ref Types? Either way there is also the cost of calling, List<T>.Add a million times, right?
Any way to solve this? Like can I "overload" methods like LINQ Select using extension methods)?
No, there's no particular penalty for the element type being value types, assuming you're using IEnumerable<T> instead of IEnumerable. You won't get any boxing going on.
If you actually know the size of the result beforehand (which the result of Select probably won't) you might want to consider creating the list with that size of buffer, then using AddRange to add the values. Otherwise the list will have to resize its buffer every time it fills it.
For instance, instead of doing:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(foo => foo.Name);
List<string> queryList = new List<string>(query);
you might do:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(x => x.Name);
List<string> queryList = new List<string>(foo.Length);
queryList.AddRange(query);
You know that calling Select will produce a sequence of the same length as the original query source, but nothing in the execution environment has that information as far as I'm aware.
It would be best to avoid the need for a list. If you can keep your caller using IEnumerable<T>, you will save yourself some headaches.
LINQ's ToList() will take your enumerable, and just construct a new List<T> directly from it, using the List<T>(IEnumerable<T>) constructor. This will be the same as making the list yourself, performance wise (although LINQ does a null check, as well).
If you're adding the elements yourself, use the AddRange method instead of the Add. ToList() is very similar to AddRange (since it's using the constructor which takes IEnumerable<T>), which typically will be your best bet, performance wise, in this case.
Generally speaking, a method returning IEnumerable doesn't have to evaluate any of the items before the item is actually needed. So, theoretically, when you return an IEnumerable none of you items need to exist at that time.
So creating a list means that you will really need to evaluate items, get them and place them somewhere in memory (at least their references). There is nothing that can be done about this - if you really need to have a list.
A number of other responders have already provided ideas for how to improve the performance of copying an IEnumerable<T> into a List<T> - I don't think that much can be added on that front.
However, based on what you have described you need to do with the results, and the fact that you get rid of the list when you're done (which I presume means that the intermediate results are not interesting) - you may want to consider whether you really need to materialize a List<T>.
Rather than creating a List<T> and operating on the contents of that list - consider writing a lazy extension method for IEnumerable<T> that performs the same processing logic. I've done this myself in a number of cases, and writing such logic in C# is not so bad when using the [yield return][1] syntax supported by the compiler.
This approach works well if all you're trying to do is visit each item in the results and collection some information from it. Often, what you need to do is just visit each element in the collection on demand, do some processing with it, and then move on. This approach is generally more scalable and performant that creating a copy of the collection just to iterate over it.
Now, this advice may not work for you for other reasons, but it's worth considering as an alternative to finding the most efficient way to materialize a very large list.
Don't pass an IEnumerable to the List constructor. IEnumerable has a ToList() method, which can't possibly do worse than that, and has nicer syntax (IMHO).
That said, that only changes the answer to your question to "it depends" - in particular, it depends on what the IEnumerable actually is behind the scenes. If it happens to be a List already, then ToList will effectively be free, of course will go much faster than if it were another type. It's still not super-fast.
The best way to solve this, of course, is to try to figure out how to do your processing on an IEnumerable rather than a List. That may not be possible.
Edit: Some people in the comments are debating whether or not ToList() will actually be any faster when called on a List than if not, and whether ToList() will be any faster than the list constructor. At this point, speculating is getting pointless, so here's some code:
using System;
using System.Linq;
using System.Collections.Generic;
public static class ToListTest
{
public static int Main(string[] args)
{
List<int> intlist = new List<int>();
for (int i = 0; i < 1000000; i++)
intlist.Add(i);
IEnumerable<int> intenum = intlist;
for (int i = 0; i < 1000; i++)
{
List<int> foo = intenum.ToList();
}
return 0;
}
}
Running this code with an IEnumerable that's really a List goes about 6-10 times faster than if I replace it with a LinkedList or Stack (on my pokey 2.4 GHz P4, using Mono 1.2.6). Conceivably this could be due to some unfortunate interaction between ToList() and the particular implementations of LinkedList or Stack's enumerations, but at least the point remains: speed will depend on the underlying type of the IEnumerable. That said, even with a List as the source, it still takes 6 seconds for me to make 1000 ToList() calls, so it's far from free.
The next question is whether ToList() is any more intelligent than the List constructor. The answer to that turns out to be no: the List constructor is just as fast as ToList(). In hindsight, Jon Skeet's reasoning makes sense - I was just forgetting that ToList() was an extension method. I still (much) prefer ToList() syntactically, but there's no performance reason to use it.
So the short version is that the best answer is still "don't convert to a List if you can avoid it". Barring that, actual performance will depend drastically on what the IEnumerable actually is, but at best it'll be sluggish, as opposed to glacial. I've amended my original answer to reflect this.
From reading the various comments and the question I get the following requirements
for a collection of data you need to run through that collection, filter out some objects and then perform some transformation on the remaining objects. If thats the case you can do something like this:
var result = from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item);
and if you need to the perform more filtering and transformation you can nest your LINQ queries like
var result = from filteredItem in (from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item))
where filteredItem.SomePropertyAvailableAfterFirstTransformation == "new"
select SecondTransfomation(filteredItem);