When are collections enumerated (IEnumerable) - c#

Recently, I ran into a strange problem where I had a method generate an IEnumerable collection of objects. This method contained four yield return statements that returned four objects. I assigned the result to a variable results using the var keyword.
var result = GenerateCollection().ToList();
This effectively meant: List<MyType> result = GenerateCollection().
I made a simple for loop over the elements of this collection. What surprised me is that the collection was reenumerated for each call to the list (for each result[i]). Later I used the result collection in a LINQ query which had some bad results performance-wise due to the continual reenumeration of the collection.
I solved the problem by casting to array instead of the list.
What this makes me wonder now is when are the collections enumerated? Which method calls make collections reenumerate?
EDIT: The GenerateCollection() method looked similarly to this:
public static IEnumerable<MyType> GenerateCollection()
{
var array = data.AsParallel(); //data is a simple collection of sublists of strings
yield return new MyType("a", array.Where(x => x.Sublist.Count(y => y == 'a') == 0));
yield return new MyType("b", array.Where(x => x.Sublist.Count(y => y == 'b') == 0));
yield return new MyType("c", array.Where(x => x.Sublist.Count(y => y == 'c') == 0));
yield return new MyType("d", array.Where(x => x.Sublist.Count(y => y == 'd') == 0));
}

You are yielding objects which have queries inside - it's not some sequence of array values - its iterator objects which are not executed when you are passing them to constructor of MyType. When you create list of MyType objects
var result = GenerateCollection().ToList();
all MyType instances are yielded and saved into list, but if you haven't executed iterators in MyType constructor, then queries are not executed. And even more - they will be executed each time again, if you'll call some operator which executes query, e.g.
result[i].ArrayIterator.Count(); // first execution
foreach(var item in result[i].ArrayIterator) // second execution
// ...
You can fix it if you'll pass result of query execution to MyType constructor:
yield return new MyType("a", array.Where(x => !x.Sublist.Contains('a')).ToList())
Now you are passing list of items instead of iterator (you can use ToArray()) also. Query is executed when you are yielding MyType instance, and it will not be executed again.

array.Where(x => x.Sublist.Count(y => y == 'a') == 0)
This piece of code will be enumerated every time you access it in MyType. Use ToList or ToArray to ensure it is enumerated only once in place where the code is written.

collections which are based on deferred execution gets enumerated as soon as you use them For-example IEnumerable,IQueryable etc. and collections which are based on immediate execution gets enumerated as soon as they are created for example LIST.

Related

LINQ yield all elements

When I want to get an IEnumerable to eagrly materialize/yield all its results I usually use ToList() like this:
var myList= new List<int>();
IEnumerable<int> myXs = myList.Select(item => item.x).ToList();
I do this usually when locking a method returning the result of a Linq query.
In these kind of cases I am not actually interested in the collection becoming a list and I often don't want to know it's type. I am just using ToList() for it's side effect - yielding all the elements.
If for example if I will change the type from List to Array I will also have to remember to change the ToList() to ToArray() or suffer some performance hit.
I can do foreach( var e in myList ) { } but I am not sure if this will be optimized at some point ?
I am looking for something like myList.Select(item => item.x).yield()
What is the best way to do it ? is there a way to simply tell an a Linq result to yield all its elements which is better than ToList ?
If the point is just to exercise the list, and don't want to construct or allocate an array of any kind, you can use Last(), which will simply iterate over all the elements until it gets to the last one (see source).
If you are actually interested in the results, in most cases you should simply use ToList() and don't overthink it.
There is no way to avoid allocating some sort of storage if you want to retrieve the results later. There is no magic IEnumerable<T> container that has no concrete type; you have to choose one, and ToList() is the most obvious choice with low overhead.
Don't forget ToListAsync() if you'd rather not wait for it to finish.
Just a FYI, since maybe that is the issue
You don't have to write LINQ Operations in a one-liner you can extend it further and further:
For example:
var myList = new List<T>();
var result = myList.Select(x => x.Foo).Where(x => x.City == "Vienna").Where(x => x.Big == true).ToList();
Could be re-written to:
var myList = new List<T>();
//get an IEnumerable<Foo>
var foos = myList.Select(x => x.Foo);
//get an IEnumerable<Foo> which is filtered by the City Vienna
var foosByCity = foos.Where(x => x.City == "Vienna");
//get an IEnumerable<Foo> which is futher filtered by Big == true
var foosByCityByBig = foosByCity.Where(x => x.Big == true);
//now you could call to list on the last IEnumerable, but you dont have to
var result = foosByCityByBig.ToList();
So what-ever your real-goal is, maybe you can change your line
var myList= new List<int>();
IEnumerable<int> myXs = myList.Select(item => item.x).ToList();
To this:
var myList= new List<int>();
IEnumerable<int> myXs = myList.Select(item => item.x);
And continue your work with myXs as an IEnumerable<int>.

C# IEnumerable being reset in child method

I have the below method:
private static List<List<job>> SplitJobsByMonth(IEnumerable<job> inactiveJobs)
{
List<List<job>> jobsByMonth = new List<List<job>>();
DateTime cutOff = DateTime.Now.Date.AddMonths(-1).Date;
cutOff = cutOff.AddDays(-cutOff.Day + 1);
List<job> temp;
while (inactiveJobs.Count() > 0)
{
temp = inactiveJobs.Where(j => j.completeddt >= cutOff).ToList();
jobsByMonth.Add(temp);
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
cutOff = cutOff.AddMonths(-1);
}
return jobsByMonth;
}
It aims to split the jobs by month. 'job' is a class, not a struct. In the while loop, the passed in IEnumerable is reset with each iteration to remove the jobs that have been processed:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
Typically this reduces the content of this collection by quite a lot. However, on the next iteration the line:
temp = inactiveJobs.Where(j => j.completeddt >= cutOff).ToList();
restores the inactiveJobs object to the state it was when it was passed into the method - so the collection is full again.
I have solved this problem by refactoring this method slightly, but I am curious as to why this issue occurs as I can't explain it. Can anyone explain why this is happening?
Why not just use a group by?
private static List<List<job>> SplitJobsByMonth(IEnumerable<job> inactiveJobs)
{
var jobsByMonth = (from job in inactiveJobs
group job by new DateTime(job.completeddt.Year, job.completeddt.Month, 1)
into g
select g.ToList()).ToList();
return jobsByMonth;
}
This happens because of deferred execution of LINQ's Where.
When you do this
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a));
no evaluation is actually happening until you start iterating the IEnumerable. If you add ToList after Where, the iteration would happen right away, so the content of interactiveJobs would be reduced:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a)).ToList();
In LINQ, queries have two different behaviors of execution: immediate and deferred.
The query is actually executed when the query variable is iterated over, not when the query variable is created. This is called deferred execution.
You can also force a query to execute immediately, which is useful for caching query results.
In order to make this add .ToList() in the end of your line:
inactiveJobs = inactiveJobs.Where(a => !temp.Contains(a)).ToList();
This executes the created query immediately and writes result to your variable.
You can see more about this example Here.

How to Get a Object from IEnumerable collection using LINQ Lambda?

storageColl is having a IStorage with property "Id" as "Test".
What I am doing-
string id="Test";
IEnumerable<IStorageObject> storageColl = getStorageCollection();
IStorageObject storageObject = storageColl.ToList().Where(m => m.Properties["Id"] == id)
.ToList()
.Cast<IStorageObject>().ToArray()[0];
Is there a better way to do this. As this may throw array out of bound exception if the storageColl will not have that "Test".
You can use FirstOrDefault on the IEnumerable.
var storageObject = storageCol1.Where(m => m.Properties["Id"] == id).FirstOrDefault();
Or as David Hedlund pointed out, use the predicate overload on FirstOrDefault and remove the Where.
var storageObject = storageCol1.FirstOrDefault(m => m.Properties["Id"] == id);
Your storageColl is a sequence of objects that implement IStorageObject. The use of the Where only limits the elements you get when you enumerate over the sequence, it does not change them.
It is a waste of processing power to convert the sequence to a list when you only need the first element of the sequence or the a subset of it.
Familiarize yourself with the following Ling functions:
Any() returns true if the sequence contains at least one element
Any( item => ....) return true if any of the elements in the sequence meets the requirement
First() returns the first element of the sequence. Exception if not Any()
FirstOrDefault returns the first element of the sequence or the default (usually null) if not Any()
The nice thing about these functions is that they don't have to enumerate over all elements in the sequence, but can stop as soon as they found something.
If you use ToList() the code enumerates over all elements, throws most of them away and uses only the first element. FirstOrDefault() would have stopped after the first enumeration.
since the collection is implement IStorageObject you don't neet to cast them and for get item by index you can use any class that utilizes Array or IList
since LINQ operates on IEnumerable (Array itself is enumerable->to iterate) you don't need to cast them to array.you can utilize ElementAt method or Use IList classes (as List)
IStorageObject storageObject = storageColl.Where(m => m.Properties["Id"] == id).First();
You can simply achieve this by
var result = storageColl.Where(m => m.Properties["Id"] == id);
You should check firstOrDefault because can return null.
var Object = storageCol.Where(p => p.Properties["Id"] == id).FirstOrDefault();
if(Object != null)
{
// Do some Work
}

difference between linq.first() vs array[0] [duplicate]

This question already has answers here:
C# LINQ First() faster than ToArray()[0]?
(10 answers)
Closed 8 years ago.
I am wondering what happens under the hood of list.first() and list[0] and which performs better.
For example which is faster?
for(int i = 0; i < 999999999999... i++)
{
str.Split(';').First() vs. str.Split(';')[0]
list.Where(x => x > 1).First() vs. list.Where(x => x > 1).ToList()[0]
}
Sorry In case of a duplicate question
Which performs better? The array accessor, since that doesn't need a method to be put on the stack and doesn't have to execute the First method to eventually get to the array accessor.
As the Reference Source of Enumerable shows, First() is actually:
IList<TSource> list = source as IList<TSource>;
if (list != null) {
if (list.Count > 0) return list[0];
}
So it doesn't do anything else, it just takes more steps to get there.
For your second part (list.Where(x => x > 1).First() vs. list.Where(x => x > 1).ToList()[0]):
Where returns an IEnumerable, which isn't IList so it doesn't go for the first part of the First method but the second part:
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return e.Current;
}
This will traverse each item one by one until it gets the desired index. In this case 0, so it will get there very soon. The other one calling ToList will always be less efficient since it has to create a new object and put all items in there in order to get the first one. The first call is definitely faster.
Simple compaction of performance http://pastebin.com/bScgyDaM
str.Split(';').First(); : 529103
str.Split(';')[0]; : 246753
list.Where(x => x == "a").First(); : 98590
list.Where(x => x == "a").ToList()[0]; : 230858
First vs [0]
if you have simple array faster is [0] because it only calculating adders in memory.
but if you combine with others LINQ command faster is First(). for example Where().First() searching until he finds first element. Where().ToList()[0] finds all elements then convert to list and do a simple calculation.
another thing is that Where() is an deferred method. A query that contains only deferred methods is not executed until the items in the result are enumerated.
so you can
list.Where( x => x>12);
list.add(10);
list.add(13);
foreach (int item in list)
{
Console.WriteLine(item);
}
13 will attach to result but 10 no because 10 and 13 were first added to the list later list was searched.
If you want to know more about Linq you can read that book Pro LINQ by Joseph Rattz and Adam Freeman http://www.apress.com/9781430226536
There is no significant difference between these, you will get pretty much the same results:
str.Split(';').First() vs. str.Split(';')[0]
For your second comparison, here you are asking only the first element
list.Where(x => x > 1).First()
So as soon as WhereIterator returns an item it's done. But in second you are putting all results into list then getting the first item using indexer , therefore it will be slower.
list.Where(x => x > 1).First() vs. list.Where(x => x > 1).ToList()[0]
The First() should be faster when applied to Enumerable because of
deferred execution. In your case, the result will be returned as soon
as one item of your list has been found that match the criteria
Where(x => x > 1).
In the second example, your initial list has to be fully enumerated,
ALL items matching the criteria will be put in a temporary list, of which you get the first item with the array accessor.
str.Split(';').First() vs. str.Split(';')[0]
In that case the method Split() already returns an array. The array accessor might be marginally faster, but the performance gain will be negligible in most cases.

What does LINQ return when the results are empty

I have a question about LINQ query. Normally a query returns a IEnumerable<T> type. If the return is empty, not sure if it is null or not. I am not sure if the following ToList() will throw an exception or just a empty List<string> if nothing found in IEnumerable result?
List<string> list = {"a"};
// is the result null or something else?
IEnumerable<string> ilist = from x in list where x == "ABC" select x;
// Or directly to a list, exception thrown?
List<string> list1 = (from x in list where x == "ABC" select x).ToList();
I know it is a very simple question, but I don't have VS available for the time being.
It will return an empty enumerable. It won't be null. You can sleep sound :)
You can also check the .Any() method:
if (!YourResult.Any())
Just a note that .Any will still retrieve the records from the database; doing a .FirstOrDefault()/.Where() will be just as much overhead but you would then be able to catch the object(s) returned from the query
var lst = new List<int>() { 1, 2, 3 };
var ans = lst.Where( i => i > 3 );
(ans == null).Dump(); // False
(ans.Count() == 0 ).Dump(); // True
(Dump is from LinqPad)
.ToList returns an empty list. (same as new List<T>() );
In Linq-to-SQL if you try to get the first element on a query with no results you will get sequence contains no elements error. I can assure you that the mentioned error is not equal to object reference not set to an instance of an object.
in conclusion no, it won't return null since null can't say sequence contains no elements it will always say object reference not set to an instance of an object ;)
Other posts here have made it clear that the result is an "empty" IQueryable, which ToList() will correctly change to be an empty list etc.
Do be careful with some of the operators, as they will throw if you send them an empty enumerable. This can happen when you chain them together.
It won't throw exception, you'll get an empty list.

Categories