To ToList() or not to ToList()? - c#

Given an in memory (not LINQ to SQL) list of classes:
List<MyClass> myItems = /*lots and lots of items*/;
which I am grouping using a GroupBy() statement:
myItems.GroupBy(g => g.Ref)
and then immediately consuming in a foreach loop is there any difference in calling .ToList() on the "group" or should I just use an IEnumerable.
So full code examples:
With ToList()
List<List<MyClass>> groupedItemsA = new List<List<MyClass>>();
List<List<MyClass>> groupedItemsB = new List<List<MyClass>>();
List<MyClass> myItems = /*lots and lots of items*/;
List<IGrouping<string, MyClass>> groupedItems = myItems.GroupBy(g => g.Ref).ToList();
foreach(IGrouping<string, MyClass> item in groupedItems)
{
if (/*check something*/)
{
groupedItemsA.Add(item.ToList());
}
else
{
groupedItemsB.Add(item.ToList());
}
}
or
Using IEnumerable
List<List<MyClass>> groupedItemsA = new List<List<MyClass>>();
List<List<MyClass>> groupedItemsB = new List<List<MyClass>>();
List<MyClass> myItems = /*lots and lots of items*/;
IEnumerable<IGrouping<string, MyClass>> groupedItems = myItems.GroupBy(g => g.Ref);
foreach(IGrouping<string, MyClass> item in groupedItems)
{
if (/*check something*/)
{
groupedItemsA.Add(item.ToList());
}
else
{
groupedItemsB.Add(item.ToList());
}
}
Is there any difference in the execution plan of these "under the hood"? Would either of these be more efficient or does it not really matter?
I am not using the groupedItems list after this.

Yes there is a difference and it can be significant.
ToList() will iterate and append each iterated item into a new list. This has the effect of creating a temporary list which consumes memory.
Sometimes you might want to take the memory penalty especially if you intend on iterating the list multiple times and the original list is not in memory.
In your particular example using the ToList() you actually end up iterating twice - once to build the list and a second time in your foreach. Depending on the size of the list and your application this may or may not be a concern.

If you are sure you'll use groupedItems only once, then using .ToList() has a single advantage: if there is an exception (for example because your code is doing funny things for calculating .Ref) during the grouping, the exception will be in the .ToList() row instead of being inside the foreach... I don't think it is a big advantage (and perhaps it is a disadvantage).
To clarify:
public class MyClass
{
public string Ref
{
get
{
// sometimes I like to throw an exception!
if (DateTime.Now.Ticks % 10 == 0) throw new Exception();
return "Foo";
}
}
}
Note that you have explicitly tagged this question as IEnumerable, and in your example myItems is a List<>, so I won't discuss of the difference of doing ToList() or not when you are reading data from a database through Entity Framework/Linq to SQL.

Is there any difference?
Yes, .ToList() creates a new list from iterating the grouped collection:
The ToList<TSource>(IEnumerable<TSource>) method forces immediate query evaluation and returns a List that contains the query results.
Whether this is noticeable should be benchmarked by you.
If you only iterate the grouped collection once, the .ToList() step is unnecessary and will be relatively slower than directly enumerating the GroupBy() result.

If you were to use the IEnumerable rather than the List() each time you enumerate it, the GroupBy expression would be re-evaluated against your myItems List.
This means if you add another item to myItems then enumerate it, that item will be included in the GroupBy expression.
When you call ToList however you create a new List and any changes to myItems will not be included in groupedItems.

Related

Is the IEnumerable<T> function with a yield more efficient than the List<T> function?

I am coding a C# forms application, and would like to know if the following two functions achieve the same result:
public List<object> Method1(int parentId)
{
List<object> allChildren = new List<object>();
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
allChildren.Add(item);
allChildren.AddRange(Method1(item.id));
}
return allChildren;
}
public IEnumerable<object> Method2(int parentId)
{
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
yield return item;
foreach (var itemy in Method2(item.id))
{
yield return itemy;
}
}
}
Am I correct in saying that the Method1 function is more efficient than the Method2?
Also, can either of the above functions be coded to be more efficient?
EDIT
I am using the function to return some objects that are then displayed in a ListView. I am then looping through these same objects to check if a string occurs.
Thanks.
This highly depends on what you want to do. For example if you use FirstOrDefault(p => ....) the yield method can be faster because it's not required to store all the stuff into a list and if the first element is the right one the list method has some overhead ( Of course the yield method has also overhead but as i said it depends ).
If you want to iterate over and over again over the data then you should go with the list.
It depends on lot's of things.
Here are some reasons to use IEnumerable<T> over List<T>:
When you are iterating a part of a collection (e.g. using FirstOrDefault, Any, Take etc.).
When you have an large collection and you can ToList() it (e.g. Fibonacci Series).
When you shouldn't use IEnumerable<T> over List<T>:
When you are enumerating a DB query multiple times with different conditions (You may want the results in memory).
When you want to iterate the whole collection more than once - There is no need to create iterators each time.

ToList method in Linq

If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example
//using linq
list = Students.Where(s => s.Name == "ABC").ToList();
//traditional way
foreach (var student in Students)
{
if (student.Name == "ABC")
list.Add(student);
}
I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.
The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.
Do these things make any impact on performance?
Do these things make any impact on performance?
That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).
But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).
Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):
Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…
What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().
var list = Students.Where(s=>s.Name == "ABC");
This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.
List<Student> studentList = new List<Student>();
var list = Students.Where(s=>s.Name == "ABC");
foreach(Student s in list)
{
studentList.add(s);
}
this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.
And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.
First we have our list
public class TestList<T> : IEnumerable<T>
{
private TestEnumerator<T> _Enumerator;
public TestList()
{
_Enumerator = new TestEnumerator<T>();
}
public IEnumerator<T> GetEnumerator()
{
return _Enumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
internal void Add(T p)
{
_Enumerator.Add(p);
}
}
And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.
public class TestEnumerator : IEnumerator
{
public Item FirstItem = null;
public Item CurrentItem = null;
public TestEnumerator()
{
}
public T Current
{
get { return CurrentItem.Value; }
}
public void Dispose()
{
}
object System.Collections.IEnumerator.Current
{
get { throw new NotImplementedException(); }
}
public bool MoveNext()
{
Program.Counter++;
if (CurrentItem == null)
{
CurrentItem = FirstItem;
return true;
}
if (CurrentItem != null && CurrentItem.NextItem != null)
{
CurrentItem = CurrentItem.NextItem;
return true;
}
return false;
}
public void Reset()
{
CurrentItem = null;
}
internal void Add(T p)
{
if (FirstItem == null)
{
FirstItem = new Item<T>(p);
return;
}
Item<T> lastItem = FirstItem;
while (lastItem.NextItem != null)
{
lastItem = lastItem.NextItem;
}
lastItem.NextItem = new Item<T>(p);
}
}
And then we have a custom item that just wraps our value
public class Item<T>
{
public Item(T item)
{
Value = item;
}
public T Value;
public Item<T> NextItem;
}
To use the actual code we create a "list" with 3 entries.
public static int Counter = 0;
static void Main(string[] args)
{
TestList<int> list = new TestList<int>();
list.Add(1);
list.Add(2);
list.Add(3);
var v = list.Where(c => c == 2).ToList(); //will use movenext 4 times
var v = list.Where(c => true).ToList(); //will also use movenext 4 times
List<int> tmpList = new List<int>(); //And the loop in OP question
foreach(var i in list)
{
tmpList.Add(i);
} //Also 4 times.
}
And conclusion? How does it hit performance?
The MoveNext is called n+1 times in this case. Regardless of how many items we have.
And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list.
The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.
And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.
To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.
In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.
First of all, Why are you even asking me? Measure for yourself and see.
That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:
var list = Students.Where(s => s.Name == "ABC");
won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".
There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.
If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.

C# foreach on IEnumerable vs. List - element modification persistent only for array - Why?

In C#, I have noticed that if I am running a foreach loop on a LINQ generated IEnumerable<T> collection and try to modify the contents of each T element, my modifications are not persistent.
On the other hand, if I apply the ToArray() or ToList() method when creating my collection, modification of the individual elements in the foreach loop are persistent.
I suspect that this is in some way related to deferred execution, but exactly how is not entirely obvious to me. I would really appreciate an explanation to this difference in behavior.
Here is some example code - I have a class MyClass with a constructor and auto-implemented property:
public class MyClass
{
public MyClass(int val) { Str = val.ToString(); }
public string Str { get; set; }
}
In my example application I use LINQ Select() to create two collections of MyClass objects based on a collection of integers, one IEnumerable<MyClass>, and one IList<MyClass> by applying the ToList() method in the end.
var ints = Enumerable.Range(1, 10);
var myClassEnumerable = ints.Select(i => new MyClass(i));
var myClassArray = ints.Select(i => new MyClass(i)).ToList();
Next, I run a foreach loop over each of the collections, and modify the contents of the looped-over MyClass objects:
foreach (var obj in myClassEnumerable) obj.Str = "Something";
foreach (var obj in myClassArray) obj.Str = "Something else";
Finally, I output the Str member of the first element in each collection:
Console.WriteLine(myClassEnumerable.First().Str);
Console.WriteLine(myClassArray.First().Str);
Somewhat counter-intuitively, the output is:
1
Something else
Deferred execution is the indeed the key point.
Executing myClassEnumerable.First().Str will reexecute your query ints.Select(i => new MyClass(i)); and so it will give you a new IEnumerable with a new list of integers.
You can see this in action using your debugger. Put a breakpoint at the new MyClass(i) part of the IEnumerable select and you will see that this part get's hit again when you execute it for Console.WriteLine
You are right, it is deferred execution. A new MyClass instance is created each time you iterate the IEnumerable. By calling ToList or ToArray you then create a List or Array and populate it with the new MyClass instances created from the iteration of the IEnumerable.

Problems removing elements from a list when iterating through the list

I have a loop that iterates through elements in a list. I am required to remove elements from this list within the loop based on certain conditions. When I try to do this in C#, I get an exception. apparently, it is not allowed to remove elements from the list which is being iterated through. The problem was observed with a foreach loop. Is there any standard way to get around this problem?
Note : One solution I could think of is to create a copy of the list solely for iteration purpose and to remove elements from the original list within the loop. I am looking for a better way of dealing with this.
When using List<T> the ToArray() method helps in this scenario vastly:
List<MyClass> items = new List<MyClass>();
foreach (MyClass item in items.ToArray())
{
if (/* condition */) items.Remove(item);
}
The alternative is to use a for loop instead of a foreach, but then you have to decrement the index variable whenever you remove an element i.e.
List<MyClass> items = new List<MyClass>();
for (int i = 0; i < items.Count; i++)
{
if (/* condition */)
{
items.RemoveAt(i);
i--;
}
}
If your list is an actual List<T> then you can use the built-in RemoveAll method to delete items based on a predicate:
int numberOfItemsRemoved = yourList.RemoveAll(x => ShouldThisItemBeDeleted(x));
You could use LINQ to replace the initial list by a new list by filtering out items:
IEnumerable<Foo> initialList = FetchList();
initialList = initialList.Where(x => SomeFilteringConditionOnElement(x));
// Now initialList will be filtered according to the condition
// The filtered elements will be subject to garbage collection
This way you don't have to worry about loops.
You can use integer indexing to remove items:
List<int> xs = new List<int> { 1, 2, 3, 4 };
for (int i = 0; i < xs.Count; ++i)
{
// Remove even numbers.
if (xs[i] % 2 == 0)
{
xs.RemoveAt(i);
--i;
}
}
This can be weird to read and tough to maintain, though, especially if the logic in the loop gets any more complex.
Another trick is to loop through the list backwards.. removing an item won't affect any of the items you are going to encounter in the rest of the loop.
I'm not recommending this or anything else though. Everything you need this for can probably be done using LINQ statements to filter the list on your requirements.
You can iterate with foreach this way:
List<Customer> custList = Customer.Populate();
foreach (var cust in custList.ToList())
{
custList.Remove(cust);
}
Note: ToList on the list of variables, this iterates through the list created by the ToList but removes the items from the original list.
Hope this helps.
The recommended solution is to put all your elements you want to remove in a separate list and after the first loop, put a second loop where you iterate over the remove-list and remove those elements form the first list.
The reason you get an error is because you're using a foreach loop. If you think about how a foreach loop works this makes sense. The foreach loop calls the GetEnumerator method on the List. If you where to change the number of elements in the List, the Enumerator the foreach loop holds wouldn't have the correct number of elements. If you removed an element a null exception error would be thrown, and if you added an element the loop would miss an item.
If you like Linq and Lamda expressions I would recommend Darin Dimitrov solution, otherwise I would use the solution provided by Chris Schmich.

Union two List in C#

I want to union, merge in a List that contains both references, so this is my code, how can I define a list ready for this porpouses?
if (e.CommandName == "AddtoSelected")
{
List<DetalleCita> lstAux = new List<DetalleCita>();
foreach (GridViewRow row in this.dgvEstudios.Rows)
{
var GridData = GetValues(row);
var GridData2 = GetValues(row);
IList AftList2 = GridData2.Values.Where(r => r != null).ToList();
AftList2.Cast<DetalleCita>();
chkEstudio = dgvEstudios.Rows[index].FindControl("ChkAsignar") as CheckBox;
if (chkEstudio.Checked)
{
IList AftList = GridData.Values.Where(r => r != null).ToList();
lstAux.Add(
new DetalleCita
{
codigoclase = Convert.ToInt32(AftList[0]),
nombreestudio = AftList[1].ToString(),
precioestudio = Convert.ToDouble(AftList[2]),
horacita = dt,
codigoestudio = AftList[4].ToString()
});
}
index++;
//this line to merge
lstAux.ToList().AddRange(AftList2);
}
dgvEstudios.DataSource = lstAux;
dgvEstudios.DataBind();
}
this is inside a rowcommand event.
If you want to add all entries from AftList2 to lstAux you should define AftList2 as IEnumerable<> with elements of type DetalleCita (being IEnumerable<DetalleCita> is enough to be used as parameter of AddRange() on List<DetalleCita>). For example like this:
var AftList2 = GridData2.Values.Where(r => r != null).Cast<DetalleCita>();
And then you can add all its elements to lstAux:
lstAux.AddRange(AftList2);
Clarification:
I think you are misunderstanding what extension method ToList() does. It creates new list from IEnumerable<T> and its result is not connected with original IEnumerable<T> that it is applied to.
That is why you are just do nothing useful trying to do list.ToList().AddRange(...) - you are copying list to (another newly created by ToList()) list, update it and then basically throwing away it (because you are not even doing something like list2 = var1.ToList(), original var1 stays unchanged after that!!! you most likely want to save result of ToList() if you are calling it).
Also you don't usually need to convert one list to another list, ToList() is useful when you need list (List<T>) but have IEnumerable<T> (that is not indexable and you may need fast access by index, or lazy evaluates but you need all results calculated at this time -- both situations may arise while trying to use result of LINQ to objects query for example: IEnumerable<int> ints = from i in anotherInts where i > 20 select i; -- even if anotherInts was List<int> result of query ints cannot be cast to List<int> because it is not list but implementation of IEnumerable<int>. In this case you could use ToList() to get list anyway: List<int> ints = (from i in anotherInts where i > 20 select i).ToList();).
UPDATE:
If you really mean union semantics (e.g. for { 1, 2 } and { 1, 3 } union would be something like { 1, 2, 3 }, with no duplication of equal elements from two collections) consider switching to HashSet<T> (it most likely available in your situation 'cause you are using C# 3.0 and I suppose yoou have recent .NET framework) or use Union() extension method instead of AddRange (I don't think this is better than first solution and be careful because it works more like ToList() -- a.Union(b) return new collection and does NOT updates either a or b).

Categories