Is the IEnumerable<T> function with a yield more efficient than the List<T> function? - c#

I am coding a C# forms application, and would like to know if the following two functions achieve the same result:
public List<object> Method1(int parentId)
{
List<object> allChildren = new List<object>();
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
allChildren.Add(item);
allChildren.AddRange(Method1(item.id));
}
return allChildren;
}
public IEnumerable<object> Method2(int parentId)
{
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
yield return item;
foreach (var itemy in Method2(item.id))
{
yield return itemy;
}
}
}
Am I correct in saying that the Method1 function is more efficient than the Method2?
Also, can either of the above functions be coded to be more efficient?
EDIT
I am using the function to return some objects that are then displayed in a ListView. I am then looping through these same objects to check if a string occurs.
Thanks.

This highly depends on what you want to do. For example if you use FirstOrDefault(p => ....) the yield method can be faster because it's not required to store all the stuff into a list and if the first element is the right one the list method has some overhead ( Of course the yield method has also overhead but as i said it depends ).
If you want to iterate over and over again over the data then you should go with the list.

It depends on lot's of things.
Here are some reasons to use IEnumerable<T> over List<T>:
When you are iterating a part of a collection (e.g. using FirstOrDefault, Any, Take etc.).
When you have an large collection and you can ToList() it (e.g. Fibonacci Series).
When you shouldn't use IEnumerable<T> over List<T>:
When you are enumerating a DB query multiple times with different conditions (You may want the results in memory).
When you want to iterate the whole collection more than once - There is no need to create iterators each time.

Related

Which is more efficient in this case? LINQ Query or FOREACH loop?

In my project, I implemented a service class which has a function naming GetList() which is as follows:
IList<SUB_HEAD> GetList(string u)
{
var collection = (from s in context.DB.SUB_HEAD where (s.head_code.Equals(u))
select s);
return collection.ToList();
}
which can also be implemented as
Arraylist unitlist= new Arraylist();
ObjectSet<SUB_HEAD> List = subheadService.GetAll();
foreach(SUB_HEAD unit in List)
{
unitlist.Add(unit.sub_head_code);
}
Purpose of doing this is to populate dropdown menu.
My question is that "which of the above method will be more efficient with respect to processing?" because my project have lot of places where i have to use drop down menu.
Please, just use the LINQ version. You can perform optimizations later if you profile and determine this is too slow (by the way, it won't be). Also, you can use the functional-style LINQ to make a single expression that I think reads better.
IList<SUB_HEAD> GetList(string u)
{
return context.DB.SUB_HEAD.Where(s => s.head_code == u).ToList();
}
The ToList() method is going to do exactly the same thing as you're doing manually. The implementation in the .NET framework looks something like this:
public static class Enumerable
{
public static List<T> ToList<T>(this IEnumerable<T> source)
{
var list = new List<T>();
foreach (var item in source)
{
list.Add(item);
}
return list;
}
}
If you can express these 4 lines of code with the characters "ToList()" then you should do so. Code duplication is bad, even when it's for something this simple.

ToList method in Linq

If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example
//using linq
list = Students.Where(s => s.Name == "ABC").ToList();
//traditional way
foreach (var student in Students)
{
if (student.Name == "ABC")
list.Add(student);
}
I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.
The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.
Do these things make any impact on performance?
Do these things make any impact on performance?
That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).
But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).
Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):
Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…
What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().
var list = Students.Where(s=>s.Name == "ABC");
This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.
List<Student> studentList = new List<Student>();
var list = Students.Where(s=>s.Name == "ABC");
foreach(Student s in list)
{
studentList.add(s);
}
this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.
And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.
First we have our list
public class TestList<T> : IEnumerable<T>
{
private TestEnumerator<T> _Enumerator;
public TestList()
{
_Enumerator = new TestEnumerator<T>();
}
public IEnumerator<T> GetEnumerator()
{
return _Enumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
internal void Add(T p)
{
_Enumerator.Add(p);
}
}
And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.
public class TestEnumerator : IEnumerator
{
public Item FirstItem = null;
public Item CurrentItem = null;
public TestEnumerator()
{
}
public T Current
{
get { return CurrentItem.Value; }
}
public void Dispose()
{
}
object System.Collections.IEnumerator.Current
{
get { throw new NotImplementedException(); }
}
public bool MoveNext()
{
Program.Counter++;
if (CurrentItem == null)
{
CurrentItem = FirstItem;
return true;
}
if (CurrentItem != null && CurrentItem.NextItem != null)
{
CurrentItem = CurrentItem.NextItem;
return true;
}
return false;
}
public void Reset()
{
CurrentItem = null;
}
internal void Add(T p)
{
if (FirstItem == null)
{
FirstItem = new Item<T>(p);
return;
}
Item<T> lastItem = FirstItem;
while (lastItem.NextItem != null)
{
lastItem = lastItem.NextItem;
}
lastItem.NextItem = new Item<T>(p);
}
}
And then we have a custom item that just wraps our value
public class Item<T>
{
public Item(T item)
{
Value = item;
}
public T Value;
public Item<T> NextItem;
}
To use the actual code we create a "list" with 3 entries.
public static int Counter = 0;
static void Main(string[] args)
{
TestList<int> list = new TestList<int>();
list.Add(1);
list.Add(2);
list.Add(3);
var v = list.Where(c => c == 2).ToList(); //will use movenext 4 times
var v = list.Where(c => true).ToList(); //will also use movenext 4 times
List<int> tmpList = new List<int>(); //And the loop in OP question
foreach(var i in list)
{
tmpList.Add(i);
} //Also 4 times.
}
And conclusion? How does it hit performance?
The MoveNext is called n+1 times in this case. Regardless of how many items we have.
And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list.
The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.
And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.
To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.
In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.
First of all, Why are you even asking me? Measure for yourself and see.
That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:
var list = Students.Where(s => s.Name == "ABC");
won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".
There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.
If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.

LINQ's ForEach on HashSet?

I am curious as to what restrictions necessitated the design decision to not have HashSet's be able to use LINQ's ForEach query.
What's really going on differently behind the scenes for these two implementations:
var myHashSet = new HashSet<T>;
foreach( var item in myHashSet ) { do.Stuff(); }
vs
var myHashSet = new HashSet<T>;
myHashSet.ForEach( item => do.Stuff(); }
I'm (pretty) sure that this is just because HashSet does not implement IEnumerable -- but what is a normal ForEach loop doing differently that makes it more supported by a HashSet?
Thanks
LINQ doesn't have ForEach. Only the List<T> class has a ForEach method.
It's also important to note that HashSet does implement IEnumerable<T>.
Remember, LINQ stands for Language INtegrated Query. It is meant to query collections of data. ForEach has nothing to do with querying. It simply loops over the data. Therefore it really doesn't belong in LINQ.
LINQ is meant to query data, I'm guessing it avoided ForEach() because there's a chance it could mutate data that would affect the way the data could be queried (i.e. if you changed a field that affected the hash code or equality).
You may be confused with the fact that List<T> has a ForEach()?
It's easy enough to write one, of course, but it should be used with caution because of those aforementioned concerns...
public static class EnumerableExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
if (source == null) throw new ArgumentNullException("source");
if (action == null) throw new ArgumentNullException("action");
foreach(var item in source)
{
action(item);
}
}
}
var myHashSet = new HashSet<T>;
myHashSet.ToList().ForEach( x => x.Stuff() );
The first use the method GetEnumerator of HashSet
The second the method ForEach
Maybe the second use GetEnumerator behind the scene but I'm not sure.

Retrieve IEnumerable's method parameters

Consider this method:
public IEnumerable<T> GetList(int Count)
{
foreach (var X in Y)
{
// do a lot of expensive stuff here (e.g. traverse a large HTML document)
// when we reach "Count" then exit out of the foreach loop
}
}
and I'd call it like so: Class.GetList(10); which would return 10 items - this is fine.
I'd like to use IEnumerable's Take() method by using Class.GetList().Take(10) instead and I want the foreach loop to be able to somehow grab the number I passed into Take() - is this possible? It seems a more cleaner approach as it's ultimately less expensive as I can grab a full list with Class.GetList() once and then use IEnumerable's methods to grab x items afterwards.
Thanks guys!
You can do this by implementing your method as an iterator block:
public IEnumerable<T> GetList()
{
foreach (var X in Y)
{
// do something
yield return something;
}
}
When you call Take(10) only the first 10 elements will be yielded - the rest of the method won't be run.
In your GetList method use yield return. This way you will be able to use Take(10) without the GetList method returning the entire result first.
No, this is not possible. However, what Take(10) does is that it just gets 10 items of the Enumerable. What you are trying to accomplish with the Count check, the Take() does this for you because it will simply stop getting items from your enumerator when it has 10 items.

Pushing Items into stack with LINQ

How can i programatically push an array of strings into generic Stack ?
string array
string[] array=new string[]{"Liza","Ana","Sandra","Diya"};
Stack Setup
public class stack<T>
{
private int index;
List<T> list;
public stack()
{
list = new List<T>();
index=-1;
}
public void Push(T obj)
{
list.Add(obj);
index++;
}
...........
}
What is the change do i need here ?
stack<string> slist = new stack<string>();
var v = from vals in array select (p => slist.Push(p));
Error Report :
The type of the expression in the select clause is incorrect.
LINQ is a query language/framework. What you want to perform here is a modification to a collection object rather than a query (selection) - this is certainly not what LINQ is designed for (or even capable of).
What you might like to do, however, is to define an extension method that for the Stack<T> class, however. Note that it also makes sense to here to use the BCL Stack<T> type, which is exactly what you need, instead of reinventing the wheel using List<T>.
public static void PushRange<T>(this Stack<T> source, IEnumerable<T> collection)
{
foreach (var item in collection)
source.Push(item);
}
Which would then allow you do the following:
myStack.PushRange(myCollection);
And if you're not already convinced, another philosophical reason: LINQ was created to bring functional paradigms to C#/.NET, and at the core of functional programming is side-effect free code. Combining LINQ with state-modifying code would thus be quite inconsistent.
The first issue is you Push returns a void. Select is expecting a value of something.
You are just doing a loop and don't need to use link.
Since you stack is internally storing a list, you can create a list by passing it an array.
so in your case
List<string> myList = new List<string>(array);
Creats the list.
Change
public void Push(T obj)
to
public T Push(T obj)
and ignore the return values.
Disclaimer: I would not recommend mutation like this.
Try this
string[] arr = new string[]{"a","f"};
var stack = new Stack<string>();
arr.ToList().ForEach(stack.Push);
While this is "cool" is isn't any better than a for loop.
Push needs a return type for you to be able to use it in a select clause. As it is, it returns void. Your example is, I think, a horrible abuse of LINQ. Even if it worked, you'd be using a side-effect of the function in the select clause to accomplish something totally unrelated to the task that the select is intended for. If there was a "ForEach" extension, then it would be reasonable to use LINQ here, but I'd avoid it and stick with a foreach loop.
foreach (var val in array)
{
slist.Push(val);
}
This is much clearer in intent and doesn't leave one scratching their head over what you're trying to accomplish.

Categories