which is more efficient in conditional looping? - c#

suppose i have the following collection
IEnumerable<car> cars = new IEnumerable<car>();
now I need to loop on this collection.
I need to do some function depending on the car type; so I can do one of the following ways:
Method A:
foreach( var item in cars){
if(item.color == white){
doSomething();
}
else{
doSomeOtherThing();
}
}
or the other way:
Method B:
foreach( var item in cars.where(c=>c.color==white)){
doSomething();
}
foreach( var item in cars.where(c=>c.color!=white)){
doSomeOtherthing();
}
to me i think method A is better bec. I loop only once on the collection
while method B seems enticing bec. the framework will loop and filter the collection for you.
So which method is better and faster ?

Well, it depends on how complicated the filtering process is. It may be so insanely efficient that it's irrelevant, especially in light of the fact that you're no longer having to do your own filtering with the if statement.
I'll say one thing: unless your collections are massive, it probably won't make enough of a difference to care. And, sometimes, it's better to optimise for readabilty rather than speed :-)
But, if you really want to know, you measure! Time the operations in your environment with suitable production-like test data. That's the only way to be certain.

Method A is more readable than method B. Just one question, is it car.color or item.color?

Related

Difference between using where/lambda and for each

I'm new to C# and came across this function preformed on a dictionary.
_objDictionary.Keys.Where(a => (a is fooObject)).ToList().ForEach(a => ((fooObject)a).LaunchMissles());
My understanding is that this essentially puts every key that is a fooObject into a list, then performs the LaunchMissles function of each. How is that different than using a for each loop like this?
foreach(var entry in _objDictionary.Keys)
{
if (entry is fooObject)
{
entry.LaunchMissles();
}
}
EDIT: The resounding opinion appears to be that there is no functional difference.
This is good example of abusing LINQ - statement did not become more readable or better in any other way, but some people just like to put LINQ everywhere. Though in this case you might take the best from both worlds by doing:
foreach(var entry in _objDictionary.Keys.OfType<FooObject>())
{
entry.LaunchMissles();
}
Note that in your foreach example you are missing a cast to FooObject to invoke LaunchMissles.
In general, Linq is no Voodomagic and does the same stuff under the hood that you would need to write if you werent using it. Linq just makes things easier to write but it wont beat regular code performance wise (if it really is equivalent)
In your case, your "oldschool" approach is perfectly fine and in my opinion the favorable
foreach(var entry in _objDictionary.Keys)
{
fooObject foo = entry as fooObject;
if (foo != null)
{
foo .LaunchMissles();
}
}
Regarding the Linq-Approach:
Materializing the Sequence to a List just to call a method on it, that does the same as the code above, is just wasting ressources and making it less readable.
In your example it doesnt make a diffrence but if the source wasnt a Collection (like Dictionary.Keys is) but an IEnumerable that really works the lazy way, then there can be a huge impact.
Lazy evalutation is designed to yield items when needed, calling ToList inbetween would first gather all items before actually executing the ForEach.
While the plain foreach-approach would get one item, then process it, then get the next and so on.
If you really want to use a "Linq-Foreach" than dont use the List-Implementation but roll your own extensionmethod (like mentioned in the comments below your quesiton)
public static class EnumerableExtensionMethods
{
public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
foreach(T item in sequence)
action(item);
}
}
Then still rolling with a regular foreach should be prefered, unless you put the foreach-body into a different method
sequence.ForEach(_methodThatDoesThejob);
That is the only "for me acceptable" way of using this.

Does foreach loop work more slowly when used with a not stored list or array?

I am wondered at if foreach loop works slowly if an unstored list or array is used as an in array or List.
I mean like that:
foreach (int number in list.OrderBy(x => x.Value)
{
// DoSomething();
}
Does the loop in this code calculates the sorting every iteration or not?
The loop using stored value:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>;
foreach (int number in list)
{
// DoSomething();
}
And if it does, which code shows the better performance, storing the value or not?
This is often counter-intuitive, but generally speaking, the option that is best for performance is to wait as long as possible to materialize results into a concrete structure like a list or array. Please keep in mind that this is a generalization, and so there are plenty of cases where it doesn't hold. Nevertheless, the first instinct is better when you avoid creating the list for as long as possible.
To demonstrate with your sample, we have these two options:
var list = tours.OrderBy(x => x.Value).ToList();
foreach (int number in list)
{
// DoSomething();
}
vs this option:
foreach (int number in list.OrderBy(x => x.Value))
{
// DoSomething();
}
To understand what is going on here, you need to look at the .OrderBy() extension method. Reading the linked documentation, you'll see it returns a IOrderedEnumerable<TSource> object. With an IOrderedEnumerable, all of the sorting needed for the foreach loop is already finished when you first start iterating over the object (and that, I believe, is the crux of your question: No, it does not re-sort on each iteration). Also note that both samples use the same OrderBy() call. Therefore, both samples have the same problem to solve for ordering the results, and they accomplish it the same way, meaning they take exactly the same amount of time to reach that point in the code.
The difference in the code samples, then, is entirely in using the foreach loop directly vs first calling .ToList(), because in both cases we start from an IOrderedEnumerable. Let's look closely at those differences.
When you call .ToList(), what do you think happens? This method is not magic. There is still code here which must execute in order to produce the list. This code still effectively uses it's own foreach loop that you can't see. Additionally, where once you only needed to worry about enough RAM to handle one object at a time, you are now forcing your program to allocate a new block of RAM large enough to hold references for the entire collection. Moving beyond references, you may also potentially need to create new memory allocations for the full objects, if you were reading a from a stream or database reader before that really only needed one object in RAM at a time. This is an especially big deal on systems where memory is the primary constraint, which is often the case with web servers, where you may be serving and maintaining session RAM for many many sessions, but each session only occasionally uses any CPU time to request a new page.
Now I am making one assumption here, that you are working with something that is not already a list. What I mean by this, is the previous paragraphs talked about needing to convert an IOrderedEnumerable into a List, but not about converting a List into some form of IEnumerable. I need to admit that there is some small overhead in creating and operating the state machine that .Net uses to implement those objects. However, I think this is a good assumption. It turns out to be true far more often than we realize. Even in the samples for this question, we're paying this cost regardless, by the simple virtual of calling the OrderBy() function.
In summary, there can be some additional overhead in using a raw IEnumerable vs converting to a List, but there probably isn't. Additionally, you are almost certainly saving yourself some RAM by avoiding the conversions to List whenever possible... potentially a lot of RAM.
Yes and no.
Yes the foreach statement will seem to work slower.
No your program has the same total amount of work to do so you will not be able to measure a difference from the outside.
What you need to focus on is not using a lazy operation (in this case OrderBy) multiple times without a .ToList or ToArray. In this case you are only using it once(foreach) but it is an easy thing to miss.
Edit: Just to be clear. The as statement in the question will not work as intended but my answer assumes no .ToList() after OrderBy .
This line won't run:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>; // Returns null.
Instead, you want to store the results this way:
List<Tour> list = tours.OrderBy(x => x.Value).ToList();
And yes, the second option (storing the results) will enumerate much faster as it will skip the sorting operation.

Should I use a simple foreach or Linq when collecting data out of a collection

For a simple case, where class foo has a member i, and I have a collection of foos, say IEnumerable<Foo> foos, and I want to end up with a collection of foo's member i, say List<TypeOfi> result.
Question: is it preferable to use a foreach (Option 1 below) or some form of Linq (Option 2 below) or some other method. Or, perhaps, it it not even worth concerning myself with (just choose my personal preference).
Option 1:
foreach (Foo foo in foos)
result.Add(foo.i);
Option 2:
result.AddRange(foos.Select(foo => foo.i));
To me, Option 2 looks cleaner, but I'm wondering if Linq is too heavy handed for something that can achieved with such a simple foreach loop.
Looking for all opinions and suggestions.
I prefer the second option over the first. However, unless there is a reason to pre-create the List<T> and use AddRange, I would avoid it. Personally, I would use:
List<TypeOfi> results = foos.Select(f => f.i).ToList();
In addition, I would not necessarily even use ToList() unless you actually need a true List<T>, or need to force the execution to be immediate instead of deferred. If you just need the collection of "i" values to iterate, I would simply use:
var results = foos.Select(f => f.i);
I definitely prefer the second. It is far more declarative and easier to understand (to me, at least).
LINQ is here to make our lives more declarative so I would hardly consider it heavy handed even in cases as seemingly "trivial" as this.
As Reed said, though, you could improve the quality by using:
var result = foos.Select(f => f.i).ToList();
As long as there is no data already in the result collection.
LINQ isn't heavy handed in any way, both the foreach and the linq code do about the same, the foreach in the second case is just hidden away.
It really is just a matter of preference, at least concerning linq to objects. If your source collection is a linq to entities query or something different, it is a complete different case - the second case would put the query into the database which is much more effective. In this simple case, the difference probably won't be that much, but if you throw in a Where operator or others into it and make the query non-trivial, the linq query will most likely have better/faster performance.
I think you could also just do
foos.Select(foo => foo.i).ToList<TypeOfi>();

LinQ optimization

Here is a peace of code:
void MyFunc(List<MyObj> objects)
{
MyFunc1(objects);
foreach( MyObj obj in objects.Where(obj1=>obj1.Good))
{
// Do Action With Good Object
}
}
void MyFunc1(List<MyObj> objects)
{
int iGoodCount = objects.Where(obj1=>obj1.Good).Count();
BeHappy(iGoodCount);
// do other stuff with 'objects' collection
}
Here we see that collection is analyzed twice and each time the value of 'Good' property is checked for each member: 1st time when calculating count of good objects, 2nd - when iterating through all good objects.
It is desirable to have that optimized, and here is a straightforward solution:
before call to MyFunc1 makecreate an additional temporary collection of good objects only (goodObjects, it can be IEnumerable);
get count of these objects and pass it as an additional parameter to MyFunc1;
in the 'MyFunc' method iterate not through 'objects.Where(...)' but through the 'goodObjects' collection.
Not too bad approach (as far as I see), but additional variable is required to be created in the 'MyFunc' method and additional parameter is required to be passed.
Question: is there any LinQ out-of-the-box functionality that allows any caching during 1st Where().Count(), remembering a processed collection and use it in the next iteration?
Any thoughts are welcome.
Thanks.
No, LINQ queries are not optimized in this way (what you describe is similar to the way SQL Server reuses a query execution plan). LINQ does not (and, for practical purposes, cannot) know enough about your objects in order to optimize this way. As far as it knows, your collection has changed (or is entirely different) between the two calls.
You're obviously aware of the ability to persist your query into a new List<T>, but apart from that there's really nothing that I can recommend without knowing more about your class and where else MyFunc is used.
As long as MyFunc1 doesn't need to modify the list by adding/removing objects, this will work.
void MyFunc(List<MyObj> objects)
{
ILookup<bool, MyObj> objLookup = objects.ToLookup(obj1 => obj1.Good);
MyFunc1(objLookup[true]);
foreach(MyObj obj in objLookup[true])
{
//..
}
}
void MyFunc1(IEnumerable<MyObj> objects)
{
//..
}

Filtering IEnumerable Pattern

Consider the following simple code pattern:
foreach(Item item in itemList)
{
if(item.Foo)
{
DoStuff(item);
}
}
If I want to parallelize it using Parallel Extensions(PE) I might simply replace the for loop construct as follows:
Parallel.ForEach(itemList, delegate(Item item)
{
if(item.Foo)
{
DoStuff(item);
}
});
However, PE will perform unnecessary work assigning work to threads for those items where Foo turned out to be false. Thus I was thinking an intermediate wrapper/filtering IEnumerable might be a reasonable approach here. Do you agree? If so what is the simplest way of achieving this? (BTW I'm currently using C#2, so I'd be grateful for at least one example that doesn't use lambda expressions etc.)
I'm not sure how the partitioning in PE for .NET 2 works, so it's difficult to say there. If each element is being pushed into a separate work item (which would be a fairly poor partitioning strategy), then filtering in advance would make quite a bit of sense.
If, however, item.Foo happened to be at all expensive (I wouldn't expect this, given that it's a property, but it's always possible), allowing it to be parallelized could be advantageous.
In addition, in .NET 4, the partitioning strategy used by the TPL will handle this fairly well. It was specifically designed to handle situations with varying levels of work. It does partitioning in "chunks", so one item does not get sent to one thread, but rather a thread gets assigned a set of items, which it processes in bulk. Depending on the frequency of item.Foo being false, paralellizing (using TPL) would quite possibly be faster than filtering in advance.
That all factors down to this single line:
Parallel.ForEach(itemList.Where(i => i.Foo), DoStuff);
But reading a comment to another post I now see you're in .Net 2.0 yet, so some of this may be a bit tricky to sneak past the compiler.
For .Net 2.0, I think you can do it like this (I'm a little unclear that passing the method names as delegates will still just work, but I think it will):
public IEnumerable<T> Where(IEnumerable<T> source, Predicate<T> predicate)
{
foreach(T item in source)
if (predicate(item))
yield return item;
}
public bool HasFoo(Item item) { return item.Foo; }
Parallel.ForEach(Where(itemList, HasFoo), DoStuff);
If I was to implement this, I would simply filter the list, before calling the foreach.
var actionableResults = from x in ItemList WHERE x.Foo select x;
This will filter the list to get the items that can be acted upon.
NOTE: this might be a pre-mature optimization, and could not make a major difference in your performance.

Categories