Difference between using where/lambda and for each - c#

I'm new to C# and came across this function preformed on a dictionary.
_objDictionary.Keys.Where(a => (a is fooObject)).ToList().ForEach(a => ((fooObject)a).LaunchMissles());
My understanding is that this essentially puts every key that is a fooObject into a list, then performs the LaunchMissles function of each. How is that different than using a for each loop like this?
foreach(var entry in _objDictionary.Keys)
{
if (entry is fooObject)
{
entry.LaunchMissles();
}
}
EDIT: The resounding opinion appears to be that there is no functional difference.

This is good example of abusing LINQ - statement did not become more readable or better in any other way, but some people just like to put LINQ everywhere. Though in this case you might take the best from both worlds by doing:
foreach(var entry in _objDictionary.Keys.OfType<FooObject>())
{
entry.LaunchMissles();
}
Note that in your foreach example you are missing a cast to FooObject to invoke LaunchMissles.

In general, Linq is no Voodomagic and does the same stuff under the hood that you would need to write if you werent using it. Linq just makes things easier to write but it wont beat regular code performance wise (if it really is equivalent)
In your case, your "oldschool" approach is perfectly fine and in my opinion the favorable
foreach(var entry in _objDictionary.Keys)
{
fooObject foo = entry as fooObject;
if (foo != null)
{
foo .LaunchMissles();
}
}
Regarding the Linq-Approach:
Materializing the Sequence to a List just to call a method on it, that does the same as the code above, is just wasting ressources and making it less readable.
In your example it doesnt make a diffrence but if the source wasnt a Collection (like Dictionary.Keys is) but an IEnumerable that really works the lazy way, then there can be a huge impact.
Lazy evalutation is designed to yield items when needed, calling ToList inbetween would first gather all items before actually executing the ForEach.
While the plain foreach-approach would get one item, then process it, then get the next and so on.
If you really want to use a "Linq-Foreach" than dont use the List-Implementation but roll your own extensionmethod (like mentioned in the comments below your quesiton)
public static class EnumerableExtensionMethods
{
public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
foreach(T item in sequence)
action(item);
}
}
Then still rolling with a regular foreach should be prefered, unless you put the foreach-body into a different method
sequence.ForEach(_methodThatDoesThejob);
That is the only "for me acceptable" way of using this.

Related

Using LINQ Where result in foreach: hidden if statement, double foreach?

foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}
Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.
To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.
It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.

which is more efficient in conditional looping?

suppose i have the following collection
IEnumerable<car> cars = new IEnumerable<car>();
now I need to loop on this collection.
I need to do some function depending on the car type; so I can do one of the following ways:
Method A:
foreach( var item in cars){
if(item.color == white){
doSomething();
}
else{
doSomeOtherThing();
}
}
or the other way:
Method B:
foreach( var item in cars.where(c=>c.color==white)){
doSomething();
}
foreach( var item in cars.where(c=>c.color!=white)){
doSomeOtherthing();
}
to me i think method A is better bec. I loop only once on the collection
while method B seems enticing bec. the framework will loop and filter the collection for you.
So which method is better and faster ?
Well, it depends on how complicated the filtering process is. It may be so insanely efficient that it's irrelevant, especially in light of the fact that you're no longer having to do your own filtering with the if statement.
I'll say one thing: unless your collections are massive, it probably won't make enough of a difference to care. And, sometimes, it's better to optimise for readabilty rather than speed :-)
But, if you really want to know, you measure! Time the operations in your environment with suitable production-like test data. That's the only way to be certain.
Method A is more readable than method B. Just one question, is it car.color or item.color?

Remove in foreach just before break

I know there are lots of ways to do it much better but I've seen it in existing code and now I'm wondering whether or not this could have any negative side effects. Please note the break right after Remove. Therefore I don't care about the iterator in general, however, I do care about unexpected behavior (-> potential exceptions).
foreach (var item in items)
{
//do stuff
if (item.IsSomething)
{
items.Remove(item); //is this safe???
break;
}
}
Could it also be possible the compiler optimizes something in a way I don't expect?
The compiler generates a call to Dispose() on the enumerator that is executed in a finally block, but that shouldn't be a problem. If you break right after removing the item, nothing bad should happen, since you don't use the enumerator anymore.
If you want to do it a different way though (for style reasons or whatever), you could do this:
var item = items.FirstOrDefault(i => i.IsSomething);
if (item != null) {
items.Remove(item);
}
It's also a bit shorter :) (I am assuming here you are using a reference or nullable type in your collection).
The compiler and everything else which is in touch with your application guarantees SC-DRF (sequential consistency for data-race-free programs), so you won't see the difference between the program you wrote and the program which is executed (which is anything but the same). Assuming items is not shared between multiple threads this is completely safe to write and has no unexpected behaviors others than if you would call Remove outside the loop.
You can't change the list while iterating within foreach.
The underlying collection cannot be modified while it's being enumerated. A standard approach is to keep the items to remove in second list , and then after Items has been enumerated, then remove each item from Items.
then u can do this -- its more efficient when dealing with large lists (Assuming entity framework)
var reducedList = items.where(a=>a.IsSomething).toList();
foreach(var item in reducedList)
{
reducedList.Remove(item);
}
this reduces the foreach loop iterations

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Filtering IEnumerable Pattern

Consider the following simple code pattern:
foreach(Item item in itemList)
{
if(item.Foo)
{
DoStuff(item);
}
}
If I want to parallelize it using Parallel Extensions(PE) I might simply replace the for loop construct as follows:
Parallel.ForEach(itemList, delegate(Item item)
{
if(item.Foo)
{
DoStuff(item);
}
});
However, PE will perform unnecessary work assigning work to threads for those items where Foo turned out to be false. Thus I was thinking an intermediate wrapper/filtering IEnumerable might be a reasonable approach here. Do you agree? If so what is the simplest way of achieving this? (BTW I'm currently using C#2, so I'd be grateful for at least one example that doesn't use lambda expressions etc.)
I'm not sure how the partitioning in PE for .NET 2 works, so it's difficult to say there. If each element is being pushed into a separate work item (which would be a fairly poor partitioning strategy), then filtering in advance would make quite a bit of sense.
If, however, item.Foo happened to be at all expensive (I wouldn't expect this, given that it's a property, but it's always possible), allowing it to be parallelized could be advantageous.
In addition, in .NET 4, the partitioning strategy used by the TPL will handle this fairly well. It was specifically designed to handle situations with varying levels of work. It does partitioning in "chunks", so one item does not get sent to one thread, but rather a thread gets assigned a set of items, which it processes in bulk. Depending on the frequency of item.Foo being false, paralellizing (using TPL) would quite possibly be faster than filtering in advance.
That all factors down to this single line:
Parallel.ForEach(itemList.Where(i => i.Foo), DoStuff);
But reading a comment to another post I now see you're in .Net 2.0 yet, so some of this may be a bit tricky to sneak past the compiler.
For .Net 2.0, I think you can do it like this (I'm a little unclear that passing the method names as delegates will still just work, but I think it will):
public IEnumerable<T> Where(IEnumerable<T> source, Predicate<T> predicate)
{
foreach(T item in source)
if (predicate(item))
yield return item;
}
public bool HasFoo(Item item) { return item.Foo; }
Parallel.ForEach(Where(itemList, HasFoo), DoStuff);
If I was to implement this, I would simply filter the list, before calling the foreach.
var actionableResults = from x in ItemList WHERE x.Foo select x;
This will filter the list to get the items that can be acted upon.
NOTE: this might be a pre-mature optimization, and could not make a major difference in your performance.

Categories