Related
I recently came across an issue where I was able to change the IEnumerable object that I was iterating over in a foreach loop. It's my understanding that in C#, you aren't supposed to be able to edit the list you're iterating over, but after some frustration, I found that this is exactly what was happening. I basically looped through a LINQ query and used the object IDs to make changes in the database on those objects and those changes affected the values in the .Where() statement.
Does anybody have an explanation for this? It seems like the LINQ query re-runs every time it's iterated over
NOTE: The fix for this is adding .ToList() after the .Where(), but my question is why this issue is happening at all i.e. if it's a bug or something I'm unaware of
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var i = 3;
var linqObj = aArray.Where(x => x == "a");
foreach (var item in linqObj ) {
aArray[i] = "b";
i--;
}
foreach (var arrItem in aArray) {
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}
This code is just a reproducible mockup, but I'd expect it to loop through 4 times and change all of the strings in aArray into b's. However, it only loops through twice and turns the last two strings in aArray into b's
EDIT: After some feedback and to be more concise, my main question here is this: "Why am I able to change what I'm looping over as I'm looping over it". Looks like the overwhelming answer is that LINQ does deferred execution, so it's re-evaluating as I'm looping through the LINQ IEnumerable.
EDIT 2: Actually looking through, it seems that everyone is concerned with the .Count() function, thinking that is what the issue here is. However, you can comment out that line and I still have the issue of the LINQ object changing. I updated the code to reflect the main issue
Why am I able to edit a LINQ list while iterating over it?
All of the answers that say that this is because of deferred "lazy" execution are wrong, in the sense that they do not adequately address the question that was asked: "Why am I able to edit a list while iterating over it?" Deferred execution explains why running the query twice gives different results, but does not address why the operation described in the question is possible.
The problem is actually that the original poster has a false belief:
I recently came across an issue where I was able to change the IEnumerable object that I was iterating over in a foreach loop. It's my understanding that in C#, you aren't supposed to be able to edit the list you're iterating over
Your understanding is wrong, and that's where the confusion comes from. The rule in C# is not "it is impossible to edit an enumerable from within an enumeration". The rule is you are not supposed to edit an enumerable from within an enumeration, and if you choose to do so, arbitrarily bad things can happen.
Basically what you're doing is running a stop sign and then asking "Running a stop sign is illegal, so why did the police not prevent me from running the stop sign?" The police are not required to prevent you from doing an illegal act; you are responsible for not making the attempt in the first place, and if you choose to do so, you take the chance of getting a ticket, or causing a traffic accident, or any other bad consequence of your poor choice. Usually the consequences of running a stop sign are no consequences at all, but that does not mean that it's a good idea.
Editing an enumerable while you're enumerating it is a bad practice, but the runtime is not required to be a traffic cop and prevent you from doing so. Nor is it required to flag the operation as illegal with an exception. It may do so, and sometimes it does do so, but there is not a requirement that it does so consistently.
You've found a case where the runtime does not detect the problem and does not throw an exception, but you do get a result that you find unexpected. That's fine. You broke the rules, and this time it just happens that the consequence of breaking the rules was an unexpected outcome. The runtime is not required to make the consequence of breaking the rules into an exception.
If you tried to do the same thing where, say, you called Add on a List<T> while enumerating the list, you'd get an exception because someone wrote code in List<T> that detects that situation.
No one wrote that code for "linq over an array", and so, no exception. The authors of LINQ were not required to write that code; you were required to not write the code you wrote! You chose to write a bad program that violates the rules, and the runtime is not required to catch you every time you write a bad program.
It seems like the LINQ query re-runs every time it's iterated over
That is correct. A query is a question about a data structure. If you change that data structure, the answer to the question can change. Enumerating the query answers the question.
However, that is an entirely different issue than the one in the title of your question. You really have two questions here:
Why can I edit an enumerable while I am enumerating it?
You can do this bad practice because nothing stops you from writing a bad program except your good sense; write better programs that do not do this!
Does a query re-execute from scratch every time I enumerate it?
Yes; a query is a question, not an answer. An enumeration of the query is an answer, and the answer can change over time.
The explanation to your first question, why your LINQ query re-runs every time it's iterated over is because of Linq's deferred execution.
This line just declares the linq exrpession and does not execute it:
var linqLIST = aArray.Where(x => x == "a");
and this is where it gets executed:
foreach (var arrItem in aArray)
and
Console.WriteLine(linqList.Count());
An explict call ToList() would run the Linq expression immediately. Use it like this:
var linqList = aArray.Where(x => x == "a").ToList();
Regarding the edited question:
Of course, the Linq expression is evaluated in every foreach iteration. The issue is not the Count(), instead every call to the LINQ expression re-evaluates it. As mentioned above, enumerate it to a List and iterate over the list.
Late edit:
Concerning #Eric Lippert's critique, I will also refer and go into detail for the rest of the OP's questions.
//Why does this only print out 2 a's and 2 b's, rather than 4 b's?
In the first loop iteration i = 3, so after aArray[3] = "b"; your array will look like this:
{ "a", "a", "a", "b" }
In the second loop iteration i(--) has now the value 2 and after executing aArray[i] = "b"; your array will be:
{ "a", "a", "b", "b" }
At this point, there are still a's in your array but the LINQ query returns IEnumerator.MoveNext() == false and as such the loop reaches its exit condition because the IEnumerator internally used, now reaches the third position in the index of the array and as the LINQ is re-evaluated it doesn't match the where x == "a" condition any more.
Why am I able to change what I'm looping over as I'm looping over it?
You are able to do so because the build in code analyser in Visual Studio is not detecting that you modify the collection within the loop. At runtime the array is modified, changing the outcome of the LINQ query but there is no handling in the implementation of the array iterator so no exception is thrown.
This missing handling seems by design, as arrays are of fixed size oposed to lists where such an exception is thrown at runtime.
Consider following example code which should be equivalent with your initial code example (before edit):
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var iterationList = aArray.Where(x => x == "a").ToList();
foreach (var item in iterationList)
{
var index = iterationList.IndexOf(item);
iterationList.Remove(item);
iterationList.Insert(index, "b");
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem);
}
Console.ReadKey();
}
}
}
This code will compile and iterate the loop once before throwing an System.InvalidOperationException with the message:
Collection was modified; enumeration operation may not execute.
Now the reason why the List implementation throws this error while enumerating it, is because it follows a basic concept: For and Foreach are iterative control flow statements that need to be deterministic at runtime. Furthermore the Foreach statement is a C# specific implementation of the iterator pattern, which defines an algorithm that implies sequential traversal and as such it would not change within the execution. Thus the List implementation throws an exception when you modify the collection while enumerating it.
You found one of the ways to modify a loop while iterating it and re-eveluating it in each iteration. This is a bad design choice because you might run into an infinite loop if the LINQ expression keeps changing the results and never meets an exit condition for the loop. This will make it hard to debug and will not be obvious when reading the code.
In contrast there is the while control flow statement which is a conditional construct and is ment to be non-deterministic at runtime, having a specific exit condition that is expected to change while execution.
Consider this rewrite base on your example:
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
bool arrayHasACondition(string x) => x == "a";
while (aArray.Any(arrayHasACondition))
{
var index = Array.FindIndex(aArray, arrayHasACondition);
aArray[index] = "b";
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}
I hope this should outline the technical background and explain your false expectations.
Enumerable.Where returns an instance that represents a query definition. When it is enumerated*, the query is evaluted. foreach allows you to work with each item at the time it is found by the query. The query is deferred, but it also pause-able/resume-able, by the enumeration mechanisms.
var aArray = new string[] { "a", "a", "a", "a" };
var i = 3;
var linqObj = aArray.Where(x => x == "a");
foreach (var item in linqObj )
{
aArray[i] = "b";
i--;
}
At the foreach loop, linqObj is enumerated* and the query is started.
The first item is examined and a match is found. The query is paused.
The loop body happens: item="a", aArray[3]="b", i=2
Back to the foreach loop, the query is resumed.
The second item is examined and a match is found. The query is paused.
The loop body happens: item="a", aArray[2]="b", i=2
Back to the foreach loop, the query is resumed.
The third item is examined and is "b", not a match.
The fourth item is examined and is "b", not a match.
The loop exits and the query concludes.
Note: is enumerated* : this means GetEnumerator and MoveNext are called. This does not mean that the query is fully evaluated and results held in a snapshot.
For further understanding, read up on yield return and how to write a method that uses that language feature. If you do this, you'll understand what you need in order to write Enumerable.Where
IEnumerable in c# is lazy. This means whenever you force it to evaluate you get the result. In your case Count() forces the linqLIST to evaluate every time you call it. by the way, linqLIST is not a list right now
You could upgrade the «avoid side-effects while enumerating an array» advice to a requirement, by utilizing the extension method below:
private static IEnumerable<T> DontMessWithMe<T>(this T[] source)
{
var copy = source.ToArray();
return source.Zip(copy, (x, y) =>
{
if (!EqualityComparer<T>.Default.Equals(x, y))
throw new InvalidOperationException(
"Array was modified; enumeration operation may not execute.");
return x;
});
}
Now chain this method to your query and watch what happens. 😃
var linqObj = aArray.DontMessWithMe().Where(x => x == "a");
Of course this comes with a cost. Now every time you enumerate the array, a copy is created. This is why I don't expect that anyone will use this extension, ever!
foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}
Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.
To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.
It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.
I came across this implementation in Enumerable.cs by reflector.
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
//check parameters
TSource local = default(TSource);
long num = 0L;
foreach (TSource local2 in source)
{
if (predicate(local2))
{
local = local2;
num += 1L;
//I think they should do something here like:
//if (num >= 2L) throw Error.MoreThanOneMatch();
//no necessary to continue
}
}
//return different results by num's value
}
I think they should break the loop if there are more than 2 items meets the condition, why they always loop through the whole collection? In case of that reflector disassembles the dll incorrectly, I write a simple test:
class DataItem
{
private int _num;
public DataItem(int num)
{
_num = num;
}
public int Num
{
get{ Console.WriteLine("getting "+_num); return _num;}
}
}
var source = Enumerable.Range(1,10).Select( x => new DataItem(x));
var result = source.Single(x => x.Num < 5);
For this test case, I think it will print "getting 0, getting 1" and then throw an exception. But the truth is, it keeps "getting 0... getting 10" and throws an exception.
Is there any algorithmic reason they implement this method like this?
EDIT Some of you thought it's because of side effects of the predicate expression, after a deep thought and some test cases, I have a conclusion that side effects doesn't matter in this case. Please provide an example if you disagree with this conclusion.
Yes, I do find it slightly strange especially because the overload that doesn't take a predicate (i.e. works on just the sequence) does seem to have the quick-throw 'optimization'.
In the BCL's defence however, I would say that the InvalidOperation exception that Single throws is a boneheaded exception that shouldn't normally be used for control-flow. It's not necessary for such cases to be optimized by the library.
Code that uses Single where zero or multiple matches is a perfectly valid possibility, such as:
try
{
var item = source.Single(predicate);
DoSomething(item);
}
catch(InvalidOperationException)
{
DoSomethingElseUnexceptional();
}
should be refactored to code that doesn't use the exception for control-flow, such as (only a sample; this can be implemented more efficiently):
var firstTwo = source.Where(predicate).Take(2).ToArray();
if(firstTwo.Length == 1)
{
// Note that this won't fail. If it does, this code has a bug.
DoSomething(firstTwo.Single());
}
else
{
DoSomethingElseUnexceptional();
}
In other words, we should leave the use of Single to cases when we expect the sequence to contain only one match. It should behave identically to Firstbut with the additional run-time assertion that the sequence doesn't contain multiple matches. Like any other assertion, failure, i.e. cases when Single throws, should be used to represent bugs in the program (either in the method running the query or in the arguments passed to it by the caller).
This leaves us with two cases:
The assertion holds: There is a single match. In this case, we want Single to consume the entire sequence anyway to assert our claim. There's no benefit to the 'optimization'. In fact, one could argue that the sample implementation of the 'optimization' provided by the OP will actually be slower because of the check on every iteration of the loop.
The assertion fails: There are zero or multiple matches. In this case, we do throw later than we could, but this isn't such a big deal since the exception is boneheaded: it is indicative of a bug that must be fixed.
To sum up, if the 'poor implementation' is biting you performance-wise in production, either:
You are using Single incorrectly.
You have a bug in your program. Once the bug is fixed, this particular performance problem will go away.
EDIT: Clarified my point.
EDIT: Here's a valid use of Single, where failure indicates bugs in the calling code (bad argument):
public static User GetUserById(this IEnumerable<User> users, string id)
{
if(users == null)
throw new ArgumentNullException("users");
// Perfectly fine if documented that a failure in the query
// is treated as an exceptional circumstance. Caller's job
// to guarantee pre-condition.
return users.Single(user => user.Id == id);
}
Update:
I got some very good feedback to my answer, which has made me re-think. Thus I will first provide the answer that states my "new" point of view; you can still find my original answer just below. Make sure to read the comments in-between to understand where my first answer misses the point.
New answer:
Let's assume that Single should throw an exception when it's pre-condition is not met; that is, when Single detects than either none, or more than one item in the collection matches the predicate.
Single can only succeed without throwing an exception by going through the whole collection. It has to make sure that there is exactly one matching item, so it will have to check all items in the collection.
This means that throwing an exception early (as soon as it finds a second matching item) is essentially an optimization that you can only benefit from when Single's pre-condition cannot be met and when it will throw an exception.
As user CodeInChaos says clearly in a comment below, the optimization wouldn't be wrong, but it is meaningless, because one usually introduces optimizations that will benefit correctly-working code, not optimizations that will benefit malfunctioning code.
Thus, it is actually correct that Single could throw an exception early; but it doesn't have to, because there's practically no added benefit.
Old answer:
I cannot give a technical reason why that method is implemented the way it is, since I didn't implement it. But I can state my understanding of the Single operator's purpose, and from there draw my personal conclusion that it is indeed badly implemented:
My understanding of Single:
What is the purpose of Single, and how is it different from e.g. First or Last?
Using the Single operator basically expresses one's assumption that exactly one item must be returned from the collection:
If you don't specify a predicate, it should mean that the collection is expected to contain exactly one item.
If you do specify a predicate, it should mean that exactly one item in the collection is expected to satisfy that condition. (Using a predicate should have the same effect as items.Where(predicate).Single().)
This is what makes Single different from other operators such as First, Last, or Take(1). None of those operators have the requirement that there should be exactly one (matching) item.
When should Single throw an exception?
Basically, when it finds that your assumption was wrong; i.e. when the underlying collection does not yield exactly one (matching) item. That is, when there are zero or more than one items.
When should Single be used?
The use of Single is appropriate when your program's logic can guarantee that the collection will yield exactly one item, and one item only. If an exception gets thrown, that should mean that your program's logic contains a bug.
If you process "unreliable" collections, such as I/O input, you should first validate the input before you pass it to Single. Single, together with an exception catch block, is not appropriate for making sure that the collection has only one matching item. By the time you invoke Single, you should already have made sure that there'll be only one matching item.
Conclusion:
The above states my understanding of the Single LINQ operator. If you follow and agree with this understanding, you should come to the conclusion that Single ought to throw an exception as early as possible. There is no reason to wait until the end of the (possibly very large) collection, because the pre-condition of Single is violated as soon as it detects a second (matching) item in the collection.
When considering this implementation we must remember that this is the BCL: general code that is supposed to work good enough in all sorts of scenarios.
First, take these scenarios:
Iterate over 10 numbers, where the first and second elements are equal
Iterate over 1.000.000 numbers, where the first and third elements are equal
The original algorithm will work well enough for 10 items, but 1M will have a severe waste of cycles. So in these cases where we know that there are two or more early in the sequences, the proposed optimization would have a nice effect.
Then, look at these scenarios:
Iterate over 10 numbers, where the first and last elements are equal
Iterate over 1.000.000 numbers, where the first and last elements are equal
In these scenarios the algorithm is still required to inspect every item in the lists. There is no shortcut. The original algorithm will perform good enough, it fulfills the contract. Changing the algorithm, introducing an if on each iteration will actually decrease performance. For 10 items it will be negligible, but 1M it will be a big hit.
IMO, the original implementation is the correct one, since it is good enough for most scenarios. Knowing the implementation of Single is good though, because it enables us to make smart decisions based on what we know about the sequences we use it on. If performance measurements in one particular scenario shows that Single is causing a bottleneck, well: then we can implement our own variant that works better in that particular scenario.
Update: as CodeInChaos and Eamon have correctly pointed out, the if test introduced in the optimization is indeed not performed on each item, only within the predicate match block. I have in my example completely overlooked the fact that the proposed changes will not affect the overall performance of the implementation.
I agree that introducing the optimization would probably benefit all scenarios. It is good to see though that eventually, the decision to implement the optimization is made on the basis of performance measurements.
I think it's a premature optimization "bug".
Why this is NOT reasonable behavior due to side effects
Some have argued that due to side effects, it should be expected that the entire list is evaluated. After all, in the correct case (the sequence indeed has just 1 element) it is completely enumerated, and for consistency with this normal case it's nicer to enumerate the entire sequence in all cases.
Although that's a reasonable argument, it flies in the face of the general practice throughout the LINQ libraries: they use lazy evaluation everywhere. It's not general practice to fully enumerate sequences except where absolutely necessary; indeed, several methods prefer using IList.Count when available over any iteration at all - even when that iteration may have side effects.
Further, .Single() without predicate does not exhibit this behavior: that terminates as soon as possible. If the argument were that .Single() should respect side-effects of enumeration, you'd expect all overloads to do so equivalently.
Why the case for speed doesn't hold
Peter Lillevold made the interesting observation that it may be faster to do...
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
}
if(count!=1)...
than
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
if(count>1) ...
}
if(count==0)...
After all, the second version, which would exit the iteration as soon as the first conflict is detected, would require an extra test in the loop - a test which in the "correct" is purely ballast. Neat theory, right?
Except, that's not bourne out by the numbers; for example on my machine (YMMV) Enumerable.Range(0,100000000).Where(x=>x==123).Single() is actually faster than Enumerable.Range(0,100000000).Single(x=>x==123)!
It's possibly a JITter quirk of this precise expression on this machine - I'm not claiming that Where followed by predicateless Single is always faster.
But whatever the case, the fail-fast solution is very unlikely to be significantly slower. After all, even in the normal case, we're dealing with a cheap branch: a branch that is never taken and thus easy on the branch predictor. And of course; the branch is further only ever encountered when pred holds - that's once per call in the normal case. That cost is simply negligible compared to the cost of the delegate call pred and its implementation, plus the cost of the interface methods .MoveNext() and .get_Current() and their implementations.
It's simply extremely unlikely that you'll notice the performance degradation caused by one predictable branch in comparison to all that other abstraction penalty - not to mention the fact that most sequences and predicates actually do something themselves.
It seems very clear to me.
Single is intended for the case where the caller knows that the enumeration contains exactly one match, since in any other case an expensive exception is thrown.
For this use case, the overload that takes a predicate must iterate over the whole enumeration. It is slightly faster to do so without an additional condition on every loop.
In my view the current implementation is correct: it is optimized for the expected use case of an enumeration that contains exactly one matching element.
That does appear to be a bad implementation, in my opinion.
Just to illustrate the potential severity of the problem:
var oneMillion = Enumerable.Range(1, 1000000)
.Select(x => { Console.WriteLine(x); return x; });
int firstEven = oneMillion.Single(x => x % 2 == 0);
The above will output all the integers from 1 to 1000000 before throwing an exception.
It's a head-scratcher for sure.
I only found this question after filing a report at https://connect.microsoft.com/VisualStudio/feedback/details/810457/public-static-tsource-single-tsource-this-ienumerable-tsource-source-func-tsource-bool-predicate-doesnt-throw-immediately-on-second-matching-result#
The side-effect argument doesn't hold water, because:
Having side-effects isn't really functional, and they're called Func for a reason.
If you do want side-effects, it makes no more sense to claim the version that has the side-effects throughout the whole sequence is desirable than it does to claim so for the version that throws immediately.
It does not match the behaviour of First or the other overload of Single.
It does not match at least some other implementations of Single, e.g. Linq2SQL uses TOP 2 to ensure that only the two matching cases needed to test for more than one match are returned.
We can construct cases where we should expect a program to halt, but it does not halt.
We can construct cases where OverflowException is thrown, which is not documented behaviour, and hence clearly a bug.
Most importantly of all, if we're in a condition where we expected the sequence to have only one matching element, and yet we're not, then something has clearly gone wrong. Apart from the general principle that the only thing you should do upon detecting an error state is clean-up (and this implementation delays that) before throwing, the case of an sequence having more than one matching element is going to overlap with the case of a sequence having more elements in total than expected - perhaps because the sequence has a bug that is causing it to loop unexpectedly. So it's precisely in one possible set of bugs that should trigger the exception, that the exception is most delayed.
Edit:
Peter Lillevold's mention of a repeated test may be a reason why the author chose to take the approach they did, as an optimisation for the non-exceptional case. If so it was needless though, even aside from Eamon Nerbonne showing it wouldn't improve much. There's no need to have a repeated test in the initial loop, as we can just change what we're testing for upon the first match:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if(source == null)
throw new ArgumentNullException("source");
if(predicate == null)
throw new ArgumentNullException("predicate");
using(IEnumerator<TSource> en = source.GetEnumerator())
{
while(en.MoveNext())
{
TSource cur = en.Current;
if(predicate(cur))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("Sequence contains more than one matching element");
return cur;
}
}
}
throw new InvalidOperationException("Sequence contains no matching element");
}
For most of the time I am using All(and returns true) instead of ForEach. Is it a good practice to use ALL instead of ForEach all the time(in case of IEnumerable), I understand All could be run on IEnumerable whereas foreach runs only on list
var wells = GlobalDataModel.WellList.Where(u => u.RefProjectName == project.OldProjectName);
if (wells.Any())
{
wells.All(u =>
{
u.RefProjectName = project.ProjectName;
return true;
});
}
var wellsList = GlobalDataModel.WellList.Where(u => u.RefProjectName == project.OldProjectName).ToList();
wellsList.ForEach(u => u.RefProjectName = project.ProjectName);
Nope, You're abusing the All method. Take a look at documentation
Determines whether all elements of a sequence satisfy a condition.
It should be used to determine all elements are true/false based on some condition, It is not meant to used to produce side effects.
List.ForEach is meant to be used for side effects. You may use it if you already have List<T> upfront. Calling ToList and creating new List just for the sake of List.ForEach is not worth. It adds another O(n) operation.
In short don't use All for side effects, List.ForEach is barely acceptable when you have list already. Recommended way is use loop of your choice, nothing can be better than that.
Ericlippert has something to say about ForEach, note that it is removed in ModernUI apps, may be removed in desktop version of .net too.
If you're checking whether all elements satisfy some condition, use All.
But, if you need to perform some operation on each element, don't use All or any other LINQ predicate (Where, Select, Any, etc.), as they are meant to be used in a purely functional way. You can iterate over the elements using a foreach .. in loop or if you prefer with the List<T>.ForEach method. However as you mention it is part of List<T> and makes your code slightly harder to change (e.g. from list to another enumerable).
See here for a discussion about the "right way" to use LINQ.
For example, you can write your code like this:
foreach (var u in GlobalDataModel.WellList
.Where(u => u.RefProjectName == project.OldProjectName))
{
u.RefProjectName = project.ProjectName;
}
It's more obvious that side effects are being done. Also, this will only iterate once over the sequence, skipping the elements that don't satisfy the condition.
I'm learning the basics of programming here (C#) but I think this question is generic in its nature.
What are some simple practical situations that lend themselves closer to a particular type of loop?
The while and for loops seem pretty similar and there are several SO questions addressing the differences between the two. How about foreach? From my basic understanding, its seems I ought to be able to do everything a foreach loop does within a for loop.
Which ever works best for code readability. In other words use the one that fits the situation best.
while: When you have a condition that needs to be checked at the start of each loop. e.g. while(!file.EndOfFile) { }
for: When you have an index or counter you are incrementing on each loop. for (int i = 0; i<array.Length; i++) { }. Essentially, the thing you are looping over is an indexable collection, array, list, etc.
foreach: When you are looping over a collection of objects or other Enumerable. In this event you may not know (or care) the size of the collection, or the collection is not index based (e.g. a set of objects). Generally I find foreach loops to be the most readable when I'm not interested in the index of something or any other exit conditions.
Those are my general rules of thumb anyway.
1. foreach and for
A foreach loop works with IEnumerator, when a for loop works with an index (in object myObject = myListOfObjects[i], i is the index).
There is a big difference between the two:
an index can access directly any object based on its position within a list.
an enumerator can only access the first element of a list, and then move to the next element (as described in the previous link from the msdn). It cannot access an element directly, just knowing the index of the element within a list.
So an enumerator may seem less powerful, but:
you don't always know the position of elements in a group, because all groups are not ordered/indexed.
you don't always know the number of elements in a list (think about a linked list).
even when it's ordered, the indexed access of a list may be based internally on an enumerator, which means that each time you're accessing an element by its position you may be actually enumerating all elements of the list up until the element you want.
indexes are not always numeric. Think about Dictionary.
So actually the big strength of the foreach loop and the underlying use of IEnumerator is that it applies to any type which implements IEnumerable (implementing IEnumerable just means that you provide a method that returns an enumerator). Lists, Arrays, Dictionaries, and all other group types all implement IEnumerable. And you can be sure that the enumerator they have is as good as it gets: you won't find a fastest way to go through a list.
So, the for loop can generally be considered as a specialized foreach loop:
public void GoThrough(List<object> myList)
{
for (int i=0; i<myList.Count; i++)
{
MessageBox.Show(myList[i].ToString());
}
}
is perfectly equivalent to:
public void GoThrough(List<object> myList)
{
foreach (object item in myList)
{
MessageBox.Show(item.ToString());
}
}
I said generally because there is an obvious case when the for loop is necessary: when you need the index (i.e. the position in the list) of the object, for some reason (like displaying it). You will though eventually realize that this happens only in specific cases when you do good .NET programming, and that foreach should be your default candidate for loops over a group of elements.
Now to keep comparing the foreach loop, it is indeed just an eye-candy specific while loop:
public void GoThrough(IEnumerable myEnumerable)
{
foreach (object obj in myEnumerable)
{
MessageBox.Show(obj.ToString());
}
}
is perfectly equivalent to:
public void GoThrough(IEnumerable myEnumerable)
{
IEnumerator myEnumerator = myEnumerable.GetEnumerator();
while (myEnumerator.MoveNext())
{
MessageBox.Show(myEnumerator.Current.ToString());
}
}
The first writing is a lot simpler though.
2. while and do..while
The while (condition) {action} loop and the do {action} while (condition) loop just differ from each other by the fact that the first one tests the condition before applying the action, when the second one applies the action, then tests the condition. The do {..} while (..) loop is used quite marginally compared to the others, since it runs the action at least once even if the condition is initially not met (which can lead to trouble, since the action is generally dependent on the condition).
The while loop is more general than the for and foreach ones, which apply specifically to lists of objects. The while loop just has a condition to go on, which can be based on anything. For example:
string name = string.empty;
while (name == string.empty)
{
Console.WriteLine("Enter your name");
name = Console.ReadLine();
}
asks the user to input his name then press Enter, until he actually inputs something. Nothing to do with lists, as you can see.
3. Conclusion
When you are going through a list, you should use foreach unless you need the numeric index, in which case you should use for.
When it doesn't have anything to do with list, and it's just a procedural construction, you should use while(..) {..}.
Now to conclude with something less restrictive: your first goal with .NET should be to make your code readable/maintainable and make it run fast, in that order of priority. Anything that achieves that is good for you. Personally though, I think the foreach loop has the advantage that potentially, it's the most readable and the fastest.
Edit: there is an other case where the for loop is useful: when you need indexing to go through a list in a special way or if you need to modify the list when in the loop. For example, in this case because we want to remove every null element from myList:
for (int i=myList.Count-1; i>=0; i--)
{
if (myList[i] == null) myList.RemoveAt(i);
}
You need the for loop here because myList cannot be modified from within a foreach loop, and we need to go through it backwards because if you remove the element at the position i, the position of all elements with an index >i will change.
But the use for these special constructions have been reduced since LINQ. The last example can be written like this in LINQ for example:
myList.RemoveAll(obj => obj == null);
LINQ is a second step though, learn the loops first.
when you know how many iterations there will be use for
when you don't know use while, when don't know and need to execute code at least once use do
when you iterate through collection and don't need index use foreach
(also you can not use collection[i] on everything that you can use foreach on)
As others have said, 'it depends'.
I find I use simple 'for' loops very rarely nowadays. If you start to use Linq you'll find you either don't need loops at all and when you do it's the 'foreach' loop that's called for.
Ultimately I agree with Colin Mackay - code for readability!
The do while loop has been forgotten, I think :)
Taken from here.
The C# while statement executes a statement or a block of statements until a specified expression evaluates to false . In some situation you may want to execute the loop at least one time and then check the condition. In this case you can use do..while loop.
The difference between do..while and while is that do..while evaluates its expression at the bottom of the loop instead of the top. Therefore, the statements within the do block are always executed at least once. From the following example you can understand how do..while loop function.
using System;
using System.Windows.Forms;
namespace WindowsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
int count = 5;
do{
MessageBox.Show(" Loop Executed ");
count++;
}while (count <=4);
}
private void button2_Click(object sender, EventArgs e)
{
int count = 5;
while (count <=4){
MessageBox.Show(" Loop Executed ");
count++;
}
}
}
}
If you have a collection and you kow upfront you're going to systematically pass through all values, use foreach as it is usually easier to work with the "current" instance.
If some condition can make you stop iterating, you can use for or while. They are pretty similar, the big difference being that for takes control of when the current index or value is updated (in the for declaration) as in a while you decide when and where in the while block to update some values that are then checked in the while predicate.
If you are writing a parser class, lets say an XMLParser that will read XML nodes from a given source, you can use while loop as you don't know how many tags are there.
Also you can use while when you iterate if the variable is true or not.
You can use for loop if you want to have a bit more control over your iterations