.NET loop integrity 101 - c#

I've always been confused about this one. Consider the following loops:
int [] list = new int [] { 1, 2, 3 };
for (int i=0; i < list.Length; i++) { }
foreach (int i in list) { }
while (list.GetEnumerator().MoveNext()) { } // Yes, yes you wouldn't call GetEnumerator with the while. Actually never tried that.
The [list] above is hard-coded. If the list was changed externally while the loop was going through iterations, what would happen?
What if the [list] was a readonly property e.g. int List{get{return(new int [] {1,2,3});}}? Would this upset the loop. If not, would it create a new instance in each iteration?

Well:
The for loop checks against list.Length on each iteration; you don't actually access list within the loop, so the contents would be irrevelant
The foreach loop only uses list to get the iterator; changing it to refer to a different list would make no difference, but if you modified the list itself structurally (e.g. by adding a value, if this were really a List<int> instead of an int[]), that would invalidate the iterator
Your third example as written would go on forever unless the list were cleared, given that it'll get a new iterator each time. If you want a more sensible explanation, post more sensible code
Fundamentally you need to understand the difference between the contents of an array vs changing which object a variable refers to - and then give us a very concrete situation to explain. In general though, a foreach loop only directly touches the source expression once, when it fetches the iterator - whereas a for loop has no magic in it, and how often it's accessed simply depends on the code - both the condition and the "step" part of the for loop are executed on each iteration, so if you refer to the variable in either of those parts, you'll see any changes...

Related

C# ForEach Loop (string declaration in loop)

I have this string array of names.
string[] someArray = { "Messi", "Ronaldo", "Bale", "Neymar", "Pele" };
foreach (string item in someArray)
{
Console.WriteLine(item);
}
When I run in debug mode with breakpoint on foreach line, and using Step Into, string item is highlighted each time with each pass like it declares it again. I suppose it should highlight item only. Does this mean that with each time item is declared again?
Note: using VS 2012.
The answer here is subtle because the question is (accidentally) unclear. Let me clarify it by breaking it apart into multiple questions.
Is the loop variable logically redeclared every time I go through a foreach loop?
As the other answer correctly notes, in C# 5 and higher, yes. In C# 1 through 4, no.
What is the observable consequence of that change?
In C# 1 there is no way to tell the difference. (And the spec is actually unclear as to where the variable is declared.)
In C# 2 the team added closures in the form of anonymous methods. If there are multiple closures that capture a loop variable in C# 2, 3 and 4 they all capture the same variable, and they all see it mutate as the loop runs. This is almost never what you want.
In C# 5 we took the breaking change to say that a new loop variable is declared every time through the loop, and so each closure captures a different variable and therefore does not observe it changing.
Why does the debugger go back to the loop variable declaration every time I step through the loop?
To inform you that the loop variable is being mutated. If you have
IEnumerable<string> someCollection = whatever;
foreach(string item in someCollection)
{
Console.WriteLine(item);
}
That is logically equivalent to
IEnumerable<string> someCollection = whatever;
{
IEnumerator<string> enumerator = someCollection.GetEnumerator();
try
{
while( enumerator.MoveNext() )
{
string item; // In C# 4.0 and before, this is outside the while
item = enumerator.Current;
{
Console.WriteLine(item);
}
}
}
finally
{
if (enumerator != null)
((IDisposable)enumerator).Dispose();
}
}
If you were debugging around that expanded version of the code, you would see the debugger highlight the invocations of MoveNext and Current. Well, the debugger does the same thing in the compact form; when MoveNext is about to be called the debugger highlights the in, for lack of anything better to highlight. When Current is about to be called it highlights item. Again, what would be the better thing to highlight?
So the fact that the debugger highlights the variable has nothing to do with whether the variable is redeclared?
Correct. It highlights the variable when Current is about to be called, because that will mutate the variable. It mutates the variable regardless of whether it is a new variable, in C# 5, or the same variable as before, in C# 4.
I noticed that you changed my example to use a generic sequence rather than an array. Why did you do that?
Because if the C# compiler knows that the collection in the foreach is an array, it actually just generates a good old fashioned for loop rather than calling MoveNext and Current and all that. Doing so is typically faster and produces less collection pressure, so it's a win all around.
However, the C# team wants the debugging experience to feel the same regardless. So again, the code generator creates stepping points on in -- this time where the generated for's loop variable is mutated -- and again, on the code which mutates the foreach loop variable. That way the debugging experience is consistent.
It depends on which version of C# you are targeting. In C# 5, it's a new variable. Before that, the variable was reused.
See Eric Lippert's blog post here: http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
From the post:
UPDATE: We are taking the breaking change. In C# 5, the loop variable
of a foreach will be logically inside the loop, and therefore closures
will close over a fresh copy of the variable each time. The "for" loop
will not be changed. We return you now to our original article.

When to use each type of loop?

I'm learning the basics of programming here (C#) but I think this question is generic in its nature.
What are some simple practical situations that lend themselves closer to a particular type of loop?
The while and for loops seem pretty similar and there are several SO questions addressing the differences between the two. How about foreach? From my basic understanding, its seems I ought to be able to do everything a foreach loop does within a for loop.
Which ever works best for code readability. In other words use the one that fits the situation best.
while: When you have a condition that needs to be checked at the start of each loop. e.g. while(!file.EndOfFile) { }
for: When you have an index or counter you are incrementing on each loop. for (int i = 0; i<array.Length; i++) { }. Essentially, the thing you are looping over is an indexable collection, array, list, etc.
foreach: When you are looping over a collection of objects or other Enumerable. In this event you may not know (or care) the size of the collection, or the collection is not index based (e.g. a set of objects). Generally I find foreach loops to be the most readable when I'm not interested in the index of something or any other exit conditions.
Those are my general rules of thumb anyway.
1. foreach and for
A foreach loop works with IEnumerator, when a for loop works with an index (in object myObject = myListOfObjects[i], i is the index).
There is a big difference between the two:
an index can access directly any object based on its position within a list.
an enumerator can only access the first element of a list, and then move to the next element (as described in the previous link from the msdn). It cannot access an element directly, just knowing the index of the element within a list.
So an enumerator may seem less powerful, but:
you don't always know the position of elements in a group, because all groups are not ordered/indexed.
you don't always know the number of elements in a list (think about a linked list).
even when it's ordered, the indexed access of a list may be based internally on an enumerator, which means that each time you're accessing an element by its position you may be actually enumerating all elements of the list up until the element you want.
indexes are not always numeric. Think about Dictionary.
So actually the big strength of the foreach loop and the underlying use of IEnumerator is that it applies to any type which implements IEnumerable (implementing IEnumerable just means that you provide a method that returns an enumerator). Lists, Arrays, Dictionaries, and all other group types all implement IEnumerable. And you can be sure that the enumerator they have is as good as it gets: you won't find a fastest way to go through a list.
So, the for loop can generally be considered as a specialized foreach loop:
public void GoThrough(List<object> myList)
{
for (int i=0; i<myList.Count; i++)
{
MessageBox.Show(myList[i].ToString());
}
}
is perfectly equivalent to:
public void GoThrough(List<object> myList)
{
foreach (object item in myList)
{
MessageBox.Show(item.ToString());
}
}
I said generally because there is an obvious case when the for loop is necessary: when you need the index (i.e. the position in the list) of the object, for some reason (like displaying it). You will though eventually realize that this happens only in specific cases when you do good .NET programming, and that foreach should be your default candidate for loops over a group of elements.
Now to keep comparing the foreach loop, it is indeed just an eye-candy specific while loop:
public void GoThrough(IEnumerable myEnumerable)
{
foreach (object obj in myEnumerable)
{
MessageBox.Show(obj.ToString());
}
}
is perfectly equivalent to:
public void GoThrough(IEnumerable myEnumerable)
{
IEnumerator myEnumerator = myEnumerable.GetEnumerator();
while (myEnumerator.MoveNext())
{
MessageBox.Show(myEnumerator.Current.ToString());
}
}
The first writing is a lot simpler though.
2. while and do..while
The while (condition) {action} loop and the do {action} while (condition) loop just differ from each other by the fact that the first one tests the condition before applying the action, when the second one applies the action, then tests the condition. The do {..} while (..) loop is used quite marginally compared to the others, since it runs the action at least once even if the condition is initially not met (which can lead to trouble, since the action is generally dependent on the condition).
The while loop is more general than the for and foreach ones, which apply specifically to lists of objects. The while loop just has a condition to go on, which can be based on anything. For example:
string name = string.empty;
while (name == string.empty)
{
Console.WriteLine("Enter your name");
name = Console.ReadLine();
}
asks the user to input his name then press Enter, until he actually inputs something. Nothing to do with lists, as you can see.
3. Conclusion
When you are going through a list, you should use foreach unless you need the numeric index, in which case you should use for.
When it doesn't have anything to do with list, and it's just a procedural construction, you should use while(..) {..}.
Now to conclude with something less restrictive: your first goal with .NET should be to make your code readable/maintainable and make it run fast, in that order of priority. Anything that achieves that is good for you. Personally though, I think the foreach loop has the advantage that potentially, it's the most readable and the fastest.
Edit: there is an other case where the for loop is useful: when you need indexing to go through a list in a special way or if you need to modify the list when in the loop. For example, in this case because we want to remove every null element from myList:
for (int i=myList.Count-1; i>=0; i--)
{
if (myList[i] == null) myList.RemoveAt(i);
}
You need the for loop here because myList cannot be modified from within a foreach loop, and we need to go through it backwards because if you remove the element at the position i, the position of all elements with an index >i will change.
But the use for these special constructions have been reduced since LINQ. The last example can be written like this in LINQ for example:
myList.RemoveAll(obj => obj == null);
LINQ is a second step though, learn the loops first.
when you know how many iterations there will be use for
when you don't know use while, when don't know and need to execute code at least once use do
when you iterate through collection and don't need index use foreach
(also you can not use collection[i] on everything that you can use foreach on)
As others have said, 'it depends'.
I find I use simple 'for' loops very rarely nowadays. If you start to use Linq you'll find you either don't need loops at all and when you do it's the 'foreach' loop that's called for.
Ultimately I agree with Colin Mackay - code for readability!
The do while loop has been forgotten, I think :)
Taken from here.
The C# while statement executes a statement or a block of statements until a specified expression evaluates to false . In some situation you may want to execute the loop at least one time and then check the condition. In this case you can use do..while loop.
The difference between do..while and while is that do..while evaluates its expression at the bottom of the loop instead of the top. Therefore, the statements within the do block are always executed at least once. From the following example you can understand how do..while loop function.
using System;
using System.Windows.Forms;
namespace WindowsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
int count = 5;
do{
MessageBox.Show(" Loop Executed ");
count++;
}while (count <=4);
}
private void button2_Click(object sender, EventArgs e)
{
int count = 5;
while (count <=4){
MessageBox.Show(" Loop Executed ");
count++;
}
}
}
}
If you have a collection and you kow upfront you're going to systematically pass through all values, use foreach as it is usually easier to work with the "current" instance.
If some condition can make you stop iterating, you can use for or while. They are pretty similar, the big difference being that for takes control of when the current index or value is updated (in the for declaration) as in a while you decide when and where in the while block to update some values that are then checked in the while predicate.
If you are writing a parser class, lets say an XMLParser that will read XML nodes from a given source, you can use while loop as you don't know how many tags are there.
Also you can use while when you iterate if the variable is true or not.
You can use for loop if you want to have a bit more control over your iterations

Why can't I modify the loop variable in a foreach?

Why is a foreach loop a read only loop? What reasons are there for this?
I'm not sure exactly what you mean by a "readonly loop" but I'm guessing that you want to know why this doesn't compile:
int[] ints = { 1, 2, 3 };
foreach (int x in ints)
{
x = 4;
}
The above code will give the following compile error:
Cannot assign to 'x' because it is a 'foreach iteration variable'
Why is this disallowed? Trying to assigning to it probably wouldn't do what you want - it wouldn't modify the contents of the original collection. This is because the variable x is not a reference to the elements in the list - it is a copy. To avoid people writing buggy code, the compiler disallows this.
I would assume it's how the iterator travels through the list.
Say you have a sorted list:
Alaska
Nebraska
Ohio
In the middle of
foreach(var s in States)
{
}
You do a States.Add("Missouri")
How do you handle that? Do you then jump to Missouri even if you're already past that index.
If, by this, you mean:
Why shouldn't I modify the collection that's being foreach'd over?
There's no surety that the items that you're getting come out in a given order, and that adding an item, or removing an item won't cause the order of items in the collection to change, or even the Enumerator to become invalid.
Imagine if you ran the following code:
var items = GetListOfTOfSomething(); // Returns 10 items
int i = 0;
foreach(vat item in items)
{
i++;
if (i == 5)
{
items.Remove(item);
}
}
As soon as you hit the loop where i is 6 (i.e. after the item is removed) anything could happen. The Enumerator might have been invalidated due to you removing an item, everything might have "shuffled up by one" in the underlying collection causing an item to take the place of the removed one, meaning you "skip" one.
If you meant "why can't I change the value that is provided on each iteration" then, if the collection you're working with contains value types, any changes you make won't be preserved as it's a value you're working with, rather than a reference.
The foreach command uses the IEnumerable interface to loop throught the collection. The interface only defined methods for stepping through a collection and get the current item, there is no methods for updating the collection.
As the interface only defines the minimal methods required to read the collecton in one direction, the interface can be implemented by a wide range of collections.
As you only access a single item at a time, the entire collection doesn't have to exist at the same time. This is for example used by LINQ expressions, where it creates the result on the fly as you read it, instead of first creating the entire result and then let you loop through it.
Not sure what you mean with read-only but I'm guessing that understanding what the foreach loop is under the hood will help. It's syntactic sugar and could also be written something like this:
IEnumerator enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
T element = enumerator.Current;
//body goes here
}
If you change the collection (list) it's getting hard to impossible to figure out how to process the iteration.
Assigning to element (in the foreach version) could be viewed as either trying to assign to enumerator.Current which is read only or trying to change the value of the local holding a ref to enumerator.Current in which case you might as well introduce a local yourself because it no longer has anything to do with the enumerated list anymore.
foreach works with everything implementing the IEnumerable interface. In order to avoid synchronization issues, the enumerable shall never be modified while iterating on it.
The problems arise if you add or remove items in another thread while iterating: depending on where you are you might miss an item or apply your code to an extra item. This is detected by the runtime (in some cases or all???) and throws an exception:
System.InvalidOperationException was unhandled
Message="Collection was modified; enumeration operation may not execute."
foreach tries to get next item on each iteration which can cause trouble if you are modifying it from another thread at the same time.

Using foreach (...) syntax while also incrementing an index variable inside the loop

When looking at C# code, I often see patterns like this:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
int i = 0;
foreach (DataType item in items)
{
// Do some stuff with item, then finally
itemProps[i] = item.Prop;
i++;
}
The for-loop iterates over the objects in items, but also keeping a counter (i) for iterating over itemProps as well. I personally don't like this extra i hanging around, and instead would probably do something like:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
for (int i = 0; i < items.Length; i++)
{
// Do some stuff with items[i], then finally
itemProps[i] = items[i].Prop;
}
Is there perhaps some benfit to the first approach I'm not aware of? Is this a result of everybody trying to use that fancy foreach (...) syntax? I'm interested in your opinions on this.
If you are using C# 3.0 that will be better;
OtherDataType[] itemProps = items.Select(i=>i.Prop).ToArray();
With i being outside the array then if would be available after the completion of the loop. If you wanted to count the number of items and the collection didn't provide a .Count or .UBound property then this could be useful.
Like you I would normally use the second method, looks much cleaner to me.
In this case, I don't think so. Sometimes, though, the collection doesn't implement this[int index] but it does implement GetEnumerator(). In the latter case, you don't have much choice.
Some data structures are not well suited for random access but can be iterated over very fast ( Trees, linked lists, etc ). So if you need to iterate over one of these but need a count for some reason, your doomed to go the ugly way...
Semantically they may be equivalent, but in fact using foreach over an enumerator gives the compiler more scope to optimise.
I don't remember all the arguments off the top of my head,but they are well covered in Effective C#, which is recommended reading.
foreach (DataType item in items)
This foreach loop makes it crystal clear that you're iterating over all the DataType item of, well yes, items. Maybe it makes the code a little longer, but it's not a "bad" code. For the other for-loop, you need to check inside the brackets to have an idea for what this loop is used.
The problem with this example lies in the fact that you're iterating over two different arrays in the same time which we don't do that often.. so we are stuck between two strategies.. either we "hack a bit" the fancy-foreach as you call it or we get back on the old-not-so-loved for(int i = 0; i ...). (There are other ways than those 2, of course)
So, I think it's the Vim vs Emacs things coming back in your question with the For vs Foreach loop :) People who like the for(), will say this foreach is useless, might cause performance issues and is just big. People who prefere foreach will say something like, we don't care if there's two extra line if we can read the code and maintenance it easily.
Finally, the i is outside the scope first the first example and inside for the second.. reasons to that?! Because if you use the i outside of your foreach, I would have called differently. And, for my opinion, I prefer the foreach ways because you see immediately what is happening. You also don't have to think about if it's < or =. You know immediately that you are iterating over all the list, However, sadly, people will forget about the i++ at the end :D So, I say Vim!
Lets not forget that some collections do not implement a direct access operator[] and that you have to iterate using the IEnumerable interface which is most easily accessed with foreach().

DataTable Loop Performance Comparison

Which of the following has the best performance?
I have seen method two implemented in JavaScript with huge performance gains, however, I was unable to measure any gain in C# and was wondering if the compiler already does method 2 even when written like method 1.
The theory behind method 2 is that the code doesn't have to access DataTable.Rows.Count on every iteration, it can simple access the int c.
Method 1
for (int i = 0; i < DataTable.Rows.Count; i++) {
// Do Something
}
Method 2
for (int i = 0, c = DataTable.Rows.Count; i < c; i++) {
// Do Something
}
No, it can't do that since there is no way to express constant over time for a value.
If the compiler should be able to do that, there would have to be a guarantee from the code returning the value that the value is constant, and for the duration of the loop won't change.
But, in this case, you're free to add new rows to the data table as part of your loop, and thus it's up to you to make that guarantee, in the way you have done it.
So in short, the compiler will not do that optimization if the end-index is anything other than a variable.
In the case of a variable, where the compiler can just look at the loop-code and see that this particular variable is not changed, it might do that and load the value into a register before starting the loop, but any performance gain from this would most likely be negligible, unless your loop body is empty.
Conclusion: If you know, or is willing to accept, that the end loop index is constant for the duration of the loop, place it into a variable.
Edit: Re-read your post, and yes, you might see negligible performance gains for your two cases as well, because the JITter optimizes the code. The JITter might optimize your end-index read into a direct access to the variable inside the data table that contains the row count, and a memory read isn't all that expensive anyway. If, on the other hand, reading that property was a very expensive operation, you'd see a more noticable difference.

Categories