Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Is it a bad practice to use break statement inside a for loop?
Say, I am searching for an value in an array. Compare inside a for loop and when value is found, break; to exit the for loop.
Is this a bad practice? I have seen the alternative used: define a variable vFound and set it to true when the value is found and check vFound in the for statement condition. But is it necessary to create a new variable just for this purpose?
I am asking in the context of a normal C or C++ for loop.
P.S: The MISRA coding guidelines advise against using break.
No, break is the correct solution.
Adding a boolean variable makes the code harder to read and adds a potential source of errors.
Lots of answers here, but I haven't seen this mentioned yet:
Most of the "dangers" associated with using break or continue in a for loop are negated if you write tidy, easily-readable loops. If the body of your loop spans several screen lengths and has multiple nested sub-blocks, yes, you could easily forget that some code won't be executed after the break. If, however, the loop is short and to the point, the purpose of the break statement should be obvious.
If a loop is getting too big, use one or more well-named function calls within the loop instead. The only real reason to avoid doing so is for processing bottlenecks.
You can find all sorts of professional code with 'break' statements in them. It perfectly make sense to use this whenever necessary. In your case this option is better than creating a separate variable just for the purpose of coming out of the loop.
Using break as well as continue in a for loop is perfectly fine.
It simplifies the code and improves its readability.
Far from bad practice, Python (and other languages?) extended the for loop structure so part of it will only be executed if the loop doesn't break.
for n in range(5):
for m in range(3):
if m >= n:
print('stop!')
break
print(m, end=' ')
else:
print('finished.')
Output:
stop!
0 stop!
0 1 stop!
0 1 2 finished.
0 1 2 finished.
Equivalent code without break and that handy else:
for n in range(5):
aborted = False
for m in range(3):
if not aborted:
if m >= n:
print('stop!')
aborted = True
else:
print(m, end=' ')
if not aborted:
print('finished.')
General rule: If following a rule requires you to do something more awkward and difficult to read then breaking the rule, then break the rule.
In the case of looping until you find something, you run into the problem of distinguishing found versus not found when you get out. That is:
for (int x=0;x<fooCount;++x)
{
Foo foo=getFooSomehow(x);
if (foo.bar==42)
break;
}
// So when we get here, did we find one, or did we fall out the bottom?
So okay, you can set a flag, or initialize a "found" value to null. But
That's why in general I prefer to push my searches into functions:
Foo findFoo(int wantBar)
{
for (int x=0;x<fooCount;++x)
{
Foo foo=getFooSomehow(x);
if (foo.bar==wantBar)
return foo;
}
// Not found
return null;
}
This also helps to unclutter the code. In the main line, "find" becomes a single statement, and when the conditions are complex, they're only written once.
There is nothing inherently wrong with using a break statement but nested loops can get confusing. To improve readability many languages (at least Java does) support breaking to labels which will greatly improve readability.
int[] iArray = new int[]{0,1,2,3,4,5,6,7,8,9};
int[] jArray = new int[]{0,1,2,3,4,5,6,7,8,9};
// label for i loop
iLoop: for (int i = 0; i < iArray.length; i++) {
// label for j loop
jLoop: for (int j = 0; j < jArray.length; j++) {
if(iArray[i] < jArray[j]){
// break i and j loops
break iLoop;
} else if (iArray[i] > jArray[j]){
// breaks only j loop
break jLoop;
} else {
// unclear which loop is ending
// (breaks only the j loop)
break;
}
}
}
I will say that break (and return) statements often increase cyclomatic complexity which makes it harder to prove code is doing the correct thing in all cases.
If you're considering using a break while iterating over a sequence for some particular item, you might want to reconsider the data structure used to hold your data. Using something like a Set or Map may provide better results.
break is a completely acceptable statement to use (so is continue, btw). It's all about code readability -- as long as you don't have overcomplicated loops and such, it's fine.
It's not like they were the same league as goto. :)
It depends on the language. While you can possibly check a boolean variable here:
for (int i = 0; i < 100 && stayInLoop; i++) { ... }
it is not possible to do it when itering over an array:
for element in bigList: ...
Anyway, break would make both codes more readable.
I agree with others who recommend using break. The obvious consequential question is why would anyone recommend otherwise? Well... when you use break, you skip the rest of the code in the block, and the remaining iterations. Sometimes this causes bugs, for example:
a resource acquired at the top of the block may be released at the bottom (this is true even for blocks inside for loops), but that release step may be accidentally skipped when a "premature" exit is caused by a break statement (in "modern" C++, "RAII" is used to handle this in a reliable and exception-safe way: basically, object destructors free resources reliably no matter how a scope is exited)
someone may change the conditional test in the for statement without noticing that there are other delocalised exit conditions
ndim's answer observes that some people may avoid breaks to maintain a relatively consistent loop run-time, but you were comparing break against use of a boolean early-exit control variable where that doesn't hold
Every now and then people observing such bugs realise they can be prevented/mitigated by this "no breaks" rule... indeed, there's a whole related strategy for "safer" programming called "structured programming", where each function is supposed to have a single entry and exit point too (i.e. no goto, no early return). It may eliminate some bugs, but it doubtless introduces others. Why do they do it?
they have a development framework that encourages a particular style of programming / code, and they've statistical evidence that this produces a net benefit in that limited framework, or
they've been influenced by programming guidelines or experience within such a framework, or
they're just dictatorial idiots, or
any of the above + historical inertia (relevant in that the justifications are more applicable to C than modern C++).
In your example you do not know the number of iterations for the for loop. Why not use while loop instead, which allows the number of iterations to be indeterminate at the beginning?
It is hence not necessary to use break statemement in general, as the loop can be better stated as a while loop.
I did some analysis on the codebase I'm currently working on (40,000 lines of JavaScript).
I found only 22 break statements, of those:
19 were used inside switch statements (we only have 3 switch statements in total!).
2 were used inside for loops - a code that I immediately classified as to be refactored into separate functions and replaced with return statement.
As for the final break inside while loop... I ran git blame to see who wrote this crap!
So according to my statistics: If break is used outside of switch, it is a code smell.
I also searched for continue statements. Found none.
It's perfectly valid to use break - as others have pointed out, it's nowhere in the same league as goto.
Although you might want to use the vFound variable when you want to check outside the loop whether the value was found in the array. Also from a maintainability point of view, having a common flag signalling the exit criteria might be useful.
I don't see any reason why it would be a bad practice PROVIDED that you want to complete STOP processing at that point.
In the embedded world, there is a lot of code out there that uses the following construct:
while(1)
{
if (RCIF)
gx();
if (command_received == command_we_are_waiting_on)
break;
else if ((num_attempts > MAX_ATTEMPTS) || (TickGet() - BaseTick > MAX_TIMEOUT))
return ERROR;
num_attempts++;
}
if (call_some_bool_returning_function())
return TRUE;
else
return FALSE;
This is a very generic example, lots of things are happening behind the curtain, interrupts in particular. Don't use this as boilerplate code, I'm just trying to illustrate an example.
My personal opinion is that there is nothing wrong with writing a loop in this manner as long as appropriate care is taken to prevent remaining in the loop indefinitely.
Depends on your use case. There are applications where the runtime of a for loop needs to be constant (e.g. to satisfy some timing constraints, or to hide your data internals from timing based attacks).
In those cases it will even make sense to set a flag and only check the flag value AFTER all the for loop iterations have actually run. Of course, all the for loop iterations need to run code that still takes about the same time.
If you do not care about the run time... use break; and continue; to make the code easier to read.
On MISRA 98 rules, that is used on my company in C dev, break statement shall not be used...
Edit : Break is allowed in MISRA '04
Ofcourse, break; is the solution to stop the for loop or foreach loop. I used it in php in foreach and for loop and found working.
I think it can make sense to have your checks at the top of your for loop like so
for(int i = 0; i < myCollection.Length && myCollection[i].SomeValue != "Break Condition"; i++)
{
//loop body
}
or if you need to process the row first
for(int i = 0; i < myCollection.Length && (i == 0 ? true : myCollection[i-1].SomeValue != "Break Condition"); i++)
{
//loop body
}
This way you can have a singular body function without breaks.
for(int i = 0; i < myCollection.Length && (i == 0 ? true : myCollection[i-1].SomeValue != "Break Condition"); i++)
{
PerformLogic(myCollection[i]);
}
It can also be modified to move Break into its own function as well.
for(int i = 0; ShouldContinueLooping(i, myCollection); i++)
{
PerformLogic(myCollection[i]);
}
From this post (and not only) we got to the point that the ++ operator cannot be applied on expressions returning value.
And it's really obvious that 5++ is better to write as 5 + 1. I just want to summarize the whole thing around the increment/decrement operator. So let's go through these snippets of code that could be helpful to somebody stuck with the ++ first time at least.
// Literal
int x = 0++; // Error
// Constant
const int Y = 1;
double f = Y++; // error. makes sense, constants are not variables actually.
int z = AddFoo()++; // Error
Summary: ++ works for variables, properties (through a synthetic sugar) and indexers(the same).
Now the interest part - any literal expressions are optimized in CSC and, hence when we write, say
int g = 5 + 1; // This is compiled to 6 in IL as one could expect.
IL_0001: ldc.i4.6 // Pushes the integer value of 6 onto the evaluation stack as an int32.
For 5++ doesn't mean 5 becomes 6, it could be a shorthand for 5 + 1, like for x++ = x + 1
What's the real reason behind this restriction?
int p = Foo()++ //? yes you increase the return value of Foo() with 1, what's wrong with that?
Examples of code that can lead to logical issues are appreciated.
One of real-life example could be, perform one more actions than in the array.
for (int i = 0; i < GetCount()++; i++) { }
Maybe the lack of usage opts compiler teams to avoid similar features?
I don't insist this is a feature we lack of, just want to understand the dark side of this for compiler writers perhaps, though I'm not. But I know c++ allows this when returning a reference in the method. I'm neither a c++ guy(very poor knowledge) just want to get the real gist of the restriction.
Like, is it just because c# guys opted to restrict the ++ over value expressions or there are definite cases leading to unpredictable results?
In order for a feature to be worth supporting, it really needs to be useful. The code you've presented is in every case less readable than the alternative, which is just to use the normal binary addition operator. For example:
for (int i = 0; i < GetCount() + 1; i++) { }
I'm all in favour of the language team preventing you from writing unreadable code when in every case where you could do it, there's a simpler alternative.
Well before using these operators you should try to read up on how they do what they do. In particular you should understand the difference between postfix and prefix, which could help figure out what is and isn't allowed.
The ++ and -- operators modify their operands. Which means that the operand must be modifiable. If you can assign a value to the expression in question then it is modifiable, and is probably a variable(c#).
Taking a look at what these operators actually do. The postfix operators should increment after your line of code executes. As for the prefix operators, well they would need to have access to the value before the method had even been called yet. The way I read the syntax is ++lvalue (or ++variable) converting to memory operations:[read, write, read] or for lvalue++ [read, read, write] Though many compilers probably optimize secondary reads.
So looking at foo()++; the value is going to be plopped dead in the center of executing code. Which would mean the compiler would need to save the value somewhere more long-term in order for operations to be performed on said value, after the line of code has finished executing. Which is no doubt the exact reason C++ does not support this syntax either.
If you were to be returning a reference the compiler wouldn't have any trouble with the postfix. Of course in C# value types (ie. int, char, float, etc) cannot be passed by reference as they are value types.
A list of Equity Analytics (stocks) objects doing a calculation for daily returns.
Was thinking there must be a pairwise solution to do this:
for(int i = 0; i < sdata.Count; i++){
sdata[i].DailyReturn = (i > 0) ? (sdata[i-1].AdjClose/sdata[i].AdjClose) - 1
: 0.0;
}
LINQ stands for: "Language-Integrated Query".
LINQ should not be used and almost can't be used for assignments, LINQ doesn't change the given IEnumerable parameter but creates a new one.
As suggest in a comment below, there is a way to create a new IEnumerable with LINQ, it will be slower, and a lot less readable.
Though LINQ is nice, an important thing is to know when not to use it.
Just use the good old for loop.
I'm new to LINQ, I started using it because of stackoverflow so I tried to play with your question, and I know this may attract downvotes but I tried and it does what you want in case there is at least one element in the list.
sdata[0].DailyReturn = 0.0;
sdata.GetRange(1, sdata.Count - 1).ForEach(c => c.DailyReturn = (sdata[sdata.IndexOf(c)-1].AdjClose / c.AdjClose) - 1);
But must say that avoiding for loops isn't the best practice. from my point of view, LINQ should be used where convenient and not everywhere. Good old loops are sometimes easier to maintain.
One of Steve McConnell's checklist items is that you should not monkey with the loop index (Chapter 16, page 25, Loop Indexes, PDF format).
This makes intuitive sense and is a practice I've always followed except maybe as I learned how to program back in the day.
In a recent code review I found this awkward loop and immediately flagged it as suspect.
for ( int i=0 ; i < this.MyControl.TabPages.Count ; i++ )
{
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
i--;
}
It's almost amusing since it manages to work by keeping the index at zero until all TabPages are removed.
This loop could have been written as
while(MyControl.TabPages.Count > 0)
MyControl.TabPages.RemoveAt(0);
And since the control was in fact written at about the same time as the loop it could even have been written as
MyControl.TabPages.Clear();
I've since been challenged about the code-review issue and found that my articulation of why it is bad practice was not as strong as I'd have liked. I said it was harder to understand the flow of the loop and therefore harder to maintain and debug and ultimately more expensive over the lifetime of the code.
Is there a better articulation of why this is bad practice?
I think your articulation is great. Maybe it can be worded like so:
Since the logic can be expressed much
clearer, it should.
Well, this adds confusion for little purpose - you could just as easily write:
while(MyControl.TabPages.Count > 0)
{
MyControl.TabPages.Remove(MyControl.TabPages[0]);
}
or (simpler)
while(MyControl.TabPages.Count > 0)
{
MyControl.TabPages.RemoveAt(0);
}
or (simplest)
MyControl.TabPages.Clear();
In all of the above, I don't have to squint and think about any edge-cases; it is pretty clear what happens when. If you are modifying the loop index, you can quickly make it quite hard to understand at a glance.
It's all about expectation.
When one uses a loopcounter, you expect that it is incremented (decremented) each iteration of the loop with the same amount.
If you mess (or monkey if you like) with the loop counter, your loop does not behave like expected. This means it is harder to understand and it increases the chance that your code is misinterpreted, and this introduces bugs.
Or to (mis) quote a wise but fictional character:
complexity leads to misunderstanding
misunderstanding leads to bugs
bugs leads to the dark side.
I agree with your challenge. If they want to keep a for loop, the code:
for ( int i=0 ; i < this.MyControl.TabPages.Count ; i++ ) {
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
i--;
}
reduces as follows:
for ( int i=0 ; i < this.MyControl.TabPages.Count ; ) {
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
}
and then to:
for ( ; 0 < this.MyControl.TabPages.Count ; ) {
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );
}
But a while loop or a Clear() method, if that exists, are clearly preferable.
I think you could build a stronger argument by invoking Knuth's concepts of literate programming, that programs should not be written for computers, but to communicate concepts to other programmers, thus the simpler loop:
while (this.MyControl.TabPages.Count>0)
{
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );
}
more clearly illustrates the intent - remove the first tab page until there are none left. I think most people would grok that much quicker than the original example.
This might be clearer:
while (this.MyControl.TabPages.Count > 0)
{
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );
}
One argument that could be used is that it is much more difficult to debug such code, where the index is being changed twice.
The original code is highly redundant to bend the action of the for-loop to what is necessary. The increment is unnecessary, and balanced by the decrement. Those should be PRE-increments, not POST-increments as well, because conceptually the post-increment is wrong. The comparison with the tabpages count is semi-redundant since that's a hackish way of checking that the container is empty.
In short, it's unnecessary cleverness, it adds rather than removes redundancy. Since it can be both obviously simpler and obviously shorter, it's wrong.
The only reason to bother with an index at all would be if one were selectively erasing things. Even in that case, I would think it preferable to say: i=0;
while(i < MyControl.Tabpages.Count)
if (wantToDelete(MyControl.Tabpages(i))
MyControl.Tabpages.RemoveAt(i);
else
i++;rather than jinxing the loop index after each removal. Or, better yet, have the index count downward so that when an item is removed it won't affect the index of future items needing removal. If many items are deleted, this may also help minimize the amount of time spent moving items around after each deletion.
I think pointing out the fact that the loop iteratations is beeing controled not by the "i++" as anyone would expect but by the crazy "i--" setup should have been enough.
I also think that altering the the state of "i" by evaluating the count and then altering the count in the loop may also lead to potential problems. I would expect a for loop to generally have a "fixed" number of iterations and the only part of the for loop condition that changes to be the loop variable "i".
I was working on some code recently and came across a method that had 3 for-loops that worked on 2 different arrays.
Basically, what was happening was a foreach loop would walk through a vector and convert a DateTime from an object, and then another foreach loop would convert a long value from an object. Each of these loops would store the converted value into lists.
The final loop would go through these two lists and store those values into yet another list because one final conversion needed to be done for the date.
Then after all that is said and done, The final two lists are converted to an array using ToArray().
Ok, bear with me, I'm finally getting to my question.
So, I decided to make a single for loop to replace the first two foreach loops and convert the values in one fell swoop (the third loop is quasi-necessary, although, I'm sure with some working I could also put it into the single loop).
But then I read the article "What your computer does while you wait" by Gustav Duarte and started thinking about memory management and what the data was doing while it's being accessed in the for-loop where two lists are being accessed simultaneously.
So my question is, what is the best approach for something like this? Try to condense the for-loops so it happens in as little loops as possible, causing multiple data access for the different lists. Or, allow the multiple loops and let the system bring in data it's anticipating. These lists and arrays can be potentially large and looping through 3 lists, perhaps 4 depending on how ToArray() is implemented, can get very costy (O(n^3) ??). But from what I understood in said article and from my CS classes, having to fetch data can be expensive too.
Would anyone like to provide any insight? Or have I completely gone off my rocker and need to relearn what I have unlearned?
Thank you
The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.
If each of your loops is O(n), then you've still only got an O(n) operation.
Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.
For referemce,
the article is at
What your computer does while you wait - Gustav Duarte
Also there's a guide to big-O notation.
It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.
Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.
Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...
So in every way, I'd go with 2 separate loops ;-p
Am I understanding you correctly in this?
You have these loops:
for (...){
// Do A
}
for (...){
// Do B
}
for (...){
// Do C
}
And you converted it into
for (...){
// Do A
// Do B
}
for (...){
// Do C
}
and you're wondering which is faster?
If not, some pseudocode would be nice, so we could see what you meant. :)
Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.
If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.
My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.
Aside from cache thrashing on large functions, there may be benefits on tiny functions as well. This applies on any auto-vectorizing compiler (not sure if Java JIT will do this yet, but you can count on it eventually).
Suppose this is your code:
// if this compiles down to a raw memory copy with a bitmask...
Date morningOf(Date d) { return Date(d.year, d.month, d.day, 0, 0, 0); }
Date timestamps[N];
Date mornings[N];
// ... then this can be parallelized using SSE or other SIMD instructions
for (int i = 0; i != N; ++i)
mornings[i] = morningOf(timestamps[i]);
// ... and this will just run like normal
for (int i = 0; i != N; ++i)
doOtherCrap(mornings[i]);
For large data sets, splitting the vectorizable code out into a separate loop can be a big win (provided caching doesn't become a problem). If it was all left as a single loop, no vectorization would occur.
This is something that Intel recommends in their C/C++ optimization manual, and it really can make a big difference.
... working on one piece of data but with two functions can sometimes make it so that code to act on that data doesn't fit in the processor's low level caches.
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // pushes functionreallybig2 out of cache
myObject.functionreallybig2(); // pushes functionreallybig1 out of cache
}
vs
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // this stays in the cache next time through loop
}
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig2(); // this stays in the cache next time through loop
}
But it was probably a mistake (usually this type of trick is commented)
When data is cycicly loaded and unloaded like this, it is called cache thrashing, btw.
This is a seperate issue from the data these functions are working on, as typically the processor caches that separately.
I apologize for not responding sooner and providing any kind of code. I got sidetracked on my project and had to work on something else.
To answer anyone still monitoring this question;
Yes, like jalf said, the function is something like:
PrepareData(vectorA, VectorB, xArray, yArray):
listA
listB
foreach(value in vectorA)
convert values insert in listA
foreach(value in vectorB)
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
I changed it to:
PrepareData(vectorA, vectorB, ref xArray, ref yArray):
listA
listB
for(int i = 0; i < vectorA.count && vectorB.count; i++)
convert values insert in listA
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
Keeping in mind that the vectors can potentially have a large number of items. I figured the second one would be better, so that the program wouldnt't have to loop n times 2 or 3 different times. But then I started to wonder about the affects (effects?) of memory fetching, or prefetching, or what have you.
So, I hope this helps to clear up the question, although a good number of you have provided excellent answers.
Thank you every one for the information. Thinking in terms of Big-O and how to optimize has never been my strong point. I believe I am going to put the code back to the way it was, I should have trusted the way it was written before instead of jumping on my novice instincts. Also, in the future I will put more reference so everyone can understand what the heck I'm talking about (clarity is also not a strong point of mine :-/).
Thank you again.