Reasons to not use 'break' statement in loops [duplicate] - c#

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Is it a bad practice to use break statement inside a for loop?
Say, I am searching for an value in an array. Compare inside a for loop and when value is found, break; to exit the for loop.
Is this a bad practice? I have seen the alternative used: define a variable vFound and set it to true when the value is found and check vFound in the for statement condition. But is it necessary to create a new variable just for this purpose?
I am asking in the context of a normal C or C++ for loop.
P.S: The MISRA coding guidelines advise against using break.

No, break is the correct solution.
Adding a boolean variable makes the code harder to read and adds a potential source of errors.

Lots of answers here, but I haven't seen this mentioned yet:
Most of the "dangers" associated with using break or continue in a for loop are negated if you write tidy, easily-readable loops. If the body of your loop spans several screen lengths and has multiple nested sub-blocks, yes, you could easily forget that some code won't be executed after the break. If, however, the loop is short and to the point, the purpose of the break statement should be obvious.
If a loop is getting too big, use one or more well-named function calls within the loop instead. The only real reason to avoid doing so is for processing bottlenecks.

You can find all sorts of professional code with 'break' statements in them. It perfectly make sense to use this whenever necessary. In your case this option is better than creating a separate variable just for the purpose of coming out of the loop.

Using break as well as continue in a for loop is perfectly fine.
It simplifies the code and improves its readability.

Far from bad practice, Python (and other languages?) extended the for loop structure so part of it will only be executed if the loop doesn't break.
for n in range(5):
for m in range(3):
if m >= n:
print('stop!')
break
print(m, end=' ')
else:
print('finished.')
Output:
stop!
0 stop!
0 1 stop!
0 1 2 finished.
0 1 2 finished.
Equivalent code without break and that handy else:
for n in range(5):
aborted = False
for m in range(3):
if not aborted:
if m >= n:
print('stop!')
aborted = True
else:
print(m, end=' ')
if not aborted:
print('finished.')

General rule: If following a rule requires you to do something more awkward and difficult to read then breaking the rule, then break the rule.
In the case of looping until you find something, you run into the problem of distinguishing found versus not found when you get out. That is:
for (int x=0;x<fooCount;++x)
{
Foo foo=getFooSomehow(x);
if (foo.bar==42)
break;
}
// So when we get here, did we find one, or did we fall out the bottom?
So okay, you can set a flag, or initialize a "found" value to null. But
That's why in general I prefer to push my searches into functions:
Foo findFoo(int wantBar)
{
for (int x=0;x<fooCount;++x)
{
Foo foo=getFooSomehow(x);
if (foo.bar==wantBar)
return foo;
}
// Not found
return null;
}
This also helps to unclutter the code. In the main line, "find" becomes a single statement, and when the conditions are complex, they're only written once.

There is nothing inherently wrong with using a break statement but nested loops can get confusing. To improve readability many languages (at least Java does) support breaking to labels which will greatly improve readability.
int[] iArray = new int[]{0,1,2,3,4,5,6,7,8,9};
int[] jArray = new int[]{0,1,2,3,4,5,6,7,8,9};
// label for i loop
iLoop: for (int i = 0; i < iArray.length; i++) {
// label for j loop
jLoop: for (int j = 0; j < jArray.length; j++) {
if(iArray[i] < jArray[j]){
// break i and j loops
break iLoop;
} else if (iArray[i] > jArray[j]){
// breaks only j loop
break jLoop;
} else {
// unclear which loop is ending
// (breaks only the j loop)
break;
}
}
}
I will say that break (and return) statements often increase cyclomatic complexity which makes it harder to prove code is doing the correct thing in all cases.
If you're considering using a break while iterating over a sequence for some particular item, you might want to reconsider the data structure used to hold your data. Using something like a Set or Map may provide better results.

break is a completely acceptable statement to use (so is continue, btw). It's all about code readability -- as long as you don't have overcomplicated loops and such, it's fine.
It's not like they were the same league as goto. :)

It depends on the language. While you can possibly check a boolean variable here:
for (int i = 0; i < 100 && stayInLoop; i++) { ... }
it is not possible to do it when itering over an array:
for element in bigList: ...
Anyway, break would make both codes more readable.

I agree with others who recommend using break. The obvious consequential question is why would anyone recommend otherwise? Well... when you use break, you skip the rest of the code in the block, and the remaining iterations. Sometimes this causes bugs, for example:
a resource acquired at the top of the block may be released at the bottom (this is true even for blocks inside for loops), but that release step may be accidentally skipped when a "premature" exit is caused by a break statement (in "modern" C++, "RAII" is used to handle this in a reliable and exception-safe way: basically, object destructors free resources reliably no matter how a scope is exited)
someone may change the conditional test in the for statement without noticing that there are other delocalised exit conditions
ndim's answer observes that some people may avoid breaks to maintain a relatively consistent loop run-time, but you were comparing break against use of a boolean early-exit control variable where that doesn't hold
Every now and then people observing such bugs realise they can be prevented/mitigated by this "no breaks" rule... indeed, there's a whole related strategy for "safer" programming called "structured programming", where each function is supposed to have a single entry and exit point too (i.e. no goto, no early return). It may eliminate some bugs, but it doubtless introduces others. Why do they do it?
they have a development framework that encourages a particular style of programming / code, and they've statistical evidence that this produces a net benefit in that limited framework, or
they've been influenced by programming guidelines or experience within such a framework, or
they're just dictatorial idiots, or
any of the above + historical inertia (relevant in that the justifications are more applicable to C than modern C++).

In your example you do not know the number of iterations for the for loop. Why not use while loop instead, which allows the number of iterations to be indeterminate at the beginning?
It is hence not necessary to use break statemement in general, as the loop can be better stated as a while loop.

I did some analysis on the codebase I'm currently working on (40,000 lines of JavaScript).
I found only 22 break statements, of those:
19 were used inside switch statements (we only have 3 switch statements in total!).
2 were used inside for loops - a code that I immediately classified as to be refactored into separate functions and replaced with return statement.
As for the final break inside while loop... I ran git blame to see who wrote this crap!
So according to my statistics: If break is used outside of switch, it is a code smell.
I also searched for continue statements. Found none.

It's perfectly valid to use break - as others have pointed out, it's nowhere in the same league as goto.
Although you might want to use the vFound variable when you want to check outside the loop whether the value was found in the array. Also from a maintainability point of view, having a common flag signalling the exit criteria might be useful.

I don't see any reason why it would be a bad practice PROVIDED that you want to complete STOP processing at that point.

In the embedded world, there is a lot of code out there that uses the following construct:
while(1)
{
if (RCIF)
gx();
if (command_received == command_we_are_waiting_on)
break;
else if ((num_attempts > MAX_ATTEMPTS) || (TickGet() - BaseTick > MAX_TIMEOUT))
return ERROR;
num_attempts++;
}
if (call_some_bool_returning_function())
return TRUE;
else
return FALSE;
This is a very generic example, lots of things are happening behind the curtain, interrupts in particular. Don't use this as boilerplate code, I'm just trying to illustrate an example.
My personal opinion is that there is nothing wrong with writing a loop in this manner as long as appropriate care is taken to prevent remaining in the loop indefinitely.

Depends on your use case. There are applications where the runtime of a for loop needs to be constant (e.g. to satisfy some timing constraints, or to hide your data internals from timing based attacks).
In those cases it will even make sense to set a flag and only check the flag value AFTER all the for loop iterations have actually run. Of course, all the for loop iterations need to run code that still takes about the same time.
If you do not care about the run time... use break; and continue; to make the code easier to read.

On MISRA 98 rules, that is used on my company in C dev, break statement shall not be used...
Edit : Break is allowed in MISRA '04

Ofcourse, break; is the solution to stop the for loop or foreach loop. I used it in php in foreach and for loop and found working.

I think it can make sense to have your checks at the top of your for loop like so
for(int i = 0; i < myCollection.Length && myCollection[i].SomeValue != "Break Condition"; i++)
{
//loop body
}
or if you need to process the row first
for(int i = 0; i < myCollection.Length && (i == 0 ? true : myCollection[i-1].SomeValue != "Break Condition"); i++)
{
//loop body
}
This way you can have a singular body function without breaks.
for(int i = 0; i < myCollection.Length && (i == 0 ? true : myCollection[i-1].SomeValue != "Break Condition"); i++)
{
PerformLogic(myCollection[i]);
}
It can also be modified to move Break into its own function as well.
for(int i = 0; ShouldContinueLooping(i, myCollection); i++)
{
PerformLogic(myCollection[i]);
}

Related

In C# is a for(;;) safe and what does it really do?

I found an empty for statement in an existing bit of code and I'm wondering what it does and is it "safe". It just feels wrong.
for(;;)
{
//some if statements and a case statement
}
Thanks!
This is one way of creating an infinite loop. It's a regular for loop, but with empty initialization, condition, and increment expressions. Because the condition expression is a no-op, the loop never exits. It's perfectly "safe" assuming it has a terminating condition (a break or return statement [or even a goto, I suppose]) somewhere.
Personally, I prefer to write infinite loops with whiles:
while (true)
{
//some statements and a case statement
}
(because for is for iteration and while is for repetition).
However, after reading this question (linked by #jailf), I now prefer while (42) { ... }.
It's equivalent as having an infinite loop:
while (true) {
}
It's safe. You need to provide an external exit mechanism though. I.E., with a break within the for loop.
This is a common idiom for an indefinite or infinite loop. You purposely might have an indefinite loop if you are looking for a condition that is not finite at the beginning -- such as user input or the end of a file of unknown size. You might also see while(1) or while(true) for the same thing. It says 'do this thing { whatever } until there is no more...'
Inside that loop structure is probably a conditional and a break statement, such as:
for(;;)
{
Console.Write("Enter your selection (1, 2, or 3): ");
string s = Console.ReadLine();
int n = Int32.Parse(s);
switch (n)
{
case 1:
Console.WriteLine("Current value is {0}", 1);
break;
case 2:
Console.WriteLine("Current value is {0}", 2);
break;
case 3:
Console.WriteLine("Current value is {0}", 3);
break;
default:
Console.WriteLine("Sorry, invalid selection.");
break;
}
if(n==1 || n==2 || n==3)
break; // out of the for(;;) loop
}
The key whether is it "safe" or not is to figure out the logic of how you leave that loop, or your indefinite loop will become an unintended infinite loop and a bug.
More at the C# site for for: HERE
All of the expressions of the for
statement are optional; for example,
the following statement is used to
write an infinite loop:
> for (; ; ) {
> // ... }
Taken from MSDN
It's the same as while (true) { /**/ } ... infinite loop until break or return or similar occurs.
All it really "does" is look ugly IMO ;)
This has been asked multiple times on SO. The best discussion on the topic is at the following link:
Is "for(;;)" faster than "while (TRUE)"? If not, why do people use it?
It's valid syntax for an infinite loop. You need to "break;" out of it. This was popular back in the C++ days IIRC.
As far as being safe, you're right in feeling "wrong" about this. Usually there would be an "if" condition inside where you would decide if you continue or break the loop. If you don't verify all execution paths it could very well lead to an infinite loop. I would try and do this some other way.
It's sometimes called a "forever" loop, because that's what it does. Look for either a break; or return; statement inside the loop or for the loop to be wrapped in a try/catch block. Personally, I try avoid that kind of thing.
Were there any break, return, or throw statements? That‘s the only way out. Is it safe? It depends if you feel safe inside an infinite loop. Some applications need one.
This is very common on embedded systems without an operating system. If your program terminates, there is no underlying system to handel that. So mostly there's one huge infinite loop in which most of the operations are handled.

C# - Suggestions of control statement needed

I'm a student and I got a homework i need some minor help with =)
Here is my task:
Write an application that prompts the user to enter the size of a square and display a square of asterisks with the sides equal with entered integer. Your application works for side’s size from 2 to 16. If the user enters a number less than 2 or greater then 16, your application should display a square of size 2 or 16, respectively, and an error message.
This is how far I've come:
start:
int x;
string input;
Console.Write("Enter a number between 2-16: ");
input = Console.ReadLine();
x = Int32.Parse(input);
Console.WriteLine("\n");
if (x <= 16 & x >= 2)
{
control statement
code
code
code
}
else
{
Console.WriteLine("You must enter a number between 2 and 16");
goto start;
}
I need help with...
... what control statment(if, for, while, do-while, case, boolean) to use inside the "if" control.
My ideas are like...
do I write a code that writes out the boxes for every type of number entered? That's a lot of code...
..there must be a code containing some "variable++" that could do the task for me, but then what control statement suits the task best?
But if I use a "variable++" how am I supposed to write the spaces in the output, because after all, it has to be a SQUARE?!?! =)
I'd love some suggestions on what type of statements to use, or maybe just a hint, of course not the whole solution as I am a student!
It's not the answer you're looking for, but I do have a few suggestions for clean code:
Your use of Int32.Parse is a potential exception that can crash the application. Look into Int32.TryParse (or just int.TryParse, which I personally think looks cleaner) instead. You'll pass it what it's parsing and an "out" parameter of the variable into which the value should be placed (in this case, x).
Try not to declare your variables until you actually use them. Getting into the habit of declaring them all up front (especially without instantiated values) can later lead to difficult to follow code. For my first suggestions, x will need to be declared ahead of time (look into default in C# for default instantiation... it's, well, by default, but it's good information to understand), but the string doesn't need to be.
Try to avoid using goto when programming :) For this code, it would be better to break out the code which handles the value and returns what needs to be drawn into a separate method and have the main method just sit around and wait for input. Watch for hard infinite loops, though.
It's never too early to write clean and maintainable code, even if it's just for a homework assignment that will never need to be maintained :)
You do not have to write code for every type of number entered. Instead, you have to use loops (for keyword).
Probably I must stop here and let you do the work, but I would just give a hint: you may want to do it with two loops, one embedded in another.
I have also noted some things I want to comment in your code:
Int32.Parse: do not use Int32, but int. It will not change the meaning of your code. I will not explain why you must use int instead: it is quite difficult to explain, and you would understand it later for sure.
Avoid using goto statement, except if you were told to use it in the current case by your teacher.
Console.WriteLine("\n");: avoid "\n". It is platform dependent (here, Linux/Unix; on Windows it's "\r\n", and on MacOS - "\n\r"). Use Environment.NewLine instead.
x <= 16 & x >= 2: why & and not ||?
You can write string input = Console.ReadLine(); instead of string input; followed by input = Console.ReadLine();.
Since it's homework, we can't give you the answer. But here are some hints (assuming solid *'s, not white space in-between):
You're going to want to iterate from 1 to N. See for (int...
There's a String constructor that will allow you to avoid the second loop. Look at all of the various constructors.
Your current error checking does not meet the specifications. Read the spec again.
You're going to throw an exception if somebody enters a non-parsable integer.
goto's went out of style before bell-bottoms. You actually don't need any outer control for the spec you were given, because it's "one shot and go". Normally, you would write a simple console app like this to look for a special value (e.g., -1) and exit when you see that value. In that case you would use while (!<end of input>) as the outer control flow.
If x is greater or equal to 16, why not assign 16 to it (since you'll eventually need to draw a square with a side of length 16) (and add an appropriate message)?
the control statement is:
for (int i = 0; i < x; i++)
{
for ( int j = 0; j < x; j++ )
{
Console.Write("*");
}
Console.WriteLine();
}
This should print a X by X square of asterisks!
I'ma teacher and I left the same task to my students a while ago, I hope you're not one of them! :)

for and while loop in c#

for (i=0 ; i<=10; i++)
{
..
..
}
i=0;
while(i<=10)
{
..
..
i++;
}
In for and while loop, which one is better, performance wise?
(update)
Actually - there is one scenario where the for construct is more efficient; looping on an array. The compiler/JIT has optimisations for this scenario as long as you use arr.Length in the condition:
for(int i = 0 ; i < arr.Length ; i++) {
Console.WriteLine(arr[i]); // skips bounds check
}
In this very specific case, it skips the bounds checking, as it already knows that it will never be out of bounds. Interestingly, if you "hoist" arr.Length to try to optimize it manually, you prevent this from happening:
int len = arr.Length;
for(int i = 0 ; i < len ; i++) {
Console.WriteLine(arr[i]); // performs bounds check
}
However, with other containers (List<T> etc), hoisting is fairly reasonable as a manual micro-optimisation.
(end update)
Neither; a for loop is evaluated as a while loop under the hood anyway.
For example 12.3.3.9 of ECMA 334 (definite assignment) dictates that a for loop:
for ( for-initializer ; for-condition ; for-iterator ) embedded-statement
is essentially equivalent (from a Definite assignment perspective (not quite the same as saying "the compiler must generate this IL")) as:
{
for-initializer ;
while ( for-condition ) {
embedded-statement ;
LLoop:
for-iterator ;
}
}
with continue statements that target
the for statement being translated to
goto statements targeting the label
LLoop. If the for-condition is omitted
from the for statement, then
evaluation of definite assignment
proceeds as if for-condition were
replaced with true in the above
expansion.
Now, this doesn't mean that the compiler has to do exactly the same thing, but in reality it pretty much does...
I would say they are the same and you should never do such micro-optimizations anyway.
The performance will be the same. However, unless you need to access the i variable outside the loop then you should use the for loop. This will be cleaner since i will only have scope within the block.
Program efficiency comes from proper algorithms, good object-design, smart program architecture, etc.
Shaving a cycle or two with for loops vs while loops will NEVER make a slow program fast, or a fast program slow.
If you want to improve program performance in this section, find a way to either partially unroll the loop (see Duff's Device), or improve performance of what is done inside the loop.
Neither one. They are equivalent. You can think of the 'for' loop being a more compact way of writing the while-loop.
Yes, they are equivalent code snippets.

Why is it bad to "monkey with the loop index"?

One of Steve McConnell's checklist items is that you should not monkey with the loop index (Chapter 16, page 25, Loop Indexes, PDF format).
This makes intuitive sense and is a practice I've always followed except maybe as I learned how to program back in the day.
In a recent code review I found this awkward loop and immediately flagged it as suspect.
for ( int i=0 ; i < this.MyControl.TabPages.Count ; i++ )
{
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
i--;
}
It's almost amusing since it manages to work by keeping the index at zero until all TabPages are removed.
This loop could have been written as
while(MyControl.TabPages.Count > 0)
MyControl.TabPages.RemoveAt(0);
And since the control was in fact written at about the same time as the loop it could even have been written as
MyControl.TabPages.Clear();
I've since been challenged about the code-review issue and found that my articulation of why it is bad practice was not as strong as I'd have liked. I said it was harder to understand the flow of the loop and therefore harder to maintain and debug and ultimately more expensive over the lifetime of the code.
Is there a better articulation of why this is bad practice?
I think your articulation is great. Maybe it can be worded like so:
Since the logic can be expressed much
clearer, it should.
Well, this adds confusion for little purpose - you could just as easily write:
while(MyControl.TabPages.Count > 0)
{
MyControl.TabPages.Remove(MyControl.TabPages[0]);
}
or (simpler)
while(MyControl.TabPages.Count > 0)
{
MyControl.TabPages.RemoveAt(0);
}
or (simplest)
MyControl.TabPages.Clear();
In all of the above, I don't have to squint and think about any edge-cases; it is pretty clear what happens when. If you are modifying the loop index, you can quickly make it quite hard to understand at a glance.
It's all about expectation.
When one uses a loopcounter, you expect that it is incremented (decremented) each iteration of the loop with the same amount.
If you mess (or monkey if you like) with the loop counter, your loop does not behave like expected. This means it is harder to understand and it increases the chance that your code is misinterpreted, and this introduces bugs.
Or to (mis) quote a wise but fictional character:
complexity leads to misunderstanding
misunderstanding leads to bugs
bugs leads to the dark side.
I agree with your challenge. If they want to keep a for loop, the code:
for ( int i=0 ; i < this.MyControl.TabPages.Count ; i++ ) {
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
i--;
}
reduces as follows:
for ( int i=0 ; i < this.MyControl.TabPages.Count ; ) {
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
}
and then to:
for ( ; 0 < this.MyControl.TabPages.Count ; ) {
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );
}
But a while loop or a Clear() method, if that exists, are clearly preferable.
I think you could build a stronger argument by invoking Knuth's concepts of literate programming, that programs should not be written for computers, but to communicate concepts to other programmers, thus the simpler loop:
while (this.MyControl.TabPages.Count>0)
{
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );
}
more clearly illustrates the intent - remove the first tab page until there are none left. I think most people would grok that much quicker than the original example.
This might be clearer:
while (this.MyControl.TabPages.Count > 0)
{
this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );
}
One argument that could be used is that it is much more difficult to debug such code, where the index is being changed twice.
The original code is highly redundant to bend the action of the for-loop to what is necessary. The increment is unnecessary, and balanced by the decrement. Those should be PRE-increments, not POST-increments as well, because conceptually the post-increment is wrong. The comparison with the tabpages count is semi-redundant since that's a hackish way of checking that the container is empty.
In short, it's unnecessary cleverness, it adds rather than removes redundancy. Since it can be both obviously simpler and obviously shorter, it's wrong.
The only reason to bother with an index at all would be if one were selectively erasing things. Even in that case, I would think it preferable to say: i=0;
while(i < MyControl.Tabpages.Count)
if (wantToDelete(MyControl.Tabpages(i))
MyControl.Tabpages.RemoveAt(i);
else
i++;rather than jinxing the loop index after each removal. Or, better yet, have the index count downward so that when an item is removed it won't affect the index of future items needing removal. If many items are deleted, this may also help minimize the amount of time spent moving items around after each deletion.
I think pointing out the fact that the loop iteratations is beeing controled not by the "i++" as anyone would expect but by the crazy "i--" setup should have been enough.
I also think that altering the the state of "i" by evaluating the count and then altering the count in the loop may also lead to potential problems. I would expect a for loop to generally have a "fixed" number of iterations and the only part of the for loop condition that changes to be the loop variable "i".

Back to basics; for-loops, arrays/vectors/lists, and optimization

I was working on some code recently and came across a method that had 3 for-loops that worked on 2 different arrays.
Basically, what was happening was a foreach loop would walk through a vector and convert a DateTime from an object, and then another foreach loop would convert a long value from an object. Each of these loops would store the converted value into lists.
The final loop would go through these two lists and store those values into yet another list because one final conversion needed to be done for the date.
Then after all that is said and done, The final two lists are converted to an array using ToArray().
Ok, bear with me, I'm finally getting to my question.
So, I decided to make a single for loop to replace the first two foreach loops and convert the values in one fell swoop (the third loop is quasi-necessary, although, I'm sure with some working I could also put it into the single loop).
But then I read the article "What your computer does while you wait" by Gustav Duarte and started thinking about memory management and what the data was doing while it's being accessed in the for-loop where two lists are being accessed simultaneously.
So my question is, what is the best approach for something like this? Try to condense the for-loops so it happens in as little loops as possible, causing multiple data access for the different lists. Or, allow the multiple loops and let the system bring in data it's anticipating. These lists and arrays can be potentially large and looping through 3 lists, perhaps 4 depending on how ToArray() is implemented, can get very costy (O(n^3) ??). But from what I understood in said article and from my CS classes, having to fetch data can be expensive too.
Would anyone like to provide any insight? Or have I completely gone off my rocker and need to relearn what I have unlearned?
Thank you
The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.
If each of your loops is O(n), then you've still only got an O(n) operation.
Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.
For referemce,
the article is at
What your computer does while you wait - Gustav Duarte
Also there's a guide to big-O notation.
It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.
Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.
Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...
So in every way, I'd go with 2 separate loops ;-p
Am I understanding you correctly in this?
You have these loops:
for (...){
// Do A
}
for (...){
// Do B
}
for (...){
// Do C
}
And you converted it into
for (...){
// Do A
// Do B
}
for (...){
// Do C
}
and you're wondering which is faster?
If not, some pseudocode would be nice, so we could see what you meant. :)
Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.
If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.
My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.
Aside from cache thrashing on large functions, there may be benefits on tiny functions as well. This applies on any auto-vectorizing compiler (not sure if Java JIT will do this yet, but you can count on it eventually).
Suppose this is your code:
// if this compiles down to a raw memory copy with a bitmask...
Date morningOf(Date d) { return Date(d.year, d.month, d.day, 0, 0, 0); }
Date timestamps[N];
Date mornings[N];
// ... then this can be parallelized using SSE or other SIMD instructions
for (int i = 0; i != N; ++i)
mornings[i] = morningOf(timestamps[i]);
// ... and this will just run like normal
for (int i = 0; i != N; ++i)
doOtherCrap(mornings[i]);
For large data sets, splitting the vectorizable code out into a separate loop can be a big win (provided caching doesn't become a problem). If it was all left as a single loop, no vectorization would occur.
This is something that Intel recommends in their C/C++ optimization manual, and it really can make a big difference.
... working on one piece of data but with two functions can sometimes make it so that code to act on that data doesn't fit in the processor's low level caches.
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // pushes functionreallybig2 out of cache
myObject.functionreallybig2(); // pushes functionreallybig1 out of cache
}
vs
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // this stays in the cache next time through loop
}
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig2(); // this stays in the cache next time through loop
}
But it was probably a mistake (usually this type of trick is commented)
When data is cycicly loaded and unloaded like this, it is called cache thrashing, btw.
This is a seperate issue from the data these functions are working on, as typically the processor caches that separately.
I apologize for not responding sooner and providing any kind of code. I got sidetracked on my project and had to work on something else.
To answer anyone still monitoring this question;
Yes, like jalf said, the function is something like:
PrepareData(vectorA, VectorB, xArray, yArray):
listA
listB
foreach(value in vectorA)
convert values insert in listA
foreach(value in vectorB)
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
I changed it to:
PrepareData(vectorA, vectorB, ref xArray, ref yArray):
listA
listB
for(int i = 0; i < vectorA.count && vectorB.count; i++)
convert values insert in listA
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
Keeping in mind that the vectors can potentially have a large number of items. I figured the second one would be better, so that the program wouldnt't have to loop n times 2 or 3 different times. But then I started to wonder about the affects (effects?) of memory fetching, or prefetching, or what have you.
So, I hope this helps to clear up the question, although a good number of you have provided excellent answers.
Thank you every one for the information. Thinking in terms of Big-O and how to optimize has never been my strong point. I believe I am going to put the code back to the way it was, I should have trusted the way it was written before instead of jumping on my novice instincts. Also, in the future I will put more reference so everyone can understand what the heck I'm talking about (clarity is also not a strong point of mine :-/).
Thank you again.

Categories