Which of the following has the best performance?
I have seen method two implemented in JavaScript with huge performance gains, however, I was unable to measure any gain in C# and was wondering if the compiler already does method 2 even when written like method 1.
The theory behind method 2 is that the code doesn't have to access DataTable.Rows.Count on every iteration, it can simple access the int c.
Method 1
for (int i = 0; i < DataTable.Rows.Count; i++) {
// Do Something
}
Method 2
for (int i = 0, c = DataTable.Rows.Count; i < c; i++) {
// Do Something
}
No, it can't do that since there is no way to express constant over time for a value.
If the compiler should be able to do that, there would have to be a guarantee from the code returning the value that the value is constant, and for the duration of the loop won't change.
But, in this case, you're free to add new rows to the data table as part of your loop, and thus it's up to you to make that guarantee, in the way you have done it.
So in short, the compiler will not do that optimization if the end-index is anything other than a variable.
In the case of a variable, where the compiler can just look at the loop-code and see that this particular variable is not changed, it might do that and load the value into a register before starting the loop, but any performance gain from this would most likely be negligible, unless your loop body is empty.
Conclusion: If you know, or is willing to accept, that the end loop index is constant for the duration of the loop, place it into a variable.
Edit: Re-read your post, and yes, you might see negligible performance gains for your two cases as well, because the JITter optimizes the code. The JITter might optimize your end-index read into a direct access to the variable inside the data table that contains the row count, and a memory read isn't all that expensive anyway. If, on the other hand, reading that property was a very expensive operation, you'd see a more noticable difference.
Related
I know that in the following code array 'Length' property is not called on every loop iteration because Jit compiler is clever enough to recognize it as property (not method) and optimize the code to call it only once storing internally the value in the temporary variable:
Int32[] myArr = new Int32[100];
for(Int32 index = 0; index < myArr.Length; index++) {
// Do something with the current item
}
So there is no need for developer to trying to optimize this by caching the length to a local variable.
My question: is it true for all collection types in .Net? For instance, suppose I have a List and call 'Count' property in for loop. Shouldn't I optimize this?
I know that in the following code array 'Length' property is not called on every loop iteration because Jit compiler is clever enough to recognize it as property (not method)
Though Length is, from the perspective of the type system a property, from the perspective of the jitter it is a special instruction just for loading the length of an array. That's the thing that enables the jitter to perform this optimization.
I note also that there is a larger optimization which you failed to mention. Checking the length is cheap. The larger optimization here is that the jitter can elide checks on the index operation into the array because it knows that the loop variable will always be in the bounds of the array. Since those checks can throw exceptions, by eliding them the jitter makes it easier to analyze the control flow of the method, and can therefore potentially do even better optimizations to the rest of the method.
is it true for all collection types in .Net?
The jitter is permitted to make that optimization for other collection types if it can prove that doing so is correct. Whether it actually does so or not you can test using science; observe the code generated by the jitter and see if it has this optimization. My guess would be that the jitter does not have this optimization.
For instance, suppose I have a List and call 'Count' property in for loop. Shouldn't I optimize this?
Now we come to the real crux of the question. Absolutely you should not optimize this. Optimization is incredibly expensive; your employer pays you for every minute you're spending optimizing your code, so you should be optimizing the thing that produces the biggest user-observable win. Getting the count of a list takes nanoseconds. There is no program in the world whose success in the marketplace was determined by whether or not someone removed a couple nanoseconds from a loop that checked a list count unnecessarily.
The way to spend your time doing performance optimizations is first have a customer-focused goal. That's what lets you know when you can stop worrying about performance and spend your money on something more important. Second, measure progress against that goal every day. Third, if and ONLY if you are not meeting your goal, use a profiler to determine where the code is actually not sufficiently performant. Then optimize the heck out of that thing, and that thing alone.
Nano-optimizations are not the way to engineer performance. They just make the code harder to read and maintain. Use a good engineering discipline when analyzing performance, not a collection of tips and tricks.
I'm reading about how to force an operation to throw an overflow exception, and on the "try it yourself" section, I had it in a different place than the book. I'm curious if there's a performance issue associated with one spot or the other, as I'm not certain the underlying mechanics of the checked keyword.
The example in the book was doing a factorial, which will quickly throw an overflow, even with an unsigned long. This is the code that I came up with:
static long Factorial (long number) {
long result = 1;
for (int i = 2; i <= number; i++) {
checked {
result *= i;
}
}
return result;
}
However, looking at the answer page in the back of the book, they had the checked wrapped around the entire body of the function, including the return and long result = 1;. Obviously you'll never need one in those places, so if anything, I would just wrap the for loop in the check.
Is the presence of it inside the loop causing some underlying CLR code to be generated repeatedly? (Like why you declare a variable before entering a for loop.) Or is there no overhead having it inside the loop?
There is going to be little difference in terms of the compiled results.
The main difference is that any arithmetic operation within the checked block will use a different IL instruction. There aren't more instructions, just different ones. Instead of mul, you get mul.ovf - instead of add, you get add.ovf, etc.
Your version actually has slightly different behavior, however. Since you're putting the checked block in the tigher scope, the variable increment (i++) will still be unchecked. The original would have been checked all the way through, which means that the i++ could throw, not just the multiplication operation. This does mean your version is faster, but only because you're avoiding the overflow checks and changing the resulting behavior, not because of the scope change.
Is the presence of it inside the loop causing some underlying CLR code to be generated repeatedly?
No, it just means those IL instructions inside of that scope will get the different IL operation codes with overflow checks instead of the standard ones.
Or is there no overhead having it inside the loop?
There is no overhead (other than the extra overhead of the checks in the instructions itself).
I am trying to optimize my code and was running VS performance monitor on it.
It shows that simple assignment of float takes up a major chunk of computing power?? I don't understand how is that possible.
Here is the code for TagData:
public class TagData
{
public int tf;
public float tf_idf;
}
So all I am really doing is:
float tag_tfidf = td.tf_idf;
I am confused.
I'll post another theory: it might be the cache miss of the first access to members of td. A memory load takes 100-200 cycles which in this case seems to amount to about 1/3 of the total duration of the method.
Points to test this theory:
Is your data set big? It bet it is.
Are you accessing the TagData's in random memory order? I bet they are not sequential in memory. This causes the memory prefetcher of the CPU to be dysfunctional.
Add a new line int dummy = td.tf; before the expensive line. This new line will now be the most expensive line because it will trigger the cache miss. Find some way to do a dummy load operation that the JIT does not optimize out. Maybe add all td.tf values to a local and pass that value to GC.KeepAlive at the end of the method. That should keep the memory load in the JIT-emitted x86.
I might be wrong but contrary to the other theories so far mine is testable.
Try making TagData a struct. That will make all items of term.tags sequential in memory and give you a nice performance boost.
Are you using LINQ? If so, LINQ uses lazy enumeration so the first time you access the value you pulled out, it's going to be painful.
If you are using LINQ, call ToList() after your query to only pay the price once.
It also looks like your data structure is sub optimal but since I don't have access to your source (and probably couldn't help even if I did :) ), I can't tell you what would be better.
EDIT: As commenters have pointed out, LINQ may not be to blame; however my question is based on the fact that both foreach statements are using IEnumerable. The TagData assignment is a pointer to the item in the collection of the IEnumerable (which may or may not have been enumerated yet). The first access of legitimate data is the line that pulls the property from the object. The first time this happens, it may be executing the entire LINQ statement and since profiling uses the average, it may be off. The same can be said for tagScores (which I'm guessing is database backed) whose first access is really slow and then speeds up. I wasn't pointing out the solution just a possible problem given my understanding of IEnumerable.
See http://odetocode.com/blogs/scott/archive/2008/10/01/lazy-linq-and-enumerable-objects.aspx
As we can see that next line to the suspicious one takes only 0.6 i.e
float tag_tfidf = td.tf_idf;//29.6
string tagName =...;//0.6
I suspect this is caused bu the excessive number of calls, and also note float is a value type, meaning they are copied by value. So everytime you assign it, runtime creates new float (Single) struct and initializes it by copying the value from td.tf_idf which takes huge time.
You can see string tagName =...; doesn't takes much because it is copied by reference.
Edit: As comments pointed out I may be wrong in that respect, this might be a bug in profiler also, Try re profiling and see if that makes any difference.
I read today a post about performance improvement in C# and Java.
I still stuck on this one:
19. Do not overuse instance variables
Performance can be improved by using local variables. The code in example 1 will execute faster than the code in Example 2.
Example1:
public void loop() {
int j = 0;
for ( int i = 0; i<250000;i++){
j = j + 1;
}
}
Example 2:
int i;
public void loop() {
int j = 0;
for (i = 0; i<250000;i++){
j = j + 1;
}
}
Indeed, I do not understand why it should be faster to instantiate some memory and release it every time a call to the loop function is done when I could do a simple access to a field.
It's pure curiosity, I'm not trying to put the variable 'i' in the class' scope :p
Is that true that's faster to use local variables? Or maybe just in some case?
Stack faster then Heap.
void f()
{
int x = 123; // <- located in stack
}
int x; // <- located in heap
void f()
{
x = 123
}
Do not forget the principle of locality data. Local data should be better cached in CPU cache. If the data are close, they will loaded entirely into the CPU cache, and the CPU does not have to get them from memory.
The performance is down to the number of steps required to get the variable. Local variable addresses are known at compile time (they are a known offset on the stack), to access a member you load the object 'this' to get the address of the actual object, before you can get the address of the member variable.
In C# another minor difference is the number of generated MSIL instructions (I guess it's similar in Java).
It takes two instructions to load an instance field:
ldarg.0 // load "this" reference onto stack
ldfld MyClass.myField // find the field and load its value
...but it only takes one instruction to load a local variable:
ldloc.0 // load the value at index 0 from the list of local variables
Even if it will be, there will be almost non measurable difference in this cases. Probabbly in first case, there is some optimization done on processor registry level, but again:
it's almost irrelevant
and what is more important, often unpredictable.
In terms of memory, it's exactly the same, there is no any difference.
The first case it generaly better: as you declare variable there were it's imediately used, which is commonly used good pattern, as it's
easy to understand (scopes of responsibilities)
easy refactor
I tested a calculation with 500,000 iterations where I used about 20 variables locally and one that does it with fields. The local variable test was about 20 milliseconds and the one with fields was about 30 milliseconds. A significant performance gain when you use local variables.
Whether the performance difference is relevant, depends on the project. In your average business application the performance gain may not be noticeable and it is better to go for readable / maintainable code, but I am working on sound synthesis software where nano-optimizations like this actually become relevant.
I suspect there's very little difference, however in the case where the variable is a member of the object, each access requires an indirection via this (effectively), whereas the local variable does not.
More generally, the object has no need for a member i, it's only used in the context of the loop, so making it local to its use is better in any case.
One of my programming philosophy is that defining variables just before it is really being used the first time. For example the way of defining variable 'x', I usually don't write code like this:
var total =0;
var x;
for(int i=0;i<100000;i++)
{
x = i;
total += x;
}
Instead, I prefer to this:
var total = 0;
for(int i=0;i<100000;i++)
{
var x = i;
total = +x;
}
This is just an example code, don't care about the real meaning of the code.
what downsides is the second way? performance?
Don't bother yourself with performance unless you really really need to (hint: 99% of the time you don't need to).
My usual philosophy (which has been confirmed by books like "The Art of Readable Code") is to declare variables in the smallest scope possible. The reason being that in terms of readability and code comprehension the less variables you have to think about at any one time the better. And defining variables in a smaller scope definitely helps with that.
Also, often times if a compiler is able to determine that (in the case of your example) moving the variable outside of the for loop to save having to create/destroy it every iteration won't change the outcome but will help performance he'll do it for you. And that's another reason not to bother with performance, the compiler is usually smarter about it than we are.
There is no performance implications, only the scope ones. You should always define variables in the innermost scope possible. This improves readability of your program.
The only "downside" is that the second version need compiler support. Old compilers needed to know all the variables the function(or a scope inside it) will be using, so you had to declare the variables in a special section(Pascal) or in the beginning of the block(C). This is not really a problem nowadays - C is the only language that does not support declaring variables anywhere and still being widely used.
The problem is that C is the most common first-language they teach in schools and universities. They teach you C, and force you to declare all variables at the beginning of the block. Then they teach you a more modern language, and because you are already used to declaring all variables at the beginning, they need to teach you to not do it.
If your first language allows you to declare a variable anywhere in the function's body, you would instinctively declare it just before you use it, and they wouldn't need to tell you that declaring variables beforehand is bad just like they don't need to tell you that smashing your computer with a 5 Kilo hammer is bad.
I recommend, like most, to keep variables within an inner scope, but exceptions
occur and I think that is what you are seeking.
C++ potentially has expensive constructor/destructor time that would be best paid for once, rather than N times. Compare
void TestPrimacyOfNUnsignedLongs(int n) {
PrimeList List(); // Makes a list of all unsigned long primes
for (int i = 0; i<n; i++) {
unsinged long x = random_ul();
if (List.IsAPrime(x)) DoThis();
}
}
or
void TestPrimacyOfNUnsignedLongs(int n) {
for (int i = 0; i<n; i++) {
PrimeList List(); // Makes a list of all unsigned long primes
unsinged long lx = random_ul();
if (List.IsAPrime(x)) DoThis();
}
}
Certainly, I could put List inside the for loop, but at a significant run time cost.
Having all variables of the same scope in the same location of the code is easier to see what variables you have and what data type there are. You don't have to look through the entire code to find it.
You have different scopes for the x variable. In the second example, you won't be able to use the x variable outside the loop.