What downsides is declaring variables just before first time use it? - c#

One of my programming philosophy is that defining variables just before it is really being used the first time. For example the way of defining variable 'x', I usually don't write code like this:
var total =0;
var x;
for(int i=0;i<100000;i++)
{
x = i;
total += x;
}
Instead, I prefer to this:
var total = 0;
for(int i=0;i<100000;i++)
{
var x = i;
total = +x;
}
This is just an example code, don't care about the real meaning of the code.
what downsides is the second way? performance?

Don't bother yourself with performance unless you really really need to (hint: 99% of the time you don't need to).
My usual philosophy (which has been confirmed by books like "The Art of Readable Code") is to declare variables in the smallest scope possible. The reason being that in terms of readability and code comprehension the less variables you have to think about at any one time the better. And defining variables in a smaller scope definitely helps with that.
Also, often times if a compiler is able to determine that (in the case of your example) moving the variable outside of the for loop to save having to create/destroy it every iteration won't change the outcome but will help performance he'll do it for you. And that's another reason not to bother with performance, the compiler is usually smarter about it than we are.

There is no performance implications, only the scope ones. You should always define variables in the innermost scope possible. This improves readability of your program.

The only "downside" is that the second version need compiler support. Old compilers needed to know all the variables the function(or a scope inside it) will be using, so you had to declare the variables in a special section(Pascal) or in the beginning of the block(C). This is not really a problem nowadays - C is the only language that does not support declaring variables anywhere and still being widely used.
The problem is that C is the most common first-language they teach in schools and universities. They teach you C, and force you to declare all variables at the beginning of the block. Then they teach you a more modern language, and because you are already used to declaring all variables at the beginning, they need to teach you to not do it.
If your first language allows you to declare a variable anywhere in the function's body, you would instinctively declare it just before you use it, and they wouldn't need to tell you that declaring variables beforehand is bad just like they don't need to tell you that smashing your computer with a 5 Kilo hammer is bad.

I recommend, like most, to keep variables within an inner scope, but exceptions
occur and I think that is what you are seeking.
C++ potentially has expensive constructor/destructor time that would be best paid for once, rather than N times. Compare
void TestPrimacyOfNUnsignedLongs(int n) {
PrimeList List(); // Makes a list of all unsigned long primes
for (int i = 0; i<n; i++) {
unsinged long x = random_ul();
if (List.IsAPrime(x)) DoThis();
}
}
or
void TestPrimacyOfNUnsignedLongs(int n) {
for (int i = 0; i<n; i++) {
PrimeList List(); // Makes a list of all unsigned long primes
unsinged long lx = random_ul();
if (List.IsAPrime(x)) DoThis();
}
}
Certainly, I could put List inside the for loop, but at a significant run time cost.

Having all variables of the same scope in the same location of the code is easier to see what variables you have and what data type there are. You don't have to look through the entire code to find it.
You have different scopes for the x variable. In the second example, you won't be able to use the x variable outside the loop.

Related

Is there performance gain or another reason for declaring all local variables before?

I see this style a lot in our code base and online as well, where if you have a function with for loops, and if statements, all the variables that only they use and nothing else are declared outside them. For example:
void process()
{
int i;
int count = 100;
vector3 point;
vector sum;
for (i = 0; i < count; ++i)
{
import(this, "pos", point);
sum += point;
}
sum /= count;
}
Or is this premature optimization? I am curious about it for C++, C# and Python which are the languages I use and where I saw these over and over again.
A lot of older code does this because it was required in C89/90. Well, to be technical, it was never required that variables be defined at the beginning on the function, only at the beginning of a block. For example:
int f() {
int x; // allowed
x = 1;
int y; // allowed in C++, but not C89
{
int z=0; // beginning of new block, so allowed even in C89
// code that uses `z` here
}
}
C++ has never had this restriction (and C hasn't in quite a while either), but for some old habits die hard. For others, maintaining consistency across the code base outweighs the benefits of defining variables close to where they're used.
As far as optimization goes, none of this will normally have any effect at all.
It makes a difference in python. It's a scoping issue where python will first search through a dictionary containing local variables and then work his way upwards to global and then built-ins.
There is a slight speed increase for this in python, although generally not really a lot. Check THIS question to see more details for python, including some tests.
I can't comment on C++ or C# but because they are compiled languages it shouldn't really matter.
It makes no difference. It's on the stack either way.

Local variables or class fields?

I read today a post about performance improvement in C# and Java.
I still stuck on this one:
19. Do not overuse instance variables
Performance can be improved by using local variables. The code in example 1 will execute faster than the code in Example 2.
Example1:
public void loop() {
int j = 0;
for ( int i = 0; i<250000;i++){
j = j + 1;
}
}
Example 2:
int i;
public void loop() {
int j = 0;
for (i = 0; i<250000;i++){
j = j + 1;
}
}
Indeed, I do not understand why it should be faster to instantiate some memory and release it every time a call to the loop function is done when I could do a simple access to a field.
It's pure curiosity, I'm not trying to put the variable 'i' in the class' scope :p
Is that true that's faster to use local variables? Or maybe just in some case?
Stack faster then Heap.
void f()
{
int x = 123; // <- located in stack
}
int x; // <- located in heap
void f()
{
x = 123
}
Do not forget the principle of locality data. Local data should be better cached in CPU cache. If the data are close, they will loaded entirely into the CPU cache, and the CPU does not have to get them from memory.
The performance is down to the number of steps required to get the variable. Local variable addresses are known at compile time (they are a known offset on the stack), to access a member you load the object 'this' to get the address of the actual object, before you can get the address of the member variable.
In C# another minor difference is the number of generated MSIL instructions (I guess it's similar in Java).
It takes two instructions to load an instance field:
ldarg.0 // load "this" reference onto stack
ldfld MyClass.myField // find the field and load its value
...but it only takes one instruction to load a local variable:
ldloc.0 // load the value at index 0 from the list of local variables
Even if it will be, there will be almost non measurable difference in this cases. Probabbly in first case, there is some optimization done on processor registry level, but again:
it's almost irrelevant
and what is more important, often unpredictable.
In terms of memory, it's exactly the same, there is no any difference.
The first case it generaly better: as you declare variable there were it's imediately used, which is commonly used good pattern, as it's
easy to understand (scopes of responsibilities)
easy refactor
I tested a calculation with 500,000 iterations where I used about 20 variables locally and one that does it with fields. The local variable test was about 20 milliseconds and the one with fields was about 30 milliseconds. A significant performance gain when you use local variables.
Whether the performance difference is relevant, depends on the project. In your average business application the performance gain may not be noticeable and it is better to go for readable / maintainable code, but I am working on sound synthesis software where nano-optimizations like this actually become relevant.
I suspect there's very little difference, however in the case where the variable is a member of the object, each access requires an indirection via this (effectively), whereas the local variable does not.
More generally, the object has no need for a member i, it's only used in the context of the loop, so making it local to its use is better in any case.

Is it always best practice to declare a variable?

I'm new to C# and any form of programming, and I have a question that seems to divide those in the know in my university faculty. That question is simply: do I always have to declare a variable? As a basic example of what I'm talking about: If I have int pounds and int pence do I need to declare int money into which to put the answer or is it ok to just have:
textbox1.Text = (pounds + pence).ToString();
I know both work but i'm thinking in terms of best practice.
Thanks in advance.
In my opinion the answer is "no". You should, however, use variables in some cases:
Whenever a value is used multiple times
When a call to an expensive function is done, or one that has side-effects
When the expression needs to be made more self-explaining, variable (with meaningful names) do help
Basically, follow your common sense. The code should be self-explaining and clear, if introducing variables helps with that then use them.
Maintenance is king. It's the most expensive part of software development and anything you can do to make it easier is good.
Variables are good because when debugging, you can inspect the results of functions and calculations.
Absolutely not. The only time I would create a variable for a single use like this is if it significantly increases the readability of my code.
In my opinion if you do something like
int a = SomeFunc();
int b = SomeFunc2();
int c = a + b;
SomeFunc3(c);
it is better to just do
int a = SomeFunc();
int b = SomeFunc2();
SomeFunc3(a + b);
or even just
SomeFunc3(SomeFunc() + SomeFunc2());
If I am not manipulating with the variable after it's calculated then I think it's just better not to declare it because you just get more lines of code and way more room to make some mistake later on when your code gets bigger
Variables come to serve the following two purposes above all else:
Place holder for data (and information)
Readability enhancer
of course more can be said about the job of a variable, but those other tasks are less important here, and are far less relevant.
The above two points have the same importance as far as I'm concerned.
If you think that declaring a variable will enhance readability, or if you think that the data stored in that variable will be needed many times (and in which case, storing it in a well name var will again increase readability), then by all means create a new variable.
The only time I strictly advice against creating more variables is when the clutter of too-many-variables impacts readability more then aids it, and this cannot be undone by method extraction.
I would suggest that logging frequently makes variable declaration worthwile, and when you need to know what something specific is and you need to track that specific value. And you are logging, aren't you? Logging is good. Logging is right. Logging is freedom and unicorns and happy things.
I don't always use a variable. As an example if have a method evaluating something and returning true/false, I typically am returning the expression. The results are logged elsewhere, and I have the inputs logged, so I always know what happened.
Localisation and scope
For a programmer, knowledge of local variables - their content and scopes - is an essential part of the mental effort in comprehending and evolving the code. When you reduce the number of concurrent variables, you "free up" the programmer to consider other factors. Minimising scope is a part of this. Every little decision implies something about your program.
void f()
{
int x = ...; // "we need x (or side effect) in next scope AND
// thereafter..."
{
int n = ...; // "n isn't needed at function scope..."
...
} // can "pop" that n mentally...
...
}
The smallest scope is a literal or temporary result. If a value is only used once I prefer to use comments rather than a variable (they're not restricted to A-Za-z0-9_ either :-)):
x = employees.find("*", // name
retirement.qualify_at(), // age
num_wives() + num_kids()); // # dependents
Concision
Keeping focused on what your program is achieving is important. If you have a lot of screen real-estate (i.e. lines of code) getting fields into variables, you've less code on screen that's actually responsible for getting algorithmic level stuff done, and hence it's less tangible to the programmer. That's another reason to keep the code concise, so:
keep useful documentation targetted and concise
It depends on the situation. There is no one practice that would be best for all. For something this simple, you can skip creating a new variable but the best thing to do is step back and see how readable your expression is and see if introducing an intermediate variable will help the situation.
There are two objectives in making this decision:
Readability - the code is readable
and self-explanatory Code
Optimization - the code doesn't have
any unnecessary calculation
If you look at this as an optimization problem it might seem less subjective
Most readable on a scale from 1 to 10 with 10 being the easiest. Using sensible variable names may give you a 2, showing the calculation in line may give you a 3 (since the user doesn't have to look up what "money" is, it's just there in that line of code). etc etc. This piece is subjective, you and the companies you work for define what is readable and you can build this cost model from that experience.
Most optimal execution is not subjective. If you write "pounds + pence" everywhere you want the money calculation to go, you are wasting processor time. Yes I know addition is a bad example, but it still holds true. Say minimum execution of a process is simplified to memory allocation of variables, assignments, and calculations. Maybe one or two of these additions in the code will be ok for readability, but at some point it becomes a complete waste on the processor. This is why variables exist, allocate some space to store the value, name it money so the user knows what it is, and reference that variable "money" everywhere it's needed.
This makes more sense as you look at loops. lets say you want to sum 1000 values in the following loop.
money = factor[1] + factor[2] + ... + factor[n]
You could do this everywhere you want to use the value money so that anyone that reads your code knows what money consists of, instead, just do it once and write in some comments when you first calculate money so any programmers can come back and reference that.
Long story short, if you only use money once and it's clear what the inline calculation means, then of course, don't make a variable. If you plan on using it throughout your code and it's meaning becomes remotely confusing, then declare a variable, save the processor and be done with it!
Note: partially kidding about this approach, just thought it was funny to answer something like this in a cost model format :) still useful I'de say
I don't recall ever seeing anything like is, and I think it's more tied to different "styles" of programming. Some styles, such a Spartan programming actually attempts to declare as few as possible. If you aren't trying to follow a particular style, then it's best to go off of readability. In your example I wouldn't declare a special variable to hold it. It you were calculating taxes based off some percentage of the total of those, then I may- or at the very least I would comment what I was calculating.

Assigning integers fields/properties to zero in a constructor

During a recent code review a colleague suggested that, for class with 4 int properties, assigning each to zero in the constructor would result in a performance penalty.
For example,
public Example()
{
this.major = 0;
this.minor = 0;
this.revision = 0;
this.build = 0;
}
His point was that this is redundant as they will be set to zero by default and you are introducing overhead by essentially performing the same task twice. My point was that the performance hit would be negligible if one existed at all and this is more readable (there are several constructors) as the intention of the state of the object after calling this constructor is very clear.
What do you think? Is there a performance gain worth caring about here?
No, there is not. The compiler will optimize out these operations; the same task will not be performed twice. Your colleague is wrong.
[Edit based upon input from the always-excellent Jon Skeet]
The compiler SHOULD optimize out the operations, but apparently they are not completely optimized out; however, the optimization gain is completely negligible, and the benefit from having the assignment be so explicit is good. Your colleague may not be completely wrong, but they're focusing on a completely trivial optimization.
I don't believe they're the same operation, and there is a performance difference. Here's a microbenchmark to show it:
using System;
using System.Diagnostics;
class With
{
int x;
public With()
{
x = 0;
}
}
class Without
{
int x;
public Without()
{
}
}
class Test
{
static void Main(string[] args)
{
int iterations = int.Parse(args[0]);
Stopwatch sw = Stopwatch.StartNew();
if (args[1] == "with")
{
for (int i = 0; i < iterations; i++)
{
new With();
}
}
else
{
for (int i = 0; i < iterations; i++)
{
new Without();
}
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
}
Results:
c:\Users\Jon\Test>test 1000000000 with
8427
c:\Users\Jon\Test>test 1000000000 without
881
c:\Users\Jon\Test>test 1000000000 with
7568
c:\Users\Jon\Test>test 1000000000 without
819
Now, would that make me change the code? Absolutely not. Write the most readable code first. If it's more readable with the assignment, keep the assignment there. Even though a microbenchmark shows it has a cost, that's still a small cost in the context of doing any real work. Even though the proportional difference is high, it's still creating a billion instances in 8 seconds in the "slow" route. My guess is that there's actually some sort of optimization for completely-empty constructors chaining directly to the completely empty object() constructor. The difference between assigning to two fields and only assigning to one field is much smaller.
In terms of while the compiler can't optimize it out, bear in mind that a base constructor could be modifying the value by reflection, or perhaps a virtual method call. The compiler could potentially notice those, but it seems a strange optimization.
My understanding is that objects memory is cleared to zero with a simple and very fast memory wipe. these explicit assignments, however, will take additional IL. Indeed, some tools will spot you assigning the default value (in a field initialiser) and advise against it.
So I would say: don't do this - it is potentially marginally slower. But not by much. In short I think your friend is correct.
Unfortunately I'm on a mobile device right now, without the right tools to prove it.
You should focus on code clarity, that is the most important thing. If performance becomes an issue, then measure performance, and see what your bottlenecks are, and improve them. It's not worth it to spend so much time worrying about performance when ease of understanding code is more important.
You can initialize them as fields directly:
public int number = 0;
And is also clear.
The more important question is: Is there a really readability gain? If the people who are maintaining the code already know that ints are assigned to zero, this is just some more code they have to parse. Perhaps the code would be cleaner without lines that do nothing.
In fact, I'd use the assignment in constructor, just for readability, and for marking the 'I didn't forget to initialize those' intention. Relying on default behavior tends to confuse another developers.
I don't think you should care about performance hit, usually there are many other places where program can be optimized. On the other hand I don't see any gain from specifying these values in the constructor since they are going to be set to 0 anyway.

DataTable Loop Performance Comparison

Which of the following has the best performance?
I have seen method two implemented in JavaScript with huge performance gains, however, I was unable to measure any gain in C# and was wondering if the compiler already does method 2 even when written like method 1.
The theory behind method 2 is that the code doesn't have to access DataTable.Rows.Count on every iteration, it can simple access the int c.
Method 1
for (int i = 0; i < DataTable.Rows.Count; i++) {
// Do Something
}
Method 2
for (int i = 0, c = DataTable.Rows.Count; i < c; i++) {
// Do Something
}
No, it can't do that since there is no way to express constant over time for a value.
If the compiler should be able to do that, there would have to be a guarantee from the code returning the value that the value is constant, and for the duration of the loop won't change.
But, in this case, you're free to add new rows to the data table as part of your loop, and thus it's up to you to make that guarantee, in the way you have done it.
So in short, the compiler will not do that optimization if the end-index is anything other than a variable.
In the case of a variable, where the compiler can just look at the loop-code and see that this particular variable is not changed, it might do that and load the value into a register before starting the loop, but any performance gain from this would most likely be negligible, unless your loop body is empty.
Conclusion: If you know, or is willing to accept, that the end loop index is constant for the duration of the loop, place it into a variable.
Edit: Re-read your post, and yes, you might see negligible performance gains for your two cases as well, because the JITter optimizes the code. The JITter might optimize your end-index read into a direct access to the variable inside the data table that contains the row count, and a memory read isn't all that expensive anyway. If, on the other hand, reading that property was a very expensive operation, you'd see a more noticable difference.

Categories