In C#, is there a benefit or drawback to re-initializing a previously declared variable instead of declaring and initializing a new one? (Ignoring thoughts on conciseness and human readability.)
For example, compare these two samples:
DataColumn col = new DataColumn();
col.ColumnName = "Subsite";
dataTable.Columns.Add(col);
col = new DataColumn(); // Re-use the "col" declaration.
col.ColumnName = "Library";
dataTable.Columns.Add(col);
vs
DataColumn col1 = new DataColumn();
col1.ColumnName = "Subsite";
gridDataTable.Columns.Add(col1);
DataColumn col2 = new DataColumn(); // Declare a new variable instead.
col2.ColumnName = "Library";
gridDataTable.Columns.Add(col2);
A similar example involving loops:
string str;
for (int i = 0; i < 100; i++)
{
str = "This is string #" + i.ToString(); // Re-initialize the "str" variable.
Console.WriteLine(str);
}
vs
for (int i = 0; i < 100; i++)
{
string str = "This is string #" + i.ToString(); // Declare a new "str" each iteration.
Console.WriteLine(str);
}
Edit: Thank you all for your answers so far. After reading them, I thought I'd expand on my question a bit:
Please correct me if I'm wrong.
When I declare and initialize a reference type like a System.String, I have a pointer to that object, which exists on the stack, and the object's contents, which exist on the heap (only accessible through the pointer).
In the first looping example, it seems like we create only one pointer, "str", and we create 100 instances of the String class, each of which exists on the heap. In my mind, as we iterate through the loop, we are merely changing the "str" pointer to point at a new instance of the String class each time. Those "old" strings that no longer have a pointer to them will be garbage collected--although I'm not sure when that would occur.
In the second looping example, it seems like we create 100 pointers in addition to creating 100 instances of the String class.
I'm not sure what happens to items on the stack that are no longer needed, though. I didn't think the garbage collector got rid of those items too; perhaps they are immediately removed from the stack as soon as you exit their scope? Even if that's true, I'd think that creating only one pointer and updating what it points to is more efficient than creating 100 different pointers, each pointing to a unique instance.
I understand the "premature optimization is evil" argument, but I'm only trying to gain a deeper understanding of things, not optimize my programs to death.
Your second example has a much clearer answer, the second example is the better one. The reason why is that the variable str is only used within the for block. Declaring the variable outside the for block means that it's possible for another piece of code to incorrectly bind to this variable and hence cause bugs in your application. You should declare all variables in the most specific scope possible to prevent accidental usage.
For the first sample I believe it's more a matter of preference. For me, I chose to create a new variable because I believe each variable should have a single purpose. If I am reusing variables it's usually a sign that I need to refactor my method.
This sounds suspiciously like a question designed to provide information for premature optimization. I doubt that either scenario is different in any way that matters in 99.9% of software. The memory is being created and used either way. The only difference is the variable references.
To find out if there is a benefit or drawback, you would need a situation where you really care about the size or the performance of the assembly. If you can't meet the size requirements, then measure the assembly size differences between the two choices (although you're more likely to make gains in other areas). If you can't meet the performance requirements, then use a profiler to see which part of your code is working too slowly.
It is primarily a readability issue, whether you use the same declared name or not doesn't matter, since either way you are creating two separate objects. You really should create the objects or variables with a single focus in mind, it will make your life easier.
As for your second example, the only real difference in intialization is that by placing your string outside the scope of the "for" loop, you are leaving it exposed to more outside influences, which can be useful at times. There is no memory or speed benefit for declaring it inside or outside of the loop. Remember, anytime you make a change to a string variable, you are essentially creating a new string. So, for example:
string test = "new string";
test = "and now I am reusing the string";
is the same as creating two separate strings, like:
string test1 = "new string";
string test2 = "and now I am reusing the string";
To get around this, you would use the StringBuilder class, which allows you to modify a string without creating a new string, and should be used in situations where a string will be heavily modified, especially inside a loop.
What about using the word "using" which destroy the object itself after the braces ends?
I am not sure but that's what I think. I'd like to know your opinions too.
For the second example, I use the second one always, I am not sure as well, but for example at some problems at the ACM-ICPC contest I used to have lost of bugs because forgetting to re-initialize, so I used this way.
For the most part, no, there should be no difference. The main difference would be that local variables (in the case of classes, their "pointer") are stored on the stack and in your first case, if your function were for some reason recursive, having two local vars instead of one will cause you to run out of stack space faster in a deep-recursive function. Getting close to that limit in either case would be a sign that you should probably use a non-recursive method.
Also, just to mention it, you could skip the variables altogether and write:
dataTable.Columns.Add(new DataColumn() { ColumnName = "Subsite" });
dataTable.Columns.Add(new DataColumn() { ColumnName = "Library" });
Which I believe performance-wise will be like having 2 local variables, but I could be wrong there. I cant remember exactly what that produces in IL code though.
Related
What is the difference in both and which is recommended and why?
var getDetailsReq = new getTransactionDetailsRequest
{
transId = transactionResponse.Payload.Id
};
var getDetailsCont = new getTransactionDetailsController(getDetailsReq);
vs
var getDetailsCont = new getTransactionDetailsController(new getTransactionDetailsRequest
{
transId = transactionResponse.Payload.Id
});
The first one holds the address of the object in the memory and it will clean out when dispose off
The second one will go untraceable and will be lost somewhere in memory
makes sense or do you have something to correct me?
They are functionally equivalent if the reference is not use again by the caller. It's very possible the optimizer will remove the temporary variable anyway if it is not used by the calling scope.
The second one will go untraceable and will be lost somewhere in memory
Well, you pass it to getTransactionDetailsController, so presumably it does something with the reference. Once the garbage collector detects that no objects have a reference to the object, it will be garbage collected (not disposed).
So use whichever one you feel is better - there is no practical guidance that I know of.
Performance wise? Nothing.
Readability-wise? I'd argue that the first one is more readable and maintainable than the second one:
You can set a breakpoint on the appropriate constructor when debugging.
You can easily inspect / watch the values of getDetailsReq.
If you need getDetailsReq somewhere else in your code, use method 1. Otherwise, it shouldn't make a difference
I was having a discussion with a colleague the other day about this hypothetical situation. Consider this pseudocode:
public void Main()
{
MyDto dto = Repository.GetDto();
foreach(var row in dto.Rows)
{
ProcessStrings(row);
}
}
public void ProcessStrings(DataRow row)
{
string string1 = GetStringFromDataRow(row, 1);
string string2 = GetStringFromDataRow(row, 2);
// do something with the strings
}
Then this functionally identical alternative:
public void Main()
{
string1 = null;
string2 = null,
MyDto dto = Repository.GetDto();
foreach(var row in dto.Rows)
{
ProcessStrings(row, string1, string2)
}
}
public void ProcessStrings(DataRow row, string string1, string string2)
{
string1 = GetStringFromDataRow(row, 1);
string2 = GetStringFromDataRow(row, 2);
// do something with the strings
}
How will these differ in processing when running the compiled code? Are we right in thinking the second version is marginally more efficient because the string variables will take up less memory and only be disposed once, whereas in the first version, they're disposed of on each pass of the loop?
Would it make any difference if the strings in the second version were passed by ref or as out parameters?
When you're dealing with "marginally more efficient" level of optimizations you risk not seeing the whole picture and end up being "marginally less efficient".
This answer here risks the same thing, but with that caveat, let's look at the hypothesis:
Storing a string into a variable creates a new instance of the string
No, not at all. A string is an object, what you're storing in the variable is a reference to that object. On 32-bit systems this reference is 4 bytes in size, on 64-bit it is 8. Nothing more, nothing less. Moving 4/8 bytes around is overhead that you're not really going to notice a lot.
So neither of the two examples, with the very little information we have about the makings of the methods being called, creates more or less strings than the other so on this count they're equivalent.
So what is different?
Well in one example you're storing the two string references into local variables. This is most likely going to be cpu registers. Could be memory on the stack. Hard to say, depends on the rest of the code. Does it matter? Highly unlikely.
In the other example you're passing in two parameters as null and then reusing those parameters locally. These parameters can be passed as cpu registers or stack memory. Same as the other. Did it matter? Not at all.
So most likely there is going to be absolutely no difference at all.
Note one thing, you're mentioning "disposal". This term is reserved for the usage of objects implementing IDisposable and then the act of disposing of these by calling IDisposable.Dispose on those objects. Strings are not such objects, this is not relevant to this question.
If, instead, by disposal you mean "garbage collection", then since I already established that neither of the two examples creates more or less objects than the others due to the differences you asked about, this is also irrelevant.
This is not important, however. It isn't important what you or I or your colleague thinks is going to have an effect. Knowing is quite different, which leads me to...
The real tip I can give about optimization:
Measure
Measure
Measure
Understand
Verify that you understand it correctly
Change, if possible
You measure, use a profiler to find the real bottlenecks and real time spenders in your code, then understand why those are bottlenecks, then ensure your understanding is correct, then you can see if you can change it.
In your code I will venture a guess that if you were to profile your program you would find that those two examples will have absolutely no effect whatsoever on the running time. If they do have effect it is going to be on order of nanoseconds. Most likely, the very act of looking at the profiler results will give you one or more "huh, that's odd" realizations about your program, and you'll find bottlenecks that are far bigger fish than the variables in play here.
In both of your alternatives, GetStringFromDataRow creates new string every time. Whether you store a reference to this string in a local variable or in argument parameter variable (which is essentially not much different from local variable in your case) does not matter. Imagine you even not assigned result of GetStringFromDataRow to any variable - instance of string is still created and stored somewhere in memory until garbage collected. If you would pass your strings by reference - it won't make much difference. You will be able to reuse memory location to store reference to created string (you can think of it as the memory address of string instance), but not memory location for string contents.
One of my programming philosophy is that defining variables just before it is really being used the first time. For example the way of defining variable 'x', I usually don't write code like this:
var total =0;
var x;
for(int i=0;i<100000;i++)
{
x = i;
total += x;
}
Instead, I prefer to this:
var total = 0;
for(int i=0;i<100000;i++)
{
var x = i;
total = +x;
}
This is just an example code, don't care about the real meaning of the code.
what downsides is the second way? performance?
Don't bother yourself with performance unless you really really need to (hint: 99% of the time you don't need to).
My usual philosophy (which has been confirmed by books like "The Art of Readable Code") is to declare variables in the smallest scope possible. The reason being that in terms of readability and code comprehension the less variables you have to think about at any one time the better. And defining variables in a smaller scope definitely helps with that.
Also, often times if a compiler is able to determine that (in the case of your example) moving the variable outside of the for loop to save having to create/destroy it every iteration won't change the outcome but will help performance he'll do it for you. And that's another reason not to bother with performance, the compiler is usually smarter about it than we are.
There is no performance implications, only the scope ones. You should always define variables in the innermost scope possible. This improves readability of your program.
The only "downside" is that the second version need compiler support. Old compilers needed to know all the variables the function(or a scope inside it) will be using, so you had to declare the variables in a special section(Pascal) or in the beginning of the block(C). This is not really a problem nowadays - C is the only language that does not support declaring variables anywhere and still being widely used.
The problem is that C is the most common first-language they teach in schools and universities. They teach you C, and force you to declare all variables at the beginning of the block. Then they teach you a more modern language, and because you are already used to declaring all variables at the beginning, they need to teach you to not do it.
If your first language allows you to declare a variable anywhere in the function's body, you would instinctively declare it just before you use it, and they wouldn't need to tell you that declaring variables beforehand is bad just like they don't need to tell you that smashing your computer with a 5 Kilo hammer is bad.
I recommend, like most, to keep variables within an inner scope, but exceptions
occur and I think that is what you are seeking.
C++ potentially has expensive constructor/destructor time that would be best paid for once, rather than N times. Compare
void TestPrimacyOfNUnsignedLongs(int n) {
PrimeList List(); // Makes a list of all unsigned long primes
for (int i = 0; i<n; i++) {
unsinged long x = random_ul();
if (List.IsAPrime(x)) DoThis();
}
}
or
void TestPrimacyOfNUnsignedLongs(int n) {
for (int i = 0; i<n; i++) {
PrimeList List(); // Makes a list of all unsigned long primes
unsinged long lx = random_ul();
if (List.IsAPrime(x)) DoThis();
}
}
Certainly, I could put List inside the for loop, but at a significant run time cost.
Having all variables of the same scope in the same location of the code is easier to see what variables you have and what data type there are. You don't have to look through the entire code to find it.
You have different scopes for the x variable. In the second example, you won't be able to use the x variable outside the loop.
What's the best practice for dealing with objects in for or foreach loops? Should we create one object outside the loops and recreate it all over again (using new... ) or create new one for every loop iteration?
Example:
foreach(var a in collection)
{
SomeClass sc = new SomeClass();
sc.id = a;
sc.Insert();
}
or
SomeClass sc = null;
foreach(var a in collection)
{
sc = new SomeClass();
sc.id = a;
sc.Insert();
}
Which is better?
The first way is better as it more clearly conveys the intended scope of the variable and prevents errors from accidentally using an object outside of the intended scope.
One reason for wanting to use the second form is if you want to break out of the loop and still have a reference to the object you last reached in the loop.
A bad reason for choosing the second form is performance. It might seem at first glance that the second method uses fewer resources or that you are only creating one object and reusing it. This isn't the case here. The repeated declaration of a variable inside a loop doesn't consume any extra resources or clock cycles so you don't gain any performance benefit from pulling the declaration outside the loop.
First off, I note that you mean "creating variables" when you say "creating objects". The object references go in the variables, but they are not the variables themselves.
Note that the scenario you describe introduces a semantic difference when the loop contains an anonymous function and the variable is a closed-over outer varible of the anonymous function. See
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/
for details.
I'm sure someone might whip out the MSIL analysis, but practically there is no discernible difference in execution or performance. The only thing you're affecting is the storage of an object reference.
I say keep it clean and simple; declare the variable inside the loop. This provides the open/closed principle in practice, so you know the scope the variable is used and is not reused elsewhere. On the next loop, the variable loses scope and is reinitialized automatically.
You are creating a new object in each loop iteration in both cases (since you call new SomeClass()).
The former approach makes it clear that sc is only used inside the loop, which might be an advantage from a maintenance point of view.
I think it does not matter for performance, but I prefer the first one. I always try to keep declaration and instantiation together if possible.
I would go with option 2 to be tidy, to keep all declarations in one place.
You may say that
"objects should only be declared where and when they are needed"
but your loop would probably be in its own little method.
I would use the first one, but for compiler it is the same, because compiler moves out declaration of variables from the loops. I bet after compiling the code would look like second one.
Which of the following has the best performance?
I have seen method two implemented in JavaScript with huge performance gains, however, I was unable to measure any gain in C# and was wondering if the compiler already does method 2 even when written like method 1.
The theory behind method 2 is that the code doesn't have to access DataTable.Rows.Count on every iteration, it can simple access the int c.
Method 1
for (int i = 0; i < DataTable.Rows.Count; i++) {
// Do Something
}
Method 2
for (int i = 0, c = DataTable.Rows.Count; i < c; i++) {
// Do Something
}
No, it can't do that since there is no way to express constant over time for a value.
If the compiler should be able to do that, there would have to be a guarantee from the code returning the value that the value is constant, and for the duration of the loop won't change.
But, in this case, you're free to add new rows to the data table as part of your loop, and thus it's up to you to make that guarantee, in the way you have done it.
So in short, the compiler will not do that optimization if the end-index is anything other than a variable.
In the case of a variable, where the compiler can just look at the loop-code and see that this particular variable is not changed, it might do that and load the value into a register before starting the loop, but any performance gain from this would most likely be negligible, unless your loop body is empty.
Conclusion: If you know, or is willing to accept, that the end loop index is constant for the duration of the loop, place it into a variable.
Edit: Re-read your post, and yes, you might see negligible performance gains for your two cases as well, because the JITter optimizes the code. The JITter might optimize your end-index read into a direct access to the variable inside the data table that contains the row count, and a memory read isn't all that expensive anyway. If, on the other hand, reading that property was a very expensive operation, you'd see a more noticable difference.