I am coding a function that will return a 2D array. This leds me to think about the implications of this in C# (with garbage collection) and in C++(no GC)
(Why in both-you may ask: I am now writing it on a windows platform with C# but in some months I will implement my algorithms on an embedded device using C++)
So basically I have a 2D array say table, and through a function I assigned to it the return value. My question is: what happen to the original piece of memory that hold the original table??
Now to the code:
In C#
using System;
public class Test
{
public static void Main()
{
int[,] table= new int [10,10]; //Here some memory is separated for table
int[,] table= createTable(10,10); //Here the return value of createTable is assigned to the original value
//WHAT HAPPENED TO THE ORIGINAL MEMORY THAT table HAD?
printTable(table,10,10); //disregard this. Not that important
}
public static int[,] createTable(int rows, int columns)
{
int[,] t = new int[rows,columns];
for(int i=0;i<rows;i++)
for(int j=0;j<columns;j++)
t[i,j]=(i+j);
return t;
}
public static void printTable(int[,]t, int rows, int columns)
{
for(int i=0;i<rows;i++)
for(int j=0;j<columns;j++)
Console.WriteLine(t[i,j]);
foreach( var im in t)
Console.WriteLine(im);
}
}
(Please don't tell me that the first new int is not necessary, etc. It is necessary for the question, it could be replaced by calling createTable twice)
I am guessing that in C#, the garbage collector takes care of this and I don't have to worry?
Now in C++
#include <cstdio>
#include <cstdlib>
int** createTable(int rows, int columns){
int** table = new int*[rows];
for(int i = 0; i < rows; i++) {
table[i] = new int[columns];
for(int j = 0; j < columns; j++){ table[i][j] = (i+j); }// sample set value;
}
return table;
}
void freeTable(int** table, int rows){
if(table){
for(int i = 0; i < rows; i++){ if(table[i]){ delete[] table[i]; } }
delete[] table;
}
}
void printTable(int** table, int rows, int columns){
for(int i = 0; i < rows; i++){
for(int j = 0; j < columns; j++){
printf("(%d,%d) -> %d\n", i, j, table[i][j]);
}
}
}
int main(int argc, char** argv){
int** table = createTable(5, 5);
table = createTable(10,10);
printTable(table, 10, 10);
freeTable(table, 10);
return 0;
}
What happened to the original memory hold by table (5x5). Does it create a memory leak when table get assigned the 10x10 table?? how do I avoid this?
In C# the garbage collector will take care of memory that isn't reachable any more.
I compiled your C++ program with gcc (Version: (Debian 6.4.0-1) 6.4.0 20170704
) on Debian Buster.
valgrind which is a very useful tool for checking for various kinds of memory problems, gave me the following output with valgrind --leak-check=full ./test :
Memcheck, a memory error detector
Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
Command: ./test
[... program output ...]
HEAP SUMMARY:
in use at exit: 140 bytes in 6 blocks
total heap usage: 19 allocs, 13 frees, 74,348 bytes allocated
140 (40 direct, 100 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
at 0x4C2C97F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x10883E: createTable(int, int) (in /home/user/code/test/test)
by 0x108A26: main (in /home/user/code/test/test)
LEAK SUMMARY:
definitely lost: 40 bytes in 1 blocks
indirectly lost: 100 bytes in 5 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 0 bytes in 0 blocks
suppressed: 0 bytes in 0 blocks
For counts of detected and suppressed errors, rerun with: -v
ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
As you can see, you are definitely loosing memory.
Either call your freeTable() function first and then reassign your pointer or use smart pointers to get around this problem.
For C++ part, yes, you create memory leak.you have to call freeTable:
int** table = createTable(5, 5);
freeTable(table, 5);
table = createTable(10,10);
printTable(table, 10, 10);
freeTable(table, 10);
Better to use Raii object as vector<vector<int>> (or dedicated class Matrix) to avoid to worry about memory deallocation (and you can add size information in class too).
So it would simply be:
auto table = createTable(5, 5);
table = createTable(10,10);
printTable(table);
At this line:
//WHAT HAPPENED TO THE ORIGINAL MEMORY THAT table HAD?
You're overwriting its reference. Since the original table is not referenced anymore, garbage collection will clean up the used space once it runs.
I can't answer your C++ regarding question, sorry for that.
I've recently ran a benchmark to see whether access times are less for a variable that is declared at the end of a block of variable declarations or after.
Benchmark code (selected variable declared at end of block),
// Benchmark 1
for (long i = 0; i < 6000000000; i++)
{
var a1 = 0;
var b1 = 0;
var c1 = 0;
// 53 variables later...
var x2 = 0;
var y2 = 0;
var z2 = 0;
z2 = 1; // Write
var _ = z2; // Read
}
Benchmark code (selected variable declared at start of block),
// Benchmark 2
for (long i = 0; i < 6000000000; i++)
{
var a1 = 0;
var b1 = 0;
var c1 = 0;
// 53 variables later...
var x2 = 0;
var y2 = 0;
var z2 = 0;
a1 = 1; // Write
var _ = a1; // Read
}
To my surprise the results (averaged over 3 runs, excluding first build and without optimizations) are as follows,
Benchmark 1: 9,419.7 milliseconds.
Benchmark 2: 12,262 milliseconds.
As you can see accessing the "newer" variable in the above benchmark is 23.18% (2842.3 ms) faster, but why?
Normally, unused locals are deleted by optimizations in basically any optimizing compiler in the world. You are only writing to most variables. This is an easy case for deletion of their physical storage.
The relation between logical locals and their physical storage is highly complex. They might be deleted, enregistered or spilled.
So don't think that var _ = a1; actually result in a read from a1 and a write to _. It does nothing.
The JIT switches off a few optimizations in functions with many (I believe 64) local variables because some algorithms have quadratic running time in the number of locals. Maybe that's why those locals impact performance.
Try it with fewer variables and you will not be able to distinguish variations of this function from one another.
Or, try it with VC++, GCC or Clang. They all should delete the entire loop. I'd be very disappointed if they didn't.
I don't think you are measuring something relevant here. Whatever the result of your benchmark - it helps you nothing with real-world code. If this was an interesting case I'd look at the disassembly but as I said I think this is irrelevant. Whatever I would find it would not be an interesting find.
If you want to learn what code a compiler typically generates you should probably write some simple functions and look at the generated machine code. This can be very instructional.
Trying to think in assembler/closer to hardware, it might be something like this:
In the faster version, you still have the address of the previously accessed variable z2 stored in the current register, which then directly can be used again without needing to change its contents (=recalculate the correct memory address) to do the write and read.
It could be an automatic optimization done by the interpreter/compiler.
Have you tried other variables instead of z2 for your W/R test at the end of the loop?
What happens if you use x2 or y2 or even any of the other variables in the middle?
Are the access times for all the variables other than z2 equal or do they differ as well?
I have always wondered if, in general, declaring a throw-away variable before a loop, as opposed to repeatedly inside the loop, makes any (performance) difference?
A (quite pointless) example in Java:
a) declaration before loop:
double intermediateResult;
for(int i=0; i < 1000; i++){
intermediateResult = i;
System.out.println(intermediateResult);
}
b) declaration (repeatedly) inside loop:
for(int i=0; i < 1000; i++){
double intermediateResult = i;
System.out.println(intermediateResult);
}
Which one is better, a or b?
I suspect that repeated variable declaration (example b) creates more overhead in theory, but that compilers are smart enough so that it doesn't matter. Example b has the advantage of being more compact and limiting the scope of the variable to where it is used. Still, I tend to code according example a.
Edit: I am especially interested in the Java case.
Which is better, a or b?
From a performance perspective, you'd have to measure it. (And in my opinion, if you can measure a difference, the compiler isn't very good).
From a maintenance perspective, b is better. Declare and initialize variables in the same place, in the narrowest scope possible. Don't leave a gaping hole between the declaration and the initialization, and don't pollute namespaces you don't need to.
Well I ran your A and B examples 20 times each, looping 100 million times.(JVM - 1.5.0)
A: average execution time: .074 sec
B: average execution time : .067 sec
To my surprise B was slightly faster.
As fast as computers are now its hard to say if you could accurately measure this.
I would code it the A way as well but I would say it doesn't really matter.
It depends on the language and the exact use. For instance, in C# 1 it made no difference. In C# 2, if the local variable is captured by an anonymous method (or lambda expression in C# 3) it can make a very signficant difference.
Example:
using System;
using System.Collections.Generic;
class Test
{
static void Main()
{
List<Action> actions = new List<Action>();
int outer;
for (int i=0; i < 10; i++)
{
outer = i;
int inner = i;
actions.Add(() => Console.WriteLine("Inner={0}, Outer={1}", inner, outer));
}
foreach (Action action in actions)
{
action();
}
}
}
Output:
Inner=0, Outer=9
Inner=1, Outer=9
Inner=2, Outer=9
Inner=3, Outer=9
Inner=4, Outer=9
Inner=5, Outer=9
Inner=6, Outer=9
Inner=7, Outer=9
Inner=8, Outer=9
Inner=9, Outer=9
The difference is that all of the actions capture the same outer variable, but each has its own separate inner variable.
The following is what I wrote and compiled in .NET.
double r0;
for (int i = 0; i < 1000; i++) {
r0 = i*i;
Console.WriteLine(r0);
}
for (int j = 0; j < 1000; j++) {
double r1 = j*j;
Console.WriteLine(r1);
}
This is what I get from .NET Reflector when CIL is rendered back into code.
for (int i = 0; i < 0x3e8; i++)
{
double r0 = i * i;
Console.WriteLine(r0);
}
for (int j = 0; j < 0x3e8; j++)
{
double r1 = j * j;
Console.WriteLine(r1);
}
So both look exactly same after compilation. In managed languages code is converted into CL/byte code and at time of execution it's converted into machine language. So in machine language a double may not even be created on the stack. It may just be a register as code reflect that it is a temporary variable for WriteLine function. There are a whole set optimization rules just for loops. So the average guy shouldn't be worried about it, especially in managed languages. There are cases when you can optimize manage code, for example, if you have to concatenate a large number of strings using just string a; a+=anotherstring[i] vs using StringBuilder. There is very big difference in performance between both. There are a lot of such cases where the compiler cannot optimize your code, because it cannot figure out what is intended in a bigger scope. But it can pretty much optimize basic things for you.
This is a gotcha in VB.NET. The Visual Basic result won't reinitialize the variable in this example:
For i as Integer = 1 to 100
Dim j as Integer
Console.WriteLine(j)
j = i
Next
' Output: 0 1 2 3 4...
This will print 0 the first time (Visual Basic variables have default values when declared!) but i each time after that.
If you add a = 0, though, you get what you might expect:
For i as Integer = 1 to 100
Dim j as Integer = 0
Console.WriteLine(j)
j = i
Next
'Output: 0 0 0 0 0...
I made a simple test:
int b;
for (int i = 0; i < 10; i++) {
b = i;
}
vs
for (int i = 0; i < 10; i++) {
int b = i;
}
I compiled these codes with gcc - 5.2.0. And then I disassembled the main ()
of these two codes and that's the result:
1º:
0x00000000004004b6 <+0>: push rbp
0x00000000004004b7 <+1>: mov rbp,rsp
0x00000000004004ba <+4>: mov DWORD PTR [rbp-0x4],0x0
0x00000000004004c1 <+11>: jmp 0x4004cd <main+23>
0x00000000004004c3 <+13>: mov eax,DWORD PTR [rbp-0x4]
0x00000000004004c6 <+16>: mov DWORD PTR [rbp-0x8],eax
0x00000000004004c9 <+19>: add DWORD PTR [rbp-0x4],0x1
0x00000000004004cd <+23>: cmp DWORD PTR [rbp-0x4],0x9
0x00000000004004d1 <+27>: jle 0x4004c3 <main+13>
0x00000000004004d3 <+29>: mov eax,0x0
0x00000000004004d8 <+34>: pop rbp
0x00000000004004d9 <+35>: ret
vs
2º
0x00000000004004b6 <+0>: push rbp
0x00000000004004b7 <+1>: mov rbp,rsp
0x00000000004004ba <+4>: mov DWORD PTR [rbp-0x4],0x0
0x00000000004004c1 <+11>: jmp 0x4004cd <main+23>
0x00000000004004c3 <+13>: mov eax,DWORD PTR [rbp-0x4]
0x00000000004004c6 <+16>: mov DWORD PTR [rbp-0x8],eax
0x00000000004004c9 <+19>: add DWORD PTR [rbp-0x4],0x1
0x00000000004004cd <+23>: cmp DWORD PTR [rbp-0x4],0x9
0x00000000004004d1 <+27>: jle 0x4004c3 <main+13>
0x00000000004004d3 <+29>: mov eax,0x0
0x00000000004004d8 <+34>: pop rbp
0x00000000004004d9 <+35>: ret
Which are exaclty the same asm result. isn't a proof that the two codes produce the same thing?
It is language dependent - IIRC C# optimises this, so there isn't any difference, but JavaScript (for example) will do the whole memory allocation shebang each time.
I would always use A (rather than relying on the compiler) and might also rewrite to:
for(int i=0, double intermediateResult=0; i<1000; i++){
intermediateResult = i;
System.out.println(intermediateResult);
}
This still restricts intermediateResult to the loop's scope, but doesn't redeclare during each iteration.
In my opinion, b is the better structure. In a, the last value of intermediateResult sticks around after your loop is finished.
Edit:
This doesn't make a lot of difference with value types, but reference types can be somewhat weighty. Personally, I like variables to be dereferenced as soon as possible for cleanup, and b does that for you,
I suspect a few compilers could optimize both to be the same code, but certainly not all. So I'd say you're better off with the former. The only reason for the latter is if you want to ensure that the declared variable is used only within your loop.
As a general rule, I declare my variables in the inner-most possible scope. So, if you're not using intermediateResult outside of the loop, then I'd go with B.
A co-worker prefers the first form, telling it is an optimization, preferring to re-use a declaration.
I prefer the second one (and try to persuade my co-worker! ;-)), having read that:
It reduces scope of variables to where they are needed, which is a good thing.
Java optimizes enough to make no significant difference in performance. IIRC, perhaps the second form is even faster.
Anyway, it falls in the category of premature optimization that rely in quality of compiler and/or JVM.
There is a difference in C# if you are using the variable in a lambda, etc. But in general the compiler will basically do the same thing, assuming the variable is only used within the loop.
Given that they are basically the same: Note that version b makes it much more obvious to readers that the variable isn't, and can't, be used after the loop. Additionally, version b is much more easily refactored. It is more difficult to extract the loop body into its own method in version a. Moreover, version b assures you that there is no side effect to such a refactoring.
Hence, version a annoys me to no end, because there's no benefit to it and it makes it much more difficult to reason about the code...
Well, you could always make a scope for that:
{ //Or if(true) if the language doesn't support making scopes like this
double intermediateResult;
for (int i=0; i<1000; i++) {
intermediateResult = i;
System.out.println(intermediateResult);
}
}
This way you only declare the variable once, and it'll die when you leave the loop.
I think it depends on the compiler and is hard to give a general answer.
I've always thought that if you declare your variables inside of your loop then you're wasting memory. If you have something like this:
for(;;) {
Object o = new Object();
}
Then not only does the object need to be created for each iteration, but there needs to be a new reference allocated for each object. It seems that if the garbage collector is slow then you'll have a bunch of dangling references that need to be cleaned up.
However, if you have this:
Object o;
for(;;) {
o = new Object();
}
Then you're only creating a single reference and assigning a new object to it each time. Sure, it might take a bit longer for it to go out of scope, but then there's only one dangling reference to deal with.
My practice is following:
if type of variable is simple (int, double, ...) I prefer variant b (inside).
Reason: reducing scope of variable.
if type of variable is not simple (some kind of class or struct) I prefer variant a (outside).
Reason: reducing number of ctor-dtor calls.
I had this very same question for a long time. So I tested an even simpler piece of code.
Conclusion: For such cases there is NO performance difference.
Outside loop case
int intermediateResult;
for(int i=0; i < 1000; i++){
intermediateResult = i+2;
System.out.println(intermediateResult);
}
Inside loop case
for(int i=0; i < 1000; i++){
int intermediateResult = i+2;
System.out.println(intermediateResult);
}
I checked the compiled file on IntelliJ's decompiler and for both cases, I got the same Test.class
for(int i = 0; i < 1000; ++i) {
int intermediateResult = i + 2;
System.out.println(intermediateResult);
}
I also disassembled code for both the case using the method given in this answer. I'll show only the parts relevant to the answer
Outside loop case
Code:
stack=2, locals=3, args_size=1
0: iconst_0
1: istore_2
2: iload_2
3: sipush 1000
6: if_icmpge 26
9: iload_2
10: iconst_2
11: iadd
12: istore_1
13: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
16: iload_1
17: invokevirtual #3 // Method java/io/PrintStream.println:(I)V
20: iinc 2, 1
23: goto 2
26: return
LocalVariableTable:
Start Length Slot Name Signature
13 13 1 intermediateResult I
2 24 2 i I
0 27 0 args [Ljava/lang/String;
Inside loop case
Code:
stack=2, locals=3, args_size=1
0: iconst_0
1: istore_1
2: iload_1
3: sipush 1000
6: if_icmpge 26
9: iload_1
10: iconst_2
11: iadd
12: istore_2
13: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
16: iload_2
17: invokevirtual #3 // Method java/io/PrintStream.println:(I)V
20: iinc 1, 1
23: goto 2
26: return
LocalVariableTable:
Start Length Slot Name Signature
13 7 2 intermediateResult I
2 24 1 i I
0 27 0 args [Ljava/lang/String;
If you pay close attention, only the Slot assigned to i and intermediateResult in LocalVariableTable is swapped as a product of their order of appearance. The same difference in slot is reflected in other lines of code.
No extra operation is being performed
intermediateResult is still a local variable in both cases, so there is no difference access time.
BONUS
Compilers do a ton of optimization, take a look at what happens in this case.
Zero work case
for(int i=0; i < 1000; i++){
int intermediateResult = i;
System.out.println(intermediateResult);
}
Zero work decompiled
for(int i = 0; i < 1000; ++i) {
System.out.println(i);
}
From a performance perspective, outside is (much) better.
public static void outside() {
double intermediateResult;
for(int i=0; i < Integer.MAX_VALUE; i++){
intermediateResult = i;
}
}
public static void inside() {
for(int i=0; i < Integer.MAX_VALUE; i++){
double intermediateResult = i;
}
}
I executed both functions 1 billion times each.
outside() took 65 milliseconds. inside() took 1.5 seconds.
I tested for JS with Node 4.0.0 if anyone is interested. Declaring outside the loop resulted in a ~.5 ms performance improvement on average over 1000 trials with 100 million loop iterations per trial. So I'm gonna say go ahead and write it in the most readable / maintainable way which is B, imo. I would put my code in a fiddle, but I used the performance-now Node module. Here's the code:
var now = require("../node_modules/performance-now")
// declare vars inside loop
function varInside(){
for(var i = 0; i < 100000000; i++){
var temp = i;
var temp2 = i + 1;
var temp3 = i + 2;
}
}
// declare vars outside loop
function varOutside(){
var temp;
var temp2;
var temp3;
for(var i = 0; i < 100000000; i++){
temp = i
temp2 = i + 1
temp3 = i + 2
}
}
// for computing average execution times
var insideAvg = 0;
var outsideAvg = 0;
// run varInside a million times and average execution times
for(var i = 0; i < 1000; i++){
var start = now()
varInside()
var end = now()
insideAvg = (insideAvg + (end-start)) / 2
}
// run varOutside a million times and average execution times
for(var i = 0; i < 1000; i++){
var start = now()
varOutside()
var end = now()
outsideAvg = (outsideAvg + (end-start)) / 2
}
console.log('declared inside loop', insideAvg)
console.log('declared outside loop', outsideAvg)
A) is a safe bet than B).........Imagine if you are initializing structure in loop rather than 'int' or 'float' then what?
like
typedef struct loop_example{
JXTZ hi; // where JXTZ could be another type...say closed source lib
// you include in Makefile
}loop_example_struct;
//then....
int j = 0; // declare here or face c99 error if in loop - depends on compiler setting
for ( ;j++; )
{
loop_example loop_object; // guess the result in memory heap?
}
You are certainly bound to face problems with memory leaks!. Hence I believe 'A' is safer bet while 'B' is vulnerable to memory accumulation esp working close source libraries.You can check usinng 'Valgrind' Tool on Linux specifically sub tool 'Helgrind'.
It's an interesting question. From my experience there is an ultimate question to consider when you debate this matter for a code:
Is there any reason why the variable would need to be global?
It makes sense to only declare the variable once, globally, as opposed to many times locally, because it is better for organizing the code and requires less lines of code. However, if it only needs to be declared locally within one method, I would initialize it in that method so it is clear that the variable is exclusively relevant to that method. Be careful not to call this variable outside the method in which it is initialized if you choose the latter option--your code won't know what you're talking about and will report an error.
Also, as a side note, don't duplicate local variable names between different methods even if their purposes are near-identical; it just gets confusing.
this is the better form
double intermediateResult;
int i = byte.MinValue;
for(; i < 1000; i++)
{
intermediateResult = i;
System.out.println(intermediateResult);
}
1) in this way declared once time both variable, and not each for cycle.
2) the assignment it's fatser thean all other option.
3) So the bestpractice rule is any declaration outside the iteration for.
Tried the same thing in Go, and compared the compiler output using go tool compile -S with go 1.9.4
Zero difference, as per the assembler output.
I use (A) when I want to see the contents of the variable after exiting the loop. It only matters for debugging. I use (B) when I want the code more compact, since it saves one line of code.
Even if I know my compiler is smart enough, I won't like to rely on it, and will use the a) variant.
The b) variant makes sense to me only if you desperately need to make the intermediateResult unavailable after the loop body. But I can't imagine such desperate situation, anyway....
EDIT: Jon Skeet made a very good point, showing that variable declaration inside a loop can make an actual semantic difference.
I'm making a small game, and this has a lot of loops, which all use a certain variable adjacentSquares. After every loop however, this should be set to 0. What would be faster, creating this variable again every time or just setting it to 0? Is there maybe a certain 'exotic' approach, that will perform even better?
The associated (unfinished) code:
void Update ()
{
int adjacentSquares = 0;
for (int x = 0; x <= gridX; x++)
{
for (int y = 0; y <= gridY; y++)
{
if (grid[x - 1,y - 1] == true)
adjacentSquares += 1;
//and some more logic
}
}
}
Why not experiment and measure the time elapsed using the System.Diagnostics.Stopwatch class? http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx
Set up a Stopwatch object before that loop and then measure elapsed time after it. Then, report back with your findings :D
The real answer here is: try it out and see!
But, I would not expect there to be a difference in speed. If anything, you're stack will use 4 bytes more memory (per variable), but even that is not a guarantee. There's a good change that (if there is a performance benefit here) either the C# compiler or the JIT compiler will recognized that the first variable is no longer used, so it will simply use that same memory for the subsequent variables. But I'll echo what I said before: run some tests - that's the only true answer to your question.
If you really want to improve performance here you could look at doing a parallel solution here, depending on if each individual calculation relies on all the previous ones you have done.
You can probably even do this with LINQ depending on the "some more logic" you are doing.
Just for improving a little bit more the performance:
void Update ()
{
int adjacentSquares = 0;
for (int x = -1; x < gridX; x++)
{
for (int y = -1; y < gridY; y++)
{
if (grid[x, y])
adjacentSquares++;
//and some more logic
}
}
}
I don't know exactly why you need to start from -1 (0 - 1), but if you have, then put that on the for instead of executing the same each time.