I am facing this strange problem with strings.
I assigned a string like this:
string temp = DateTime.UtcNow.ToString("s");
_snapShotTime = string.Copy(temp);
//here threads started....
//while thread progressing I am passing _snapShotTime to create a directory.
//same in second threads.
But the time of local private variable _snapShotTime is keep on changing. I don't know why. I have used a local variable and copy value in it.
Thanks
I suspect your thread uses a lambda expression (or anonymous function) which captures _snapShotTime. That would indeed allow it to be changed. It's hard to say for sure without any code though.
If this is the problem, it's typically that you're referring to a captured variable which is declared outside the loop, but changed on every iteration of a loop. You can fix this by declaring a new variable which takes a copy of the original variable inside the loop, and only using that copy variable in the lambda expression. You'll get a "new" variable inside the loop on each iteration, so you won't have problems.
Strings are immutable, they do not change unless a variable is reassigned to a new string.
We need to see more code in order to help pinpoint the problem.
Why don't you just do
_snapShotTime = DateTime.UtcNow.ToString("s");
Also, place a breakpoint on that line and see when it is being called.
When it does break, see the stack and it will clarify things.
I suspect that your threads change the value of _snapShotTime
Related
This is a longshot but...
I understand there is a way to tell if variable A (var1, var2...., varX) points to instance of class by using Equals:
List<string> _mainInstance = new();
var var1 = _mainInstance;
var var2 = _mainInstance;
var var3 = _mainInstance;
//I understand how to tell if variable points to _mainInstance
if (var1.Equals(_mainInstance))
{
//It does because it points by reference
}
//I want to check if any variable still point to _mainInstance
if (!_mainInstance.HasOthersPointingToIt())
{
//Safe to delete _mainInstance
}
I don't know at design time how many pointers to _myInstance there will be. I want to check if my _mainInstance has any variables still pointing to it every so often. I thought maybe reflection or garbage collection but all my research there is coming up with nothing.
Is there a way to examine a variable (in this case an instance of a class) and tell if anything else (variables, properties of a class instance) still point to it?
Edit:
#GuruStron asks what is the underlying problem I am trying to solve?
Answer: I have a parent/child "tree" that I need to keep track of. That in itself is pretty easy. My difficulty is that the tree has blocks that have a single definition and that definition gets reused.
In the graph below it shows the top tree with stars. The stars represents pointers to block definitions. Below that on the left are the actual block definitions.
The the block definition gets substituted and the final tree looks like this:
When a block pointer is deleted I need to make sure that nothing else is pointing to that block definition and delete the block definition when it is no longer needed.
In to following tutorial : http://www.albahari.com/threading/
They say that the following code :
for (int i = 0; i < 10; i++)
new Thread (() => Console.Write (i)).Start();
is non deterministic and can produce the following answer :
0223557799
I thought that when one uses lambda expressions the compiler creates some kind of anonymous class that captures the variables that are in use by creating members like them in the capturing class.
But i is value type, so i thought that he should be copied by value.
where is my mistake ?
It will be very helpful if the answer will explain how does closure work, how do it hold a "pointer" to a specific int , what code does generated in this specific case ?
The key point here is that closures close over variables, not over values. As such, the value of a given variable at the time you close over it is irrelevant. What matters is the value of that variable at the time the anonymous method is invoked.
How this happens is easy enough to see when you see what the compiler transforms the closure into. It'll create something morally similar to this:
public class ClosureClass1
{
public int i;
public void AnonyousMethod1()
{
Console.WriteLine(i);
}
}
static void Main(string[] args)
{
ClosureClass1 closure1 = new ClosureClass1();
for (closure1.i = 0; closure1.i < 10; closure1.i++)
new Thread(closure1.AnonyousMethod1).Start();
}
So here we can see a bit more clearly what's going on. There is one copy of the variable, and that variable has now been promoted to a field of a new class, instead of being a local variable. Anywhere that would have modified the local variable now modifies the field of this instance. We can now see why your code prints what it does. After starting the new thread, but before it can actually execute, the for loop in the main thread is going back and incrementing the variable in the closure. The variable that hasn't yet been read by the closure.
To produce the desired result what you need to do is make sure that, instead of having every iteration of the loop closing over a single variable, they need to each have a variable that they close over:
for (int i = 0; i < 10; i++)
{
int copy = i;
new Thread(() => Console.WriteLine(copy));
}
Now the copy variable is never changed after it is closed over, and our program will print out 0-9 (although in an arbitrary order, because threads can be scheduled however the OS wants).
As Albahari states, Although the passing arguments are value types, each thread captures the memory location thus resulting in unexpected results.
This is happening because before the Thread had any time to start, the loop already changed whatever value that inside i.
To avoid that, you should use a temp variable as Albahari stated, or only use it when you know the variable is not going to change.
i in Console.Write(i) is evaluated right when that statement is about to be executed. That statement will be executed once thread has been fully created and started running and got to that code. By that time loop has moved forward a few times and thus i can be any value by then. Closures, unlike regular functions, have visibility into local variables of a function in which it is defined (what makes them useful, and way to shoot oneself in a foot).
Suppose we have the following code:
void AFunction()
{
foreach(AClass i in AClassCollection)
{
listOfLambdaFunctions.AddLast( () => { PrintLine(i.name); } );
}
}
void Main()
{
AFunction();
foreach( var i in listOfLambdaFunctions)
i();
}
One might think that the above code would out the same as the following:
void Main()
{
foreach(AClass i in AClassCollection)
PrintLine(i.name);
}
However, it doesn't. Instead, it prints the name of the last item in AClassCollection every time.
It appears as if the same item was being used in each lambda function. I suspect there might be some delay from when the lambda was created to when the lambda took a snapshot of the external variables used in it.
Essentially, the lambda is holding a reference to the local variable i, instead of taking a "snapshot" of i's value when the lambda was created.
To test this theory, I tried this code:
string astr = "a string";
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(astr); };
astr = "changed";
fnc();
and, surprise, it outputs changed!
I am using XNA 3.1, and whichever version of C# that comes with it.
My questions are:
What is going on?
Does the lambda function somehow store a 'reference' to the variable or something?
Is there any way around this problem?
This is a modified closure
See: similar questions like Access to Modified Closure
To work around the issue you have to store a copy of the variable inside the scope of the for loop:
foreach(AClass i in AClassCollection)
{
AClass anotherI= i;
listOfLambdaFunctions.AddLast( () => { PrintLine(anotherI.name); } );
}
does the lambda function somehow store a 'reference' to the variable or something?
Close. The lambda function captures the variable itself. There is no need to store a reference to a variable, and in fact, in .NET it is impossible to permanently store a reference to a variable. You just capture the entire variable. You never capture the value of the variable.
Remember, a variable is a storage location. The name "i" refers to a particular storage location, and in your case, it always refers to the same storage location.
Is there anyway around this problem?
Yes. Create a new variable every time through the loop. The closure then captures a different variable every time.
This is one of the most frequently reported problems with C#. We're considering changing the semantics of the loop variable declaration so that a new variable is created every time through the loop.
For more details on this issue see my articles on the subject:
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/
what is going on? does the lambda function somehow store a 'reference' to the variable or something?
Yes exactly that; c# captured variables are to the variable, not the value of the variable. You can usually get around this by introducing a temp variable and binding to that:
string astr = "a string";
var tmp = astr;
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(tmp); };
especially in foreach where this is notorious.
Yes, the lambda stores a reference to the variable (conceptually speaking, anyway).
A very simple workaround is this:
foreach(AClass i in AClassCollection)
{
AClass j = i;
listOfLambdaFunctions.AddLast( () => { PrintLine(j.name); } );
}
In every iteration of the foreach loop, a new j gets created, which the lambda captures.
i on the other hand, is the same variable throughout, but gets updated with every iteration (so all the lambdas end up seeing the last value)
And I agree that this is a bit surprising. :)
I've been caught by this one as well, as said by Calgary Coder, it is a modified closure. I really had trouble spotting them until I got resharper. Since it is one of the warnings that resharper watches for, I am much better at identifying them as I code.
What's the best practice for dealing with objects in for or foreach loops? Should we create one object outside the loops and recreate it all over again (using new... ) or create new one for every loop iteration?
Example:
foreach(var a in collection)
{
SomeClass sc = new SomeClass();
sc.id = a;
sc.Insert();
}
or
SomeClass sc = null;
foreach(var a in collection)
{
sc = new SomeClass();
sc.id = a;
sc.Insert();
}
Which is better?
The first way is better as it more clearly conveys the intended scope of the variable and prevents errors from accidentally using an object outside of the intended scope.
One reason for wanting to use the second form is if you want to break out of the loop and still have a reference to the object you last reached in the loop.
A bad reason for choosing the second form is performance. It might seem at first glance that the second method uses fewer resources or that you are only creating one object and reusing it. This isn't the case here. The repeated declaration of a variable inside a loop doesn't consume any extra resources or clock cycles so you don't gain any performance benefit from pulling the declaration outside the loop.
First off, I note that you mean "creating variables" when you say "creating objects". The object references go in the variables, but they are not the variables themselves.
Note that the scenario you describe introduces a semantic difference when the loop contains an anonymous function and the variable is a closed-over outer varible of the anonymous function. See
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/
for details.
I'm sure someone might whip out the MSIL analysis, but practically there is no discernible difference in execution or performance. The only thing you're affecting is the storage of an object reference.
I say keep it clean and simple; declare the variable inside the loop. This provides the open/closed principle in practice, so you know the scope the variable is used and is not reused elsewhere. On the next loop, the variable loses scope and is reinitialized automatically.
You are creating a new object in each loop iteration in both cases (since you call new SomeClass()).
The former approach makes it clear that sc is only used inside the loop, which might be an advantage from a maintenance point of view.
I think it does not matter for performance, but I prefer the first one. I always try to keep declaration and instantiation together if possible.
I would go with option 2 to be tidy, to keep all declarations in one place.
You may say that
"objects should only be declared where and when they are needed"
but your loop would probably be in its own little method.
I would use the first one, but for compiler it is the same, because compiler moves out declaration of variables from the loops. I bet after compiling the code would look like second one.
In C#, is there a benefit or drawback to re-initializing a previously declared variable instead of declaring and initializing a new one? (Ignoring thoughts on conciseness and human readability.)
For example, compare these two samples:
DataColumn col = new DataColumn();
col.ColumnName = "Subsite";
dataTable.Columns.Add(col);
col = new DataColumn(); // Re-use the "col" declaration.
col.ColumnName = "Library";
dataTable.Columns.Add(col);
vs
DataColumn col1 = new DataColumn();
col1.ColumnName = "Subsite";
gridDataTable.Columns.Add(col1);
DataColumn col2 = new DataColumn(); // Declare a new variable instead.
col2.ColumnName = "Library";
gridDataTable.Columns.Add(col2);
A similar example involving loops:
string str;
for (int i = 0; i < 100; i++)
{
str = "This is string #" + i.ToString(); // Re-initialize the "str" variable.
Console.WriteLine(str);
}
vs
for (int i = 0; i < 100; i++)
{
string str = "This is string #" + i.ToString(); // Declare a new "str" each iteration.
Console.WriteLine(str);
}
Edit: Thank you all for your answers so far. After reading them, I thought I'd expand on my question a bit:
Please correct me if I'm wrong.
When I declare and initialize a reference type like a System.String, I have a pointer to that object, which exists on the stack, and the object's contents, which exist on the heap (only accessible through the pointer).
In the first looping example, it seems like we create only one pointer, "str", and we create 100 instances of the String class, each of which exists on the heap. In my mind, as we iterate through the loop, we are merely changing the "str" pointer to point at a new instance of the String class each time. Those "old" strings that no longer have a pointer to them will be garbage collected--although I'm not sure when that would occur.
In the second looping example, it seems like we create 100 pointers in addition to creating 100 instances of the String class.
I'm not sure what happens to items on the stack that are no longer needed, though. I didn't think the garbage collector got rid of those items too; perhaps they are immediately removed from the stack as soon as you exit their scope? Even if that's true, I'd think that creating only one pointer and updating what it points to is more efficient than creating 100 different pointers, each pointing to a unique instance.
I understand the "premature optimization is evil" argument, but I'm only trying to gain a deeper understanding of things, not optimize my programs to death.
Your second example has a much clearer answer, the second example is the better one. The reason why is that the variable str is only used within the for block. Declaring the variable outside the for block means that it's possible for another piece of code to incorrectly bind to this variable and hence cause bugs in your application. You should declare all variables in the most specific scope possible to prevent accidental usage.
For the first sample I believe it's more a matter of preference. For me, I chose to create a new variable because I believe each variable should have a single purpose. If I am reusing variables it's usually a sign that I need to refactor my method.
This sounds suspiciously like a question designed to provide information for premature optimization. I doubt that either scenario is different in any way that matters in 99.9% of software. The memory is being created and used either way. The only difference is the variable references.
To find out if there is a benefit or drawback, you would need a situation where you really care about the size or the performance of the assembly. If you can't meet the size requirements, then measure the assembly size differences between the two choices (although you're more likely to make gains in other areas). If you can't meet the performance requirements, then use a profiler to see which part of your code is working too slowly.
It is primarily a readability issue, whether you use the same declared name or not doesn't matter, since either way you are creating two separate objects. You really should create the objects or variables with a single focus in mind, it will make your life easier.
As for your second example, the only real difference in intialization is that by placing your string outside the scope of the "for" loop, you are leaving it exposed to more outside influences, which can be useful at times. There is no memory or speed benefit for declaring it inside or outside of the loop. Remember, anytime you make a change to a string variable, you are essentially creating a new string. So, for example:
string test = "new string";
test = "and now I am reusing the string";
is the same as creating two separate strings, like:
string test1 = "new string";
string test2 = "and now I am reusing the string";
To get around this, you would use the StringBuilder class, which allows you to modify a string without creating a new string, and should be used in situations where a string will be heavily modified, especially inside a loop.
What about using the word "using" which destroy the object itself after the braces ends?
I am not sure but that's what I think. I'd like to know your opinions too.
For the second example, I use the second one always, I am not sure as well, but for example at some problems at the ACM-ICPC contest I used to have lost of bugs because forgetting to re-initialize, so I used this way.
For the most part, no, there should be no difference. The main difference would be that local variables (in the case of classes, their "pointer") are stored on the stack and in your first case, if your function were for some reason recursive, having two local vars instead of one will cause you to run out of stack space faster in a deep-recursive function. Getting close to that limit in either case would be a sign that you should probably use a non-recursive method.
Also, just to mention it, you could skip the variables altogether and write:
dataTable.Columns.Add(new DataColumn() { ColumnName = "Subsite" });
dataTable.Columns.Add(new DataColumn() { ColumnName = "Library" });
Which I believe performance-wise will be like having 2 local variables, but I could be wrong there. I cant remember exactly what that produces in IL code though.