How does string works in c#? [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I know strings are inmutable, once created we cannot change it, I've read that if we create a new string object and we assign a value to it and then we assign another value to the same string object internally there is actually another object created and assigned with the new value. Let's say I have:
string str = "dog";
str = "cat";
If I write Console.WriteLine(str); it returns cat.
So internally there are two objects? But they have the same name? How does it works? I've made some research on google but I have not find yet something convincing enough to me so I can clarify my thoughts about this.
I know strings are reference types, so we have an object in the stack with a reference to a value in the heap, what's happening in this case?(see code above).
I've upload a picture, apologize me if I'm wrong about the idea of the stack and the heap that's why I'm asking this question.
Does the picture reflects what happens in the first line of code(string str = "dog";)? And then what should happen in the second line of code?? The dog value in the heap changes? And then a new object in the stack is created referencing it? Then what happens with the object that was there before? Do they have the same name?
I'm sorry for so many questions but I think that is very important to understand this correctly and to know what's happening behind the scenes...

When you assign str to "dog", it does as you describe above in memory: the reference variable str now is "pointing at" the location of the string you've just instantiated.
str => MEMORY LOCATION "1": "dog"
MEMORY LOCATION "2":
MEMORY LOCATION "3":
When str is reassigned to your new string, "cat", it too is created in memory, and now str is adjusted so it points at "cat" in the new location.
MEMORY LOCATION "1": "dog"
str => MEMORY LOCATION "2": "cat"
MEMORY LOCATION "3":
What happens to "dog"? It's effectively inaccessible now, since we no longer have a reference to its location (in the memory, heap, terms are interchangeable in this situation). Later, when the Garbage collector reviews the memory for cleaning, it'll realize that there is nothing referencing "dog" and it will mark the memory for being deleted and replaced as needed.

You're close. Your picture accurately represents what happens in the first line of code. However, things are a little different from what you describe for the second line of code.
For the line str = "cat";, a second string object is created in the heap and the str variable is changed to refer to that new object. You're left with str pointing to "cat" and an orphan "dog" object on the heap with no references to it.
The "dog" object might be cleaned up by the garbage collector because there are no references to it.

review String Interning or .Net String Intern table or CLR Intern Pool.
Basically, the Common Language Runtime (CLR) maintains a table of [unique] string values, and whenever you manipulate a string in your code, the CLR examines this intern table to see if the new value you are trying to create is already in there or not. If it is, it just reassigns the variable you are modifying to point to that entry in the intern pool. If not, it adds the value to the pool and returns that new reference. Old values in the pool, no longer referenced by variables, get garbage collected.

Yes, there are two objects. No, they don't have the same name. Try not to think of a variable as a "name" for the object itself per se - it's more like a temporary name for the object's location in memory. (The reason it's somewhat misleading to think of a variable as a "name" for the object is that you can have several variables referring to the same object; it's not the case that the object has several "names" per se, or that there are several objects - that's just how you happen to be storing the reference).
"string str" initially has a reference to the string "dog." After you assign "cat" to "str", the variable now has a reference to the string "cat."
Both strings still exist in memory (at least temporarily), but the "dog" string is no longer accessible because you don't have a reference to it (and therefore no longer "know" its location). You don't know in advance how long they'll both exist in memory though since the garbage collector could delete the "dog" string from memory at any point since there are no longer any references to it.
You're correct about the value on the stack with the reference to the object on the heap by the way - that's a good distinction.

Related

If I cannot call a destructor in C#, what is the solution to destroying an object for the sake of security? [duplicate]

This question already has answers here:
C# Securely delete variable from memory
(2 answers)
Closed 4 years ago.
CustomerData newObject = new CustomerData(long number)//takes credit card #
//uses newObject
//need to delete object here?
I am creating an object that will hold customer information; the object will call a method to out that info in a database. When I am done how do I get rid of the object?
Honestly an attacker that can read the information out of memory, has propably beaten your security already. You can not use the information unless it is clear and unencrypted in memory. And an attacker that can read the memory can just read it then. And even if you get all the references to that instance removed, the bit pattern is still written in the RAM - until it is overwritten by something on pure chance.
I also can not think of any customer related number that I would consider "secure information". You write a person/groups Customer Number onto every bill you send them. If you did not, you could not use it in any meaningfull way for processing.
There are fringe cases where strings are security relevant and that is what SecureString covers. You know, Passwords. But beyond that most types are irrelevant.
You can dereference an object using in C#
object = null;
Then you can force Garbage Collection to collect using
GC.Collect();
But it is not recommended. It's bad.
Actually, object = nothing does not delete it at all.
object = null;
will discard the reference to it, which will speed up when it will be cleaned up by the garbage collector, but you can't know when that will be.
If an object has a Dispose() method, you need to call it before it goes out of scope. You can use the using keyword to do this for you automatically
using (Graphics gr = Graphics.FromBitmap(bm))
{
//some code
} // gr.Dispose() is called here

C# - how does variable scope and disposal impact processing efficiency?

I was having a discussion with a colleague the other day about this hypothetical situation. Consider this pseudocode:
public void Main()
{
MyDto dto = Repository.GetDto();
foreach(var row in dto.Rows)
{
ProcessStrings(row);
}
}
public void ProcessStrings(DataRow row)
{
string string1 = GetStringFromDataRow(row, 1);
string string2 = GetStringFromDataRow(row, 2);
// do something with the strings
}
Then this functionally identical alternative:
public void Main()
{
string1 = null;
string2 = null,
MyDto dto = Repository.GetDto();
foreach(var row in dto.Rows)
{
ProcessStrings(row, string1, string2)
}
}
public void ProcessStrings(DataRow row, string string1, string string2)
{
string1 = GetStringFromDataRow(row, 1);
string2 = GetStringFromDataRow(row, 2);
// do something with the strings
}
How will these differ in processing when running the compiled code? Are we right in thinking the second version is marginally more efficient because the string variables will take up less memory and only be disposed once, whereas in the first version, they're disposed of on each pass of the loop?
Would it make any difference if the strings in the second version were passed by ref or as out parameters?
When you're dealing with "marginally more efficient" level of optimizations you risk not seeing the whole picture and end up being "marginally less efficient".
This answer here risks the same thing, but with that caveat, let's look at the hypothesis:
Storing a string into a variable creates a new instance of the string
No, not at all. A string is an object, what you're storing in the variable is a reference to that object. On 32-bit systems this reference is 4 bytes in size, on 64-bit it is 8. Nothing more, nothing less. Moving 4/8 bytes around is overhead that you're not really going to notice a lot.
So neither of the two examples, with the very little information we have about the makings of the methods being called, creates more or less strings than the other so on this count they're equivalent.
So what is different?
Well in one example you're storing the two string references into local variables. This is most likely going to be cpu registers. Could be memory on the stack. Hard to say, depends on the rest of the code. Does it matter? Highly unlikely.
In the other example you're passing in two parameters as null and then reusing those parameters locally. These parameters can be passed as cpu registers or stack memory. Same as the other. Did it matter? Not at all.
So most likely there is going to be absolutely no difference at all.
Note one thing, you're mentioning "disposal". This term is reserved for the usage of objects implementing IDisposable and then the act of disposing of these by calling IDisposable.Dispose on those objects. Strings are not such objects, this is not relevant to this question.
If, instead, by disposal you mean "garbage collection", then since I already established that neither of the two examples creates more or less objects than the others due to the differences you asked about, this is also irrelevant.
This is not important, however. It isn't important what you or I or your colleague thinks is going to have an effect. Knowing is quite different, which leads me to...
The real tip I can give about optimization:
Measure
Measure
Measure
Understand
Verify that you understand it correctly
Change, if possible
You measure, use a profiler to find the real bottlenecks and real time spenders in your code, then understand why those are bottlenecks, then ensure your understanding is correct, then you can see if you can change it.
In your code I will venture a guess that if you were to profile your program you would find that those two examples will have absolutely no effect whatsoever on the running time. If they do have effect it is going to be on order of nanoseconds. Most likely, the very act of looking at the profiler results will give you one or more "huh, that's odd" realizations about your program, and you'll find bottlenecks that are far bigger fish than the variables in play here.
In both of your alternatives, GetStringFromDataRow creates new string every time. Whether you store a reference to this string in a local variable or in argument parameter variable (which is essentially not much different from local variable in your case) does not matter. Imagine you even not assigned result of GetStringFromDataRow to any variable - instance of string is still created and stored somewhere in memory until garbage collected. If you would pass your strings by reference - it won't make much difference. You will be able to reuse memory location to store reference to created string (you can think of it as the memory address of string instance), but not memory location for string contents.

How does String.Remove() operates regarding memory?

I was wondering how does .NET's string.Remove() method operates regarding memory.
If I have the following piece of code:
string sample = "abc";
sample = sample.Remove(0);
What will actually happen in memory?
If I understand correctly, We've allocated a string consisting of 3 chars, and then we removed all of them on a new copy of the string, assigned the copy to the old reference, by that overriding it, and then what? What happens to those 3 characters?
If we're not pointing to them anymore, and they're not freed up (at least not that I'm aware of), they will remain in memory as garbage.
However, I'm sure the CLR has some way of detecting it and freeing them up eventually.
So any of you guys know what happens here? Thanks in advance!
First Remove is going to create a new string that has no characters in it (an empty string). This will involve the allocation of a char array an a string object to wrap it. Then you'll assign a reference to that string to your local variable.
Since the string "abc" is a literal string, it'll still exist in the intern pool, unless you've disabled interning of compile time literal strings, so it won't be garbage collected.
So in summary, you've created two new objects and changed the reference of the variable sample from the old object to the new one.
According to the source code: http://referencesource.microsoft.com/#mscorlib/system/string.cs
The method Remove() allocates a new string object and returns the results to you
In your code sample, the sample variable is replaced with a new string object that no longer has the first character
When the garbage collector fires, the orphaned string is reclaimed.

Memory assigned to strings

I know that strings in C# are immutable i.e. when I change the value of a string variable a new string variable with the same name is created with the new value and the older one is collected by GC. Am I right?
string s1 = "abc";
s1 = s1.Substring(0, 1);
If what I said is right, then my doubt is if a new string is created, then is it created in the same memory location?
if a new string is created, then is it created in the same memory location?
No, a separate string object is created, in a separate bit of memory.
You're then replacing the value of s1 with a reference to the newly-created string. That may or may not mean that the original string can be garbage collected - it depends on whether anything else has references to it. In the case of a string constant (as in your example, with a string literal) I suspect that won't be garbage collected anyway, although it's an implementation detail.
If you have:
string text = "original";
text = text.Substring(0, 5);
text = text.Substring(0, 3);
then the intermediate string created by the first call to Substring will be eligible for garbage collection, because nothing else refers to it. That doesn't mean it will be garbage collected immediately though, and it certainly doesn't mean that its memory will be reused for the string created by the final line.

Re-initialize variable or declare new?

In C#, is there a benefit or drawback to re-initializing a previously declared variable instead of declaring and initializing a new one? (Ignoring thoughts on conciseness and human readability.)
For example, compare these two samples:
DataColumn col = new DataColumn();
col.ColumnName = "Subsite";
dataTable.Columns.Add(col);
col = new DataColumn(); // Re-use the "col" declaration.
col.ColumnName = "Library";
dataTable.Columns.Add(col);
vs
DataColumn col1 = new DataColumn();
col1.ColumnName = "Subsite";
gridDataTable.Columns.Add(col1);
DataColumn col2 = new DataColumn(); // Declare a new variable instead.
col2.ColumnName = "Library";
gridDataTable.Columns.Add(col2);
A similar example involving loops:
string str;
for (int i = 0; i < 100; i++)
{
str = "This is string #" + i.ToString(); // Re-initialize the "str" variable.
Console.WriteLine(str);
}
vs
for (int i = 0; i < 100; i++)
{
string str = "This is string #" + i.ToString(); // Declare a new "str" each iteration.
Console.WriteLine(str);
}
Edit: Thank you all for your answers so far. After reading them, I thought I'd expand on my question a bit:
Please correct me if I'm wrong.
When I declare and initialize a reference type like a System.String, I have a pointer to that object, which exists on the stack, and the object's contents, which exist on the heap (only accessible through the pointer).
In the first looping example, it seems like we create only one pointer, "str", and we create 100 instances of the String class, each of which exists on the heap. In my mind, as we iterate through the loop, we are merely changing the "str" pointer to point at a new instance of the String class each time. Those "old" strings that no longer have a pointer to them will be garbage collected--although I'm not sure when that would occur.
In the second looping example, it seems like we create 100 pointers in addition to creating 100 instances of the String class.
I'm not sure what happens to items on the stack that are no longer needed, though. I didn't think the garbage collector got rid of those items too; perhaps they are immediately removed from the stack as soon as you exit their scope? Even if that's true, I'd think that creating only one pointer and updating what it points to is more efficient than creating 100 different pointers, each pointing to a unique instance.
I understand the "premature optimization is evil" argument, but I'm only trying to gain a deeper understanding of things, not optimize my programs to death.
Your second example has a much clearer answer, the second example is the better one. The reason why is that the variable str is only used within the for block. Declaring the variable outside the for block means that it's possible for another piece of code to incorrectly bind to this variable and hence cause bugs in your application. You should declare all variables in the most specific scope possible to prevent accidental usage.
For the first sample I believe it's more a matter of preference. For me, I chose to create a new variable because I believe each variable should have a single purpose. If I am reusing variables it's usually a sign that I need to refactor my method.
This sounds suspiciously like a question designed to provide information for premature optimization. I doubt that either scenario is different in any way that matters in 99.9% of software. The memory is being created and used either way. The only difference is the variable references.
To find out if there is a benefit or drawback, you would need a situation where you really care about the size or the performance of the assembly. If you can't meet the size requirements, then measure the assembly size differences between the two choices (although you're more likely to make gains in other areas). If you can't meet the performance requirements, then use a profiler to see which part of your code is working too slowly.
It is primarily a readability issue, whether you use the same declared name or not doesn't matter, since either way you are creating two separate objects. You really should create the objects or variables with a single focus in mind, it will make your life easier.
As for your second example, the only real difference in intialization is that by placing your string outside the scope of the "for" loop, you are leaving it exposed to more outside influences, which can be useful at times. There is no memory or speed benefit for declaring it inside or outside of the loop. Remember, anytime you make a change to a string variable, you are essentially creating a new string. So, for example:
string test = "new string";
test = "and now I am reusing the string";
is the same as creating two separate strings, like:
string test1 = "new string";
string test2 = "and now I am reusing the string";
To get around this, you would use the StringBuilder class, which allows you to modify a string without creating a new string, and should be used in situations where a string will be heavily modified, especially inside a loop.
What about using the word "using" which destroy the object itself after the braces ends?
I am not sure but that's what I think. I'd like to know your opinions too.
For the second example, I use the second one always, I am not sure as well, but for example at some problems at the ACM-ICPC contest I used to have lost of bugs because forgetting to re-initialize, so I used this way.
For the most part, no, there should be no difference. The main difference would be that local variables (in the case of classes, their "pointer") are stored on the stack and in your first case, if your function were for some reason recursive, having two local vars instead of one will cause you to run out of stack space faster in a deep-recursive function. Getting close to that limit in either case would be a sign that you should probably use a non-recursive method.
Also, just to mention it, you could skip the variables altogether and write:
dataTable.Columns.Add(new DataColumn() { ColumnName = "Subsite" });
dataTable.Columns.Add(new DataColumn() { ColumnName = "Library" });
Which I believe performance-wise will be like having 2 local variables, but I could be wrong there. I cant remember exactly what that produces in IL code though.

Categories