How many string object are created when using string concatenation in C# - c#

I'm a beginner in C#, just have some question on string concatenation.
string str = "My name is";
str += "John"
Q1-Does C#(.NET) have the same concept string pool in Java?
Q2- how many string object are created?

Q1-Does C#(.NET) have the same concept string pool in Java?
T̶h̶e̶ ̶a̶n̶s̶w̶e̶r̶ ̶i̶s̶ ̶n̶o̶,̶ ̶u̶s̶i̶n̶g̶ ̶s̶t̶r̶i̶n̶g̶s̶ ̶i̶n̶ ̶C̶#̶ ̶i̶s̶ ̶n̶o̶t̶ ̶l̶i̶k̶e̶ ̶t̶h̶e̶ ̶s̶t̶r̶i̶n̶g̶ ̶p̶o̶o̶l̶ ̶i̶n̶ ̶j̶a̶v̶a̶, each string is its own reference;
Correction : I had to research this for Java... It is conceptually the same thing, i was mistaken about the details of Javas string pool
C# commonly calls it string interning
You can read more about it here at Fabulous Adventures In Coding : Eric Lippert's Erstwhile Blog
String interning and String.Empty
If you have two identical string literals in one compilation unit then
the code we generate ensures that only one string object is created by
the CLR for all instances of that literal within the assembly. This
optimization is called "string interning".
String interning is a CLI feature that reuses a string instance in certain situations :
string literals, created via the ldstr IL command
When invoked explicitly using string.Intern
Q2- how many string object are created?
Because strings in C# are immutable, you get 3 string allocations out of your 2 statements
// 1st string
string str = "My name is";
// 2nd string
// "John"
// 3rd string, which is the concatenation of the first 2
str += "John"

Yes, there is such a thing.
The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system.
source
In your case, I believe there will be three allocations.

Related

Is it possible to mix string resources when using string interpolation?

I'm refactoring some code and one task is to place hard-coded strings into resources. I have this string in code:
var s = $"Last Updated: {DateTime.Now}";
So now how can I extract this string into a resource. Currently I think only this is possible:
var s = string.Format(MyStrings.LastUpdated, DateTime.Now);
where, MyStrings.LastUpdated would be:
Last Updated: {0}
Is this the only way or is there a newer way available?
The $ special character identifies a string literal as an interpolated string. An interpolated string is a string literal that might contain interpolation expressions. When an interpolated string is resolved to a result string, items with interpolation expressions are replaced by the string representations of the expression results.
As stated in the language reference for string interpolation with $, the operator identifies a string literal, so you cannot use a resource or variable for the format string itself.
Extracting the format string and formatting it using string.Format is a reasonable way to do this. Nevertheless, you should keep in mind that these format strings might contain complex expressions involving names or numbers and other format specifiers. So you should carefully communicate that your translator does not change or remove them, otherwise your application might show unexpected behavior. On the one hand, this might be inconvenient, but on the other hand you can only protect the expressions in your format string from modification if you separate them from your localizable resources and concatenate them yourself, which is infeasible.

Why does `String.Trim()` not trim the object itself?

Not often but sometimes I need to use String.Trim() to remove whitespaces of a string.
If it was a longer time since last trim coding I write:
string s = " text ";
s.Trim();
and be surprised why s is not changed. I need to write:
string s = " text ";
s = s.Trim();
Why are some string methods designed in this (not very intuitive) way? Is there something special with strings?
Strings are immutable. Any string operation generates a new string without changing the original string.
From MSDN:
Strings are immutable--the contents of a string object cannot be
changed after the object is created, although the syntax makes it
appear as if you can do this.
s.Trim() creates a new trimmed version of the original string and returns it instead of storing the new version in s. So, what you have to do is to store the trimmed instance in your variable:
s = s.Trim();
This pattern is followed in all the string methods and extension methods.
The fact that string is immutable doesn't have to do with the decision to use this pattern, but with the fact of how strings are kept in memory. This methods could have been designed to create the new modified string instance in memory and point the variable to the new instance.
It's also good to remember that if you need to make lots of modifications to a string, it's much better to use an StringBuilder, which behaves like a "mutable" string, and it's much more eficient doing this kind of operations.
As it is written in MSDN Library:
A String object is called immutable (read-only), because its value
cannot be modified after it has been created. Methods that appear to
modify a String object actually return a new String object that
contains the modification.
Because strings are immutable, string manipulation routines that
perform repeated additions or deletions to what appears to be a single
string can exact a significant performance penalty.
See this link.
In addition to all the good answers, I also feel that the reason being Threadsaftey.
Lets say
string s = " any text ";
s.Trim();
When you say this there is nothing stopping the other thread from modifying s. If the same string is modified, lets say the other thread remove 'a' from s, then what is the result of s.Trim()?
But when it returns the new string, though it is being modified by the other thread, the trim can make a local copy modify it and return modified string.

Why string.Replace("X","Y") works only when assigned to new string?

I guess it has to do something with string being a reference type but I dont get why simply string.Replace("X","Y") does not work?
Why do I need to do string A = stringB.Replace("X","Y")? I thought it is just a method to be done on specified instance.
EDIT: Thank you so far. I extend my question: Why does b+="FFF" work but b.Replace does not?
Because strings are immutable. Any time you change a string .net creates creates a new string object. It's a property of the class.
Immutable objects
String Object
Why doesn't stringA.Replace("X","Y") work?
Why do I need to do stringB = stringA.Replace("X","Y"); ?
Because strings are immutable in .NET. You cannot change the value of an existing string object, you can only create new strings. string.Replace creates a new string which you can then assign to something if you wish to keep a reference to it. From the documentation:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
Emphasis mine.
So if strings are immutable, why does b += "FFF"; work?
Good question.
First note that b += "FFF"; is equivalent to b = b + "FFF"; (except that b is only evaluated once).
The expression b + "FFF" creates a new string with the correct result without modifying the old string. The reference to the new string is then assigned to b replacing the reference to the old string. If there are no other references to the old string then it will become eligible for garbage collection.
Strings are immutable, which means that once they are created, they cannot be changed anymore. This has several reasons, as far as I know mainly for performance (how strings are represented in memory).
See also (among many):
http://en.wikipedia.org/wiki/Immutable_object
http://channel9.msdn.com/forums/TechOff/58729-Why-are-string-types-immutable-in-C/
As a direct consequence of that, each string operation creates a new string object. In particular, if you do things like
foreach (string msg in messages)
{
totalMessage = totalMessage + message;
totalMessage = totalMessage + "\n";
}
you actually create potentially dozens or hundreds of string objects. So, if you want to manipulate strings more sophisticatedly, follow GvS's hint and use the StringBuilder.
Strings are immutable. Any operation changing them has to create a new string.
A StringBuilder supports the inline Replace method.
Use the StringBuilder if you need to do a lot of string manipulation.
Why "b+="FFF"works but the b.replace is not
Because the += operator assigns the results back to the left hand operand, of course. It's just a short hand for b = b + "FFF";.
The simple fact is that you can't change any string in .Net. There are no instance methods for strings that alter the content of that string - you must always assign the results of an operation back to a string reference somewhere.
Yes its a method of System.String. But you can try
a = a.Replace("X","Y");
String.Replace is a shared function of string class that returns a new string. It is not an operator on the current object. b.Replace("a","b") would be similar to a line that only has c+1. So just like c=c+1 actually sets the value of c+1 to c, b=b.Replace("a","b") sets the new string returned to b.
As everyone above had said, strings are immutable.
This means that when you do your replace, you get a new string, rather than changing the existing string.
If you don't store this new string in a variable (such as in the variable that it was declared as) your new string won't be saved anywhere.
To answer your extended question, b+="FFF" is equivalent to b = b + "FFF", so basically you are creating a new string here also.
Just to be more explicit. string.Replace("X","Y") returns a new string...but since you are not assigning the new string to anything the new string is lost.

Is it because of string pooling by CLR or by the GetHashCode() method?

Is it because of string pooling by CLR or by the GetHashCode() method of both strings return same value?
string s1 = "xyz";
string s2 = "xyz";
Console.WriteLine(" s1 reference equals s2 : {0}", object.ReferenceEquals(s1, s2));
Console writes : "s1 reference equals s2 : True"
I believe that, it's not because of the GetHashCode() returns same value for both string instance. Because, I tested with custom object and overridden the GetHasCode() method to return a single constant every time. The two separate instances of this object does not equal in the reference.
Please let me know, what is happening behind the scene.
thanks
123Developer
It sounds like string interning - a method of storing only one copy of a string. It requires strings to be an immutable type in the language you are dealing with, and .Net satisfies that and uses string interning.
In string interning a string "xyz" is stored in the intern pool, and whenever you say "xyz" internally it references the entry in the pool. This can save space by only storing the string once. So a comparison of "xyz" == "xyz" will get interpreted as [pointer to 34576] == [pointer to 34576] which is true.
This is definitely due to string interning. Hash codes are never calculated when comparing references with object.ReferenceEquals.
From the C# spec, section 2.4.4.5:
Each string literal does not
necessarily result in a new string
instance. When two or more string
literals that are equivalent according
to the string equality operator
(§7.9.7) appear in the same program,
these string literals refer to the
same string instance.
Note that string constant expressions count as literals in this case, so:
string x = "a" + "b";
string y = "ab";
It's guaranteed that x and y refer to the same object too (i.e. they are the same references).
When the spec says "program" by the way, it really means "assembly". The behaviour of equal strings in different assemblies depends on things like CompilationRelaxations.NoStringInterning and the precise CLR implementation and execution time situation (e.g. whether the assembly is ngen'd or not).
It's similar to string pooling, but it's not done at runtime but at compile time.
Any string literal in an assembly only exists once. The compiler uses the same constant string for all occurances of the string literal "xyz". As strings are immutable (you can never change the value of a string instance), the compiler can safely use the same string instance for separate string references.
If you instead create a string at runtime, you get a separate instance:
string s1 = "xyz";
string s2 = "xy";
s2 += "z";
Console.WriteLine("s1 ref = s2 : {0}", object.ReferenceEquals(s1, s2));
Output:
s1 ref = s2 : False
Totally agree with Tom's answer...
Excerpt from CIL Specification (page 126):
The CLI guarantees that the result of
two ldstr instructions referring to
two metadata tokens that have the same
sequence of characters, return
precisely the same string object (a
process known as “string interning”).
string interning has nothing to do with it.
I would be very surprise to find up that .NET/C# compiler calls Intern implicitly, It takes too much stress on the CPU to check for matching string at runtime.

C# string concatenation and string interning

When performing string concatentation of an existing string in the intern pool, is a new string entered into the intern pool or is a reference returned to the existing string in the intern pool? According to this article, String.Concat and StringBuilder will insert new string instances into the intern pool?
http://community.bartdesmet.net/blogs/bart/archive/2006/09/27/4472.aspx
Can anyone explain how concatenation works with the intern pool?
If you create new strings, they will not automatically be put into the intern pool, unless you concatenate constants compile-time, in which case the compiler will create one string result and intern that as part of the JIT process.
You can see whether a string has been interned by calling String.IsInterned. The call will return a new string that is either a reference to an interned string equal to the string that was passed as an argument, or null if the string was not interned.

Categories