What happens when we initialize a string in C#? - c#

Kindly look at the following program:
static void Main()
{
string s1 = "Hello";
string s2 = "Hello";
Console.WriteLine ( ( object ) s1 == ( object ) s2 );
Console.ReadLine();
}
The output of this snippet is "TRUE". Now my question is:
does string s1 = "HELLO" ; create a new string object? If yes, how does it create a new object without calling the constructor and without using the new operator??
If string s1 = "HELLO", and string s2 = "HELLO" create two objects, then how come the answer is TRUE??

If you intend to compare object references, it's clearer do it like so:
Console.WriteLine ( object.ReferenceEquals(s1, s2 ));
rather than like this:
Console.WriteLine ( ( object ) s1 == ( object ) s3 ); // false
That said, let's rewrite your code a little:
using System;
public class Program
{
public static void Main()
{
string s1 = "Hello";
string s2 = string2();
Console.WriteLine ( object.ReferenceEquals(s1, s2 )); // true
string s3 = "Hel";
s3 = s3 + "lo";
Console.WriteLine ( object.ReferenceEquals(s1, s3 )); // false
// This is the equivalent of the line above:
Console.WriteLine ( ( object ) s1 == ( object ) s3 ); // also false
Console.WriteLine (s1 == s3); // true (comparing string contents)
s3 = string.Intern(s3);
Console.WriteLine ( object.ReferenceEquals(s1, s3 )); // now true
Console.ReadLine();
}
private static string string2()
{
return "Hello";
}
}
Ok, so the question is, "Why do the first two strings have the same reference"?
The answer to that is because the compiler keeps a table of all the strings that it has stored so far, and if a new string it encounters is already in that table, it doesn't store a new one; instead, it makes the new string reference the corresponding string that is already in its table. This is called string interning.
The next thing to note is that if you create a new string by concatenating two strings at runtime, then that new string does NOT have the same reference as an existing string. A brand new string is created.
However if you use == to compare that string with another string that has a different reference but the same contents, true will be returned. That's because string == compares the contents of the string.
The following line in the above code demonstrates this:
Console.WriteLine (s1, s3); // true
Finally, note that the runtime can "intern" strings, that is, use a reference to an existing string rather than a new string. However, it does not do this automatically.
You can call string.Intern() to explicitly intern a string, as the code above shows.

does string s1 = "HELLO" ; create a new string object? If yes, how
does it create a new object without calling the constructor and
without using the new operator??
Yes, not only does it create a new string but also bakes it into the libraries metadata under the "User Strings" section (This is otherwise called "string interning"), so it can directly pull it from there at run-time and save the allocation time. You can view it using ILDASM:
User Strings
-------------------------------------------------------
70000001 : ( 5) L"Hello"
And also see the compiler recognize it as a StringLiteralToken when it parses the syntax tree:
The compiler is aware of the special syntax given for strings and allows you the special syntactic sugar.
If string s1 = "HELLO", and string s2 = "HELLO" create two objects,
then how come the answer is TRUE??
As I previously said in the first part, the string literal is actually only loaded at run-time. This means that string will be loaded once, cached and compared against itself, thus leading this reference equality check to yield true.
You can see this in the emitted IL (Compiled in Release mode):
IL_0000: ldstr "Hello"
IL_0005: ldstr "Hello"
IL_000A: stloc.0 // s2
IL_000B: ldloc.0 // s2
IL_000C: ceq

Related

Is there a way to replace the arguments in a string multiple times?

I'm declaring a string at initialisation as follows
string a = string.Format("Hello {0}", "World.");
Is there a way to subsequently replace the zeroth argument for something else?
If there's no obvious solution, does anybody know of a better way to address the issue. For context, at initialisation I create a number of strings. The text needs to be updated as the program proceeds. I could create a structure comprising an array of strings and an array of objects and then build the final string as required, but this seems ugly, particularly as each instance could have a different number of arguments.
For example,
public class TextThingy
{
List<String> strings;
List<String> arguments;
...
public string ToString()
{
return strings[0] + arguments [0] + strings [1] ...
}
I've tried this, but to no avail.
string b = string.Format(a, "Universe.");
I guess that the argument {0} once populated is then baked into the string that one time.
You could move the format string to a variable like this?
Would that work? If not, please add some more info for us.
string fmt = "Hello {0}";
string a = string.Format(fmt, "World.");
string b = string.Format(fmt, "Universe.");
try string replace, like ...
StringBuilder sb = new StringBuilder("11223344");
string myString =
sb
.Replace("1", string.Empty)
.Replace("2", string.Empty)
.Replace("3", string.Empty)
.ToString();

Why is string interning failing here (or is it)? [duplicate]

string s1 = "test";
string s5 = s1.Substring(0, 3)+"t";
string s6 = s1.Substring(0,4)+"";
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s5)); //False
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s6)); //True
Both the strings s5 and s6 have same value as s1 ("test"). Based on string interning concept, both the statements must have evaluated to true. Can someone please explain why s5 didn't have the same reference as s1?
You should get false for calls of ReferenceEquals on string objects that are not string literals.
Essentially, the last line prints True by coincidence: what happens is that when you pass an empty string for string concatenation, library optimization recognizes this, and returns the original string. This has nothing to do with interning, as the same thing will happen with strings that you read from console or construct in any other way:
var s1 = Console.ReadLine();
var s2 = s1+"";
var s3 = ""+s1;
Console.WriteLine(
"{0} {1} {2}"
, object.ReferenceEquals(s1, s2)
, object.ReferenceEquals(s1, s3)
, object.ReferenceEquals(s2, s3)
);
The above prints
True True True
Demo.
The CLR doesn't intern all strings. All string literals are interned by default. The following, however:
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s6)); //True
Returns true, since the line here:
string s6 = s1.Substring(0,4)+"";
Is effectively optimized to return the same reference back. It happens to (likely) be interned, but that's coincidental. If you want to see if a string is interned, you should use String.IsInterned()
If you want to intern strings at runtime, you can use String.Intern and store the reference, as per the MSDN documentation here: String.Intern Method (String). However, I strongly suggest you not use this method, unless you have a good reason to do so: it has performance considerations and potentially unwanted side-effects (for example, strings that have been interned cannot be garbage collected).
From msdn documentation of object.ReferenceEquals here:
When comparing strings.If objA and objB are strings, the ReferenceEquals method returns true if the string is interned.It does not perform a test for value equality.In the following example, s1 and s2 are equal because they are two instances of a single interned string.However, s3 and s4 are not equal, because although they are have identical string values, that string is not interned.
using System;
public class Example
{
public static void Main()
{
String s1 = "String1";
String s2 = "String1";
Console.WriteLine("s1 = s2: {0}", Object.ReferenceEquals(s1, s2));
Console.WriteLine("{0} interned: {1}", s1,
String.IsNullOrEmpty(String.IsInterned(s1)) ? "No" : "Yes");
String suffix = "A";
String s3 = "String" + suffix;
String s4 = "String" + suffix;
Console.WriteLine("s3 = s4: {0}", Object.ReferenceEquals(s3, s4));
Console.WriteLine("{0} interned: {1}", s3,
String.IsNullOrEmpty(String.IsInterned(s3)) ? "No" : "Yes");
}
}
// The example displays the following output:
// s1 = s2: True
// String1 interned: Yes
// s3 = s4: False
// StringA interned: No
Strings in .NET can be interned. It isn't said anywhere that 2 identical strings should be the same string instance. Typically, the compiler will intern identical string literals, but this isn't true for all strings, and is certainly not true of strings created dynamically at runtime.
The Substring method is smart enough to return the original string in the case where the substring being requested is exactly the original string. Link to the Reference Source found in comment by #DanielA.White. So s1.Substring(0,4) returns s1 when s1 is of length 4. And apparently the + operator has a similar optimization such that
string s6 = s1.Substring(0,4)+"";
is functionally equivalent to:
string s6 = s1;

System.String Type in C#

I know that it may sound like a weird question but this has been going on in my mind for a while.
I know that the System.String type in C# is actually a class with a constructor that has a character array parameter. For example the following code is legal and causes no error:
System.String s = new System.String("Hello".toCharArray());
My question is that what makes is possible for the System.String class to accept an array of characters simply this way:
System.String s = "Hello";
When you call:
System.String s = new System.String("Hello".toCharArray());
You are explicitly invoking a constructor
When you write:
string foo = "bar";
An IL instruction (Ldstr) pushes a new object reference to that string literal. It's not the same as calling a constructor.
This is possible because the C# language specifies that string literals are possible (see ยง2.4.4.5 String literals). The C# compiler and CIL/CLR have good support for how these literals are used, e.g. with the ldstr opcode.
There is no support for including such literals for your own custom types.
Strings are kind of a special clr type. They are the only immutable reference type.
Here are several things which may help you to understand string type:
var a = "Hello";
var b = new String("Hello".ToCharArray());
var c = String.Intern(b); // 'interns' the string...
var equalsString = a == b; // true
var equalsObj = (object)a == (object)b; // false
var equalsInterned = (object)a == (object)c; // true !!
a[0] = 't'; // not valid, because a string is immutable. Instead, do it this way:
var array = b.ToArray();
array[0] = 't';
a = new String(array); // a is now "tello"

why use string constructor with char array for constants?

I found this piece of code and I'd like to understand why the developer used the string constructor with a char array instead of just a literal constant string:
static string atomLang = new String("lang".ToCharArray());
The only reason I can think of is to avoid getting a reference to the interned instance of the string.
string str1 = "lang";
string str2 = "lang";
string str3 = new String("lang".ToCharArray());
Console.WriteLine(object.ReferenceEquals(str1, str2)); // Output: true
Console.WriteLine(object.ReferenceEquals(str1, str3)); // Output: false
Not that this will have any practical effects on your code (other than marginal performance differences).

Difference in String concatenation performance

I know you should use a StringBuilder when concatenating strings but I was just wondering if there is a difference in concatenating string variables and string literals. So, is there a difference in performance in building s1, s2, and s3?
string foo = "foo";
string bar = "bar";
string s1 = "foo" + "bar";
string s2 = foo + "bar";
string s3 = foo + bar;
In the case you present, it's actually better to use the concatenation operator on the string class. This is because it can pre-compute the lengths of the strings and allocate the buffer once and do a fast copy of the memory into the new string buffer.
And this is the general rule for concatenating strings. When you have a set number of items that you want to concatenate together (be it 2, or 2000, etc) it's better to just concatenate them all with the concatenation operator like so:
string result = s1 + s2 + ... + sn;
It should be noted in your specific case for s1:
string s1 = "foo" + "bar";
The compiler sees that it can optimize the concatenation of string literals here and transforms the above into this:
string s1 = "foobar";
Note, this is only for the concatenation of two string literals together. So if you were to do this:
string s2 = foo + "a" + bar;
Then it does nothing special (but it still makes a call to Concat and precomputes the length). However, in this case:
string s2 = foo + "a" + "nother" + bar;
The compiler will translate that into:
string s2 = foo + "another" + bar;
If the number of strings that you are concatenating is variable (as in, a loop which you don't know beforehand how many elements there are in it), then the StringBuilder is the most efficient way of concatenating those strings, as you will always have to reallcate the buffer to account for the new string entries being added (of which you don't know how many are left).
The compiler can concatenate literals at compile time, so "foo" + "bar" get compiled to "foobar" directly, and there's no need to do anything at runtime.
Other than that, I doubt there's any significant difference.
Your "knowledge" is incorrect. You should sometimes use a StringBuilder when concatenating strings. In particular, you should do it when you can't perform the concatenation all in one experession.
In this case, the code is compiled as:
string foo = "foo";
string bar = "bar";
string s1 = "foobar";
string s2 = String.Concat(foo, "bar");
string s3 = String.Concat(foo, bar);
Using a StringBuilder would make any of this less efficient - in particular it would push the concatenation for s1 from compile time to execution time. For s2 and s3 it would force the creation of an extra object (the StringBuilder) as well as probably allocating a string which is unnecessarily large.
I have an article which goes into more detail on this.
There is no difference between s2 and s3. The compiler will take care of s1 for you, and concatenate it during compile time.
I'd say that this should decide compiler. Because all your string-building can be optimized as values is already known.
I guess StringBuilder pre-allocates space for appending more strings. As You know + is binary operator so there is no way to build concatenation of more than two strings at a time. Thus if you want to do s4 = s1 + s2 + s3 it will require building intermediate string (s1+s2) and only after that s4.

Categories