I found this piece of code and I'd like to understand why the developer used the string constructor with a char array instead of just a literal constant string:
static string atomLang = new String("lang".ToCharArray());
The only reason I can think of is to avoid getting a reference to the interned instance of the string.
string str1 = "lang";
string str2 = "lang";
string str3 = new String("lang".ToCharArray());
Console.WriteLine(object.ReferenceEquals(str1, str2)); // Output: true
Console.WriteLine(object.ReferenceEquals(str1, str3)); // Output: false
Not that this will have any practical effects on your code (other than marginal performance differences).
Related
I needed to compare two strings, if any one of them contains other in C#. If suppose we have following string patterns
string str1 = "Hello World Test";
string str2 = "Hello World";
string str3 = "Hello World Test Example";
string str4 = "No match";
so following function is required
Compare(str1,str2) = true;
Compare(str1,str3) = true;
Compare(str1,str4) = false;
I am trying following, but looking for if there is some more better alternatives available
if (str1.Contains(str2) || str2.Contains(str1))
Is there a way to make that check with a single call?
Check for the length of the strings and always use contains in such a way that shorter string should be looked out in the longer string.You can use str1.Contains(str2).
string s1 = "test";
string s5 = s1.Substring(0, 3)+"t";
string s6 = s1.Substring(0,4)+"";
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s5)); //False
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s6)); //True
Both the strings s5 and s6 have same value as s1 ("test"). Based on string interning concept, both the statements must have evaluated to true. Can someone please explain why s5 didn't have the same reference as s1?
You should get false for calls of ReferenceEquals on string objects that are not string literals.
Essentially, the last line prints True by coincidence: what happens is that when you pass an empty string for string concatenation, library optimization recognizes this, and returns the original string. This has nothing to do with interning, as the same thing will happen with strings that you read from console or construct in any other way:
var s1 = Console.ReadLine();
var s2 = s1+"";
var s3 = ""+s1;
Console.WriteLine(
"{0} {1} {2}"
, object.ReferenceEquals(s1, s2)
, object.ReferenceEquals(s1, s3)
, object.ReferenceEquals(s2, s3)
);
The above prints
True True True
Demo.
The CLR doesn't intern all strings. All string literals are interned by default. The following, however:
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s6)); //True
Returns true, since the line here:
string s6 = s1.Substring(0,4)+"";
Is effectively optimized to return the same reference back. It happens to (likely) be interned, but that's coincidental. If you want to see if a string is interned, you should use String.IsInterned()
If you want to intern strings at runtime, you can use String.Intern and store the reference, as per the MSDN documentation here: String.Intern Method (String). However, I strongly suggest you not use this method, unless you have a good reason to do so: it has performance considerations and potentially unwanted side-effects (for example, strings that have been interned cannot be garbage collected).
From msdn documentation of object.ReferenceEquals here:
When comparing strings.If objA and objB are strings, the ReferenceEquals method returns true if the string is interned.It does not perform a test for value equality.In the following example, s1 and s2 are equal because they are two instances of a single interned string.However, s3 and s4 are not equal, because although they are have identical string values, that string is not interned.
using System;
public class Example
{
public static void Main()
{
String s1 = "String1";
String s2 = "String1";
Console.WriteLine("s1 = s2: {0}", Object.ReferenceEquals(s1, s2));
Console.WriteLine("{0} interned: {1}", s1,
String.IsNullOrEmpty(String.IsInterned(s1)) ? "No" : "Yes");
String suffix = "A";
String s3 = "String" + suffix;
String s4 = "String" + suffix;
Console.WriteLine("s3 = s4: {0}", Object.ReferenceEquals(s3, s4));
Console.WriteLine("{0} interned: {1}", s3,
String.IsNullOrEmpty(String.IsInterned(s3)) ? "No" : "Yes");
}
}
// The example displays the following output:
// s1 = s2: True
// String1 interned: Yes
// s3 = s4: False
// StringA interned: No
Strings in .NET can be interned. It isn't said anywhere that 2 identical strings should be the same string instance. Typically, the compiler will intern identical string literals, but this isn't true for all strings, and is certainly not true of strings created dynamically at runtime.
The Substring method is smart enough to return the original string in the case where the substring being requested is exactly the original string. Link to the Reference Source found in comment by #DanielA.White. So s1.Substring(0,4) returns s1 when s1 is of length 4. And apparently the + operator has a similar optimization such that
string s6 = s1.Substring(0,4)+"";
is functionally equivalent to:
string s6 = s1;
For example:
public string ReplaceXYZ(string text)
{
string replacedText = text;
replacedText = replacedText.Replace("X", String.Empty);
replacedText = replacedText.Replace("Y", String.Empty);
replacedText = replacedText.Replace("Z", String.Empty);
return replacedText;
}
If I were to call "ReplaceXYZ" even for strings that do not contain "X", "Y", or "Z", would 3 new strings be created each time?
I spotted code similar to this in one of our projects. It's called repeatedly as it loops through a large collection of strings.
It does not return a new instance if there is nothing to replace:
string text1 = "hello world", text2 = text1.Replace("foo", "bar");
bool referenceEqual = object.ReferenceEquals(text1, text2);
After that code executes, referenceEqual is set to true.
Even better, this behavior is documented:
If oldValue is not found in the current instance, the method returns the current instance unchanged.
Otherwise, this would be implementation-dependent and could change in the future.
Note that there is a similar, documented optimization for calling Substring(0) on a string value:
If startIndex is equal to zero, the method returns the original string unchanged
How I set new value for an string by index value?
I tried:
string a = "abc";
a[0] = "A";
not works for strings, but yes for chars. Why?
Strings in C# (and other .NET languages which use System.String in the base class library) are immutable. That is, you can't modify a string character by character that way (or for that matter, can you modify a string ever).
If you want to modify a string based on the index, you have to convert it to an array using System.String.ToCharArray() first. You convert it back to a string using System.String's constructor, passing in the modified array.
Your example would have to be changed to look like:
string a = "abc";
char[] array = a.ToCharArray();
array[0] = 'A'; //Note single quotes, not double quotes
a = new string(array);
The System.String type does not permit writing by index (or via any means -- to change a the content of a String variable, one must replace it with a reference to an entirely new String). The System.Text.StringBuilder type does, however, permit writing by index. One may create a new System.Text.StringBuilder object (optionally passing a string to the constructor), manipulate it, and then use its ToString method to convert it back to a string.
A replacement would be this:
string a = "abc";
a = a.Remove(0, 1);
a = a.Insert(0, "A");
or for the C say:
string a = "abc";
a = a.Remove(2, 1);
a = a.Insert(2, "C");
Also using a stringbuilder may work as per http://msdn.microsoft.com/en-us/library/362314fe.aspx
StringBuilder sb = new StringBuilder("abc");
sb[0] = 'A';
sb[2] = 'C';
string str = sb.ToString();
Use StringBuilder if you need a mutable String.
Also: a[0] can represent one character while "A" is a String object-it is illegal.
a[0] for a character is a address in memory to which you can assign a value.
string on the other hand is a class and in this case the a[0] is actually a function call to the overloaded operator[]. You can't assign values to functions.
I know you should use a StringBuilder when concatenating strings but I was just wondering if there is a difference in concatenating string variables and string literals. So, is there a difference in performance in building s1, s2, and s3?
string foo = "foo";
string bar = "bar";
string s1 = "foo" + "bar";
string s2 = foo + "bar";
string s3 = foo + bar;
In the case you present, it's actually better to use the concatenation operator on the string class. This is because it can pre-compute the lengths of the strings and allocate the buffer once and do a fast copy of the memory into the new string buffer.
And this is the general rule for concatenating strings. When you have a set number of items that you want to concatenate together (be it 2, or 2000, etc) it's better to just concatenate them all with the concatenation operator like so:
string result = s1 + s2 + ... + sn;
It should be noted in your specific case for s1:
string s1 = "foo" + "bar";
The compiler sees that it can optimize the concatenation of string literals here and transforms the above into this:
string s1 = "foobar";
Note, this is only for the concatenation of two string literals together. So if you were to do this:
string s2 = foo + "a" + bar;
Then it does nothing special (but it still makes a call to Concat and precomputes the length). However, in this case:
string s2 = foo + "a" + "nother" + bar;
The compiler will translate that into:
string s2 = foo + "another" + bar;
If the number of strings that you are concatenating is variable (as in, a loop which you don't know beforehand how many elements there are in it), then the StringBuilder is the most efficient way of concatenating those strings, as you will always have to reallcate the buffer to account for the new string entries being added (of which you don't know how many are left).
The compiler can concatenate literals at compile time, so "foo" + "bar" get compiled to "foobar" directly, and there's no need to do anything at runtime.
Other than that, I doubt there's any significant difference.
Your "knowledge" is incorrect. You should sometimes use a StringBuilder when concatenating strings. In particular, you should do it when you can't perform the concatenation all in one experession.
In this case, the code is compiled as:
string foo = "foo";
string bar = "bar";
string s1 = "foobar";
string s2 = String.Concat(foo, "bar");
string s3 = String.Concat(foo, bar);
Using a StringBuilder would make any of this less efficient - in particular it would push the concatenation for s1 from compile time to execution time. For s2 and s3 it would force the creation of an extra object (the StringBuilder) as well as probably allocating a string which is unnecessarily large.
I have an article which goes into more detail on this.
There is no difference between s2 and s3. The compiler will take care of s1 for you, and concatenate it during compile time.
I'd say that this should decide compiler. Because all your string-building can be optimized as values is already known.
I guess StringBuilder pre-allocates space for appending more strings. As You know + is binary operator so there is no way to build concatenation of more than two strings at a time. Thus if you want to do s4 = s1 + s2 + s3 it will require building intermediate string (s1+s2) and only after that s4.