I know you should use a StringBuilder when concatenating strings but I was just wondering if there is a difference in concatenating string variables and string literals. So, is there a difference in performance in building s1, s2, and s3?
string foo = "foo";
string bar = "bar";
string s1 = "foo" + "bar";
string s2 = foo + "bar";
string s3 = foo + bar;
In the case you present, it's actually better to use the concatenation operator on the string class. This is because it can pre-compute the lengths of the strings and allocate the buffer once and do a fast copy of the memory into the new string buffer.
And this is the general rule for concatenating strings. When you have a set number of items that you want to concatenate together (be it 2, or 2000, etc) it's better to just concatenate them all with the concatenation operator like so:
string result = s1 + s2 + ... + sn;
It should be noted in your specific case for s1:
string s1 = "foo" + "bar";
The compiler sees that it can optimize the concatenation of string literals here and transforms the above into this:
string s1 = "foobar";
Note, this is only for the concatenation of two string literals together. So if you were to do this:
string s2 = foo + "a" + bar;
Then it does nothing special (but it still makes a call to Concat and precomputes the length). However, in this case:
string s2 = foo + "a" + "nother" + bar;
The compiler will translate that into:
string s2 = foo + "another" + bar;
If the number of strings that you are concatenating is variable (as in, a loop which you don't know beforehand how many elements there are in it), then the StringBuilder is the most efficient way of concatenating those strings, as you will always have to reallcate the buffer to account for the new string entries being added (of which you don't know how many are left).
The compiler can concatenate literals at compile time, so "foo" + "bar" get compiled to "foobar" directly, and there's no need to do anything at runtime.
Other than that, I doubt there's any significant difference.
Your "knowledge" is incorrect. You should sometimes use a StringBuilder when concatenating strings. In particular, you should do it when you can't perform the concatenation all in one experession.
In this case, the code is compiled as:
string foo = "foo";
string bar = "bar";
string s1 = "foobar";
string s2 = String.Concat(foo, "bar");
string s3 = String.Concat(foo, bar);
Using a StringBuilder would make any of this less efficient - in particular it would push the concatenation for s1 from compile time to execution time. For s2 and s3 it would force the creation of an extra object (the StringBuilder) as well as probably allocating a string which is unnecessarily large.
I have an article which goes into more detail on this.
There is no difference between s2 and s3. The compiler will take care of s1 for you, and concatenate it during compile time.
I'd say that this should decide compiler. Because all your string-building can be optimized as values is already known.
I guess StringBuilder pre-allocates space for appending more strings. As You know + is binary operator so there is no way to build concatenation of more than two strings at a time. Thus if you want to do s4 = s1 + s2 + s3 it will require building intermediate string (s1+s2) and only after that s4.
Related
I'm declaring a string at initialisation as follows
string a = string.Format("Hello {0}", "World.");
Is there a way to subsequently replace the zeroth argument for something else?
If there's no obvious solution, does anybody know of a better way to address the issue. For context, at initialisation I create a number of strings. The text needs to be updated as the program proceeds. I could create a structure comprising an array of strings and an array of objects and then build the final string as required, but this seems ugly, particularly as each instance could have a different number of arguments.
For example,
public class TextThingy
{
List<String> strings;
List<String> arguments;
...
public string ToString()
{
return strings[0] + arguments [0] + strings [1] ...
}
I've tried this, but to no avail.
string b = string.Format(a, "Universe.");
I guess that the argument {0} once populated is then baked into the string that one time.
You could move the format string to a variable like this?
Would that work? If not, please add some more info for us.
string fmt = "Hello {0}";
string a = string.Format(fmt, "World.");
string b = string.Format(fmt, "Universe.");
try string replace, like ...
StringBuilder sb = new StringBuilder("11223344");
string myString =
sb
.Replace("1", string.Empty)
.Replace("2", string.Empty)
.Replace("3", string.Empty)
.ToString();
string s1 = "test";
string s5 = s1.Substring(0, 3)+"t";
string s6 = s1.Substring(0,4)+"";
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s5)); //False
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s6)); //True
Both the strings s5 and s6 have same value as s1 ("test"). Based on string interning concept, both the statements must have evaluated to true. Can someone please explain why s5 didn't have the same reference as s1?
You should get false for calls of ReferenceEquals on string objects that are not string literals.
Essentially, the last line prints True by coincidence: what happens is that when you pass an empty string for string concatenation, library optimization recognizes this, and returns the original string. This has nothing to do with interning, as the same thing will happen with strings that you read from console or construct in any other way:
var s1 = Console.ReadLine();
var s2 = s1+"";
var s3 = ""+s1;
Console.WriteLine(
"{0} {1} {2}"
, object.ReferenceEquals(s1, s2)
, object.ReferenceEquals(s1, s3)
, object.ReferenceEquals(s2, s3)
);
The above prints
True True True
Demo.
The CLR doesn't intern all strings. All string literals are interned by default. The following, however:
Console.WriteLine("{0} ", object.ReferenceEquals(s1, s6)); //True
Returns true, since the line here:
string s6 = s1.Substring(0,4)+"";
Is effectively optimized to return the same reference back. It happens to (likely) be interned, but that's coincidental. If you want to see if a string is interned, you should use String.IsInterned()
If you want to intern strings at runtime, you can use String.Intern and store the reference, as per the MSDN documentation here: String.Intern Method (String). However, I strongly suggest you not use this method, unless you have a good reason to do so: it has performance considerations and potentially unwanted side-effects (for example, strings that have been interned cannot be garbage collected).
From msdn documentation of object.ReferenceEquals here:
When comparing strings.If objA and objB are strings, the ReferenceEquals method returns true if the string is interned.It does not perform a test for value equality.In the following example, s1 and s2 are equal because they are two instances of a single interned string.However, s3 and s4 are not equal, because although they are have identical string values, that string is not interned.
using System;
public class Example
{
public static void Main()
{
String s1 = "String1";
String s2 = "String1";
Console.WriteLine("s1 = s2: {0}", Object.ReferenceEquals(s1, s2));
Console.WriteLine("{0} interned: {1}", s1,
String.IsNullOrEmpty(String.IsInterned(s1)) ? "No" : "Yes");
String suffix = "A";
String s3 = "String" + suffix;
String s4 = "String" + suffix;
Console.WriteLine("s3 = s4: {0}", Object.ReferenceEquals(s3, s4));
Console.WriteLine("{0} interned: {1}", s3,
String.IsNullOrEmpty(String.IsInterned(s3)) ? "No" : "Yes");
}
}
// The example displays the following output:
// s1 = s2: True
// String1 interned: Yes
// s3 = s4: False
// StringA interned: No
Strings in .NET can be interned. It isn't said anywhere that 2 identical strings should be the same string instance. Typically, the compiler will intern identical string literals, but this isn't true for all strings, and is certainly not true of strings created dynamically at runtime.
The Substring method is smart enough to return the original string in the case where the substring being requested is exactly the original string. Link to the Reference Source found in comment by #DanielA.White. So s1.Substring(0,4) returns s1 when s1 is of length 4. And apparently the + operator has a similar optimization such that
string s6 = s1.Substring(0,4)+"";
is functionally equivalent to:
string s6 = s1;
I would like to know the different ways of inserting a variable into a string, in C#.
I am currently trying to insert values into a json string that I am building:
Random rnd = new Random();
int ID = rnd.Next(1, 999);
string body = #"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""help here""}";
How could I add the "ID" to the string body?
In a typical string inserting scenario, I'd do one of these:
string body = string.Format("My ID is {0}", ID);
string body = "My ID is " + ID;
However, your string is apparently JSON serialized data. I'd expect that I'd want to parse that into a class in order to work with it.
var myObj = JsonConvert.DeserializeObject<MyClass>(someString);
myObj.TID = ID;
// maybe do other things with it, then if I need JSON again...
string body = JsonConvert.SerializeObject(myObj);
One reason to take this approach is to make sure that any data I put in still makes the JSON valid. For example, if my ID were, instead of an int, a string with characters that needed escaping, directly inserting "\"\n\"" would not be the right thing to do.
String interpolation is the easiest way these days:
int myIntValue = 123;
string myStringValue = "one two three";
string interpolatedString = $"my int is: {myIntValue}. My string is: {myStringValue}.";
Output would be "my int is: 123. My string is: one two three.".
You can experiment with this sample yourself, over here.
The $ special character identifies a string literal as an interpolated
string. An interpolated string is a string literal that might contain
interpolation expressions. When an interpolated string is resolved to
a result string, items with interpolation expressions are replaced by
the string representations of the expression results. This feature is available starting with C# 6.
You could try this:
string body = #"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""" + ID + #"""}";
You can also use string.Concat:
string body = string.Concat(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""", ID, #"""}");
There are a number of ways to inject values into strings, however it's easy to lose sight of encodings, and cause major breakage.
If you just want to inject a value into another string, you can use:
string concatenation
string building
string formatting
Concatenation:
The simplest and most common way to build strings is by simply concatenating them together with the + operator:
var foo = 5;
var bar = "example-" + foo;
Concatenation can be difficult to read which makes it easy to introduce bugs, but for most simple tasks is the right tool for the job.
In this case, it's a poor choice:
string body = #"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""" + ID.ToString() + #"""}";
String Building
The StringBuilder class is useful for building large strings particularly when built iteratively.
var sb = new StringBuilder();
for (var i = 0; i < 1000; i++) {
sb.Append(i.ToString());
sb.Append(" ");
}
var output = sb.ToString();
It can still be difficult to read and hard to debug, but for cases where you're joining lots of strings together, it's super efficient
In this case, it's a poor choice:
StringBuilder sb = new StringBuilder();
sb.Append(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""");
sb.Append(ID.ToString());
sb.Append(#"""}");
string body = sb.ToString();
String formatting
The string.Format method makes templating data into a string super easy and efficient. If you plan on reusing the same string over and over, using a format string makes it much easier to read and debug code, particularly when there are lots of replacements:
var foo = 5;
var bar = string.Format("example-{0}", foo);
Format strings can also automatically apply culturally accurate formatting to particular data types, so that a DateTime is appropriately displayed, or so that a number has the appropriate number of trailing zeros.
In this case, it's a poor choice:
string string.Format(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""{0}""}", ID);
The right choice
You're not dumping data into any old string. That's JSON encoded data. If you just concatenate/build/format in any old value, you can break your string. For example, if the ID variable contained a " character, you'd break the entire JSON dataset.
Additionally, the length of the string and necessary quotes make it super difficult to read, which makes it difficult to maintain. Good luck when you get around to needing to add another formatted value, it's going to be a pain to change any existing value or add in new dynamic ones.
Instead of writing a JSON literal, write an object and encode it to JSON:
var bodyData =
new
{
currency = "country",
gold = 1,
detail = "detailid-979095986",
tId = ID //here's where you set the ID
};
var jss = new JavaScriptSerializer();
var body = jss.Serialize(bodyData);
This code is much easier to modify when the data changes, and will actually encode your data correctly. You don't need to worry about all those annoying double quote characters any more either.
You can use the
String.Format(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""{0}""}", ID)
Since this is params object[], you can use as many {n} as you want.
Instead of using on string, you could concatenate strings together using +, which would allow you to insert text between the generated strings.
string body = #"***" + ID + #"***";
Assume I have the following string constants:
const string constString1 = "Const String 1";
const string constString2 = "Const String 2";
const string constString3 = "Const String 3";
const string constString4 = "Const String 4";
Now I can append the strings in two ways:
Option1:
string resultString = constString1 + constString2 + constString3 + constString4;
Option2:
string resultString = string.Format("{0}{1}{2}{3}",constString1,constString2,constString3,constString4);
Internally string.Format uses StringBuilder.AppendFormat. Now given the fact that I am appending constant strings, which of the options (option1 or option 2) is better with respect to performance and/or memory?
The first one will be done by the compiler (at least the Microsoft C# Compiler) (in the same way that the compiler does 1+2), the second one must be done at runtime. So clearly the first one is faster.
As an added benefit, in the first one the string is internalized, in the second one it isn't.
And String.Format is quite slow :-) (read this
http://msmvps.com/blogs/jon_skeet/archive/2008/10/06/formatting-strings.aspx). NOT "slow enough to be a problem", UNLESS all your program do all the day is format strings (MILLIONS of them, not TENS). Then you could probably to it faster Appending them to a StringBuilder.
The first variant will be best, but only when you are using constant strings.
There are two compilator optimizations (from the C# compiler, not the JIT compiler) that are in effect here. Lets take one example of a program
const string A = "Hello ";
const string B = "World";
...
string test = A + B;
First optimization is constant propagation that will change your code basically into this:
string test = "Hello " + "World";
Then a concatenation of literal strings (as they are now, due to the first optimization) optimization will kick in and change it to
string test = "Hello World";
So if you write any variants of the program shown above, the actual IL will be the same (or at least very similar) due to the optimizations done by the C# compiler.
I could do this in C#..
int number = 2;
string str = "Hello " + number + " world";
..and str ends up as "Hello 2 world".
In VB.NET i could do this..
Dim number As Integer = 2
Dim str As String = "Hello " + number + " world"
..but I get an InvalidCastException "Conversion from string "Hello " to type 'Double' is not valid."
I am aware that I should use .ToString() in both cases, but whats going on here with the code as it is?
In VB I believe the string concatenation operator is & rather than + so try this:
Dim number As Integer = 2
Dim str As String = "Hello " & number & " world"
Basically when VB sees + I suspect it tries do numeric addition or use the addition operator defined in a type (or no doubt other more complicated things, based on options...) Note that System.String doesn't define an addition operator - it's all hidden in the compiler by calls to String.Concat. (This allows much more efficient concatenation of multiple strings.)
Visual Basic makes a distinction between the + and & operators. The & will make the conversion to a string if an expression is not a string.
&Operator (Visual Basic)
The + operator uses more complex evaluation logic to determine what to make the final cast into (for example it's affected by things like Option Strict configuration)
+Operator (Visual Basic)
I'd suggest to stay away from raw string concatenation, if possible.
Good alternatives are using string.format:
str = String.Format("Hello {0} workd", Number)
Or using the System.Text.StringBuilder class, which is also more efficient on larger string concatenations.
Both automatically cast their parameters to string.
The VB plus (+) operator is ambiguous.
If you don't have Option Explicit on, if my memory serves me right, it is possible to do this:
Dim str = 1 + "2"
and gets str as integer = 3.
If you explicitly want a string concatenation, use the ampersand operator
Dim str = "Hello " & number & " world"
And it'll happily convert number to string for you.
I think this behavior is left in for backward compatibility.
When you program in VB, always use an ampersand to concatenate strings.