Question on string interning performance

Question on string interning performance - c#

I'm curious. The scenario is a web app/site with e.g. 100's of concurrent connections and many (20?) page loads per second.
If the app needs to server a formatted string
string.Format("Hello, {0}", username);
Will the "Hello, {0}" be interned? Or would it only be interned with
string hello = "Hello, {0}";
string.Format(hello, username);
Which, with regard to interning, would give better performance: the above or,
StringBuilder builder = new StringBuilder()
builder.Append("Hello, ");
builder.Append(username);
or even
string hello = "Hello, {0}";
StringBuilder builder = new StringBuilder()
builder.Append("Hello, ");
builder.Append(username);
So my main questions are:
1) Will a string.Format literal be interned
2) Is it worth setting a variable name for a stringbuilder for a quick lookup, or
3) Is the lookup itself quite heavy (if #1 above is a no)
I realise this would probably result in minuscule gains, but as I said I am curious.

There is a static method String.IsInterned(str) method. You could do some testing and find out!
http://msdn.microsoft.com/en-us/library/system.string.isinterned.aspx

String.Format actually uses a StringBuilder internally, so there is no reason to call it directly in your code. As far as interning of the literal is concerned, the two code versions are the same as the C# compiler will create a temporary variable to store the literal.
Finally, the effect of interning in a web page is negligible. Page rendering is essentially a heavy-duty string manipulation operation so the difference interning makes is negligible. You can achieve much greater performance benefits in a much easier way by using page and control caching.

Quick answer: run a 100k iterations and find out.

You can't beat
return "Hello, " + username;
if your scenario is really that simple.

Related

Memory fragmentation when concatenating or adding strings but not with string.Format?

So a professor in university just told me that using concatenation on strings in C# (i.e. when you use the plus sign operator) creates memory fragmentation, and that I should use string.Format instead.
Now, I've searched a lot in stack overflow and I found a lot of threads about performance, which concatenating strings win hands down. (Some of them include this, this and this)
I can't find someone who talks about memory fragmentation though. I opened .NET's string.Format using ILspy and apparently it uses the same string builder than the string.Concat method does (which if I understand is what the + sign is overloaded to). In fact: it uses the code in string.Concat!
I found this article from 2007 but I doubt it's accurate today (or ever!). Apparently the compiler is smart enough to avoid that today, cause I can't seem to reproduce the issue. Both adding strings with string.format and plus signs end up using the same code internally. As said before, the string.Format uses the same code string.Concat uses.
So now I'm starting to doubt his claim. Is it true?

So a professor in university just told me that using concatenation on strings in C# (i.e. when you use the plus sign operator) creates memory fragmentation, and that I should use string.Format instead.
No, what you should do instead is do user research, set user-focussed real-world performance metrics, and measure the performance of your program against those metrics. When, and only when you find a performance problem, you should use the appropriate profiling tools to determine the cause of the performance issue. If the cause is "memory fragmentation" then address that by identifying the causes of the "fragmentation" and trying experiments to determine what techniques mitigate the effect.
Performance is not achieved by "tips and tricks" like "avoid string concatenation". Performance is achieved by applying engineering discipline to realistic problems.
To address your more specific problem: I have never heard the advice to eschew concatenation in favor of formatting for performance reasons. The advice usually given is to eschew iterated concatenation in favor of builders. Iterated concatenation is quadratic in time and space and creates collection pressure. Builders allocate unnecessary memory but are linear in typical scenarios. Neither creates fragmentation of the managed heap; iterated concatenation tends to produce contiguous blocks of garbage.
The number of times I've had a performance problem that came down to unnecessary fragmentation of a managed heap is exactly one; in an early version of Roslyn we had a pattern where we would allocate a small long lived object, then a small short lived object, then a small long lived object... several hundred thousand times in a row, and the resulting maximally fragmented heap caused user-impacting performance problems on collections; we determined this by careful measurement of the performance in the relevant scenarios, not by ad hoc analysis of the code from our comfortable chairs.
The usual advice is not to avoid fragmentation, but rather to avoid pressure. We found during the design of Roslyn that pressure was far more impactful on GC performance than fragmentation, once our aforementioned allocation pattern problem was fixed.
My advice to you is to either press your professor for an explanation, or to find a professor who has a more disciplined approach to performance metrics.
Now, all that said, you should use formatting instead of concatenation, but not for performance reasons. Rather, for code readability, localizability, and similar stylistic concerns. A format string can be made into a resource, it can be localized, and so on.
Finally, I caution you that if you are putting strings together in order to build something like a SQL query or a block of HTML to be served to a user, then you want to use none of these techniques. These applications of string building have serious security impacts when you get them wrong. Use libraries and tools specifically designed for construction of those objects, rather than rolling your own with strings.

The problem with string concatenation is that strings are immutable. string1 + string2 does not concatenate string2 onto string1, it creates a whole new string. Using a StringBuilder (or string.Format) does not have this problem. Internally, the StringBuilder holds a char[], which it over-allocates. Appending something to a StringBuilder does not create any new objects unless it runs out of room in the char[] (in which case it over-allocates a new one).
I ran a quick benchmark. I think it proves the point :)
StringBuilder sb = new StringBuilder();
string st;
Stopwatch sw;
sw = Stopwatch.StartNew();
for (int i = 0 ; i < 100000 ; i++)
{
sb.Append("a");
}
st = sb.ToString();
sw.Stop();
Debug.WriteLine($"Elapsed: {sw.Elapsed}");
st = "";
sw = Stopwatch.StartNew();
for (int i = 0 ; i < 100000 ; i++)
{
st = st + "a";
}
sw.Stop();
Debug.WriteLine($"Elapsed: {sw.Elapsed}");
The console output:
Elapsed: 00:00:00.0011883 (StringBuilder.Append())
Elapsed: 00:00:01.7791839 (+ operator)

C# big string array to string

I have a string array of about 20,000,000 values.
And i need to convert it to a string
I've tried:
string data = "";
foreach (var i in tm)
{
data = data + i;
}
But that takes too long time
does someone know a faster way?

Try StringBuilder:
StringBuilder sb = new StringBuilder();
foreach (var i in tm)
{
sb.Append(i);
}
To get the resulting String use ToString():
string result = sb.ToString();

The answer is going to depend on the size of the output string and the amount of memory you have available and usable. The hard limit on string length appears to be 2^31-1 (int.MaxValue) characters, occupying just over 4GB of memory. Whether you can actually allocate that is dependent on your framework version, etc. If you're going to be producing a larger output then you can't put it into a single string anyway.
You've already discovered that naive concatenation is going to be tragically slow. The problem is that every pass through the loop creates a new string, then immediately discards it on the next iteration. This is going to fill up memory pretty quickly, forcing the Garbage Collector to work overtime finding old strings to clear out of memory, not to mention the amount of memory fragmentation and all that stuff that modern programmers don't pay much attention to.
A StringBuiler, is a reasonable solution. Internally it allocates blocks of characters that it then stitches together at the end using pointers and memory copies. Saves a lot of hassles that way and is quite speedy.
As for String.Join... it uses a StringBuilder. So does String.Concat although it is certainly quicker when not inserting separator characters.
For simplicity I would use String.Concat and be done with it.
But then I'm not much for simplicity.
Here's an untested and possibly horribly slow answer using LINQ. When I get time I'll test it and see how it performs, but for now:
string result = new String(lines.SelectMany(l => (IEnumerable<char>)l).ToArray());
Obviously there is a potential overflow here since the ToArray call can potentially create an array larger than the String constructor can handle. Try it out and see if it's as quick as String.Concat.

So you can do it in LINQ, like such.
string data = tm.Aggregate("", (current, i) => current + i);
Or you can use the string.Join function
string data = string.Join("", tm);

Cant check it right now but I'm curious on how this option would perform:
var data = String.Join(string.Empty, tm);
Is Join optimized and ignores concatenation a with String.Empty?

For this big data unfortunately memory based methods will fail and this will be a real headache for GC. For this operation create a file and put every string in it. Like this:
using (StreamWriter sw = new StreamWriter("some_file_to_write.txt")){
for (int i=0; i<tm.Length;i++)
sw.Write(tm[i]);
}
Try to avoid using "var" on this performance demanding approach. Correction: "var" does not effect perfomance. "dynamic" does.

Is there a difference between concatenation and place holding in C# output or is it a matter of preference?

Just started working with C# for the first time, and while looking through the tutorial, I found nothing on the difference between the Concatenation (console.writeline("Hello" + user) where user is a string variable) and the place holder (console.writeline("Hello {0}" , user) where user is a string variable) methods for output. Is there a difference or is it simply which way you find easier

Its not really specific to C#, lots of languages support both styles. The latter form is usually thought of as 'safer', but I can't quote any specific reason why. It is useful if the item needs to appear in more than 1 place, or if you want to save the format string as a constant. Take a look at this thread for more info: When is it better to use String.Format vs string concatenation?.

Using string formatters, as opposed to string concatenation, is almost entirely about readability. What they actually do, and even how they perform, is close enough to the same.
For such a simple case both look all right, but when you have a complex string with lots of values mixed in format strings can end up looking a lot nicer:
Here's a better example:
string output = "Hello " + username + ". I have spent " + executionTime + " seconds trying to figure out that the answer to life is: " + output;
vs
string output = string.Format("Hello {0}. I have spent {1} seconds trying to figure out that the answer to life is: {2}"
, username, executionTime, output);

As Matt said Place holding is considered as the safer approach then the simple concatenation, but I am not sure for that reasons(I need to explore on it). But yes one thing is sure that Place Holding is a bit costly operation then Concatenation in terms of Performance. Check this Blog entry "Formatting Strings" by Jon Skeet.
Although performance will be effected significantly only if you are using Place Holders for like thousands times or so.

String Concatenation Vs String Builder Append

So...I have this scenario where I have a Foreach loop that loops through a List of Checkboxes to check which are selected. For every selected checkbox, I have to do a pretty long string concatenation, involving 30 different strings of an average length of 20 characters, and then send it out as a HTTP request. 2 of the strings are dependant on the index/value of the checkbox selected.
The length of the List of Checkboxes is also variable depending upon the user's data. I would say the average length of the List would be 20, but it can go up to 50-60. So the worst case scenario would be performing the whole string concatenation 60 or so times.
For now I'm doing it with simple string concatenation via the '+' operator, but I'm wondering if it would be faster to do it with Stringbuilder. Of course, that means I'd have to either create a Stringbuilder object within the loop, or create it before the loop and call Stringbuilder.Remove at the end of it after sending out the HTTP request.
I appreciate any insights anybody can share regarding this issue.
EDIT
Thanks for all the replies everybody, so from what I've gathered, the best way for me to go about doing this would be something like:
StringBuilder sb = new StringBuilder();
foreach (CheckBox item in FriendCheckboxList)
{
if (item.Checked)
{
sb.Append(string1);
sb.Append(string2);
sb.Append(string3);
.
.
.
sb.Append(stringLast);
SendRequest(sb.ToString());
sb.Length = 0;
}
}

Use StringBuilder. That's what it's for.
Strings are immutable. String concatenation creates a new string, needing more memory, and is generally considered slow:
string a = "John" + " " + "Saunders";
This creates a string "John ", then creates another string "John Saunders", then finally, assigns that to "a". The "John " is left for garbage collection.
string a = "John";
a += " ";
a += "Saunders";
This is about the same, as "John" is replaced by a new string "John ", which is replaced by a new string "John Saunders". The originals are left to be garbage collected.
On the other hand, StringBuilder is designed to be appended, removed, etc.
Example:
StringBuilder sb = new StringBuilder();
for (int i=0; i<n; i++)
{
sb.Length = 0;
sb.Append(field1[i]);
sb.Append(field2[i]);
...
sb.Append(field30[i]);
// Do something with sb.ToString();
}

This topic has been analysed to death over the years. The end result is that if you are doing a small, known number of concatenations, use '+', otherwise use stringbuilder. From what you've said, concatenate with '+' should be faster. There are a gazillion (give or take) sites out there analysing this - google it.
For the size of string you are talking about, it's negligible anyway.
EDIT: on second thought, SB is probably faster. But like I said, who cares?

I know this has been answered, but I wanted to point out that I actually think the "blindly accepted as gospel" approach of always using StringBuilder is sometimes wrong, and this is one case.
For background, see this blog entry: http://geekswithblogs.net/johnsperfblog/archive/2005/05/27/40777.aspx
The short of it is, for this particular case, as described, you will see better performance by avoiding StringBuilder and making use of the + operator thusly:
foreach (CheckBox item in FriendCheckboxList)
{
if (item.Checked)
{
string request = string1 +
string2 +
string3 +
.
.
.
stringLast;
SendRequest(request);
}
}
The reason is that the C# compiler (as of .NET 1.1), will convert that statement into a single IL call to String.Concat passing an array of Strings as an argument. The blog entry does an excellent job outlining the implementation details of String.Concat, but suffice to say, it is extremely efficient for this case.

If your asking this question, chances are you should use StringBuilder for many reasons, but i'll provide two.
When you use string concatenation it has to allocate a new buffer and and copy the data in the other string into the new string variable. So you are going to incur many repeated allocations. Which in the end ends up fragmenting the memory, using up heap space, and making more work for the Garbage collector.
The StringBuilder on the other hand pre-allocates a buffer and as you add strings to it doesn't need to keep re-allocating (assuming initial buffer is large enough). Which increases performance and is far less taxing on memory.
As developers we should try to anticipate future growth. Let's say that your list grows substantially over time and then all of a sudden starts performing slowly. If you can prevent this with little effort now, why wouldn't you do it?

In general I would recommend to use a StringBuilder.
Have you tested this and checked the performance? Is the performance an issue vs how long it will take you to rewrite the code?

Is it worth using StringBuilder in web apps?

In web app I am splitting strings and assigning to link names or to collections of strings. Is there a significant performance benefit to using stringbuilder for a web application?
EDIT: 2 functions: splitting up a link into 5-10 strings. THen repackaging into another string. Also I append one string at a time to a link everytime the link is clicked.

How many strings will you be concatenating? Do you know for sure how many there will be, or does it depend on how many records are in the database etc?
See my article on this subject for more details and guidelines - but basically, being in a web app makes no difference to how expensive string concatenation is vs using a StringBuilder.
EDIT: I'm afraid it's still not entirely clear from the question exactly what you're doing. If you've got a fixed set of strings to concatenate, and you can do it all in one go, then it's faster and probably more readable to do it using concatenation. For instance:
string faster = first + " " + second + " " + third + "; " + fourth;
string slower = new StringBuilder().Append(first)
.Append(" ")
.Append(second)
.Append(" ")
.Append(third)
.Append("; ")
.Append(fourth)
.ToString();
Another alternative is to use a format string of course. This may well be the slowest, but most readable:
string readable = string.Format("{0} {1} {2}; {3}",
first, second, third, fourth);
The part of your question mentioning "adding a link each time" suggests using a StringBuilder for that aspect though - anything which naturally leads to a loop is more efficient (for moderate to large numbers) using StringBuilder.

You should take a look at this excellent article by Jon Skeet about concatenating strings.

Yes, concatenating regular strings is expensive (really appending on string on to the end of another). Each time a string is changed, .net drops the old string and creates a new one with the new values. It is an immutable object.
EDIT:
Stringbuilder should be used with caution, and evaluated like any other approach. Sometimes connactenting two strings together will be more efficient, and should be evaluated on a case by case basis.
Atwood has an interesting article related to this.

Why would the performance be any different in a web application or a winforms application?
Using stringbuilder is a matter of good practice because of memory and object allocation, the rules apply no matter why you are building the code.

If you're making the string in a loop with a high number of iterations, then it's a good idea to use stringbuilder. Otherwise, string concatenation is your best bet.

FIRSTLY, Are you still writing this application? If yes then STOP Performance tuning!
SECONDLY, Prioritise Correctness over Speed. Readability is way more important in the long run for obvious reasons.
THIRDLY, WE don't know the exact situation and code you are writing. We can't really advise you if this micro optimisation is important to the performance of your code or not. MEASURE the difference. I highly recommend Red Gate's Ants Profiler.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.