StringBuilder performance in C#?

StringBuilder performance in C#? - c#

I have a StringBuilder object where I am adding some strings like follows:
I want to know which one is better approach here, first one is this:
StringBuilder sb = new StringBuilder();
sb.Append("Hello" + "How" + "are" + "you");
and the second one is:
StringBuilder sb = new StringBuilder();
sb.Append("Hello").Append("How").Append("are").Append("you");

In your current example, the string literals:
"Hello" + "How" + "are" + "you"
Will be compiled into one constant string literal by the compiler, so it is technically faster than:
sb.Append("Hello").Append("How").Append("are").Append("you");
However, were you to use string variables:
sb.Append(s1 + s2 + s3 + s4);
Then the latter would be faster as the former could potentially create a series of strings (because of the concatenation) before passing the final string into the Append method, whereas the latter would avoid the extra string creations (but trades off extra method calls and internal buffer resizing).
Update: For further clarity, in this exact situation where there are only 4 items being concatenated, the compiler will emit a call to String.Concat(string, string, string, string), which knowing the length and number of strings will be more efficient than StringBuilder.

The first will be more efficient. The compiler will convert it to the following single call:
StringBuilder sb = new StringBuilder();
sb.Append("HelloHowareyou");
Measuring the performance
The best way to know which is faster is to measure it. I'll get straight to the point: here are the results (smaller times means faster):
sb.Append("Hello" + "How" + "are" + "you") : 11.428s
sb.Append("Hello").Append("How").Append("are").Append("you"): 15.314s
sb.Append(a + b + c + d) : 21.970s
sb.Append(a).Append(b).Append(c).Append(d) : 15.529s
The number given is the number of seconds to perform the operation 100 million times in a tight loop.
Conclusions
The fastest is using string literals and +.
But if you have variables, using Append is faster than +. The first version is slower because of an extra call to String.Concat.
In case you want to test this yourself, here's the program I used to get the above timings:
using System;
using System.Text;
public class Program
{
public static void Main()
{
DateTime start, end;
int numberOfIterations = 100000000;
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append("Hello" + "How" + "are" + "you");
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(\"Hello\" + \"How\" + \"are\" + \"you\")", start, end);
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append("Hello").Append("How").Append("are").Append("you");
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(\"Hello\").Append(\"How\").Append(\"are\").Append(\"you\")", start, end);
string a = "Hello";
string b = "How";
string c = "are";
string d = "you";
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append(a + b + c + d);
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(a + b + c + d)", start, end);
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append(a).Append(b).Append(c).Append(d);
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(a).Append(b).Append(c).Append(d)", start, end);
Console.ReadLine();
}
private static void DisplayResult(string name, DateTime start, DateTime end)
{
Console.WriteLine("{0,-60}: {1,6:0.000}s", name, (end - start).TotalSeconds);
}
}

String constants will be concatenated at compile time by the compiler. If you are concatenating no more than four string expressions, the compiler will emit a call to String.Concat
s + t + u + v ==> String.Concat(s, t, u, v)
This performs faster than StringBuilder, as StringBuilder might have to resize its internal buffer, while Concat can calculate the total resulting length in advance. If you know the maximum length of the resulting string in advance, however, you can initialize the StringBuilder by specifying an initial working buffer size
var sb = new StringBuilder(initialBufferSize);
StringBuilder is often used in a loop and other dynamic scenarios and performs faster than s += t in such cases.

In the first case the compiler will construct a single string, so you'll only call Append once. However, I doubt this will make much of a difference. What did your measurements show?

The second one is the better approach. Strings are immutable meaning that when you use sb.Append("Hello" + "How" + "Are" + "You") you are creating multiple copies of the string
e.g.
"Hello"
then
"HelloHow"
then
"HelloHowAre"
etc.
The second piece of code is much more performant
edit: Of course this doesn't take into consideration compiler optimisations, but it's best to use the class as intended
Ok as people have pointed out since these are literals the compiler takes care of optimising these operations away - but my point is that doing string concatenation is something that StringBuilder tries to avoid
For instance, looping several times as such:
var someString = "";
foreach (var s in someListOfStrings)
{
someString += s;
}
Is not as good as doing:
var sb = new StringBuilder();
foreach(var s in someListOfStrings)
{
sb.Append(s);
}
sb.ToString();
As this will likely be much quicker since, as I said before, strings are immutable
I assumed the OP was talking about using concatenation in general since
sb.Append("Hello" + "How");
Seems completely pointless when
sb.Append("HelloHow");
Would be more logical...?
It seems to me that in the OPs mind, the placeholder text would eventually become a shedload of variables...

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

I have an extension method to remove certain characters from a string (a phone number) which is performing much slower than I think it should vs chained Replace calls. The weird bit, is that in a loop it overtakes the Replace thing if the loop runs for around 3000 iterations, and after that it's faster. Lower than that and chaining Replace is faster. It's like there's a fixed overhead to my code which Replace doesn't have. What could this be!?
Quick look. When only testing 10 numbers, mine takes about 0.3ms, while Replace takes only 0.01ms. A massive difference! But when running 5 million, mine takes around 1700ms while Replace takes about 2500ms.
Phone numbers will only have 0-9, +, -, (, )
Here's the relevant code:
Building test cases, I'm playing with testNums.
int testNums = 5_000_000;
Console.WriteLine("Building " + testNums + " tests");
Random rand = new Random();
string[] tests = new string[testNums];
char[] letters =
{
'0','1','2','3','4','5','6','7','8','9',
'+','-','(',')'
};
for(int t = 0; t < tests.Length; t++)
{
int length = rand.Next(5, 20);
char[] word = new char[length];
for(int c = 0; c < word.Length; c++)
{
word[c] = letters[rand.Next(letters.Length)];
}
tests[t] = new string(word);
}
Console.WriteLine("Tests built");
string[] stripped = new string[tests.Length];
Using my extension method:
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].CleanNumberString();
}
stopwatch.Stop();
Console.WriteLine("Clean: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Using chained Replace:
stripped = new string[tests.Length];
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].Replace(" ", string.Empty)
.Replace("-", string.Empty)
.Replace("(", string.Empty)
.Replace(")", string.Empty)
.Replace("+", string.Empty);
}
stopwatch.Stop();
Console.WriteLine("Replace: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Extension method in question:
public static string CleanNumberString(this string s)
{
Span<char> letters = stackalloc char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] >= '0' && s[i] <= '9')
letters[count++] = s[i];
}
return new string(letters.Slice(0, count));
}
What I've tried:
I've run them around the other way. Makes a tiny difference, but not enough.
Make it a normal static method, which was significantly slower than extension. As a ref parameter was slightly slower, and in parameter was about the same as extension method.
Aggressive Inlining. Doesn't make any real difference. I'm in release mode, so I suspect the compiler inlines it anyway. Either way, not much change.
I have also looked at memory allocations, and that's as I expect. My one allocates on the managed heap only one string per iteration (the new string at the end) which Replace allocates a new object for each Replace. So the memory used by the Replace one is much, higher. But it's still faster!
Is it calling native C code and doing something crafty there? Is the higher memory usage triggering the GC and slowing it down (still doesn't explane the insanely fast time on only one or two iterations)
Any ideas?
(Yes, I know not to bother optimising things like this, it's just bugging me because I don't know why it's doing this)

After doing some benchmarks, I think can safely assert that your initial statement is wrong for the exact reason you mentionned in your deleted answer: the loading time of the method is the only thing that misguided you.
Here's the full benchmark on a simplified version of the problem:
static void Main(string[] args)
{
// Build string of n consecutive "ab"
int n = 1000;
Console.WriteLine("N: " + n);
char[] c = new char[n];
for (int i = 0; i < n; i+=2)
c[i] = 'a';
for (int i = 1; i < n; i += 2)
c[i] = 'b';
string s = new string(c);
Stopwatch stopwatch;
// Make sure everything is loaded
s.CleanNumberString();
s.Replace("a", "");
s.UnsafeRemove();
// Tests to remove all 'a' from the string
// Unsafe remove
stopwatch = Stopwatch.StartNew();
string a1 = s.UnsafeRemove();
stopwatch.Stop();
Console.WriteLine("Unsafe remove:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Extension method
stopwatch = Stopwatch.StartNew();
string a2 = s.CleanNumberString();
stopwatch.Stop();
Console.WriteLine("Clean method:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// String replace
stopwatch = Stopwatch.StartNew();
string a3 = s.Replace("a", "");
stopwatch.Stop();
Console.WriteLine("String.Replace:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Make sure the returned strings are identical
Console.WriteLine(a1.Equals(a2) && a2.Equals(a3));
Console.ReadKey();
}
public static string CleanNumberString(this string s)
{
char[] letters = new char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
if (s[i] == 'b')
letters[count++] = 'b';
return new string(letters.SubArray(0, count));
}
public static T[] SubArray<T>(this T[] data, int index, int length)
{
T[] result = new T[length];
Array.Copy(data, index, result, 0, length);
return result;
}
// Taken from https://stackoverflow.com/a/2183442/6923568
public static unsafe string UnsafeRemove(this string s)
{
int len = s.Length;
char* newChars = stackalloc char[len];
char* currentChar = newChars;
for (int i = 0; i < len; ++i)
{
char c = s[i];
switch (c)
{
case 'a':
continue;
default:
*currentChar++ = c;
break;
}
}
return new string(newChars, 0, (int)(currentChar - newChars));
}
When ran with different values of n, it is clear that your extension method (or at least my somewhat equivalent version of it) has a logic that makes it faster than String.Replace(). In fact, it is more performant on either small or big strings:
N: 100
Unsafe remove: 0,0024ms
Clean method: 0,0015ms
String.Replace: 0,0021ms
True
N: 100000
Unsafe remove: 0,3889ms
Clean method: 0,5308ms
String.Replace: 1,3993ms
True
I highly suspect optimizations for the replacement of strings (not to be compared to removal) in String.Replace() to be the culprit here. I also added a method from this answer to have another comparison on removal of characters. That one's times behave similarly to your method but gets faster on higher values (80k+ on my tests) of n.
With all that being said, since your question is based on an assumption that we found was false, if you need more explanation on why the opposite is true (i.e. "Why is String.Replace() slower than my method"), plenty of in-depth benchmarks about string manipulation already do so.

I ran the clean method a couple more. interestingly, it is a lot faster than the Replace. Only the first time run was slower. Sorry that I couldn't explain why it's slower the first time but I ran more of the method then the result was expected.
Building 100 tests
Tests built
Replace: 0.0528ms
Clean: 0.4526ms
Clean: 0.0413ms
Clean: 0.0294ms
Replace: 0.0679ms
Replace: 0.0523ms
used dotnet core 2.1

So I've found with help from daehee Kim and Mat below that it's only the first iteration, but it's for the whole first loop. Every loop after there is ok.
I use the following line to force the JIT to do its thing and initialise this method:
RuntimeHelpers.PrepareMethod(typeof(CleanExtension).GetMethod("CleanNumberString", BindingFlags.Public | BindingFlags.Static).MethodHandle);
I find the JIT usually takes about 2-3ms to do its thing here (including Reflection time of about 0.1ms). Note that you should probably not be doing this because you're now getting the Reflection cost as well, and the JIT will be called right after this anyway, but it's probably a good idea for benchmarks to fairly compare.
The more you know!
My benchmark for a loop of 5000 iterations, repeated 5000 times with random strings and averaged is:
Clean: 0.41078ms
Replace: 1.4974ms

How can I concatenate a list

I need to concatenate these values, I've seen examples using string builder but I cant quite figure it.
I am trying to return recreate the linestrings of https://api.tfl.gov.uk/Line/140/Route/Sequence/Inbound
However the results I have to return, have more than 1 string of co-ords hence the adding "[", "]"
//
for (int i = 0; i < R.geometry.coordinates.Count; i++)
foreach (List<List<double>> C in R.geometry.coordinates)
{
RS.lineStrings.Add(i++.ToString());
RS.lineStrings.Add("[");
foreach (List<double> a in C)
{
// These values are to be concatentated, I'm wanting to create a string of RS.lineStrings.Add($"[{a[1]},{a[0]}]");
RS.lineStrings.Add($"[{a[1]},{a[0]}]");
}
RS.lineStrings.Add("]");
RS.lineStrings.Add(",");
}

Considering in your code C is List<List<double>>. Then you can use LINQ to concatenate
var sb = new StringBuilder(C.Count * 20); // appox length not to resize
C.ForEach(item => sb.AppendFormat("[{0},{1}]", item[1], item[0]));
var str = sb.ToString(); // This is concatenation.
If you want list of strings
C.Select(item => $"[{item[1]},{item[0]}]").ToList();
Based on your new update (I am trying to return "[[0,1],[2,3],[4,5]]") do this
var result = "[" + string.Join(",", C.Select(item => $"[{item[1]},{item[0]}]")) + "]";
Which method you choose - should depend on the details of your list. you can still do it with string builder for better memory management
var sb = new StringBuilder(C.Count * 20); // appox length not to resize
C.ForEach(item => sb.AppendFormat("[{0},{1}],", item[1], item[0])); // note comma - ],
sb.Insert(0, "[").Replace(',', ']', sb.Length - 1, 1); // this removes last comma, ads closing bracket

You can use string.Join() to join them:
string result = string.Join(",", C);

Strings are inmutable. So if you do a lot of string connaction, that can leave a lot of dead strings in memory. The GC will deal with them, but it is still a performance issue. Especially on a Webserver it should be avoided. And then there are things like StringInterning too. A lot of minor optimisations that will get in the way if you do mass operations on strings.
StringBuilder is the closest we can get to a mutable string, that get's around those optimsiations (wich may be a hinderance here). The only use difference is that you use "Append" rather then "Add".

There are multiple possibilities :
RS.lineStrings.Add(string.concat("[{a[1]}" + "," + "{a[0]}]");
RS.lineStrings.Add(string.concat("[{a[1]}",",","{a[0]}]");
Documentation https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings

Here's a solution with a StringBuilder. It's wordy as all get out, but it should be much faster and produce much less garbage for a large number of items. It uses no string concatenation.
var listOfLists = new List<List<double>>
{
new List<double> {1.0, 2.0, 3.0},
new List<double> {3.14, 42.0}
};
var buffer = new StringBuilder();
buffer.Append('[');
var firstOuter = true;
foreach (var list in listOfLists)
{
var firstInner = true;
buffer.Append('[');
if (!firstOuter)
{
buffer.Append(',');
}
foreach (var item in list)
{
if (!firstInner)
{
buffer.Append(',');
}
firstInner = firstOuter = false;
buffer.Append(item.ToString());
}
buffer.Append(']');
}
buffer.Append(']');
var concatenated = buffer.ToString();

String vs. StringBuilder when editing a long string?

I have a string that I have to edit quite a lot. Length is undefined. Replace(string, string) will be the most used method for this.
What is better string.Replace("", "") or StringBuilder.Replace("", "") ?
public static string FormatMyString(string input)
{
// ...
}
(The code examples are plain and stupid. They just serve the purpose to show you what I'm doing. I always get questions: "What are you trying to do ?")

What is better string.Replace("", "") or StringBuilder.Replace("", "") ?
Neither. They both do nothing useful. In the more general case:
if you're doing one replacement, the internal-call of String.Replace should be fine
if you're doing lots of replacements, consider StringBuilder to avoid intermediate strings

IMHO String Builder is the way to go.
It has a cost for creating but would offer much better performence in general due to much more efficient string manipulation and concatenation.
As I said there is a cost so you should consider what does "edit quite a lot" mean.
Sorry I can't provide actual benchmark results right now, but I assume the threshold for using String Builder should be very low...
Hope this helps.

You just don't get any good value just "asking" about these kinds of things. You need to benchmark. Take this code for example:
var sw = Stopwatch.StartNew();
var cc0 = GC.CollectionCount(0);
var s = (string)null;
for (var i = 0; i < 10000000; i++)
{
s = "a";
s += "b";
}
var cc1 = GC.CollectionCount(0);
sw.Stop();
Console.WriteLine(
"collections == {0}, ms == {1}, string == \"{2}\"",
cc1 - cc0,
sw.ElapsedMilliseconds,
s);
Versus this code:
var sw = Stopwatch.StartNew();
var cc0 = GC.CollectionCount(0);
var sb = (StringBuilder)null;
for (var i = 0; i < 10000000; i++)
{
sb = new StringBuilder();
sb.Append("a");
sb.Append("b");
}
var cc1 = GC.CollectionCount(0);
Console.WriteLine(
"collections == {0}, ms == {1}, string == \"{2}\"",
cc1 - cc0,
sw.ElapsedMilliseconds,
sb.ToString());
The two results I get are:
collections == 63, ms == 336, string == "ab" // +=
collections == 228, ms == 692, string == "ab" // StringBuilder
The StringBuilder takes over twice as long and causes over 3.5 times more garbage collections to occur.
It's certainly the case that if I were to concatenate very long strings that StringBuilder will perform better, but I won't know that point unless I measure it.
You need to provide more detail as to what code you are running and what you mean by "better" (faster, less memory, easy to read code, etc) before we can say what is best.

.NET String performance question

Is it better, from a performance standpoint, to use "Example1"? I'm assuming that "Example2" would create a new string on the heap in each iteration while "Example1" would not...
Example1:
StringBuilder letsCount = new StringBuilder("Let's count! ");
string sep = ", ";
for(int i=; i< 100; i++)
{
letsCount.Append(i + sep);
}
Example2:
StringBuilder letsCount = new StringBuilder("Let's count! ");
for(int i=; i< 100; i++)
{
letsCount.Append(i + ", ");
}

The .NET CLR is much smarter than that. It "interns" string literals so that there is only one instance.
It's also worth noting that if you were truly concerned about string concatenation, you would want to turn the single Append call into two append calls. The reality, however, is that the overhead of two calls probably outweighs any minor concatenation cost. In either case, it's probably nearly immeasurable except in very controlled conditions.

They are identical.

Actually a much faster way to do it would be
string letsCount = "Let's count! ";
string[] numbers = new string[100];
for(int i=0; i< 100; i++)
{
numbers[i]=i+", ";
}
String.Join(letsCount, numbers);
See here

Inverse String.Replace - Faster way of doing it?

I have a method to replace every character except those I specify. For example,
ReplaceNot("test. stop; or, not", ".;/\\".ToCharArray(), '*');
would return
"****.*****;***,****".
Now, this is not an instance of premature optimization. I call this method quite a few times during a network operation. I found that on longer strings, it is causing some latency, and removing it helped a bit. Any help to speed this up would be appreciated.
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
int index = 0;
int old = -1;
StringBuilder sb = new StringBuilder(original.Length);
while ((index = original.IndexOfAny(pattern, index)) > -1)
{
sb.Append(new string(replacement, index - old - 1));
sb.Append(original[index]);
old = index++;
}
if (original.Length - old > 1)
{
sb.Append(new string(replacement, original.Length - (old + 1)));
}
return sb.ToString();
}
Final #'s. I also added a test case for a 3K character string, ran at 100K times instead of 1M to see how well each of these scales. The only surprise was that the regular expression 'scaled better' than the others, but it is no help since it is very slow to begin with:
User Short * 1M Long * 100K Scale
John 319 2125 6.66
Luke 360 2659 7.39
Guffa 409 2827 6.91
Mine 447 3372 7.54
DirkGently 1094 9134 8.35
Michael 1591 12785 8.04
Peter 21106 94386 4.47
Update: I made the creation of the regular expression for Peter's version a static variable, and set it to RegexOptions.Compiled to be fair:
User Short * 1M Long * 100K Scale
Peter 8997 74715 8.30
Pastebin link to my testing code, please correct me if it is wrong:
http://pastebin.com/f64f260ee

Can't you use Regex.Replace like so:
Regex regex = new Regex(#"[^.;/\\]");
string s = regex.Replace("test. stop; or, not", "*");

Alright, on a ~60KB string, this will perform about 40% faster than your version:
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
int index = 0;
StringBuilder sb = new StringBuilder(new string(replacement, original.Length));
while ((index = original.IndexOfAny(pattern, index)) > -1)
{
sb[index] = original[index++];
}
return sb.ToString();
}
The trick is to initialize a new string with all replacement characters, since most of them will be replaced.

I don't know if this will be any faster, but it avoids newing up strings just so they can be appended to the string builder, which may help:
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
StringBuilder sb = new StringBuilder(original.Length);
foreach (char ch in original) {
if (Array.IndexOf( pattern, ch) >= 0) {
sb.Append( ch);
}
else {
sb.Append( replacement);
}
}
return sb.ToString();
}
If the number of chars in pattern will be of any size (which I'm guessing it generally won't), it might pay to sort it and perform an Array.BinarySearch() instead of the Array.indexOf().
For such a simple transformation, I'd bet that it'll have no problem being faster than a regex, too.
Also, since your set of characters in pattern are likely to usually come from a string anyway (at least that's been my general experience with this type of API), why don't you have the method signature be:
public static string ReplaceNot(this string original, string pattern, char replacement)
or better yet, have an overload where pattern can be a char[] or string?

Here's another version for you. My tests suggest that its performance is pretty good.
public static string ReplaceNot(
this string original, char[] pattern, char replacement)
{
char[] buffer = new char[original.Length];
for (int i = 0; i < buffer.Length; i++)
{
bool replace = true;
for (int j = 0; j < pattern.Length; j++)
{
if (original[i] == pattern[j])
{
replace = false;
break;
}
}
buffer[i] = replace ? replacement : original[i];
}
return new string(buffer);
}

The StringBuilder has an overload that takes a character and a count, so you don't have to create intermediate strings to add to the StringBuilder. I get about 20% improvement by replacing this:
sb.Append(new string(replacement, index - old - 1));
with:
sb.Append(replacement, index - old - 1);
and this:
sb.Append(new string(replacement, original.Length - (old + 1)));
with:
sb.Append(replacement, original.Length - (old + 1));
(I tested the code that you said was about four times faster, and I find it about 15 times slower...)

It's going to be O(n). You seem to be replacing all alphabets and whitespaces by *, why not just test if the current character is an alphabet/whitespace and replace it?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

StringBuilder performance in C#? - c#

In the first case the compiler will construct a single string, so you'll only call Append once. However, I doubt this will make much of a difference. What did your measurements show?

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

How can I concatenate a list

String vs. StringBuilder when editing a long string?

.NET String performance question

Inverse String.Replace - Faster way of doing it?

Categories

Resources