StringBuilder.Append Vs StringBuilder.AppendFormat - c#

I was wondering about StringBuilder and I've got a question that I was hoping the community would be able to explain.
Let's just forget about code readability, which of these is faster and why?
StringBuilder.Append:
StringBuilder sb = new StringBuilder();
sb.Append(string1);
sb.Append("----");
sb.Append(string2);
StringBuilder.AppendFormat:
StringBuilder sb = new StringBuilder();
sb.AppendFormat("{0}----{1}",string1,string2);

It's impossible to say, not knowing the size of string1 and string2.
With the call to AppendFormat, it will preallocate the buffer just once given the length of the format string and the strings that will be inserted and then concatenate everything and insert it into the buffer. For very large strings, this will be advantageous over separate calls to Append which might cause the buffer to expand multiple times.
However, the three calls to Append might or might not trigger growth of the buffer and that check is performed each call. If the strings are small enough and no buffer expansion is triggered, then it will be faster than the call to AppendFormat because it won't have to parse the format string to figure out where to do the replacements.
More data is needed for a definitive answer
It should be noted that there is little discussion of using the static Concat method on the String class (Jon's answer using AppendWithCapacity reminded me of this). His test results show that to be the best case (assuming you don't have to take advantage of specific format specifier). String.Concat does the same thing in that it will predetermine the length of the strings to concatenate and preallocate the buffer (with slightly more overhead due to looping constructs through the parameters). It's performance is going to be comparable to Jon's AppendWithCapacity method.
Or, just the plain addition operator, since it compiles to a call to String.Concat anyways, with the caveat that all of the additions are in the same expression:
// One call to String.Concat.
string result = a + b + c;
NOT
// Two calls to String.Concat.
string result = a + b;
result = result + c;
For all those putting up test code
You need to run your test cases in separate runs (or at the least, perform a GC between the measuring of separate test runs). The reason for this is that if you do say, 1,000,000 runs, creating a new StringBuilder in each iteration of the loop for one test, and then you run the next test that loops the same number of times, creating an additional 1,000,000 StringBuilder instances, the GC will more than likely step in during the second test and hinder its timing.

casperOne is correct. Once you reach a certain threshold, the Append() method becomes slower than AppendFormat(). Here are the different lengths and elapsed ticks of 100,000 iterations of each method:
Length: 1
Append() - 50900
AppendFormat() - 126826
Length: 1000
Append() - 1241938
AppendFormat() - 1337396
Length: 10,000
Append() - 12482051
AppendFormat() - 12740862
Length: 20,000
Append() - 61029875
AppendFormat() - 60483914
When strings with a length near 20,000 are introduced, the AppendFormat() function will slightly outperform Append().
Why does this happen? See casperOne's answer.
Edit:
I reran each test individually under Release configuration and updated the results.

casperOne is entirely accurate that it depends on the data. However, suppose you're writing this as a class library for 3rd parties to consume - which would you use?
One option would be to get the best of both worlds - work out how much data you're actually going to have to append, and then use StringBuilder.EnsureCapacity to make sure we only need a single buffer resize.
If I weren't too bothered though, I'd use Append x3 - it seems "more likely" to be faster, as parsing the string format tokens on every call is clearly make-work.
Note that I've asked the BCL team for a sort of "cached formatter" which we could create using a format string and then re-use repeatedly. It's crazy that the framework has to parse the format string each time it's used.
EDIT: Okay, I've edited John's code somewhat for flexibility and added an "AppendWithCapacity" which just works out the necessary capacity first. Here are the results for the different lengths - for length 1 I used 1,000,000 iterations; for all other lengths I used 100,000. (This was just to get sensible running times.) All times are in millis.
Unfortunately tables don't really work in SO. The lengths were 1, 1000, 10000, 20000
Times:
Append: 162, 475, 7997, 17970
AppendFormat: 392, 499, 8541, 18993
AppendWithCapacity: 139, 189, 1558, 3085
So as it happened, I never saw AppendFormat beat Append - but I did see AppendWithCapacity win by a very substantial margin.
Here's the full code:
using System;
using System.Diagnostics;
using System.Text;
public class StringBuilderTest
{
static void Append(string string1, string string2)
{
StringBuilder sb = new StringBuilder();
sb.Append(string1);
sb.Append("----");
sb.Append(string2);
}
static void AppendWithCapacity(string string1, string string2)
{
int capacity = string1.Length + string2.Length + 4;
StringBuilder sb = new StringBuilder(capacity);
sb.Append(string1);
sb.Append("----");
sb.Append(string2);
}
static void AppendFormat(string string1, string string2)
{
StringBuilder sb = new StringBuilder();
sb.AppendFormat("{0}----{1}", string1, string2);
}
static void Main(string[] args)
{
int size = int.Parse(args[0]);
int iterations = int.Parse(args[1]);
string method = args[2];
Action<string,string> action;
switch (method)
{
case "Append": action = Append; break;
case "AppendWithCapacity": action = AppendWithCapacity; break;
case "AppendFormat": action = AppendFormat; break;
default: throw new ArgumentException();
}
string string1 = new string('x', size);
string string2 = new string('y', size);
// Make sure it's JITted
action(string1, string2);
GC.Collect();
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
action(string1, string2);
}
sw.Stop();
Console.WriteLine("Time: {0}ms", (int) sw.ElapsedMilliseconds);
}
}

Append will be faster in most cases because there are many overloads to that method that allow the compiler to call the correct method. Since you are using Strings the StringBuilder can use the String overload for Append.
AppendFormat takes a String and then an Object[] which means that the format will have to be parsed and each Object in the array will have to be ToString'd before it can be added to the StringBuilder's internal array.
Note: To casperOne's point - it is difficult to give an exact answer without more data.

StringBuilder also has cascaded appends: Append() returns the StringBuilder itself, so you can write your code like this:
StringBuilder sb = new StringBuilder();
sb.Append(string1)
.Append("----")
.Append(string2);
Clean, and it generates less IL-code (although that's really a micro-optimization).

Of course profile to know for sure in each case.
That said, I think in general it will be the former because you aren't repeatedly parsing the format string.
However, the difference would be very small. To the point that you really should consider using AppendFormat in most cases anyway.

I'd assume it was the call that did the least amount of work. Append just concatenates strings, where AppendFormat is doing string substitutions. Of course these days, you never can tell...

1 should be faster becuase it's simply appending the strings whereas 2 has to create a string based on a format and then append the string. So there's an extra step in there.

Faster is 1 in your case however it isn't a fair comparison. You should ask StringBuilder.AppendFormat() vs StringBuilder.Append(string.Format()) - where the first one is faster due to internal working with char array.
Your second option is more readable though.

Related

C# big string array to string

I have a string array of about 20,000,000 values.
And i need to convert it to a string
I've tried:
string data = "";
foreach (var i in tm)
{
data = data + i;
}
But that takes too long time
does someone know a faster way?
Try StringBuilder:
StringBuilder sb = new StringBuilder();
foreach (var i in tm)
{
sb.Append(i);
}
To get the resulting String use ToString():
string result = sb.ToString();
The answer is going to depend on the size of the output string and the amount of memory you have available and usable. The hard limit on string length appears to be 2^31-1 (int.MaxValue) characters, occupying just over 4GB of memory. Whether you can actually allocate that is dependent on your framework version, etc. If you're going to be producing a larger output then you can't put it into a single string anyway.
You've already discovered that naive concatenation is going to be tragically slow. The problem is that every pass through the loop creates a new string, then immediately discards it on the next iteration. This is going to fill up memory pretty quickly, forcing the Garbage Collector to work overtime finding old strings to clear out of memory, not to mention the amount of memory fragmentation and all that stuff that modern programmers don't pay much attention to.
A StringBuiler, is a reasonable solution. Internally it allocates blocks of characters that it then stitches together at the end using pointers and memory copies. Saves a lot of hassles that way and is quite speedy.
As for String.Join... it uses a StringBuilder. So does String.Concat although it is certainly quicker when not inserting separator characters.
For simplicity I would use String.Concat and be done with it.
But then I'm not much for simplicity.
Here's an untested and possibly horribly slow answer using LINQ. When I get time I'll test it and see how it performs, but for now:
string result = new String(lines.SelectMany(l => (IEnumerable<char>)l).ToArray());
Obviously there is a potential overflow here since the ToArray call can potentially create an array larger than the String constructor can handle. Try it out and see if it's as quick as String.Concat.
So you can do it in LINQ, like such.
string data = tm.Aggregate("", (current, i) => current + i);
Or you can use the string.Join function
string data = string.Join("", tm);
Cant check it right now but I'm curious on how this option would perform:
var data = String.Join(string.Empty, tm);
Is Join optimized and ignores concatenation a with String.Empty?
For this big data unfortunately memory based methods will fail and this will be a real headache for GC. For this operation create a file and put every string in it. Like this:
using (StreamWriter sw = new StreamWriter("some_file_to_write.txt")){
for (int i=0; i<tm.Length;i++)
sw.Write(tm[i]);
}
Try to avoid using "var" on this performance demanding approach. Correction: "var" does not effect perfomance. "dynamic" does.

Insert a carriage return every 64 characters of a string

I have a Base64 encoded string like this :
SWwgw6l0YWl0IHVuIHBldGl0IG5hdmlyZS [...] 0IG5hdmlyZSA=
The input String can big large (> 1MB). And for interoperability reasons, I need to add a carriage return into that large string every 64 characters.
The first guess I had was to use a stringbuilder and use the method "AppendLine" every 64 characters like this :
string InputB64_Without_CRLF = "SWwgw6l0YWl0IHVuIHBldGl0IG5hdmlyZS [...] 0IG5hdmlyZSA=";
int BufferSize = 64;
int Index = 0;
StringBuilder sb = new StringBuilder();
while (Index < strInput.Length) {
sb.AppendLine(InputB64_Without_CRLF.Substring(Index, BufferSize));
Index += BufferSize;
}
string Output_With_CRLF = sb.ToString();
But I'm worried about the performance of that portion of code. Is there a better means to insert a character into a string at a certain position without rebuilding another string ?
Is there a better means to insert a character into a string at a certain position without rebuilding another string?
.NET strings are immutable, which means that they cannot be modified once they have been created.
Therefore, if you want to insert characters into a string, there is no other way but to create a new one. And StringBuilder is quite probably the most efficient way to go about this, because it allows you to perform as many string-building steps as needed, and only create one single new string in the end.
Unless you've actually noticed performance problems in a real-world scenario, keep your current solution. It looks fine to me, at least from a performance point of view.
Some further fine points to consider:
If you're still not happy with your solution, I can think of only a few minor things that might make your current solution more efficient:
Declare the StringBuilders required capacity up-front, so that its backing character buffer won't have to be resized:
var additionalCharactersCount = Environment.NewLine.Length * (input.Length / 64);
var sb = new StringBuilder(capacity: input.Length + additionalCharactersCount);
Insert the complete input string into the StringBuilder first, then repeatedly .Insert(…, Environment.NewLine) every 64 characters.
I am not at all certain whether this would actually improve execution speed, but it would get rid of the repeated string creation caused by .Substring. Measure for yourself whether it's faster than your solution or not.
Your code is not inefficient, trying to save 100ms or less is usually not worth the effort. But if you are concerned, here is another slightly more efficient way to insert a new line(which is sometimes\r\n, not just\n) every 64 characters
string Output_With_CRLF = InputB64_Without_CRLF;
//Start at last index so that our new line inserts do not move the text, making sure to input every 64th of the original string
//This looks stupid to divide and multiply again, but it works because it is integer division
StringBuilder sb = new StringBuilder(InputB64_Without_CRLF);
for (int i = (InputB64_Without_CRLF.Length / 64) * 64; i >= 64; i -= 64)
sb.Insert(i, Environment.NewLine);
This will only be a tiny bit more efficient than your original code, you likely won't notice much difference.
After talking with stakx i had this idea. By using the StringBuilder you do not create many strings over and over. The StringBuilder is very efficient and will handle its insert without creating more objects.

Better option for String Manipulation - .NET

I'm working with huge string data for a project in C#. I'm confused about which approach should I use to manipulate my string data.
First Approach:
StringBuilder myString = new StringBuilder().Append(' ', 1024);
while(someString[++counter] != someChar)
myString[i++] += someString[counter];
Second Approach:
String myString = new String();
int i = counter;
while(soumeString[++counter] != someChar);
myString = someString.SubString(i, counter - i);
Which one of the two would be more fast(and efficient)? Considering the strings I'm working with are huge.
The strings are already in the RAM.
The size of the string can vary from 32MB-1GB.
You should use IndexOf rather than doing individual character manipulations in a loop, and add whole chunks of string to the result:
StringBuilder myString = new StringBuilder();
int pos = someString.IndexOf(someChar, counter);
myString.Append(someString.SubString(counter, pos));
For "huge" strings, it may make sense to take a streamed approach and not load the whole thing into memory. For the best raw performance, you can sometimes squeeze a little more speed out by using pointer math to search and capture pieces of strings.
To be clear, I'm stating two completely different approaches.
1 - Stream
The OP doesn't say how big these strings are, but it may be impractical to load them into memory. Perhaps they are being read from a file, from a data reader connected to a DB, from an active network connection, etc.
In this scenario, I would open a stream, read forward, buffering my input in a StringBuilder until the criteria was met.
2 - Unsafe Char Manipulation
This requires that you do have the complete string. You can obtain a char* to the start of a string quite simply:
// fix entire string in memory so that we can work w/ memory range safely
fixed( char* pStart = bigString )
{
char* pChar = pStart; // unfixed pointer to start of string
char* pEnd = pStart + bigString.Length;
}
You can now increment pChar and examine each character. You can buffer it (e.g. if you want to examine multiple adjacent characters) or not as you choose. Once you determine the ending memory location, you now have a range of data that you can work with.
Unsafe Code and Pointers in c#
2.1 - A Safer Approach
If you are familiar with unsafe code, it is very fast, expressive, and flexible. If not, I would still use a similar approach, but without the pointer math. This is similar to the approach which #supercat suggested, namely:
Get a char[].
Read through it character by character.
Buffer where needed. StringBuilder is good for this; set an initial size and reuse the instance.
Analyze buffer where needed.
Dump buffer often.
Do something with the buffer when it contains the desired match.
And an obligatory disclaimer for unsafe code: The vast majority of the time the framework methods are a better solution. They are safe, tested, and invoked millions of times per second. Unsafe code puts all of the responsibility on the developer. It does not make any assumptions; it's up to you to be a good framework/OS citizen (e.g. not overwriting immutable strings, allowing buffer overruns, etc.). Because it does not make any assumptions and removes the safeguards, it will often yield a performance increase. It's up to the developer to determine if there is indeed a benefit, and to decide if the advantages are significant enough.
Per request from OP, here are my test results.
Assumptions:
Big string is already in memory, no requirement for reading from disk
Goal is to not use any native pointers/unsafe blocks
The "checking" process is simple enough that something like Regex is not needed. For now simplifying to a single char comparison. The below code can easily be modified to consider multiple chars at once, this should have no effect on the relative performance of the two approaches.
public static void Main()
{
string bigStr = GenString(100 * 1024 * 1024);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10; i++)
{
int counter = -1;
StringBuilder sb = new StringBuilder();
while (bigStr[++counter] != 'x')
sb.Append(bigStr[counter]);
Console.WriteLine(sb.ToString().Length);
}
sw.Stop();
Console.WriteLine("StringBuilder: {0}", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < 10; i++)
{
int counter = -1;
while (bigStr[++counter] != 'x') ;
Console.WriteLine(bigStr.Substring(0, counter).Length);
}
sw.Stop();
Console.WriteLine("Substring: {0}", sw.Elapsed.TotalSeconds);
}
public static string GenString(int size)
{
StringBuilder sb = new StringBuilder(size);
for (int i = 0; i < size - 1; i++)
{
sb.Append('a');
}
sb.Append('x');
return sb.ToString();
}
Results (release build, .NET 4):
StringBuilder ~7.9 sec
Substring ~1.9 sec
StringBuilder was consistently > 3x slower, with a variety of different sized strings.
There's an IndexOf operation which would search more quickly for someChar, but I'll assume your real function to find the desired length is more complicated than that. In that scenario, I would recommend copying someString to a Char[], doing the search, and then using the new String(Char[], Int32, Int32) constructor to produce the final string. Indexing a Char[] is going to be so much more efficient than indexing an String or StringBuilder that unless you expect that you'll typically be needing only a small fraction of the string, copying everything to the Char[] will be a 'win' (unless, of course, you could simply use something like IndexOf).
Even if the length of the string will often be much larger than the length of interest, you may still be best off using a Char[]. Pre-initialize the Char[] to some size, and then do something like:
Char[] temp = new Char[1024];
int i=0;
while (i < theString.Length)
{
int subLength = theString.Length - i;
if (subLength > temp.Length) // May impose other constraints on subLength, provided
subLength = temp.Length; // it's greater than zero.
theString.CopyTo(i, temp, 0, subLength);
... do stuff with the array
i+=subLength;
}
Once you're all done, you may then use a single SubString call to construct a string with the necessary characters from the original. If your application requires buinding a string whose characters differ from the original, you could use a StringBuilder and, within the above loop, use the Append(Char[], Int32, Int32) method to add processed characters to it.
Note also that when the above loop construct, one may decide to reduce subLength at any point in the loop provided it is not reduced to zero. For example, if one is trying to find whether the string contains a prime number of sixteen or fewer digits enclosed by parentheses, one could start by scanning for an open-paren; if one finds it and it's possible that the data one is looking for might extend beyond the array, set subLength to the position of the open-paren, and reloop. Such an approach will result in a small amount of redundant copying, but not much (often none), and will eliminate the need to keep track of parsing state between loops. A very convenient pattern.
You always want to use StringBuilder when manipulating strings. This is becwuse strings are immutable, so every time a new object needs to be created.

String Concatenation Vs String Builder Append

So...I have this scenario where I have a Foreach loop that loops through a List of Checkboxes to check which are selected. For every selected checkbox, I have to do a pretty long string concatenation, involving 30 different strings of an average length of 20 characters, and then send it out as a HTTP request. 2 of the strings are dependant on the index/value of the checkbox selected.
The length of the List of Checkboxes is also variable depending upon the user's data. I would say the average length of the List would be 20, but it can go up to 50-60. So the worst case scenario would be performing the whole string concatenation 60 or so times.
For now I'm doing it with simple string concatenation via the '+' operator, but I'm wondering if it would be faster to do it with Stringbuilder. Of course, that means I'd have to either create a Stringbuilder object within the loop, or create it before the loop and call Stringbuilder.Remove at the end of it after sending out the HTTP request.
I appreciate any insights anybody can share regarding this issue.
EDIT
Thanks for all the replies everybody, so from what I've gathered, the best way for me to go about doing this would be something like:
StringBuilder sb = new StringBuilder();
foreach (CheckBox item in FriendCheckboxList)
{
if (item.Checked)
{
sb.Append(string1);
sb.Append(string2);
sb.Append(string3);
.
.
.
sb.Append(stringLast);
SendRequest(sb.ToString());
sb.Length = 0;
}
}
Use StringBuilder. That's what it's for.
Strings are immutable. String concatenation creates a new string, needing more memory, and is generally considered slow:
string a = "John" + " " + "Saunders";
This creates a string "John ", then creates another string "John Saunders", then finally, assigns that to "a". The "John " is left for garbage collection.
string a = "John";
a += " ";
a += "Saunders";
This is about the same, as "John" is replaced by a new string "John ", which is replaced by a new string "John Saunders". The originals are left to be garbage collected.
On the other hand, StringBuilder is designed to be appended, removed, etc.
Example:
StringBuilder sb = new StringBuilder();
for (int i=0; i<n; i++)
{
sb.Length = 0;
sb.Append(field1[i]);
sb.Append(field2[i]);
...
sb.Append(field30[i]);
// Do something with sb.ToString();
}
This topic has been analysed to death over the years. The end result is that if you are doing a small, known number of concatenations, use '+', otherwise use stringbuilder. From what you've said, concatenate with '+' should be faster. There are a gazillion (give or take) sites out there analysing this - google it.
For the size of string you are talking about, it's negligible anyway.
EDIT: on second thought, SB is probably faster. But like I said, who cares?
I know this has been answered, but I wanted to point out that I actually think the "blindly accepted as gospel" approach of always using StringBuilder is sometimes wrong, and this is one case.
For background, see this blog entry: http://geekswithblogs.net/johnsperfblog/archive/2005/05/27/40777.aspx
The short of it is, for this particular case, as described, you will see better performance by avoiding StringBuilder and making use of the + operator thusly:
foreach (CheckBox item in FriendCheckboxList)
{
if (item.Checked)
{
string request = string1 +
string2 +
string3 +
.
.
.
stringLast;
SendRequest(request);
}
}
The reason is that the C# compiler (as of .NET 1.1), will convert that statement into a single IL call to String.Concat passing an array of Strings as an argument. The blog entry does an excellent job outlining the implementation details of String.Concat, but suffice to say, it is extremely efficient for this case.
If your asking this question, chances are you should use StringBuilder for many reasons, but i'll provide two.
When you use string concatenation it has to allocate a new buffer and and copy the data in the other string into the new string variable. So you are going to incur many repeated allocations. Which in the end ends up fragmenting the memory, using up heap space, and making more work for the Garbage collector.
The StringBuilder on the other hand pre-allocates a buffer and as you add strings to it doesn't need to keep re-allocating (assuming initial buffer is large enough). Which increases performance and is far less taxing on memory.
As developers we should try to anticipate future growth. Let's say that your list grows substantially over time and then all of a sudden starts performing slowly. If you can prevent this with little effort now, why wouldn't you do it?
In general I would recommend to use a StringBuilder.
Have you tested this and checked the performance? Is the performance an issue vs how long it will take you to rewrite the code?

String vs. StringBuilder

I understand the difference between String and StringBuilder (StringBuilder being mutable) but is there a large performance difference between the two?
The program I’m working on has a lot of case driven string appends (500+). Is using StringBuilder a better choice?
Yes, the performance difference is significant. See the KB article "How to improve string concatenation performance in Visual C#".
I have always tried to code for clarity first, and then optimize for performance later. That's much easier than doing it the other way around! However, having seen the enormous performance difference in my applications between the two, I now think about it a little more carefully.
Luckily, it's relatively straightforward to run performance analysis on your code to see where you're spending the time, and then to modify it to use StringBuilder where needed.
To clarify what Gillian said about 4 string, if you have something like this:
string a,b,c,d;
a = b + c + d;
then it would be faster using strings and the plus operator. This is because (like Java, as Eric points out), it internally uses StringBuilder automatically (Actually, it uses a primitive that StringBuilder also uses)
However, if what you are doing is closer to:
string a,b,c,d;
a = a + b;
a = a + c;
a = a + d;
Then you need to explicitly use a StringBuilder. .Net doesn't automatically create a StringBuilder here, because it would be pointless. At the end of each line, "a" has to be an (immutable) string, so it would have to create and dispose a StringBuilder on each line. For speed, you'd need to use the same StringBuilder until you're done building:
string a,b,c,d;
StringBuilder e = new StringBuilder();
e.Append(b);
e.Append(c);
e.Append(d);
a = e.ToString();
StringBuilder is preferable IF you are doing multiple loops, or forks in your code pass... however, for PURE performance, if you can get away with a SINGLE string declaration, then that is much more performant.
For example:
string myString = "Some stuff" + var1 + " more stuff"
+ var2 + " other stuff" .... etc... etc...;
is more performant than
StringBuilder sb = new StringBuilder();
sb.Append("Some Stuff");
sb.Append(var1);
sb.Append(" more stuff");
sb.Append(var2);
sb.Append("other stuff");
// etc.. etc.. etc..
In this case, StringBuild could be considered more maintainable, but is not more performant than the single string declaration.
9 times out of 10 though... use the string builder.
On a side note: string + var is also more performant that the string.Format approach (generally) that uses a StringBuilder internally (when in doubt... check reflector!)
A simple example to demonstrate the difference in speed when using String concatenation vs StringBuilder:
System.Diagnostics.Stopwatch time = new Stopwatch();
string test = string.Empty;
time.Start();
for (int i = 0; i < 100000; i++)
{
test += i;
}
time.Stop();
System.Console.WriteLine("Using String concatenation: " + time.ElapsedMilliseconds + " milliseconds");
Result:
Using String concatenation: 15423 milliseconds
StringBuilder test1 = new StringBuilder();
time.Reset();
time.Start();
for (int i = 0; i < 100000; i++)
{
test1.Append(i);
}
time.Stop();
System.Console.WriteLine("Using StringBuilder: " + time.ElapsedMilliseconds + " milliseconds");
Result:
Using StringBuilder: 10 milliseconds
As a result, the first iteration took 15423 ms while the second iteration using StringBuilder took 10 ms.
It looks to me that using StringBuilder is faster, a lot faster.
This benchmark shows that regular concatenation is faster when combining 3 or fewer strings.
http://www.chinhdo.com/20070224/stringbuilder-is-not-always-faster/
StringBuilder can make a very significant improvement in memory usage, especially in your case of adding 500 strings together.
Consider the following example:
string buffer = "The numbers are: ";
for( int i = 0; i < 5; i++)
{
buffer += i.ToString();
}
return buffer;
What happens in memory? The following strings are created:
1 - "The numbers are: "
2 - "0"
3 - "The numbers are: 0"
4 - "1"
5 - "The numbers are: 01"
6 - "2"
7 - "The numbers are: 012"
8 - "3"
9 - "The numbers are: 0123"
10 - "4"
11 - "The numbers are: 01234"
12 - "5"
13 - "The numbers are: 012345"
By adding those five numbers to the end of the string we created 13 string objects! And 12 of them were useless! Wow!
StringBuilder fixes this problem. It is not a "mutable string" as we often hear (all strings in .NET are immutable). It works by keeping an internal buffer, an array of char. Calling Append() or AppendLine() adds the string to the empty space at the end of the char array; if the array is too small, it creates a new, larger array, and copies the buffer there. So in the example above, StringBuilder might only need a single array to contain all 5 additions to the string-- depending on the size of its buffer. You can tell StringBuilder how big its buffer should be in the constructor.
String Vs String Builder:
First thing you have to know that In which assembly these two classes lives?
So,
string is present in System namespace.
and
StringBuilder is present in System.Text namespace.
For string declaration:
You have to include the System namespace.
something like this.
Using System;
and
For StringBuilder declaration:
You have to include the System.text namespace.
something like this.
Using System.text;
Now Come the the actual Question.
What is the differene between string & StringBuilder?
The main difference between these two is that:
string is immutable.
and
StringBuilder is mutable.
So Now lets discuss the difference between immutable and mutable
Mutable: : means Changable.
Immutable: : means Not Changable.
For example:
using System;
namespace StringVsStrigBuilder
{
class Program
{
static void Main(string[] args)
{
// String Example
string name = "Rehan";
name = name + "Shah";
name = name + "RS";
name = name + "---";
name = name + "I love to write programs.";
// Now when I run this program this output will be look like this.
// output : "Rehan Shah RS --- I love to write programs."
}
}
}
So in this case we are going to changing same object 5-times.
So the Obvious question is that ! What is actually happen under the hood, when we change the same string 5-times.
This is What Happen when we change the same string 5-times.
let look at the figure.
Explaination:
When we first initialize this variable "name" to "Rehan" i-e string name = "Rehan"
this variable get created on stack "name" and pointing to that "Rehan" value.
after this line is executed: "name = name + "Shah". the reference variable is no longer pointing to that object "Rehan" it now pointing to "Shah" and so on.
So string is immutable meaning that once we create the object in the memory we can't change them.
So when we concatinating the name variable the previous object remains there in the memory and another new string object is get created...
So from the above figure we have five-objects the four-objects are thrown away they are not used at all. They stil remain in memory and they occuy the amount of memory.
"Garbage Collector" is responsible for that so clean that resources from the memory.
So in case of string anytime when we manipulate the string over and over again we have some many objects Created ans stay there at in the memory.
So this is the story of string Variable.
Now Let's look at toward StringBuilder Object.
For Example:
using System;
using System.Text;
namespace StringVsStrigBuilder
{
class Program
{
static void Main(string[] args)
{
// StringBuilder Example
StringBuilder name = new StringBuilder();
name.Append("Rehan");
name.Append("Shah");
name.Append("RS");
name.Append("---");
name.Append("I love to write programs.");
// Now when I run this program this output will be look like this.
// output : "Rehan Shah Rs --- I love to write programs."
}
}
}
So in this case we are going to changing same object 5-times.
So the Obvious question is that ! What is actually happen under the hood, when we change the same StringBuilder 5-times.
This is What Happen when we change the same StringBuilder 5-times.
let look at the figure.
Explaination:
In case of StringBuilder object. you wouldn't get the new object. The same object will be change in memory so even if you change the object et say 10,000 times we will still have only one stringBuilder object.
You don't have alot of garbage objects or non_referenced stringBuilder objects because why it can be change. It is mutable meaning it change over a time?
Differences:
String is present in System namespace where as Stringbuilder present
in System.Text namespace.
string is immutable where as StringBuilder is mutabe.
Yes, StringBuilder gives better performance while performing repeated operation over a string. It is because all the changes are made to a single instance so it can save a lot of time instead of creating a new instance like String.
String Vs Stringbuilder
String
under System namespace
immutable (read-only) instance
performance degrades when continuous change of value occures
thread safe
StringBuilder (mutable string)
under System.Text namespace
mutable instance
shows better performance since new changes are made to existing instance
Strongly recommend dotnet mob article : String Vs StringBuilder in C#.
Related Stack Overflow question: Mutability of string when string
doesn't change in C#?.
StringBuilder reduces the number of allocations and assignments, at a cost of extra memory used. Used properly, it can completely remove the need for the compiler to allocate larger and larger strings over and over until the result is found.
string result = "";
for(int i = 0; i != N; ++i)
{
result = result + i.ToString(); // allocates a new string, then assigns it to result, which gets repeated N times
}
vs.
String result;
StringBuilder sb = new StringBuilder(10000); // create a buffer of 10k
for(int i = 0; i != N; ++i)
{
sb.Append(i.ToString()); // fill the buffer, resizing if it overflows the buffer
}
result = sb.ToString(); // assigns once
The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs. A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated. In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input.
Source: MSDN
Consider 'The Sad Tragedy of Micro-Optimization Theater'.
StringBuilder is better for building up a string from many non-constant values.
If you're building up a string from a lot of constant values, such as multiple lines of values in an HTML or XML document or other chunks of text, you can get away with just appending to the same string, because almost all compilers do "constant folding", a process of reducing the parse tree when you have a bunch of constant manipulation (it's also used when you write something like int minutesPerYear = 24 * 365 * 60). And for simple cases with non-constant values appended to each other, the .NET compiler will reduce your code to something similar to what StringBuilder does.
But when your append can't be reduced to something simpler by the compiler, you'll want a StringBuilder. As fizch points out, that's more likely to happen inside of a loop.
Further to the previous answers, the first thing I always do when thinking of issues like this is to create a small test application. Inside this app, perform some timing test for both scenarios and see for yourself which is quicker.
IMHO, appending 500+ string entries should definitely use StringBuilder.
I believe StringBuilder is faster if you have more than 4 strings you need to append together. Plus it can do some cool things like AppendLine.
In .NET, StringBuilder is still faster than appending strings. I'm pretty sure that in Java, they just create a StringBuffer under the hood when you append strings, so there's isn't really a difference. I'm not sure why they haven't done this in .NET yet.
StringBuilder is significantly more efficient but you will not see that performance unless you are doing a large amount of string modification.
Below is a quick chunk of code to give an example of the performance. As you can see you really only start to see a major performance increase when you get into large iterations.
As you can see the 200,000 iterations took 22 seconds while the 1 million iterations using the StringBuilder was almost instant.
string s = string.Empty;
StringBuilder sb = new StringBuilder();
Console.WriteLine("Beginning String + at " + DateTime.Now.ToString());
for (int i = 0; i <= 50000; i++)
{
s = s + 'A';
}
Console.WriteLine("Finished String + at " + DateTime.Now.ToString());
Console.WriteLine();
Console.WriteLine("Beginning String + at " + DateTime.Now.ToString());
for (int i = 0; i <= 200000; i++)
{
s = s + 'A';
}
Console.WriteLine("Finished String + at " + DateTime.Now.ToString());
Console.WriteLine();
Console.WriteLine("Beginning Sb append at " + DateTime.Now.ToString());
for (int i = 0; i <= 1000000; i++)
{
sb.Append("A");
}
Console.WriteLine("Finished Sb append at " + DateTime.Now.ToString());
Console.ReadLine();
Result of the above code:
Beginning String + at 28/01/2013 16:55:40.
Finished String + at 28/01/2013 16:55:40.
Beginning String + at 28/01/2013 16:55:40.
Finished String + at 28/01/2013 16:56:02.
Beginning Sb append at 28/01/2013 16:56:02.
Finished Sb append at 28/01/2013 16:56:02.
Using strings for concatenation can lead to a runtime complexity on the order of O(n^2).
If you use a StringBuilder, there is a lot less copying of memory that has to be done. With the StringBuilder(int capacity) you can increase performance if you can estimate how large the final String is going to be. Even if you're not precise, you'll probably only have to grow the capacity of StringBuilder a couple of times which can help performance also.
I have seen significant performance gains from using the EnsureCapacity(int capacity) method call on an instance of StringBuilder before using it for any string storage. I usually call that on the line of code after instantiation. It has the same effect as if you instantiate the StringBuilder like this:
var sb = new StringBuilder(int capacity);
This call allocates needed memory ahead of time, which causes fewer memory allocations during multiple Append() operations. You have to make an educated guess on how much memory you will need, but for most applications this should not be too difficult. I usually err on the side of a little too much memory (we are talking 1k or so).
If you're doing a lot of string concatenation, use a StringBuilder. When you concatenate with a String, you create a new String each time, using up more memory.
Alex
String and StringBuilder are actually both immutable, the StringBuilder has built in buffers which allow its size to be managed more efficiently. When the StringBuilder needs to resize is when it is re-allocated on the heap. By default it is sized to 16 characters, you can set this in the constructor.
eg.
StringBuilder sb = new StringBuilder(50);
String concatenation will cost you more.
In Java, You can use either StringBuffer or StringBuilder based on your need.
If you want a synchronized, and thread safe implementation, go for StringBuffer. This will be faster than the String concatenation.
If you do not need synchronized or Thread safe implementation, go for StringBuilder.
This will be faster than String concatenation and also faster than StringBuffer as their is no synchorization overhead.
My approach has always been to use StringBuilder when concatenating 4 or more strings
OR
When I don't know how may concatenations are to take place.
Good performance related article on it here
StringBuilder will perform better, from a memory stand point. As for processing, the difference in time of execution may be negligible.
StringBuilder is probably preferable. The reason is that it allocates more space than currently needed (you set the number of characters) to leave room for future appends. Then those future appends that fit in the current buffer don't require any memory allocation or garbage collection, which can be expensive. In general, I use StringBuilder for complex string concatentation or multiple formatting, then convert to a normal String when the data is complete, and I want an immutable object again.
As a general rule of thumb, if I have to set the value of the string more than once, or if there are any appends to the string, then it needs to be a string builder. I have seen applications that I have written in the past before learning about string builders that have had a huge memory foot print that just seems to keep growing and growing. Changing these programs to use the string builder cut down the memory usage significantly. Now I swear by the string builder.

Categories