I'm testing the efficiency of an extension method to see which permutation would be the fastest in terms of processing time. Memory consumption isn't an issue at this point..
I've created a small console app to generate an array of of random strings, which then has the extension methods applied to it. I'm currently using the StopWatch class to measure the time taken to run the extension methods. I then average to total time of each method over a number of iterations.
I'm not excluding highest or lowest results at this point.
Extension Methods being tested:
public static String ToString1(this String[] s) {
StringBuilder sb = new StringBuilder();
foreach (String item in s) {
sb.AppendLine(item);
}
return sb.ToString();
}
public static String ToString2(this String[] s) {
return String.Join("\n", s);
}
Program.cs
static void Main(string[] args)
{
long s1Total = 0;
long s2Total = 0;
double s1Avg = 0;
double s2Avg = 0;
int iteration = 1;
int size = 100000;
while (iteration <= 25)
{
Console.WriteLine("Iteration: {0}", iteration);
Test(ref s1Total, ref s2Total, ref iteration, size);
}
s1Avg = s1Total / iteration;
s2Avg = s2Total / iteration;
Console.WriteLine("Version\t\tTotal\t\tAvg");
Console.WriteLine("StringBuilder\t\t{0}\t\t{1}",s1Total, s1Avg);
Console.WriteLine("String.Join:\t\t{0}\t\t{1}",s2Total, s2Avg);
Console.WriteLine("Press any key..");
Console.ReadKey();
}
private static void Test(ref long s1Total, ref long s2Total, ref int iteration, int size)
{
String[] data = new String[size];
Random r = new Random();
for (int i = 0; i < size; i++)
{
data[i] = r.NextString(50);
}
Stopwatch s = new Stopwatch();
s.Start();
data.ToString1();
s.Stop();
s1Total += s.ElapsedTicks;
s.Reset();
s.Start();
data.ToString2();
s.Stop();
s2Total += s.ElapsedTicks;
iteration++;
}
Other extensions methods used in the above code for completeness..
Random extension:
public static String NextString(this Random r,int size)
{
return NextString(r,size,false);
}
public static String NextString(this Random r,int size, bool lowerCase)
{
StringBuilder sb = new StringBuilder();
char c;
for (int i = 0; i < size; i++)
{
c = Convert.ToChar(Convert.ToInt32(Math.Floor(26*r.NextDouble() + 65)));
sb.Append(c);
}
if (lowerCase) {
return sb.ToString().ToLower();
}
return sb.ToString();
}
Running the above code, my results indicate that the StringBuilder based method is faster than String.Join based method.
My Questions:
Is this the right way to be performing this type of measurement..
Is there a better way of doing this?
Are my results in this instance correct, and if so is using a StringBuilder actually faster than String.Join in this situation?
Thanks.
Next time when you want to compare the performance, you can take a look at the source code via reflector. You can easily find that string.Join is using StringBuilder to construct the string. So they have slight performance difference.
I got
StringBuilder 3428567 131867
String.Join: 1245078 47887
Note that ToString1 adds an extra newline.
Also, you can improve it by setting the StringBuilder's capacity.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I've created a windows-forms-based small application to generate random unique alphanumeric strings with length=8. Application is working fine with small count but it got stuck for like forever when I try to generate 40 million (as per my requirement) strings. Please help me to make it efficient so that the strings could be generated quickly.
following is the complete code I've used for it.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Threading.Tasks;
namespace RandomeString
{
public partial class Form1 : Form
{
private const string Letters = "abcdefghijklmnpqrstuvwxyz";
private readonly char[] alphanumeric = (Letters + Letters.ToLower() + "abcdefghijklmnpqrstuvwxyz0123456789").ToCharArray();
private static Random random = new Random();
private int _ticks;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
if (string.IsNullOrEmpty(textBox1.Text) || string.IsNullOrWhiteSpace(textBox2.Text))
{
string message = "Please provide required length and numbers of strings count.";
string title = "Input Missing";
MessageBoxButtons buttons = MessageBoxButtons.OK;
DialogResult result = MessageBox.Show(message, title, buttons, MessageBoxIcon.Warning);
}
else
{
int ValuesCount;
ValuesCount = Convert.ToInt32(textBox2.Text);
for (int i = 1; i <= ValuesCount; i++)
{
listBox1.Items.Add(RandomString(Convert.ToInt32(textBox1.Text)));
}
}
}
public static string RandomString(int length)
{
const string chars = "abcdefghijklmnpqrstuvwxyz0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private void button2_Click(object sender, EventArgs e)
{
try
{
StringBuilder sb = new StringBuilder();
foreach (object row in listBox1.Items)
{
sb.Append(row.ToString());
sb.AppendLine();
}
sb.Remove(sb.Length - 1, 1); // Just to avoid copying last empty row
Clipboard.SetData(System.Windows.Forms.DataFormats.Text, sb.ToString());
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
private void timer1_Tick(object sender, EventArgs e)
{
_ticks++;
this.Text = _ticks.ToString();
}
}
}
One way to speed things up is to avoid LINQ. For example, take a look at these two implementations:
public static string LinqStuff(int length)
{
const string chars = "abcdefghijklmnpqrstuvwxyz0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
public static string ManualStuff(int length)
{
const string chars = "abcdefghijklmnpqrstuvwxyz0123456789";
const int clength = 35;
var buffer = new char[length];
for(var i = 0; i < length; ++i)
{
buffer[i] = chars[random.Next(clength)];
}
return new string(buffer);
}
Running it through this:
private void TestThis(long iterations)
{
Console.WriteLine($"Running {iterations} iterations...");
var sw = new Stopwatch();
sw.Start();
for (long i = 0; i < iterations; ++i)
{
LinqStuff(20);
}
sw.Stop();
Console.WriteLine($"LINQ took {sw.ElapsedMilliseconds} ms.");
sw.Reset();
sw.Start();
for (long i = 0; i < iterations; ++i)
{
ManualStuff(20);
}
sw.Stop();
Console.WriteLine($"Manual took {sw.ElapsedMilliseconds} ms.");
}
With this:
TestThis(50_000_000);
Yielded these results:
LINQ took 28272 ms.
Manual took 9449 ms.
So by using LINQ, you increase the time it takes to generate strings by 3 times.
You could tweak this more and squeeze out a few more seconds, probably (for example, send in the same char[] buffer to each call)
Don't use linq
pre-allocate the memory
don't put it in to a UI control
use as many cores and threads as you can.
use direct memory.
Write the results to a file, instead of using the clipboard
This could likely be done quicker and even more efficiently, see notes. However, I can generate the chars in under 200ms
Note : Span<T> would give better results, however due to the lamdas it's just easier to take the small hit from fixed and use pointers
private const string Chars = "abcdefghijklmnpqrstuvwxyz0123456789";
private static readonly ThreadLocal<Random> _random =
new ThreadLocal<Random>(() => new Random());
public static unsafe void Do(byte[] array, int index)
{
var r = _random.Value;
fixed (byte* pArray = array)
{
var pLen = pArray + ((index + 1) * 1000000);
int i = 1;
for (var p = pArray + (index * 1000000); p < pLen; p++ ,i++)
if ((i % 9) == 0) *p = (byte)'\r';
else if ((i % 10) == 0) *p = (byte)'\n';
else *p = (byte)Chars[r.Next(35)];
}
}
public static async Task Main(string[] args)
{
var array = new byte[40000000 * ( 8 + 2)];
var sw = Stopwatch.StartNew();
Parallel.For(0, 39, (index) => Do(array, index));
Console.WriteLine(sw.Elapsed);
sw = Stopwatch.StartNew();
await using (var fs = new FileStream(#"D:\asdasd.txt", FileMode.Create,FileAccess.Write,FileShare.None, 1024*1024,FileOptions.Asynchronous|FileOptions.SequentialScan))
await fs.WriteAsync(array,0, array.Length);
Console.WriteLine(sw.Elapsed);
}
Output
00:00:00.1768141
00:00:00.4369418
Note 1 : I haven't really put much thought into this apart from the raw generation, obviously there are other considerations.
Note 2 : Also this will end up on the large object heap, so buyer beware. You would need to generate them straight to file so save this from ending up on the LOB
Note 3 : I give no guarantees about the random distribution, likely a different random number generator would be better overall
Note 4 : I used 40 index's because the math was easy, you would get slightly better results if you could match your threads to cores
I am doing this in this way but its remove string previous characters, its out put is (Magic,Agic,Gic,Ic,C) but I want the whole string to be concate before and after.
public string[] Transform(string st)
{
string[] arr = new string[st.Length];
string[] arr1 = new string[st.Length];
for (int x = 0; x < st.Length; x++)
{
arr1[x] = char.ToLower(st[x]) + "".ToString();
}
for (int i = 0; i < st.Length; i++)
{
string st1 = "";
{
st1 = char.ToUpper(st[i]) + st.Substring(i + 1);
}
arr[i] = st1;
}
return arr;
}
You can do this with a single loop:
public static string[] Transform(string str)
{
var strs = new List<string>();
var sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
sb.Clear();
sb.Append(str);
sb[i] = char.ToUpper(str[i]);
strs.Add(sb.ToString());
}
return strs.ToArray();
}
What this does is adds the str to a StringBuilder and then modifies the indexed character with the upper case version of that character. For example, the input abcde will give:
Abcde
aBcde
abCde
abcDe
abcdE
Try it out on DotNetFiddle
If you wanted to get really fancy I'm sure there is some convoluted LINQ that can do the same, but this gives you a basic framework for how it can work.
You forgot to add left part of the string. Try to do like this:
st1 = st.ToLower().Substring + char.ToUpper(st[i]) + st.Substring(i + 1);
Here. This is twice as fast as the method that uses a string builder and a List
public static string[] Transform(string str)
{
var strs = new string [str.Length];
var sb = str.ToCharArray();
char oldCh;
for (int i = 0; i < str.Length; i++)
{
oldCh = sb[i];
sb[i] = char.ToUpper(sb[i]);
strs[i] = new string (sb);
sb[i] = oldCh;
}
return strs;
}
There's no need to clear and keep reading the string to the string builder. We also know the size of the array so that can be allocated at the start.
I wrote an answer for your questions (it's second code snippet), you can modify it for your needs, like changing the return type to string[], or use ToArray() extension method if you wanna stick with it. I think it's more readable this way.
I decided to put a the end little profiler to check CPU usage and memory compared to #Ron Beyer answer.
Here is my first attempt:
public static void Main()
{
var result = Transform("abcde");
result.ToList().ForEach(WriteLine);
}
public static IEnumerable<string> Transform(string str)
{
foreach (var w in str)
{
var split = str.Split(w);
yield return split[0] + char.ToUpper(w) + split[1];
}
}
Result:
Abcde
aBcde
abCde
abcDe
abcdE
Code fiddle https://dotnetfiddle.net/gnsAGX
There is one huge drawback of that code above, it works only if the passed word has unique letters. Therefore "aaaaa" won't produce proper result.
Here is my second successful attempt that seems works with any string input. I used one instance of StringBuilder to decrease the number of objects that would need to be created and manage on one instance, instead of so much copying objects so it's more optimized.
public static void Main()
{
var result = Transform("aaaaa");
result.ToList().ForEach(WriteLine);
}
public static IEnumerable<string> Transform(string str)
{
var result = new StringBuilder(str.ToLower());
for( int i = 0; i < str.Length; i++)
{
result[i] = char.ToUpper(str[i]);
yield return result.ToString();
result[i] = char.ToLower(str[i]);
}
}
Result:
Aaaaa
aAaaa
aaAaa
aaaAa
aaaaA
Code fiddle: https://dotnetfiddle.net/tzhXtP
Measuring execute time and memory uses.
I will use dotnetfiddle.net status panel, to make it easier.
Fiddle has few limitations like time execution of code 10 sec and used memory
besides differences are very significant.
I tested programs with 14 000 repetitions, my code additionally changes the output to array[].
My answer (https://dotnetfiddle.net/1fLVw9)
Last Run: 12:23:09 pm
Compile: 0.046s
Execute: 7.563s
Memory: 16.22Gb
CPU: 7.609s
Compared answer (https://dotnetfiddle.net/Zc88F2)
Compile: 0.031s
Execute: 9.953s
Memory: 16.22Gb
CPU: 9.938s
It slightly reduces the execution time.
Hope this helps!
public static string[] Transform(string str)
{
var strs = new string [str.Length];
var sb = str.ToCharArray();
char oldCh;
for (int i = 0; i < str.Length; i++)
{
oldCh = sb[i];
sb[i] = char.ToUpper(sb[i]);
strs[i] = new string (sb);
sb[i] = oldCh;
}
return strs;
}
I'm implementing a word count feature for my ASP.NET server, and I was wondering what would be the fastest method of doing so, as I'm not sure using a simple
text.AsParallel().Count(Char.IsWhiteSpace);
is the fastest possible method. Since this feature might be used quite a bit on relatively long walls of text, I want it to be as fast as possible, even if it means using unsafe methods.
Edit: Some benchmarking with Rufus L's code as well as my own unsafe method:
public static unsafe int CountWords(string s)
{
int count = 0;
fixed (char* ps = s)
{
int len = s.Length;
char* pc = ps;
while (len-- > 0)
{
if (char.IsWhiteSpace(*pc++))
{
count++;
}
}
}
return count;
}
Split(null): 681979 words in 415867 ticks.
Count(WhiteSpace): 681978 words in 147860 ticks.
AsParallel: 681978 words in 401077 ticks.
Unsafe: 681978 words in 98139 ticks.
I'm still open to any better ideas :)
EDIT2:
Rewrote the function, taking care of multiple white spaces too:
public static unsafe int CountWords(string s)
{
int count = 0;
fixed (char* ps = s)
{
int len = s.Length;
bool inWord = false;
char* pc = ps;
while (len-- > 0)
{
if (char.IsWhiteSpace(*pc++))
{
if (!inWord)
{
inWord = true;
}
}
else
{
if (inWord)
{
inWord = false;
count++;
}
}
if (len == 0)
{
if (inWord)
{
count++;
}
}
}
}
return count;
}
Split(null): 681979 words in 517055 ticks.
Count(WhiteSpace): 681978 words in 148952 ticks.
AsParallel: 681978 words in 410289 ticks.
Unsafe: 660000 words in 114833 ticks.
According to my tests, this is much faster, by a factor of 4 (but see the update below for different results):
wordCount = text.Split(null).Length;
Here's the test, in case you want to try it out. Note that adding AsParallel() slows the process down on my machine, due to the cost of task switching:
public static void Main()
{
var text = File.ReadAllText("d:\\public\\temp\\temp.txt");
int wordCount;
var sw = new Stopwatch();
sw.Start();
wordCount = text.Split(null).Length;
sw.Stop();
Console.WriteLine("Split(null): {0} words in {1} ticks.", wordCount,
sw.ElapsedTicks);
sw.Restart();
wordCount = text.Count(Char.IsWhiteSpace);
sw.Stop();
Console.WriteLine("Count(WhiteSpace): {0} words in {1} ticks.", wordCount,
sw.ElapsedTicks);
sw.Restart();
wordCount = text.AsParallel().Count(Char.IsWhiteSpace);
sw.Stop();
Console.WriteLine("AsParallel: {0} words in {1} ticks.", wordCount,
sw.ElapsedTicks);
}
Output:
Split(null): 964 words in 629 ticks.
Count(WhiteSpace): 963 words in 2377 ticks.
AsParallel: 963 words in 208983 ticks.
Update
After making the string MUCH longer (OP mentioned 100's of 1000's of words), the results became much more similar, and the Count(WhiteSpace) method became faster than the Split(null) method:
Code change:
var text = File.ReadAllText("d:\\public\\temp\\temp.txt");
var textToSearch = new StringBuilder();
for (int i = 0; i < 500; i++) textToSearch.Append(text);
text = textToSearch.ToString();
Output:
Split(null): 481501 words in 185135 ticks.
Count(WhiteSpace): 481500 words in 101373 ticks.
AsParallel: 481500 words in 336117 ticks.
After some benchmarking, the following unsage code yielded the fastest result in any case with 500+ words:
public static unsafe int CountWords(string s)
{
int count = 0;
fixed (char* ps = s)
{
int len = s.Length;
bool inWord = false;
char* pc = ps;
while (len-- > 0)
{
if (char.IsWhiteSpace(*pc++))
{
if (!inWord)
{
inWord = true;
}
}
else
{
if (inWord)
{
inWord = false;
count++;
}
}
if (len == 0)
{
if (inWord)
{
count++;
}
}
}
}
return count;
}
Answering your ask, the méthod AsParallel() is very fast. But exists more options, by e.g.:
Using Regex:
string input = "text text text text";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
Console.WriteLine("Words: {0}", substrings.count());
But I reiterate, the AsParallel () method is very fast. You can do a proof of concept, to find out which is better. Place a stopwatch () at the beginning and another at the end of code and compare the AsParallel runtime () with the regex, so will have a more 'exact' answer.
Update
Using Parallel.For:
static void Main(string[] args)
{
string text = #"text text text text text text text text text text ";
int count = 0;
Console.WriteLine("Generating words, wait...");
Parallel.For(0, 100000, i =>
{
text += #"text text text text text text text text text text ";
});
var sw = new Stopwatch();
sw.Start();
Parallel.For(0, text.Length, i =>
{
if (char.IsWhiteSpace(text[i]))
count++;
});
sw.Stop();
Console.WriteLine("Words: {0} in {1} ticks", count, sw.ElapsedTicks);
Console.ReadLine();
}
Results:
PS: Note that the Parallel.For used is not managed
Is there a way to remove characters from a current character array and then save it to a new character array. Following is the code:
string s1 = "move";
string s2 = "remove";
char[] c1 = s1.ToCharArray();
char[] c2 = s2.ToCharArray();
for (int i = 0; i < s2.Length; i++)
{
for (int p = 0; p < s1.Length; p++)
{
if (c2[i] == c1[p])
{
// REMOVE LETTER FROM C2
}
// IN THE END I SHOULD JUST HAVE c3 = re (ALL THE MATCHING CHARACTERS M-O-V-E SHOULD BE
DELETED)
Would appreciate your help
You can create a third array, c3, where you will add characters from c2 that are not to be removed.
You may also use Replace.
string s3 = s2.Replace(s1,"");
The original O(N^2) approach is wasteful. And I don't see how the other two answers actually perform the work you seem to be trying to accomplish. I hope this example, which has O(N) performance, works better for you:
string s1 = "move";
string s2 = "remove";
HashSet<char> excludeCharacters = new HashSet<char>(s1);
StringBuilder sb = new StringBuilder();
// Copy every character from the original string, except those to be excluded
foreach (char ch in s2)
{
if (!excludeCharacters.Contains(ch))
{
sb.Append(ch);
}
}
return sb.ToString();
Granted, for short strings the performance isn't likely to matter. But IMHO this is also easier to comprehend than the alternatives.
EDIT:
It is still not entirely clear to me what the OP is trying to do here. The most obvious task would be to remove whole words, but neither of his descriptions seem to say that's what he really wants. So, on the assumption that the above is not addressing his needs, but that he also does not want to remove whole words, here are a couple of other options...
1) O(N), the best approach for strings of non-trivial length, but is somewhat more complicated:
string s1 = "move";
string s2 = "remove";
Dictionary<char, int> excludeCharacters = new Dictionary<char, int>();
foreach (char ch in s1)
{
int count;
excludeCharacters.TryGetValue(ch, out count);
excludeCharacters[ch] = ++count;
}
StringBuilder sb = new StringBuilder();
foreach (char ch in s2)
{
int count;
if (!excludeCharacters.TryGetValue(ch, out count) || count == 0)
{
sb.Append(ch);
}
else
{
excludeCharacters[ch] = --count;
}
}
return sb.ToString();
2) An O(N^2) implementation which at least minimizes other unnecessary inefficiencies and which would suffice if all the inputs are relatively short:
StringBuilder sb = new StringBuilder(s2);
foreach (char ch in s1)
{
for (int i = 0; i < sb.Length; i++)
{
if (sb[i] == ch)
{
sb.Remove(i, 1);
break;
}
}
}
return sb.ToString();
This isn't particularly efficient, but it will probably be fast enough for short strings:
string s1 = "move";
string s2 = "remove";
foreach (char charToRemove in s1)
{
int index = s2.IndexOf(charToRemove);
if (index >= 0)
s2 = s2.Remove(index, 1);
}
// Result is now in s2.
Console.WriteLine(s2);
This avoids converting to a char array.
However, just to emphasize: This will be VERY slow for large strings.
[EDIT]
I have done some testing, and it turns out that this code is in fact pretty fast.
Here I'm comparing the code with the optimized code from another answer. However, note that we're not comparing entirely fairly, since the code here correctly implements the OP's requirement while the other code doesn't. However, it does demonstrate that the use of HashSet doesn't help as much as one might think.
I tested this code on a release build, not run inside a debugger (if you run it in a debugger, it does a debug build not a release build which will give incorrect timings).
This test uses target strings of length 1024 and chars to remove == "SKFPBPENAALDKOWJKFPOSKLW".
My results, where test1() is the incorrect but supposedly optimized solution from another answer, and test2() is my unoptimized but correct solution:
test1() took 00:00:00.2891665
test2() took 00:00:00.1004743
test1() took 00:00:00.2720192
test2() took 00:00:00.0993898
test1() took 00:00:00.2753971
test2() took 00:00:00.0997268
test1() took 00:00:00.2754325
test2() took 00:00:00.1026486
test1() took 00:00:00.2785548
test2() took 00:00:00.1039417
test1() took 00:00:00.2818029
test2() took 00:00:00.1029695
test1() took 00:00:00.2727377
test2() took 00:00:00.0995654
test1() took 00:00:00.2711982
test2() took 00:00:00.1009849
As you can see, test2() consistently outperforms test1(). This remains true even if the strings are increased to length 8192.
The test code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;
namespace Demo
{
public static class Program
{
private static void Main(string[] args)
{
var sw = new Stopwatch();
string text = randomString(8192, 27367);
string charsToRemove = "SKFPBPENAALDKOWJKFPOSKLW";
int dummyLength = 0;
int iters = 10000;
for (int trial = 0; trial < 8; ++trial)
{
sw.Restart();
for (int i = 0; i < iters; ++i)
dummyLength += test1(text, charsToRemove).Length;
Console.WriteLine("test1() took " + sw.Elapsed);
sw.Restart();
for (int i = 0; i < iters; ++i)
dummyLength += test2(text, charsToRemove).Length;
Console.WriteLine("test2() took " + sw.Elapsed);
Console.WriteLine();
}
}
private static string randomString(int length, int seed)
{
var rng = new Random(seed);
var sb = new StringBuilder(length);
for (int i = 0; i < length; ++i)
sb.Append((char) rng.Next(65, 65 + 26*2));
return sb.ToString();
}
private static string test1(string text, string charsToRemove)
{
HashSet<char> excludeCharacters = new HashSet<char>(charsToRemove);
StringBuilder sb = new StringBuilder();
foreach (char ch in text)
{
if (!excludeCharacters.Contains(ch))
{
sb.Append(ch);
}
}
return sb.ToString();
}
private static string test2(string text, string charsToRemove)
{
foreach (char charToRemove in charsToRemove)
{
int index = text.IndexOf(charToRemove);
if (index >= 0)
text = text.Remove(index, 1);
}
return text;
}
}
}
[EDIT 2]
Here's a much more optimized solution:
public static string RemoveChars(string text, string charsToRemove)
{
char[] result = new char[text.Length];
char[] targets = charsToRemove.ToCharArray();
int n = 0;
int m = targets.Length;
foreach (char ch in text)
{
if (m == 0)
{
result[n++] = ch;
}
else
{
int index = findFirst(targets, ch, m);
if (index < 0)
{
result[n++] = ch;
}
else
{
if (m > 1)
{
--m;
targets[index] = targets[m];
}
else
{
m = 0;
}
}
}
}
return new string(result, 0, n);
}
private static int findFirst(char[] chars, char target, int n)
{
for (int i = 0; i < n; ++i)
if (chars[i] == target)
return i;
return -1;
}
Plugging that into my test program above shows that it runs around 3 times faster than test2().
Why is StringBuilder slower when compared to + concatenation?
StringBuilder was meant to avoid extra object creation, but why does it penalize performance?
static void Main(string[] args)
{
int max = 1000000;
for (int times = 0; times < 5; times++)
{
Console.WriteLine("\ntime: {0}", (times+1).ToString());
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < max; i++)
{
string msg = "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
}
sw.Stop();
Console.WriteLine("String +\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
sw = Stopwatch.StartNew();
for (int j = 0; j < max; j++)
{
StringBuilder msg = new StringBuilder();
msg.Append("Your total is ");
msg.Append("$500 ");
msg.Append(DateTime.Now);
}
sw.Stop();
Console.WriteLine("StringBuilder\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
}
Console.Read();
}
EDIT: Moving out of scope variables as suggested:
Change so that the StringBuilder isn't instantiated all the time, instead .Clear() it:
time: 1
String + : 3348ms
StringBuilder : 3151ms
time: 2
String + : 3346ms
StringBuilder : 3050ms
etc.
Note that this still tests exactly the same functionality, but tries to reuse resources a bit smarter.
Code: (also live on http://ideone.com/YuaqY)
using System;
using System.Text;
using System.Diagnostics;
public class Program
{
static void Main(string[] args)
{
int max = 1000000;
for (int times = 0; times < 5; times++)
{
{
Console.WriteLine("\ntime: {0}", (times+1).ToString());
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < max; i++)
{
string msg = "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
}
sw.Stop();
Console.WriteLine("String +\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
}
{
Stopwatch sw = Stopwatch.StartNew();
StringBuilder msg = new StringBuilder();
for (int j = 0; j < max; j++)
{
msg.Clear();
msg.Append("Your total is ");
msg.Append("$500 ");
msg.Append(DateTime.Now);
}
sw.Stop();
Console.WriteLine("StringBuilder\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
}
}
Console.Read();
}
}
You are creating a new instance of StringBuilder with every iteration, and that incurs some overhead. Since you are not using it for what it's actually meant to do (ie: build large strings which would otherwise require many string concatenation operations), it's not surprising to see worse performance than concatenation.
A more common comparison / usage of StringBuilder is something like:
string msg = "";
for (int i = 0; i < max; i++)
{
msg += "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
}
StringBuilder msg_sb = new StringBuilder();
for (int j = 0; j < max; j++)
{
msg_sb.Append("Your total is ");
msg_sb.Append("$500 ");
msg_sb.Append(DateTime.Now);
}
With this, you'll observe a significant performance difference between StringBuilder and concatenation. And by "significant" I mean orders of magnitude, not the ~ 10% difference you are observing in your examples.
Since StringBuilder doesn't have to build tons of intermediary strings that will just get thrown away, you get much better performance. That's what it's meant for. For smaller strings, you are better off using string concatenation for simplicity and clarity.
The benefits of StringBuilder should be noticeable with longer strings.
Every time you concatenate a string you create a new string object, so the longer the string, the more is needed to copy from the old string to the new string.
Also, creating many temporary objects may have an adverse effect on performance that is not measurable by a StopWatch, because it "pollutes" the managed heap with temporary objects and may cause more garbage collection cycles.
Modify your test to create (much) longer strings and use (many) more concatenations / append operations and the StringBuilder should perform better.
Note that
string msg = "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
compiles down to
string msg = String.Concat("Your total is ", "$500 ");
msg = String.Concat(msg, DateTime.Now.ToString());
This totals two concats and one ToString per iteration. Also, a single String.Concat is really fast, because it knows how large the resulting string will be, so it only allocates the resulting string once, and then quickly copies the source strings into it. This means that in practice
String.Concat(x, y);
will always outperform
StringBuilder builder = new StringBuilder();
builder.Append(x);
builder.Append(y);
because StringBuilder cannot take such shortcuts (you could call a thirs Append, or a Remove, that's not possible with String.Concat).
The way a StringBuilder works is by allocating an initial buffer and set the string length to 0. With each Append, it has to check the buffer, possibly allocate more buffer space (usually copying the old buffer to the new buffer), copy the string and increment the string length of the builder. String.Concat does not need to do all this extra work.
So for simple string concatenations, x + y (i.e., String.Concat) will always outperform StringBuilder.
Now, you'll start to get benefits from StringBuilder once you start concatenating lots of strings into a single buffer, or you're doing lots of manipulations on the buffer, where you'd need to keep creating new strings when not using a StringBuilder. This is because StringBuilder only occasionally allocates new memory, in chunks, but String.Concat, String.SubString, etc. (nearly) always allocate new memory. (Something like "".SubString(0,0) or String.Concat("", "") won't allocate memory, but those are degenerate cases.)
In addition to not using StringBuilder as in the most efficient manner, you're also not using string concatenation as efficiently as possible. If you know how many strings you're concatenating ahead of time, then doing it all on one line should be fastest. The compiler optimizes the operation so that no intermediate strings are generated.
I added a couple more test cases. One is basically the same as what sehe suggested, and the other generates the string in one line:
sw = Stopwatch.StartNew();
builder = new StringBuilder();
for (int j = 0; j < max; j++)
{
builder.Clear();
builder.Append("Your total is ");
builder.Append("$500 ");
builder.Append(DateTime.Now);
}
sw.Stop();
Console.WriteLine("StringBuilder (clearing)\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
sw = Stopwatch.StartNew();
for (int i = 0; i < max; i++)
{
msg = "Your total is " + "$500" + DateTime.Now;
}
sw.Stop();
Console.WriteLine("String + (one line)\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
And here is an example of the output I see on my machine:
time: 1
String + : 3707ms
StringBuilder : 3910ms
StringBuilder (clearing) : 3683ms
String + (one line) : 3645ms
time: 2
String + : 3703ms
StringBuilder : 3926ms
StringBuilder (clearing) : 3666ms
String + (one line) : 3625ms
In general:
- StringBuilder does better if you're building a large string in a lot of steps, or you don't know how many strings will be concatenated together.
- Mashing them all together in a single expression is better whenever it's a reasonable option option.
I think its better to compare effeciancy between String and StringBuilder rather then time.
what msdn says:
A String is called immutable because its value cannot be modified once it has been created. Methods that appear to modify a String actually return a new String containing the modification. If it is necessary to modify the actual contents of a string-like object, use the System.Text.StringBuilder class.
string msg = "Your total is "; // a new string object
msg += "$500 "; // a new string object
msg += DateTime.Now; // a new string object
see which one is better.
Here is an example that demonstrates a situation in which StringBuilder will execute more quickly than string concatenation:
static void Main(string[] args)
{
const int sLen = 30, Loops = 10000;
DateTime sTime, eTime;
int i;
string sSource = new String('X', sLen);
string sDest = "";
//
// Time StringBuilder.
//
for (int times = 0; times < 5; times++)
{
sTime = DateTime.Now;
System.Text.StringBuilder sb = new System.Text.StringBuilder((int)(sLen * Loops * 1.1));
Console.WriteLine("Result # " + (times + 1).ToString());
for (i = 0; i < Loops; i++)
{
sb.Append(sSource);
}
sDest = sb.ToString();
eTime = DateTime.Now;
Console.WriteLine("String Builder took :" + (eTime - sTime).TotalSeconds + " seconds.");
//
// Time string concatenation.
//
sTime = DateTime.Now;
for (i = 0; i < Loops; i++)
{
sDest += sSource;
//Console.WriteLine(i);
}
eTime = DateTime.Now;
Console.WriteLine("Concatenation took : " + (eTime - sTime).TotalSeconds + " seconds.");
Console.WriteLine("\n");
}
//
// Make the console window stay open
// so that you can see the results when running from the IDE.
//
}
Result # 1
String Builder took :0 seconds.
Concatenation took : 8.7659616 seconds.
Result # 2
String Builder took :0 seconds.
Concatenation took : 8.7659616 seconds.
Result # 3
String Builder took :0 seconds.
Concatenation took : 8.9378432 seconds.
Result # 4
String Builder took :0 seconds.
Concatenation took : 8.7972128 seconds.
Result # 5
String Builder took :0 seconds.
Concatenation took : 8.8753408 seconds.
StringBulder is much faster than + concatenation..