Performance issues with nested loops and string concatenations - c#

Can someone please explain why this code is taking so long to run (i.e. >24 hours):
The number of rows is 5000, whilst the number of columns is 2000 (i.e. Approximately 10m loops).
Is there a better way to do this????
for (int i = 0; i < m.rows; i++)
{
for (int j = 0; j < m.cols; j++)
{
textToWrite += m[i, j].ToString() + ",";
}
//remove the final comma.
textToWrite = textToWrite.Substring(0,textToWrite.Length-2);
textToWrite += Environment.NewLine;
}

Yes, the += operator is not very efficient. Use StringBuilder instead.
In the .NET framework a string is immutable, which means it cannot be modified in place. This means the += operator has to create a new string every time, which means allocating memory, copying the value of the existing string and writing it to the new location. It's ok for one or two concatenations, but as soon as you put it in a loop you need to use an alternative.
http://support.microsoft.com/kb/306822
You'll see a massive performance improvement by using the following code:
var textToWriteBuilder = new StringBuilder();
for (int i = 0; i < m.rows; i++)
{
for (int j = 0; j < m.cols; j++)
{
textToWriteBuilder.Append(m[i, j].ToString() + ",");
}
// I've modified the logic on the following line, I assume you want to
// concatenate the value instead of overwriting it as you do in your question.
textToWriteBuilder.Append(textToWriteBuilder.Substring(0, textToWriteBuilder.Length - 2));
textToWriteBuilder.Append(Environment.NewLine);
}
string textToWrite = textToWriteBuilder.ToString();

Because you are creating tons of strings.
You should use StringBuilder for this.
StringBuilder sb = new StringBuildeR();
for (int i = 0; i < m.rows; i++)
{
bool first = true;
for (int j = 0; j < m.cols; j++)
{
sb.Append(m[i, j]);
if (first)
{
first = false;
}
else
{
sb.Append(",");
}
}
sb.AppendLine();
}
string output = sb.ToString();

Your code is taking so long because you're appending strings, creating thousands of new temporary strings as you go. The memory manager needs to find memory for these strings (which increase in memory requirements, as they get longer) and the operation copies the characters you have so far (the number of which increases with every iteration) to the newest string.
The alternative is to use a single StringBuilder, on which you call Append() to append more efficiently and, finally, ToString() when you're done to get the finalized string that you want to use.

The biggest issue I see with this is the fact you're using textToWrite as a string.
As strings are immutable so each time the string is changed new memory must be reserved copied from the previous version.
A far more efficient approach is to use the StringBuilder class which is designed for exactly this type of scenario. For example:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < m.rows; i++)
{
for (int j = 0; j < m.cols; j++)
{
sb.Append(m[i, j].ToString());
if(j < m.cols - 1) // don't add a comma on the last element
{
sb.Append(",");
}
}
sb.AppendLine();
}

Supposing that textToWrite is a String, you should use StringBuilder instead. String is immutable and it is very ineffective to add small parts.
Ideally you would initialize StringBuilder with a reasonable size (see doc).

Use a StringBuilder instead of several million concatenations.
If you concatenate 2 strings, this means the system allocates new memory to contain both of them, and then copies both in. A zillion large memory allocations and copy actions become slow very fast.
What StringBuilder does is reduce this immensely by allocating 'in advance', thus only having to grow the buffer a few times and just copying it in, eliminating the by far slowest factor of your loop.

Assume the matrix is of size MxM and has N elements. You are building the string in a way that takes O(N^2) (or O(M^4)) in the number of iterations. Each operation must copy what's already there. The issue is not some constant-factor overhead like temporary strings.
Use StringBuilder.
String concatenation is more efficient for small number of concatenated strings. For a dynamic number of strings, use StringBuilder.

The reason that it takes so long to run is because you are using string concatenation to create a string. For each iteration it will copy the entire string to a new string, so in the end you will have copied strings that adds up to several million times the final string.
Use a StringBuilder to create the string:
StringBuilder textToWrite = new StringBuilder();
for (int i = 0; i < m.rows; i++)
{
for (int j = 0; j < m.cols; j++)
{
if (j > 0) textToWrite.Append(',');
textToWrite.Append(m[i, j]);
}
textToWrite.AppendLine();
}

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

I have an extension method to remove certain characters from a string (a phone number) which is performing much slower than I think it should vs chained Replace calls. The weird bit, is that in a loop it overtakes the Replace thing if the loop runs for around 3000 iterations, and after that it's faster. Lower than that and chaining Replace is faster. It's like there's a fixed overhead to my code which Replace doesn't have. What could this be!?
Quick look. When only testing 10 numbers, mine takes about 0.3ms, while Replace takes only 0.01ms. A massive difference! But when running 5 million, mine takes around 1700ms while Replace takes about 2500ms.
Phone numbers will only have 0-9, +, -, (, )
Here's the relevant code:
Building test cases, I'm playing with testNums.
int testNums = 5_000_000;
Console.WriteLine("Building " + testNums + " tests");
Random rand = new Random();
string[] tests = new string[testNums];
char[] letters =
{
'0','1','2','3','4','5','6','7','8','9',
'+','-','(',')'
};
for(int t = 0; t < tests.Length; t++)
{
int length = rand.Next(5, 20);
char[] word = new char[length];
for(int c = 0; c < word.Length; c++)
{
word[c] = letters[rand.Next(letters.Length)];
}
tests[t] = new string(word);
}
Console.WriteLine("Tests built");
string[] stripped = new string[tests.Length];
Using my extension method:
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].CleanNumberString();
}
stopwatch.Stop();
Console.WriteLine("Clean: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Using chained Replace:
stripped = new string[tests.Length];
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].Replace(" ", string.Empty)
.Replace("-", string.Empty)
.Replace("(", string.Empty)
.Replace(")", string.Empty)
.Replace("+", string.Empty);
}
stopwatch.Stop();
Console.WriteLine("Replace: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Extension method in question:
public static string CleanNumberString(this string s)
{
Span<char> letters = stackalloc char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] >= '0' && s[i] <= '9')
letters[count++] = s[i];
}
return new string(letters.Slice(0, count));
}
What I've tried:
I've run them around the other way. Makes a tiny difference, but not enough.
Make it a normal static method, which was significantly slower than extension. As a ref parameter was slightly slower, and in parameter was about the same as extension method.
Aggressive Inlining. Doesn't make any real difference. I'm in release mode, so I suspect the compiler inlines it anyway. Either way, not much change.
I have also looked at memory allocations, and that's as I expect. My one allocates on the managed heap only one string per iteration (the new string at the end) which Replace allocates a new object for each Replace. So the memory used by the Replace one is much, higher. But it's still faster!
Is it calling native C code and doing something crafty there? Is the higher memory usage triggering the GC and slowing it down (still doesn't explane the insanely fast time on only one or two iterations)
Any ideas?
(Yes, I know not to bother optimising things like this, it's just bugging me because I don't know why it's doing this)
After doing some benchmarks, I think can safely assert that your initial statement is wrong for the exact reason you mentionned in your deleted answer: the loading time of the method is the only thing that misguided you.
Here's the full benchmark on a simplified version of the problem:
static void Main(string[] args)
{
// Build string of n consecutive "ab"
int n = 1000;
Console.WriteLine("N: " + n);
char[] c = new char[n];
for (int i = 0; i < n; i+=2)
c[i] = 'a';
for (int i = 1; i < n; i += 2)
c[i] = 'b';
string s = new string(c);
Stopwatch stopwatch;
// Make sure everything is loaded
s.CleanNumberString();
s.Replace("a", "");
s.UnsafeRemove();
// Tests to remove all 'a' from the string
// Unsafe remove
stopwatch = Stopwatch.StartNew();
string a1 = s.UnsafeRemove();
stopwatch.Stop();
Console.WriteLine("Unsafe remove:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Extension method
stopwatch = Stopwatch.StartNew();
string a2 = s.CleanNumberString();
stopwatch.Stop();
Console.WriteLine("Clean method:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// String replace
stopwatch = Stopwatch.StartNew();
string a3 = s.Replace("a", "");
stopwatch.Stop();
Console.WriteLine("String.Replace:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Make sure the returned strings are identical
Console.WriteLine(a1.Equals(a2) && a2.Equals(a3));
Console.ReadKey();
}
public static string CleanNumberString(this string s)
{
char[] letters = new char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
if (s[i] == 'b')
letters[count++] = 'b';
return new string(letters.SubArray(0, count));
}
public static T[] SubArray<T>(this T[] data, int index, int length)
{
T[] result = new T[length];
Array.Copy(data, index, result, 0, length);
return result;
}
// Taken from https://stackoverflow.com/a/2183442/6923568
public static unsafe string UnsafeRemove(this string s)
{
int len = s.Length;
char* newChars = stackalloc char[len];
char* currentChar = newChars;
for (int i = 0; i < len; ++i)
{
char c = s[i];
switch (c)
{
case 'a':
continue;
default:
*currentChar++ = c;
break;
}
}
return new string(newChars, 0, (int)(currentChar - newChars));
}
When ran with different values of n, it is clear that your extension method (or at least my somewhat equivalent version of it) has a logic that makes it faster than String.Replace(). In fact, it is more performant on either small or big strings:
N: 100
Unsafe remove: 0,0024ms
Clean method: 0,0015ms
String.Replace: 0,0021ms
True
N: 100000
Unsafe remove: 0,3889ms
Clean method: 0,5308ms
String.Replace: 1,3993ms
True
I highly suspect optimizations for the replacement of strings (not to be compared to removal) in String.Replace() to be the culprit here. I also added a method from this answer to have another comparison on removal of characters. That one's times behave similarly to your method but gets faster on higher values (80k+ on my tests) of n.
With all that being said, since your question is based on an assumption that we found was false, if you need more explanation on why the opposite is true (i.e. "Why is String.Replace() slower than my method"), plenty of in-depth benchmarks about string manipulation already do so.
I ran the clean method a couple more. interestingly, it is a lot faster than the Replace. Only the first time run was slower. Sorry that I couldn't explain why it's slower the first time but I ran more of the method then the result was expected.
Building 100 tests
Tests built
Replace: 0.0528ms
Clean: 0.4526ms
Clean: 0.0413ms
Clean: 0.0294ms
Replace: 0.0679ms
Replace: 0.0523ms
used dotnet core 2.1
So I've found with help from daehee Kim and Mat below that it's only the first iteration, but it's for the whole first loop. Every loop after there is ok.
I use the following line to force the JIT to do its thing and initialise this method:
RuntimeHelpers.PrepareMethod(typeof(CleanExtension).GetMethod("CleanNumberString", BindingFlags.Public | BindingFlags.Static).MethodHandle);
I find the JIT usually takes about 2-3ms to do its thing here (including Reflection time of about 0.1ms). Note that you should probably not be doing this because you're now getting the Reflection cost as well, and the JIT will be called right after this anyway, but it's probably a good idea for benchmarks to fairly compare.
The more you know!
My benchmark for a loop of 5000 iterations, repeated 5000 times with random strings and averaged is:
Clean: 0.41078ms
Replace: 1.4974ms

PadRight in string of arrays doesn't add chars

I created array of strings which includes strings with Length from 4 to 6. I am trying to PadRight 0's to get length for every element in array to 6.
string[] array1 =
{
"aabc", "aabaaa", "Abac", "abba", "acaaaa"
};
for (var i = 0; i <= array1.Length-1; i++)
{
if (array1[i].Length < 6)
{
for (var j = array1[i].Length; j <= 6; j++)
{
array1[i] = array1[i].PadRight(6 - array1[i].Length, '0');
}
}
Console.WriteLine(array1[i]);
}
Right now the program writes down the exact same strings I have in array without adding 0's at the end. I made a little research and found some informations about that strings are immutable, but still there are some example with changing strings inside, but I couldn't find any with PadRight or PadLeft and I fell like there must be a way to do it, but I just can't figure it out.
Any ideas on how to fix that issue?
The first argument to PadRight is the total length you want. You've specified 6 - array1[i].Length - and as all your strings start off with at least 3 characters, you're padding to at most 3 characters, so it's not doing anything.
You don't need your inner loop, and your outer loop condition is more conventionally written as <. This is one way I'd write that code:
using System;
public class Test
{
static void Main()
{
string[] array =
{
"aabc", "aabaaa", "Abac", "abba", "acaaaa"
};
for (var i = 0; i < array.Length; i++)
{
array[i] = array[i].PadRight(6, '0');
Console.WriteLine(array[i]);
}
}
}
In fact I'd probably use foreach, or even Select, but that's a different matter. I've left this using an array to be a bit closer to your original code.

C# Char permutation with repetition on large set of chars

Hello Im trying to get all possible combinations with repetitions of given char array.
Char array consists of alphabet letters(only lower) and I need to generate strings with length of 30 or more chars.
I tried with method of many for-loops,but when I try to get all combinations of char in char array with length of string more then 5 I get out of Memory Exception.
So I created similar Method that takes only first 200000 strings,then next 2000000 and so on this was proven sucessfull but only with smaller length strings.
This was my method with length of 7 chars:
public static int Progress = 0;
public static ArrayList CreateRngUrl7()
{
ArrayList AllCombos = new ArrayList();
int passed = 0;
int Too = Progress + 200000;
char[] alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToLower().ToCharArray();
for (int i = 0; i < alpha.Length; i++)
for (int i1 = 0; i1 < alpha.Length; i1++)
for (int i2 = 0; i2 < alpha.Length; i2++)
for (int i3 = 0; i3 < alpha.Length; i3++)
for (int i4 = 0; i4 < alpha.Length; i4++)
for (int i5 = 0; i5 < alpha.Length; i5++)
for (int i6 = 0; i6 < alpha.Length; i6++)
{
if (passed > (Too - 200000) && passed < Too)
{
string word = new string(new char[] { alpha[i], alpha[i1], alpha[i2], alpha[i3], alpha[i4], alpha[i5],alpha[i6] });
AllCombos.Add(word);
}
passed++;
}
if (Too >= passed)
{
MessageBox.Show("All combinations of RNG7 were returned");
}
Progress = Too;
return AllCombos;
}
I tried adding 30 for-loops with in way described above so i Would get strings with lenghts of 30 but application just hangs.Is there any better way to do this? All answers would be much appreciated. Thank you in advance!
Can someone please just post method how it is done with larger legth strings I just want to see an example? I don't have to store that data,I just need to compare it with something and release it from memory. I used alphabet for example I don't need whole alphabet.Question was not how long it would take or how much combinations would it be!!!!!
You get an OutOfMemoryException because inside the loop you allocate a string and store it in an ArrayList. The strings have to stay in memory until the ArrayList is garbage collected and your loop creates more strings than you will be able to store.
If you simply want to check the string for a condition you should put the check inside the loop:
for ( ... some crazy loop ...) {
var word = ... create word ...
if (!WordPassesTest(word)) {
Console.WriteLine(word + " failed test.");
return false;
}
}
return true;
Then you only need storage for a single word. Of course, if the loop is crazy enough, it will not terminate before the end of the universe as we know it.
If you need to execute many nested but similar loops you can use recursion to simplify the code. Here is an example that is not incredible efficient, but at least it is simple:
Char[] chars = "ABCD".ToCharArray();
IEnumerable<String> GenerateStrings(Int32 length) {
if (length == 0) {
yield return String.Empty;
yield break;
}
var strings = chars.SelectMany(c => GenerateStrings(length - 1), (c, s) => c + s);
foreach (var str in strings)
yield return str;
}
Calling GenerateStrings(3) will generate all strings of length 3 using lazy evaluation (so no additional storage is required for the strings).
Building on top of an IEnumerable generating your strings you can create primites to buffer and process buffers of strings. An easy solution is to using Reactive Extensions for .NET. Here you already have a Buffer primitive:
GenerateStrings(3)
.ToObservable()
.Buffer(10)
.Subscribe(list => ... ship the list to another computer and process it ...);
The lambda in Subscribe will be called with a List<String> with at most 10 strings (the parameter provided in the call to Buffer).
Unless you have an infinte number of computers you will still have to pull the computers from a pool and only recycle them back to the pool when they have finished the computation.
It should be obvious from the comments on this question that you will not be able to process 26^30 strings even if you have multiple computers at your disposal.
I don't have time right now to write some code but essentially if you are running out of RAM use disk. I'm thinking along the lines of one thread running an algorithm to find the combinations and another persisting the results to disk and releasing the RAM.

.NET String performance question

Is it better, from a performance standpoint, to use "Example1"? I'm assuming that "Example2" would create a new string on the heap in each iteration while "Example1" would not...
Example1:
StringBuilder letsCount = new StringBuilder("Let's count! ");
string sep = ", ";
for(int i=; i< 100; i++)
{
letsCount.Append(i + sep);
}
Example2:
StringBuilder letsCount = new StringBuilder("Let's count! ");
for(int i=; i< 100; i++)
{
letsCount.Append(i + ", ");
}
The .NET CLR is much smarter than that. It "interns" string literals so that there is only one instance.
It's also worth noting that if you were truly concerned about string concatenation, you would want to turn the single Append call into two append calls. The reality, however, is that the overhead of two calls probably outweighs any minor concatenation cost. In either case, it's probably nearly immeasurable except in very controlled conditions.
They are identical.
Actually a much faster way to do it would be
string letsCount = "Let's count! ";
string[] numbers = new string[100];
for(int i=0; i< 100; i++)
{
numbers[i]=i+", ";
}
String.Join(letsCount, numbers);
See here

Is using an arraylist of chars faster for performing multiple string concatenation?

I'm using the .Net micro framework so the StringBuilder is not available.
I have seen some code from apt professionals to use an Arraylist of chars to concat and build strings, as opposed to the + operator. They essentially build a managed code StringBuilder.
Is there a performance advantage to this? Assume the number of concatenations are higher than 10 and string lengths are also larger than 10.
No, don't use an ArrayList of char values. That will box every char - performance will be horrible, as will memory usage. (Size of a reference + size of a boxed char for each character... yikes!)
Use a char[] internally and "resize" it (create a new array and copy the contents in) when you need to, perhaps doubling in size each time. (EDIT: You don't resize it to the exact size you need - you would start off with, say, 16 chars and keep doubling - so most Append operations don't need to "resize" the array.)
That's similar to how StringBuilder works anyway. (It's even closer to how Java's StringBuilder works.)
I suggest you actually build your own StringBuilder type with the most important members. Unit test the heck out of it, and profile where appropriate.
Let me know if you want a short example.
The only reason that using an ArrayList of chars to build a string would be considered performant is if you compare it to something that has really bad performance. Concatenating a huge string using += would be an example of something that would have such bad performance.
You can make the string concatenation a lot more efficient if you just concatenate into several shorter strings instead of one large string.
This code, for example:
string[] parts = new string[1000];
for (int i = 0; i < parts.Length; i++) {
string part = String.Empty;
for (int j=0; j < 100; j++) {
part += "*";
}
parts[i] = part;
}
string result = String.Concat(parts);
Is about 450 times faster than this code:
string result = string.Empty;
for (int i = 0; i < 100000; i++) {
result += "*";
}
A StringBuilder is still faster, but it's only about four times faster than the first example. So by using shorter strings you can cut the time by 99.78%, and using a StringBuilder would only cut another 0.16%.

Categories