String vs. StringBuilder when editing a long string?

String vs. StringBuilder when editing a long string? - c#

I have a string that I have to edit quite a lot. Length is undefined. Replace(string, string) will be the most used method for this.
What is better string.Replace("", "") or StringBuilder.Replace("", "") ?
public static string FormatMyString(string input)
{
// ...
}
(The code examples are plain and stupid. They just serve the purpose to show you what I'm doing. I always get questions: "What are you trying to do ?")

What is better string.Replace("", "") or StringBuilder.Replace("", "") ?
Neither. They both do nothing useful. In the more general case:
if you're doing one replacement, the internal-call of String.Replace should be fine
if you're doing lots of replacements, consider StringBuilder to avoid intermediate strings

IMHO String Builder is the way to go.
It has a cost for creating but would offer much better performence in general due to much more efficient string manipulation and concatenation.
As I said there is a cost so you should consider what does "edit quite a lot" mean.
Sorry I can't provide actual benchmark results right now, but I assume the threshold for using String Builder should be very low...
Hope this helps.

You just don't get any good value just "asking" about these kinds of things. You need to benchmark. Take this code for example:
var sw = Stopwatch.StartNew();
var cc0 = GC.CollectionCount(0);
var s = (string)null;
for (var i = 0; i < 10000000; i++)
{
s = "a";
s += "b";
}
var cc1 = GC.CollectionCount(0);
sw.Stop();
Console.WriteLine(
"collections == {0}, ms == {1}, string == \"{2}\"",
cc1 - cc0,
sw.ElapsedMilliseconds,
s);
Versus this code:
var sw = Stopwatch.StartNew();
var cc0 = GC.CollectionCount(0);
var sb = (StringBuilder)null;
for (var i = 0; i < 10000000; i++)
{
sb = new StringBuilder();
sb.Append("a");
sb.Append("b");
}
var cc1 = GC.CollectionCount(0);
Console.WriteLine(
"collections == {0}, ms == {1}, string == \"{2}\"",
cc1 - cc0,
sw.ElapsedMilliseconds,
sb.ToString());
The two results I get are:
collections == 63, ms == 336, string == "ab" // +=
collections == 228, ms == 692, string == "ab" // StringBuilder
The StringBuilder takes over twice as long and causes over 3.5 times more garbage collections to occur.
It's certainly the case that if I were to concatenate very long strings that StringBuilder will perform better, but I won't know that point unless I measure it.
You need to provide more detail as to what code you are running and what you mean by "better" (faster, less memory, easy to read code, etc) before we can say what is best.

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

I have an extension method to remove certain characters from a string (a phone number) which is performing much slower than I think it should vs chained Replace calls. The weird bit, is that in a loop it overtakes the Replace thing if the loop runs for around 3000 iterations, and after that it's faster. Lower than that and chaining Replace is faster. It's like there's a fixed overhead to my code which Replace doesn't have. What could this be!?
Quick look. When only testing 10 numbers, mine takes about 0.3ms, while Replace takes only 0.01ms. A massive difference! But when running 5 million, mine takes around 1700ms while Replace takes about 2500ms.
Phone numbers will only have 0-9, +, -, (, )
Here's the relevant code:
Building test cases, I'm playing with testNums.
int testNums = 5_000_000;
Console.WriteLine("Building " + testNums + " tests");
Random rand = new Random();
string[] tests = new string[testNums];
char[] letters =
{
'0','1','2','3','4','5','6','7','8','9',
'+','-','(',')'
};
for(int t = 0; t < tests.Length; t++)
{
int length = rand.Next(5, 20);
char[] word = new char[length];
for(int c = 0; c < word.Length; c++)
{
word[c] = letters[rand.Next(letters.Length)];
}
tests[t] = new string(word);
}
Console.WriteLine("Tests built");
string[] stripped = new string[tests.Length];
Using my extension method:
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].CleanNumberString();
}
stopwatch.Stop();
Console.WriteLine("Clean: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Using chained Replace:
stripped = new string[tests.Length];
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].Replace(" ", string.Empty)
.Replace("-", string.Empty)
.Replace("(", string.Empty)
.Replace(")", string.Empty)
.Replace("+", string.Empty);
}
stopwatch.Stop();
Console.WriteLine("Replace: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Extension method in question:
public static string CleanNumberString(this string s)
{
Span<char> letters = stackalloc char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] >= '0' && s[i] <= '9')
letters[count++] = s[i];
}
return new string(letters.Slice(0, count));
}
What I've tried:
I've run them around the other way. Makes a tiny difference, but not enough.
Make it a normal static method, which was significantly slower than extension. As a ref parameter was slightly slower, and in parameter was about the same as extension method.
Aggressive Inlining. Doesn't make any real difference. I'm in release mode, so I suspect the compiler inlines it anyway. Either way, not much change.
I have also looked at memory allocations, and that's as I expect. My one allocates on the managed heap only one string per iteration (the new string at the end) which Replace allocates a new object for each Replace. So the memory used by the Replace one is much, higher. But it's still faster!
Is it calling native C code and doing something crafty there? Is the higher memory usage triggering the GC and slowing it down (still doesn't explane the insanely fast time on only one or two iterations)
Any ideas?
(Yes, I know not to bother optimising things like this, it's just bugging me because I don't know why it's doing this)

After doing some benchmarks, I think can safely assert that your initial statement is wrong for the exact reason you mentionned in your deleted answer: the loading time of the method is the only thing that misguided you.
Here's the full benchmark on a simplified version of the problem:
static void Main(string[] args)
{
// Build string of n consecutive "ab"
int n = 1000;
Console.WriteLine("N: " + n);
char[] c = new char[n];
for (int i = 0; i < n; i+=2)
c[i] = 'a';
for (int i = 1; i < n; i += 2)
c[i] = 'b';
string s = new string(c);
Stopwatch stopwatch;
// Make sure everything is loaded
s.CleanNumberString();
s.Replace("a", "");
s.UnsafeRemove();
// Tests to remove all 'a' from the string
// Unsafe remove
stopwatch = Stopwatch.StartNew();
string a1 = s.UnsafeRemove();
stopwatch.Stop();
Console.WriteLine("Unsafe remove:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Extension method
stopwatch = Stopwatch.StartNew();
string a2 = s.CleanNumberString();
stopwatch.Stop();
Console.WriteLine("Clean method:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// String replace
stopwatch = Stopwatch.StartNew();
string a3 = s.Replace("a", "");
stopwatch.Stop();
Console.WriteLine("String.Replace:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Make sure the returned strings are identical
Console.WriteLine(a1.Equals(a2) && a2.Equals(a3));
Console.ReadKey();
}
public static string CleanNumberString(this string s)
{
char[] letters = new char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
if (s[i] == 'b')
letters[count++] = 'b';
return new string(letters.SubArray(0, count));
}
public static T[] SubArray<T>(this T[] data, int index, int length)
{
T[] result = new T[length];
Array.Copy(data, index, result, 0, length);
return result;
}
// Taken from https://stackoverflow.com/a/2183442/6923568
public static unsafe string UnsafeRemove(this string s)
{
int len = s.Length;
char* newChars = stackalloc char[len];
char* currentChar = newChars;
for (int i = 0; i < len; ++i)
{
char c = s[i];
switch (c)
{
case 'a':
continue;
default:
*currentChar++ = c;
break;
}
}
return new string(newChars, 0, (int)(currentChar - newChars));
}
When ran with different values of n, it is clear that your extension method (or at least my somewhat equivalent version of it) has a logic that makes it faster than String.Replace(). In fact, it is more performant on either small or big strings:
N: 100
Unsafe remove: 0,0024ms
Clean method: 0,0015ms
String.Replace: 0,0021ms
True
N: 100000
Unsafe remove: 0,3889ms
Clean method: 0,5308ms
String.Replace: 1,3993ms
True
I highly suspect optimizations for the replacement of strings (not to be compared to removal) in String.Replace() to be the culprit here. I also added a method from this answer to have another comparison on removal of characters. That one's times behave similarly to your method but gets faster on higher values (80k+ on my tests) of n.
With all that being said, since your question is based on an assumption that we found was false, if you need more explanation on why the opposite is true (i.e. "Why is String.Replace() slower than my method"), plenty of in-depth benchmarks about string manipulation already do so.

I ran the clean method a couple more. interestingly, it is a lot faster than the Replace. Only the first time run was slower. Sorry that I couldn't explain why it's slower the first time but I ran more of the method then the result was expected.
Building 100 tests
Tests built
Replace: 0.0528ms
Clean: 0.4526ms
Clean: 0.0413ms
Clean: 0.0294ms
Replace: 0.0679ms
Replace: 0.0523ms
used dotnet core 2.1

So I've found with help from daehee Kim and Mat below that it's only the first iteration, but it's for the whole first loop. Every loop after there is ok.
I use the following line to force the JIT to do its thing and initialise this method:
RuntimeHelpers.PrepareMethod(typeof(CleanExtension).GetMethod("CleanNumberString", BindingFlags.Public | BindingFlags.Static).MethodHandle);
I find the JIT usually takes about 2-3ms to do its thing here (including Reflection time of about 0.1ms). Note that you should probably not be doing this because you're now getting the Reflection cost as well, and the JIT will be called right after this anyway, but it's probably a good idea for benchmarks to fairly compare.
The more you know!
My benchmark for a loop of 5000 iterations, repeated 5000 times with random strings and averaged is:
Clean: 0.41078ms
Replace: 1.4974ms

How can I concatenate a list

I need to concatenate these values, I've seen examples using string builder but I cant quite figure it.
I am trying to return recreate the linestrings of https://api.tfl.gov.uk/Line/140/Route/Sequence/Inbound
However the results I have to return, have more than 1 string of co-ords hence the adding "[", "]"
//
for (int i = 0; i < R.geometry.coordinates.Count; i++)
foreach (List<List<double>> C in R.geometry.coordinates)
{
RS.lineStrings.Add(i++.ToString());
RS.lineStrings.Add("[");
foreach (List<double> a in C)
{
// These values are to be concatentated, I'm wanting to create a string of RS.lineStrings.Add($"[{a[1]},{a[0]}]");
RS.lineStrings.Add($"[{a[1]},{a[0]}]");
}
RS.lineStrings.Add("]");
RS.lineStrings.Add(",");
}

Considering in your code C is List<List<double>>. Then you can use LINQ to concatenate
var sb = new StringBuilder(C.Count * 20); // appox length not to resize
C.ForEach(item => sb.AppendFormat("[{0},{1}]", item[1], item[0]));
var str = sb.ToString(); // This is concatenation.
If you want list of strings
C.Select(item => $"[{item[1]},{item[0]}]").ToList();
Based on your new update (I am trying to return "[[0,1],[2,3],[4,5]]") do this
var result = "[" + string.Join(",", C.Select(item => $"[{item[1]},{item[0]}]")) + "]";
Which method you choose - should depend on the details of your list. you can still do it with string builder for better memory management
var sb = new StringBuilder(C.Count * 20); // appox length not to resize
C.ForEach(item => sb.AppendFormat("[{0},{1}],", item[1], item[0])); // note comma - ],
sb.Insert(0, "[").Replace(',', ']', sb.Length - 1, 1); // this removes last comma, ads closing bracket

You can use string.Join() to join them:
string result = string.Join(",", C);

Strings are inmutable. So if you do a lot of string connaction, that can leave a lot of dead strings in memory. The GC will deal with them, but it is still a performance issue. Especially on a Webserver it should be avoided. And then there are things like StringInterning too. A lot of minor optimisations that will get in the way if you do mass operations on strings.
StringBuilder is the closest we can get to a mutable string, that get's around those optimsiations (wich may be a hinderance here). The only use difference is that you use "Append" rather then "Add".

There are multiple possibilities :
RS.lineStrings.Add(string.concat("[{a[1]}" + "," + "{a[0]}]");
RS.lineStrings.Add(string.concat("[{a[1]}",",","{a[0]}]");
Documentation https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings

Here's a solution with a StringBuilder. It's wordy as all get out, but it should be much faster and produce much less garbage for a large number of items. It uses no string concatenation.
var listOfLists = new List<List<double>>
{
new List<double> {1.0, 2.0, 3.0},
new List<double> {3.14, 42.0}
};
var buffer = new StringBuilder();
buffer.Append('[');
var firstOuter = true;
foreach (var list in listOfLists)
{
var firstInner = true;
buffer.Append('[');
if (!firstOuter)
{
buffer.Append(',');
}
foreach (var item in list)
{
if (!firstInner)
{
buffer.Append(',');
}
firstInner = firstOuter = false;
buffer.Append(item.ToString());
}
buffer.Append(']');
}
buffer.Append(']');
var concatenated = buffer.ToString();

Faster ways of console I/O in C#

I recently started using C# on programming contest sites like sphere online judge. One thing I noticed is, that Console I/O can really slow down my programs in C#.
I am mainly using Console.ReadLine and Console.WriteLine methods. For the integer parsing I have written my own parser, because the built in parsers are quite slow
I am aware, that writing to console is slow, so when there is a lot to be written, I use StringBuilder to build up all the output and write all of them once using a single Console.WriteLine(sb.ToString()) call.
Are there any more optimizations I could to to fasten I/O? Are there any other ways of I/O than what I mentioned above?
(Please spare the you-should-check-your-algorithm-first kind of replies, this question is specifically about fast I/O. Thanks for understanding.)

I am speeding up Console I/O in this way:
Parsing long arrays
static IEnumerable<T> ReadArray<T>(string arrayLine, Func<string, T> parseFunction, char separator = ' ')
{
int from = 0;
for (int i = 0; i < arrayLine.Length; i++)
{
if (arrayLine[i] == separator)
{
yield return parseFunction(arrayLine.Substring(from, i - from));
from = i + 1;
}
}
yield return parseFunction(arrayLine.Substring(from));
}
and you can use the method this way
var array = ReadArray(Console.ReadLine(), s => int.Parse(s)).ToArray();
you can gain extra miliseconds avoiding ToArray() call and returning the array directly as most of the time array length is part of input.
Print efficiently (Some time it is not enough)
class Program
{
static StreamWriter output = new StreamWriter(Console.OpenStandardOutput());
internal static void Run()
{
int testCount = int.Parse(Console.ReadLine());
for (int t = 0; t < testCount; t++)
{
// Do your logic here. For example
var result = FooBarSolution(...);
output.WriteLine(result);
}
output.Flush();
}
}

Here is the GIST.
Reading as a string and converting it back as integer or long etc., has unwanted overhead.
Things to think through:
Can you read multiple test cases in a single go?
Can you read the inputs in a format that is supported and easier to convert to other datatypes? [String is just simple for developers ;)]
Can you read it in a buffered way?
Unsafe code is supported by SPOJ C# compiler? Then you can save bit more ticks? so on..,
Here is my take on INOUT test
You should use a huge buffer (byte[]) and read all console input as bytes - then loop through that using ReadByte() till you see a whitespace character and convert that segment as integer or long or string as per your input format.
It is quite effective and has saved lot of time for problems with strict time limits.
Here is my sample code
public int ReadInt()
{
byte readByte;
while ((readByte = GetByte()) < '-') ;
var neg = false;
if (readByte == '-')
{
neg = true;
readByte = GetByte();
}
var m = readByte - '0';
while (true)
{
readByte = GetByte();
if (readByte < '0') break;
m = m * 10 + (readByte - '0');
}
return neg ? -m : m;
}
This is my code used in SPOJ
https://github.com/davidsekar/C-sharp-Programming-IO/blob/master/ConsoleInOut/InputOutput.cs
Thanks,
davidsekar

StringBuilder performance in C#?

I have a StringBuilder object where I am adding some strings like follows:
I want to know which one is better approach here, first one is this:
StringBuilder sb = new StringBuilder();
sb.Append("Hello" + "How" + "are" + "you");
and the second one is:
StringBuilder sb = new StringBuilder();
sb.Append("Hello").Append("How").Append("are").Append("you");

In your current example, the string literals:
"Hello" + "How" + "are" + "you"
Will be compiled into one constant string literal by the compiler, so it is technically faster than:
sb.Append("Hello").Append("How").Append("are").Append("you");
However, were you to use string variables:
sb.Append(s1 + s2 + s3 + s4);
Then the latter would be faster as the former could potentially create a series of strings (because of the concatenation) before passing the final string into the Append method, whereas the latter would avoid the extra string creations (but trades off extra method calls and internal buffer resizing).
Update: For further clarity, in this exact situation where there are only 4 items being concatenated, the compiler will emit a call to String.Concat(string, string, string, string), which knowing the length and number of strings will be more efficient than StringBuilder.

The first will be more efficient. The compiler will convert it to the following single call:
StringBuilder sb = new StringBuilder();
sb.Append("HelloHowareyou");
Measuring the performance
The best way to know which is faster is to measure it. I'll get straight to the point: here are the results (smaller times means faster):
sb.Append("Hello" + "How" + "are" + "you") : 11.428s
sb.Append("Hello").Append("How").Append("are").Append("you"): 15.314s
sb.Append(a + b + c + d) : 21.970s
sb.Append(a).Append(b).Append(c).Append(d) : 15.529s
The number given is the number of seconds to perform the operation 100 million times in a tight loop.
Conclusions
The fastest is using string literals and +.
But if you have variables, using Append is faster than +. The first version is slower because of an extra call to String.Concat.
In case you want to test this yourself, here's the program I used to get the above timings:
using System;
using System.Text;
public class Program
{
public static void Main()
{
DateTime start, end;
int numberOfIterations = 100000000;
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append("Hello" + "How" + "are" + "you");
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(\"Hello\" + \"How\" + \"are\" + \"you\")", start, end);
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append("Hello").Append("How").Append("are").Append("you");
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(\"Hello\").Append(\"How\").Append(\"are\").Append(\"you\")", start, end);
string a = "Hello";
string b = "How";
string c = "are";
string d = "you";
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append(a + b + c + d);
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(a + b + c + d)", start, end);
start = DateTime.UtcNow;
for (int i = 0; i < numberOfIterations; ++i)
{
StringBuilder sb = new StringBuilder();
sb.Append(a).Append(b).Append(c).Append(d);
}
end = DateTime.UtcNow;
DisplayResult("sb.Append(a).Append(b).Append(c).Append(d)", start, end);
Console.ReadLine();
}
private static void DisplayResult(string name, DateTime start, DateTime end)
{
Console.WriteLine("{0,-60}: {1,6:0.000}s", name, (end - start).TotalSeconds);
}
}

String constants will be concatenated at compile time by the compiler. If you are concatenating no more than four string expressions, the compiler will emit a call to String.Concat
s + t + u + v ==> String.Concat(s, t, u, v)
This performs faster than StringBuilder, as StringBuilder might have to resize its internal buffer, while Concat can calculate the total resulting length in advance. If you know the maximum length of the resulting string in advance, however, you can initialize the StringBuilder by specifying an initial working buffer size
var sb = new StringBuilder(initialBufferSize);
StringBuilder is often used in a loop and other dynamic scenarios and performs faster than s += t in such cases.

In the first case the compiler will construct a single string, so you'll only call Append once. However, I doubt this will make much of a difference. What did your measurements show?

The second one is the better approach. Strings are immutable meaning that when you use sb.Append("Hello" + "How" + "Are" + "You") you are creating multiple copies of the string
e.g.
"Hello"
then
"HelloHow"
then
"HelloHowAre"
etc.
The second piece of code is much more performant
edit: Of course this doesn't take into consideration compiler optimisations, but it's best to use the class as intended
Ok as people have pointed out since these are literals the compiler takes care of optimising these operations away - but my point is that doing string concatenation is something that StringBuilder tries to avoid
For instance, looping several times as such:
var someString = "";
foreach (var s in someListOfStrings)
{
someString += s;
}
Is not as good as doing:
var sb = new StringBuilder();
foreach(var s in someListOfStrings)
{
sb.Append(s);
}
sb.ToString();
As this will likely be much quicker since, as I said before, strings are immutable
I assumed the OP was talking about using concatenation in general since
sb.Append("Hello" + "How");
Seems completely pointless when
sb.Append("HelloHow");
Would be more logical...?
It seems to me that in the OPs mind, the placeholder text would eventually become a shedload of variables...

Fast string suffix checking in C# (.NET 4.0)?

What is the fastest method of checking string suffixes in C#?
I need to check each string in a large list (anywhere from 5000 to 100000 items) for a particular term. The term is guaranteed never to be embedded within the string. In other words, if the string contains the term, it will be at the end of the string. The string is also guaranteed to be longer than the suffix. Cultural information is not important.
These are how different methods performed against 100000 strings (half of them have the suffix):
1. Substring Comparison - 13.60ms
2. String.Contains - 22.33ms
3. CompareInfo.IsSuffix - 24.60ms
4. String.EndsWith - 29.08ms
5. String.LastIndexOf - 30.68ms
These are average times. [Edit] Forgot to mention that the strings also get put into separate lists, but this is not important. It does add to the running time though.
On my system substring comparison (extracting the end of the string using the String.Substring method and comparing it to the suffix term) is consistently the fastest when tested against 100000 strings. The problem with using substring comparison though is that Garbage Collection can slow it down considerably (more than the other methods) because String.Substring creates new strings. The effect is not as bad in .NET 4.0 as it was in 3.5 and below, but it is still noticeable. In my tests, String.Substring performed consistently slower on sets of 12000-13000 strings. This will obviously differ between systems and implementations.
[EDIT]
Benchmark code:
http://pastebin.com/smEtYNYN
[EDIT]
FlyingStreudel's code runs fast, but Jon Skeet's recommendation of using EndsWith in conjunction with StringComparison.Ordinal appears to be the best option.

If that's the time taken to check 100,000 strings, does it really matter?
Personally I'd use string.EndsWith on the grounds that it's the most descriptive: it says exactly what you're trying to test.
I'm somewhat suspicious of the fact that it appears to be performing worst though... if you could post your benchmark code, that would be very useful. (In particular, it really shouldn't have to do as much work as string.Contains.)
Have you tried specifying an ordinal match? That may well make it significantly faster:
if (x.EndsWith(y, StringComparison.Ordinal))
Of course, you shouldn't do that unless you want an ordinal comparison - are you expecting culturally-sensitive matches? (Developers tend not to consider this sort of thing, and I very firmly include myself in that category.)

Jon is absolutely right; this is potentially not an apples-to-apples comparison because different string methods have different defaults for culteral sensitivity. Be very sure that you are getting the comparison semantics you intend to in each one.
In addition to Jon's answer, I'd add that the relevant question is not "which is fastest?" but rather "which is too slow?" What's your performance goal for this code? The slowest method still finds the result in less time than it takes a movie projector to advance to the next frame, and obviously that is not noticable by humans. If your goal is that the search appears instantaneous to the user then you're done; any of those methods work. If your goal is that the search take less than a millisecond then none of those methods work; they are all orders of magnitude too slow. What's the budget?

I took a look at your benchmark code and frankly, it looks dodgy.
You are measuring all kinds of extraneous things along with what it is you want to measure; you're measuring the cost of the foreach and the adding to a list, both of which might have costs of the same order of magnitude as the thing you are attempting to test.
Also, you are not throwing out the first run; remember, the JIT compiler is going to jit the code that you call the first time through the loop, and it is going to be hot and ready to go the second time, so your results will therefore be skewed; you are averaging one potentially very large thing with many small things. In the past when I have done this I have discovered situations where the jit time actually dominated the time of everything else. Is that realistic? Do you mean to measure the jit time, or should it be not considered as part of the average?

I dunno how fast this is, but this is what I would do?
static bool HasSuffix(string check, string suffix)
{
int offset = check.Length - suffix.Length;
for (int i = 0; i < suffix.Length; i++)
{
if (check[offset + i] != suffix[i])
{
return false;
}
}
return true;
}
edit: OOPS x2
edit: So I wrote my own little benchmark... does this count? It runs 25 trials of evaluating one million strings and takes the average of the difference in performance. The handful of times I ran it it was consistently outputting that CharCompare was faster by ~10-40ms over one million records. So that is a hugely unimportant increase in efficiency (.000000001s/call) :) All in all I doubt it will matter which method you implement.
class Program
{
volatile static List<string> strings;
static double[] results = new double[25];
static void Main(string[] args)
{
strings = new List<string>();
Random r = new Random();
for (int rep = 0; rep < 25; rep++)
{
Console.WriteLine("Run " + rep);
strings.Clear();
for (int i = 0; i < 1000000; i++)
{
string temp = "";
for (int j = 0; j < r.Next(3, 101); j++)
{
temp += Convert.ToChar(
Convert.ToInt32(
Math.Floor(26 * r.NextDouble() + 65)));
}
if (i % 4 == 0)
{
temp += "abc";
}
strings.Add(temp);
}
OrdinalWorker ow = new OrdinalWorker(strings);
CharWorker cw = new CharWorker(strings);
if (rep % 2 == 0)
{
cw.Run();
ow.Run();
}
else
{
ow.Run();
cw.Run();
}
Thread.Sleep(1000);
results[rep] = ow.finish.Subtract(cw.finish).Milliseconds;
}
double tDiff = 0;
for (int i = 0; i < 25; i++)
{
tDiff += results[i];
}
double average = tDiff / 25;
if (average < 0)
{
average = average * -1;
Console.WriteLine("Char compare faster by {0}ms average",
average.ToString().Substring(0, 4));
}
else
{
Console.WriteLine("EndsWith faster by {0}ms average",
average.ToString().Substring(0, 4));
}
}
}
class OrdinalWorker
{
List<string> list;
int count;
public Thread t;
public DateTime finish;
public OrdinalWorker(List<string> l)
{
list = l;
}
public void Run()
{
t = new Thread(() => {
string suffix = "abc";
for (int i = 0; i < list.Count; i++)
{
count = (list[i].EndsWith(suffix, StringComparison.Ordinal)) ?
count + 1 : count;
}
finish = DateTime.Now;
});
t.Start();
}
}
class CharWorker
{
List<string> list;
int count;
public Thread t;
public DateTime finish;
public CharWorker(List<string> l)
{
list = l;
}
public void Run()
{
t = new Thread(() =>
{
string suffix = "abc";
for (int i = 0; i < list.Count; i++)
{
count = (HasSuffix(list[i], suffix)) ? count + 1 : count;
}
finish = DateTime.Now;
});
t.Start();
}
static bool HasSuffix(string check, string suffix)
{
int offset = check.Length - suffix.Length;
for (int i = 0; i < suffix.Length; i++)
{
if (check[offset + i] != suffix[i])
{
return false;
}
}
return true;
}
}

Did you try direct access ?
I mean, you can make a loop watching for similar string, it could be faster than make a substring and having the same behaviour.
int i,j;
foreach(String testing in lists){
i=0;
j=0;
int ok=1;
while(ok){
i = testing.lenght - PATTERN.lenght;
if(i>0 && i<testing.lenght && testing[i] != PATTERN[j])
ok = 0;
i++;
j++;
}
if(ok) return testing;
}
Moreover if it's big strings, you could try using hashs.

I don't profess to be an expert in this area, however I felt compelled to at least profile this to some extent (knowing full well that my fictitious scenario will differ substantially from your own) and here is what I came up with:
It seems, at least for me, EndsWith takes the lead with LastIndexOf consistently coming in second, some timings are:
SubString: 00:00:00.0191877
Contains: 00:00:00.0201980
CompareInfo: 00:00:00.0255181
EndsWith: 00:00:00.0120296
LastIndexOf: 00:00:00.0133181
These were gleaned from processing 100,000 strings where the desired suffix appeared in all strings and so to me simply echoes Jon's answer (where the benefit is both speed and descriptiveness). And the code used to come to these results:
class Program
{
class Profiler
{
private Stopwatch Stopwatch = new Stopwatch();
public TimeSpan Elapsed { get { return Stopwatch.Elapsed; } }
public void Start()
{
Reset();
Stopwatch.Start();
}
public void Stop()
{
Stopwatch.Stop();
}
public void Reset()
{
Stopwatch.Reset();
}
}
static string suffix = "_sfx";
static Profiler profiler = new Profiler();
static List<string> input = new List<string>();
static List<string> output = new List<string>();
static void Main(string[] args)
{
GenerateSuffixedStrings();
FindStringsWithSuffix_UsingSubString(input, suffix);
Console.WriteLine("SubString: {0}", profiler.Elapsed);
FindStringsWithSuffix_UsingContains(input, suffix);
Console.WriteLine("Contains: {0}", profiler.Elapsed);
FindStringsWithSuffix_UsingCompareInfo(input, suffix);
Console.WriteLine("CompareInfo: {0}", profiler.Elapsed);
FindStringsWithSuffix_UsingEndsWith(input, suffix);
Console.WriteLine("EndsWith: {0}", profiler.Elapsed);
FindStringsWithSuffix_UsingLastIndexOf(input, suffix);
Console.WriteLine("LastIndexOf: {0}", profiler.Elapsed);
Console.WriteLine();
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
static void GenerateSuffixedStrings()
{
for (var i = 0; i < 100000; i++)
{
input.Add(Guid.NewGuid().ToString() + suffix);
}
}
static void FindStringsWithSuffix_UsingSubString(IEnumerable<string> strings, string suffix)
{
output.Clear();
profiler.Start();
foreach (var s in strings)
{
if(s.Substring(s.Length - 4) == suffix)
output.Add(s);
}
profiler.Stop();
}
static void FindStringsWithSuffix_UsingContains(IEnumerable<string> strings, string suffix)
{
output.Clear();
profiler.Start();
foreach (var s in strings)
{
if (s.Contains(suffix))
output.Add(s);
}
profiler.Stop();
}
static void FindStringsWithSuffix_UsingCompareInfo(IEnumerable<string> strings, string suffix)
{
var ci = CompareInfo.GetCompareInfo("en-GB");
output.Clear();
profiler.Start();
foreach (var s in strings)
{
if (ci.IsSuffix(s, suffix))
output.Add(s);
}
profiler.Stop();
}
static void FindStringsWithSuffix_UsingEndsWith(IEnumerable<string> strings, string suffix)
{
output.Clear();
profiler.Start();
foreach (var s in strings)
{
if (s.EndsWith(suffix))
output.Add(s);
}
profiler.Stop();
}
static void FindStringsWithSuffix_UsingLastIndexOf(IEnumerable<string> strings, string suffix)
{
output.Clear();
profiler.Start();
foreach (var s in strings)
{
if (s.LastIndexOf(suffix) == s.Length - 4)
output.Add(s);
}
profiler.Stop();
}
}
EDIT:
As commented, I attempted this again with only some of the strings having a suffix applied and these are the results:
SubString: 00:00:00.0079731
Contains: 00:00:00.0243696
CompareInfo: 00:00:00.0334056
EndsWith: 00:00:00.0196668
LastIndexOf: 00:00:00.0229599
The string generator method was updated as follows, to produce the strings:
static void GenerateSuffixedStrings()
{
var nxt = false;
var rnd = new Random();
for (var i = 0; i < 100000; i++)
{
input.Add(Guid.NewGuid().ToString() +
(rnd.Next(0, 2) == 0 ? suffix : string.Empty));
}
}
Further, this trend continues if none of the string have a suffix:
SubString: 00:00:00.0055584
Contains: 00:00:00.0187089
CompareInfo: 00:00:00.0228983
EndsWith: 00:00:00.0114227
LastIndexOf: 00:00:00.0199328
However, this gap shortens again when assigning a quarter of the inputs a suffix (the first quarter, then sorting to randomise the coverage):
SubString: 00:00:00.0302997
Contains: 00:00:00.0305685
CompareInfo: 00:00:00.0306335
EndsWith: 00:00:00.0351229
LastIndexOf: 00:00:00.0322899
Conclusion? IMO, and agreeing with Jon, EndsWith seems the way to go (based on this limited test, anyway).
Further Edit:
To cure Jon's curiosity I ran a few more tests on EndsWith, with and without Ordinal string comparison...
On 100,000 strings with a quarter of them suffixed:
EndsWith: 00:00:00.0795617
OrdinalEndsWith: 00:00:00.0240631
On 1,000,000 strings with a quarter of them suffixed:
EndsWith: 00:00:00.5460591
OrdinalEndsWith: 00:00:00.2807860
On 10,000,000 strings with a quarter of them suffixed:
EndsWith: 00:00:07.5889581
OrdinalEndsWith: 00:00:03.3248628
Note that I only ran the last test once as generating the strings proved this laptop is in need of a replacement

There's a lot of good information here. I wanted to note that if your suffix is short, it could be even faster to look at the last few characters individually. My modified version of the benchmark code in question is here: http://pastebin.com/6nNdbEvW. It gives theses results:
Last char equality: 1.52 ms (50000)
Last 2 char equality: 1.56 ms (50000)
EndsWith using StringComparison.Ordinal: 3.75 ms (50000)
Contains: 11.10 ms (50000)
LastIndexOf: 14.85 ms (50000)
IsSuffix: 11.30 ms (50000)
Substring compare: 17.69 ms (50000)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String vs. StringBuilder when editing a long string? - c#

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

How can I concatenate a list

Faster ways of console I/O in C#

StringBuilder performance in C#?

Fast string suffix checking in C# (.NET 4.0)?

Categories

Resources