Related
I'm facing a problem while developing an application.
Basically,
I have a fixed string, let's say "IHaveADream"
I now want to user to insert another string, for my purpose of a fixed length, and then concatenate every character of the fixed string with every character of the string inserted by the user.
e.g.
The user inserts "ByeBye"
then the output would be:
"IBHyaevBeyAeDream".
How to accomplish this?
I have tried with String.Concat and String.Join, inside a for statement, with no luck.
One memory-efficient option is to use a string builder, since both the original string and the user input could potentially be rather large. As mentioned by Kris, you can initialize your StringBuilder capacity to the combined length of both strings.
void Main()
{
var start = "IHaveADream";
var input = "ByeBye";
var sb = new StringBuilder(start.Length + input.Length);
for (int i = 0; i < start.Length; i++)
{
sb.Append(start[i]);
if (input.Length >= i + 1)
sb.Append(input[i]);
}
sb.ToString().Dump();
}
This only safely accounts for the input string being shorter or equal in length to the starting string. If you had a longer input string, you'd want to take the longer length as the end point for your for loop iteration and check that each array index is not out of bounds.
void Main()
{
var start = "IHaveADream";
var input = "ByeByeByeByeBye";
var sb = new StringBuilder(start.Length + input.Length);
var length = start.Length >= input.Length ? start.Length : input.Length;
for (int i = 0; i < length; i++)
{
if (start.Length >= i + 1)
sb.Append(start[i]);
if (input.Length >= i + 1)
sb.Append(input[i]);
}
sb.ToString().Dump();
}
You can create an array of characters and then re-combine them in the order you want.
char[] chars1 = "IHaveADream".ToCharArray();
char[] chars2 = "ByeBye".ToCharArray();
// you can create a custom algorithm to handle
// different size strings.
char[] c = new char[17];
c[0] = chars1[0];
c[1] = chars2[0];
...
c[13] = chars1[10];
string s = new string(c);
var firstString = "Ihaveadream";
var secondString = "ByeBye";
var stringBuilder = new StringBuilder();
for (int i = 0; i< firstString.Length; i++) {
stringBuilder .Append(str[i]);
if (i < secondString.Length) {
stringBuilder .Append(secondStr[i]);
}
}
var result = stringBuilder.ToString();
If you don't care much about memory usage or perfomance you can just use:
public static string concatStrings(string value, string value2)
{
string result = "";
int i = 0;
for (i = 0; i < Math.Max(value.Length, value2.Length) ; i++)
{
if (i < value.Length) result += value[i].ToString();
if (i < value2.Length) result += value2[i].ToString();
}
return result;
}
Usage:
string conststr = "IHaveADream";
string input = "ByeBye";
var result = ConcatStrings(conststr, input);
Console.WriteLine(result);
Output: IBHyaevBeyAeDream
P.S.
Just checked perfomance of both methods (with strBuilder and simple cancatenation) and it appears to be that both of this methods take same time to execute (if you have just one operation). The main reason for it is that string builder take considerable time to initialize while with use of concatenation we don't need that.
But in case if you have to process something like 1500 strings then it's different story and string builder is more of an option.
For 100 000 method executions it showed 85 (str buld) vs 22 (concat) ms respectively.
My Code
I'm using a parallel loop to call a webservice because the individual for loop is too slow. However the results comes out skipping some of the item.
Code:
private void readCSV(string FilePath, string Extension)
{
switch (Extension)
{
case ".csv":
var reader = new StreamReader(File.OpenRead(FilePath));
int counter = 0;
List<int> phoneNo = new List<int>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
phoneNo.Add(int.Parse(line));
}
reader.Close();
Parallel.For(0, phoneNo.Count, (index) =>
{
counter++;
Literal1.Text += counter + " " + phoneNo[index] + " " + webserviceClass123.callWebserviceMethod(phoneNo[index]) + "<br/>";
});
break;
}
}
So the results should be like (example)
1 4189291 40.10
2 5124910 23.10
3 5123145 12.11
...
...
50 4124919 20.58
but it comes out as
3 8581892 41.10
1 9281989 10.99
50 4199289 02.22
It is jumbled up, and it misses a lot of data
How do I get it to be in order and ensure that all the data is represented?
It's not at all clear that you should expect Literal1.Text += ... to be thread-safe. I would suggest you use the Parallel.For loop just to collect the data, and then change Literal1.Text afterwards.
For example, you could write:
var results = new WhateverType[phoneNo.Count];
Parallel.For(0, phoneNo.Count,
index => results[index] = webserviceClass123.callWebserviceMethod(phoneNo[index]));
var builder = new StringBuilder();
for (int i = 0; i < phoneNo.Count; i++)
{
builder.AppendFormat("{0} {1} {2}<br/>",
i, phoneNo[i], results[i]);
}
Literal1.Text = builder.ToString();
It would quite possibly be even cleaner to use Parallel LINQ:
var results = phoneNo
.AsParallel()
.Select(number => new {
number,
result = webserviceClass123.callWebserviceMethod(number)
})
.AsOrdered()
.ToList()
var builder = new StringBuilder();
foreach (int i = 0; i < results.Count; i++)
{
builder.AppendFormat("{0} {1} {2}<br/>",
i, result[i].number, results[i].result);
}
Literal1.Text = builder.ToString();
Why is StringBuilder slower when compared to + concatenation?
StringBuilder was meant to avoid extra object creation, but why does it penalize performance?
static void Main(string[] args)
{
int max = 1000000;
for (int times = 0; times < 5; times++)
{
Console.WriteLine("\ntime: {0}", (times+1).ToString());
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < max; i++)
{
string msg = "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
}
sw.Stop();
Console.WriteLine("String +\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
sw = Stopwatch.StartNew();
for (int j = 0; j < max; j++)
{
StringBuilder msg = new StringBuilder();
msg.Append("Your total is ");
msg.Append("$500 ");
msg.Append(DateTime.Now);
}
sw.Stop();
Console.WriteLine("StringBuilder\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
}
Console.Read();
}
EDIT: Moving out of scope variables as suggested:
Change so that the StringBuilder isn't instantiated all the time, instead .Clear() it:
time: 1
String + : 3348ms
StringBuilder : 3151ms
time: 2
String + : 3346ms
StringBuilder : 3050ms
etc.
Note that this still tests exactly the same functionality, but tries to reuse resources a bit smarter.
Code: (also live on http://ideone.com/YuaqY)
using System;
using System.Text;
using System.Diagnostics;
public class Program
{
static void Main(string[] args)
{
int max = 1000000;
for (int times = 0; times < 5; times++)
{
{
Console.WriteLine("\ntime: {0}", (times+1).ToString());
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < max; i++)
{
string msg = "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
}
sw.Stop();
Console.WriteLine("String +\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
}
{
Stopwatch sw = Stopwatch.StartNew();
StringBuilder msg = new StringBuilder();
for (int j = 0; j < max; j++)
{
msg.Clear();
msg.Append("Your total is ");
msg.Append("$500 ");
msg.Append(DateTime.Now);
}
sw.Stop();
Console.WriteLine("StringBuilder\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
}
}
Console.Read();
}
}
You are creating a new instance of StringBuilder with every iteration, and that incurs some overhead. Since you are not using it for what it's actually meant to do (ie: build large strings which would otherwise require many string concatenation operations), it's not surprising to see worse performance than concatenation.
A more common comparison / usage of StringBuilder is something like:
string msg = "";
for (int i = 0; i < max; i++)
{
msg += "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
}
StringBuilder msg_sb = new StringBuilder();
for (int j = 0; j < max; j++)
{
msg_sb.Append("Your total is ");
msg_sb.Append("$500 ");
msg_sb.Append(DateTime.Now);
}
With this, you'll observe a significant performance difference between StringBuilder and concatenation. And by "significant" I mean orders of magnitude, not the ~ 10% difference you are observing in your examples.
Since StringBuilder doesn't have to build tons of intermediary strings that will just get thrown away, you get much better performance. That's what it's meant for. For smaller strings, you are better off using string concatenation for simplicity and clarity.
The benefits of StringBuilder should be noticeable with longer strings.
Every time you concatenate a string you create a new string object, so the longer the string, the more is needed to copy from the old string to the new string.
Also, creating many temporary objects may have an adverse effect on performance that is not measurable by a StopWatch, because it "pollutes" the managed heap with temporary objects and may cause more garbage collection cycles.
Modify your test to create (much) longer strings and use (many) more concatenations / append operations and the StringBuilder should perform better.
Note that
string msg = "Your total is ";
msg += "$500 ";
msg += DateTime.Now;
compiles down to
string msg = String.Concat("Your total is ", "$500 ");
msg = String.Concat(msg, DateTime.Now.ToString());
This totals two concats and one ToString per iteration. Also, a single String.Concat is really fast, because it knows how large the resulting string will be, so it only allocates the resulting string once, and then quickly copies the source strings into it. This means that in practice
String.Concat(x, y);
will always outperform
StringBuilder builder = new StringBuilder();
builder.Append(x);
builder.Append(y);
because StringBuilder cannot take such shortcuts (you could call a thirs Append, or a Remove, that's not possible with String.Concat).
The way a StringBuilder works is by allocating an initial buffer and set the string length to 0. With each Append, it has to check the buffer, possibly allocate more buffer space (usually copying the old buffer to the new buffer), copy the string and increment the string length of the builder. String.Concat does not need to do all this extra work.
So for simple string concatenations, x + y (i.e., String.Concat) will always outperform StringBuilder.
Now, you'll start to get benefits from StringBuilder once you start concatenating lots of strings into a single buffer, or you're doing lots of manipulations on the buffer, where you'd need to keep creating new strings when not using a StringBuilder. This is because StringBuilder only occasionally allocates new memory, in chunks, but String.Concat, String.SubString, etc. (nearly) always allocate new memory. (Something like "".SubString(0,0) or String.Concat("", "") won't allocate memory, but those are degenerate cases.)
In addition to not using StringBuilder as in the most efficient manner, you're also not using string concatenation as efficiently as possible. If you know how many strings you're concatenating ahead of time, then doing it all on one line should be fastest. The compiler optimizes the operation so that no intermediate strings are generated.
I added a couple more test cases. One is basically the same as what sehe suggested, and the other generates the string in one line:
sw = Stopwatch.StartNew();
builder = new StringBuilder();
for (int j = 0; j < max; j++)
{
builder.Clear();
builder.Append("Your total is ");
builder.Append("$500 ");
builder.Append(DateTime.Now);
}
sw.Stop();
Console.WriteLine("StringBuilder (clearing)\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
sw = Stopwatch.StartNew();
for (int i = 0; i < max; i++)
{
msg = "Your total is " + "$500" + DateTime.Now;
}
sw.Stop();
Console.WriteLine("String + (one line)\t: {0}ms", ((int)sw.ElapsedMilliseconds).ToString().PadLeft(6));
And here is an example of the output I see on my machine:
time: 1
String + : 3707ms
StringBuilder : 3910ms
StringBuilder (clearing) : 3683ms
String + (one line) : 3645ms
time: 2
String + : 3703ms
StringBuilder : 3926ms
StringBuilder (clearing) : 3666ms
String + (one line) : 3625ms
In general:
- StringBuilder does better if you're building a large string in a lot of steps, or you don't know how many strings will be concatenated together.
- Mashing them all together in a single expression is better whenever it's a reasonable option option.
I think its better to compare effeciancy between String and StringBuilder rather then time.
what msdn says:
A String is called immutable because its value cannot be modified once it has been created. Methods that appear to modify a String actually return a new String containing the modification. If it is necessary to modify the actual contents of a string-like object, use the System.Text.StringBuilder class.
string msg = "Your total is "; // a new string object
msg += "$500 "; // a new string object
msg += DateTime.Now; // a new string object
see which one is better.
Here is an example that demonstrates a situation in which StringBuilder will execute more quickly than string concatenation:
static void Main(string[] args)
{
const int sLen = 30, Loops = 10000;
DateTime sTime, eTime;
int i;
string sSource = new String('X', sLen);
string sDest = "";
//
// Time StringBuilder.
//
for (int times = 0; times < 5; times++)
{
sTime = DateTime.Now;
System.Text.StringBuilder sb = new System.Text.StringBuilder((int)(sLen * Loops * 1.1));
Console.WriteLine("Result # " + (times + 1).ToString());
for (i = 0; i < Loops; i++)
{
sb.Append(sSource);
}
sDest = sb.ToString();
eTime = DateTime.Now;
Console.WriteLine("String Builder took :" + (eTime - sTime).TotalSeconds + " seconds.");
//
// Time string concatenation.
//
sTime = DateTime.Now;
for (i = 0; i < Loops; i++)
{
sDest += sSource;
//Console.WriteLine(i);
}
eTime = DateTime.Now;
Console.WriteLine("Concatenation took : " + (eTime - sTime).TotalSeconds + " seconds.");
Console.WriteLine("\n");
}
//
// Make the console window stay open
// so that you can see the results when running from the IDE.
//
}
Result # 1
String Builder took :0 seconds.
Concatenation took : 8.7659616 seconds.
Result # 2
String Builder took :0 seconds.
Concatenation took : 8.7659616 seconds.
Result # 3
String Builder took :0 seconds.
Concatenation took : 8.9378432 seconds.
Result # 4
String Builder took :0 seconds.
Concatenation took : 8.7972128 seconds.
Result # 5
String Builder took :0 seconds.
Concatenation took : 8.8753408 seconds.
StringBulder is much faster than + concatenation..
I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace() method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):
XML.Contains("<name>" + workspaceName + "</name>");
I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.
This is fastest way I know of, even though you said you didn't want to use regular expressions:
Regex.Replace(XML, #"\s+", "");
Crediting #hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.
private static readonly Regex sWhitespace = new Regex(#"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}
I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:
public static string RemoveWhitespace(this string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
I tested it in a simple unit test:
[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
string s = null;
for (int i = 0; i < 1000000; i++)
{
s = input.RemoveWhitespace();
}
Assert.AreEqual(expected, s);
}
[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
string s = null;
for (int i = 0; i < 1000000; i++)
{
s = Regex.Replace(input, #"\s+", "");
}
Assert.AreEqual(expected, s);
}
For 1,000,000 attempts the first option (without regexp) runs in less than a second (700 ms on my machine), and the second takes 3.5 seconds.
Try the replace method of the string in C#.
XML.Replace(" ", string.Empty);
My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.
str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
Timings for 10,000 loop on simple string with whitespace inc new lines and tabs
split/join = 60 milliseconds
linq chararray = 94 milliseconds
regex = 437 milliseconds
Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...
public static string RemoveWhitespace(this string str) {
return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}
Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.
Long input string results:
InPlaceCharArray: 2021 ms (Sunsetquest's answer) - (Original source)
String split then join: 4277ms (Kernowcode's answer)
String reader: 6082 ms
LINQ using native char.IsWhitespace: 7357 ms
LINQ: 7746 ms (Henk's answer)
ForLoop: 32320 ms
RegexCompiled: 37157 ms
Regex: 42940 ms
Short input string results:
InPlaceCharArray: 108 ms (Sunsetquest's answer) - (Original source)
String split then join: 294 ms (Kernowcode's answer)
String reader: 327 ms
ForLoop: 343 ms
LINQ using native char.IsWhitespace: 624 ms
LINQ: 645ms (Henk's answer)
RegexCompiled: 1671 ms
Regex: 2599 ms
Code:
public class RemoveWhitespace
{
public static string RemoveStringReader(string input)
{
var s = new StringBuilder(input.Length); // (input.Length);
using (var reader = new StringReader(input))
{
int i = 0;
char c;
for (; i < input.Length; i++)
{
c = (char)reader.Read();
if (!char.IsWhiteSpace(c))
{
s.Append(c);
}
}
}
return s.ToString();
}
public static string RemoveLinqNativeCharIsWhitespace(string input)
{
return new string(input.ToCharArray()
.Where(c => !char.IsWhiteSpace(c))
.ToArray());
}
public static string RemoveLinq(string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
public static string RemoveRegex(string input)
{
return Regex.Replace(input, #"\s+", "");
}
private static Regex compiled = new Regex(#"\s+", RegexOptions.Compiled);
public static string RemoveRegexCompiled(string input)
{
return compiled.Replace(input, "");
}
public static string RemoveForLoop(string input)
{
for (int i = input.Length - 1; i >= 0; i--)
{
if (char.IsWhiteSpace(input[i]))
{
input = input.Remove(i, 1);
}
}
return input;
}
public static string StringSplitThenJoin(this string str)
{
return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}
public static string RemoveInPlaceCharArray(string input)
{
var len = input.Length;
var src = input.ToCharArray();
int dstIdx = 0;
for (int i = 0; i < len; i++)
{
var ch = src[i];
switch (ch)
{
case '\u0020':
case '\u00A0':
case '\u1680':
case '\u2000':
case '\u2001':
case '\u2002':
case '\u2003':
case '\u2004':
case '\u2005':
case '\u2006':
case '\u2007':
case '\u2008':
case '\u2009':
case '\u200A':
case '\u202F':
case '\u205F':
case '\u3000':
case '\u2028':
case '\u2029':
case '\u0009':
case '\u000A':
case '\u000B':
case '\u000C':
case '\u000D':
case '\u0085':
continue;
default:
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
}
Tests:
[TestFixture]
public class Test
{
// Short input
//private const string input = "123 123 \t 1adc \n 222";
//private const string expected = "1231231adc222";
// Long input
private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";
private const int iterations = 1000000;
[Test]
public void RemoveInPlaceCharArray()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveInPlaceCharArray(input);
}
stopwatch.Stop();
Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveStringReader()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveStringReader(input);
}
stopwatch.Stop();
Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveLinqNativeCharIsWhitespace()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
}
stopwatch.Stop();
Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveLinq()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveLinq(input);
}
stopwatch.Stop();
Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveRegex()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveRegex(input);
}
stopwatch.Stop();
Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveRegexCompiled()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveRegexCompiled(input);
}
stopwatch.Stop();
Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveForLoop()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveForLoop(input);
}
stopwatch.Stop();
Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[TestMethod]
public void StringSplitThenJoin()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.StringSplitThenJoin(input);
}
stopwatch.Stop();
Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
}
Edit: Tested a nice one liner from Kernowcode.
Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.
input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.Select(c => c.ToString())
.Aggregate((a, b) => a + b);
Testing 1,000,000 loops on "This is a simple Test"
This method = 1.74 seconds
Regex = 2.58 seconds
new String (Henks) = 0.82 seconds
I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)
He tested ten different methods. This one is the fastest safe version...
public static string TrimAllWithInplaceCharArray(string str) {
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
for (int i = 0; i < len; i++) {
var ch = src[i];
switch (ch) {
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
continue;
default:
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )
public static unsafe void RemoveAllWhitespace(ref string str)
{
fixed (char* pfixed = str)
{
char* dst = pfixed;
for (char* p = pfixed; *p != 0; p++)
{
switch (*p)
{
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
continue;
default:
*dst++ = *p;
break;
}
}
uint* pi = (uint*)pfixed;
ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
pi[-1] = (uint)len;
pfixed[len] = '\0';
}
}
There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.
If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.
If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:
public static string RemoveWhitespace(this string input)
{
int j = 0, inputlen = input.Length;
char[] newarr = new char[inputlen];
for (int i = 0; i < inputlen; ++i)
{
char tmp = input[i];
if (!char.IsWhiteSpace(tmp))
{
newarr[j] = tmp;
++j;
}
}
return new String(newarr, 0, j);
}
Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:
public static partial class Extension
{
public static string RemoveWhiteSpace(this string self)
{
return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
}
}
I think alot of persons come here for removing spaces. :
string s = "my string is nice";
s = s.replace(" ", "");
Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.
static string RemoveWhitespace(string input)
{
StringBuilder output = new StringBuilder(input.Length);
for (int index = 0; index < input.Length; index++)
{
if (!Char.IsWhiteSpace(input, index))
{
output.Append(input[index]);
}
}
return output.ToString();
}
I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:
"a b c\r\n d\t\t\t e"
to
"a b c d e"
I used the following method
private static string RemoveWhiteSpace(string value)
{
if (value == null) { return null; }
var sb = new StringBuilder();
var lastCharWs = false;
foreach (var c in value)
{
if (char.IsWhiteSpace(c))
{
if (lastCharWs) { continue; }
sb.Append(' ');
lastCharWs = true;
}
else
{
sb.Append(c);
lastCharWs = false;
}
}
return sb.ToString();
}
I assume your XML response looks like this:
var xml = #"<names>
<name>
foo
</name>
<name>
bar
</name>
</names>";
The best way to process XML is to use an XML parser, such as LINQ to XML:
var doc = XDocument.Parse(xml);
var containsFoo = doc.Root
.Elements("name")
.Any(e => ((string)e).Trim() == "foo");
We can use:
public static string RemoveWhitespace(this string input)
{
if (input == null)
return null;
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
Using Linq, you can write a readable method this way :
public static string RemoveAllWhitespaces(this string source)
{
return string.IsNullOrEmpty(source) ? source : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
}
Here is yet another variant:
public static string RemoveAllWhitespace(string aString)
{
return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}
As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.
I have found different results to be true. I am trying to replace all whitespace with a single space and the regex was extremely slow.
return( Regex::Replace( text, L"\s+", L" " ) );
What worked the most optimally for me (in C++ cli) was:
String^ ReduceWhitespace( String^ text )
{
String^ newText;
bool inWhitespace = false;
Int32 posStart = 0;
Int32 pos = 0;
for( pos = 0; pos < text->Length; ++pos )
{
wchar_t cc = text[pos];
if( Char::IsWhiteSpace( cc ) )
{
if( !inWhitespace )
{
if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
inWhitespace = true;
newText += L' ';
}
posStart = pos + 1;
}
else
{
if( inWhitespace )
{
inWhitespace = false;
posStart = pos;
}
}
}
if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
return( newText );
}
I tried the above routine first by replacing each character separately, but had to switch to doing substrings for the non-space sections. When applying to a 1,200,000 character string:
the above routine gets it done in 25 seconds
the above routine + separate character replacement in 95 seconds
the regex aborted after 15 minutes.
The straightforward way to remove all whitespaces from a string, "example" is your initial string.
String.Concat(example.Where(c => !Char.IsWhiteSpace(c))
I'm getting a "FormatException: Input string was not in a correct format" error that I don't understand.
I'm using the following lines to write a string to a text file:
using (StreamWriter sw = new StreamWriter(myfilename, false, System.Text.Encoding.GetEncoding(enc)))
{
sw.Write(mystring, Environment.NewLine);
}
(the encoding part is because I do have an option in my application to set it to utf-8 or iso-8859-1... but I think that's irrelevant).
All of my strings write out just fine except this one string that is different from the others because it actually has a snippet of javascript code in it. I'm sure that one of the special characters there might be causing the problem but how do I know?
The one thing I tried was to insert the following line just before the sw.Write statement above:
System.Console.WriteLine(mystring);
and it wrote out to the console just fine - no error.
Help?
Thanks! (and Happy New Year!)
-Adeena
The overload you are using takes the format as the first parameter, and objects to inject after that.
You can do either of the following:
sw.Write(mystring + Environment.NewLine);
or
sw.Write("{0}{1}", mystring, Environment.NewLine);
In response to the comments from DK, I tested to what extend string concatenation is slower. I made this setup with three options;
concatenating the string
calling sw.Write twice
calling sw.WriteLine
On my machine, the second option is about 88% faster than average. At 10000000 iterations they use 3517, 2420 and 3385 milliseconds.
It should only be significant if this is code that is called many times in your program.
using System;
using System.IO;
using System.Text;
class Program
{
static void Main(string[] args)
{
const string myString = "kdhlkhldhcøehdhkjehdkhekdhk";
int iterations=getIntFromParams(args, 0, 10);
int method = getIntFromParams(args, 1, 0);
var fileName=Path.GetTempFileName();
using (StreamWriter sw = new StreamWriter(fileName, false, Encoding.Default))
{
switch (method)
{
case 0:
Console.WriteLine("Starting method with concatenation. Iterations: " + iterations);
var start0 = DateTimeOffset.Now;
for (int i = 0; i < iterations; i++)
{
sw.Write(myString + Environment.NewLine);
}
var time0 = DateTimeOffset.Now - start0;
Console.WriteLine("End at " + time0.TotalMilliseconds + " ms.");
break;
case 1:
Console.WriteLine("Starting method without concatenation. Iterations: " + iterations);
var start1 = DateTimeOffset.Now;
for (int i = 0; i < iterations; i++)
{
sw.Write(myString);
sw.Write(Environment.NewLine);
}
var time1 = DateTimeOffset.Now - start1;
Console.WriteLine("End at " + time1.TotalMilliseconds + " ms.");
break;
case 2:
Console.WriteLine("Starting method without concatenation, using WriteLine. Iterations: " + iterations);
var start2 = DateTimeOffset.Now;
for (int i = 0; i < iterations; i++)
{
sw.WriteLine(myString);
}
var time2 = DateTimeOffset.Now - start2;
Console.WriteLine("End at " + time2.TotalMilliseconds + " ms.");
break;
}
}
}
private static int getIntFromParams(string[] args, int index, int #default)
{
int value;
try
{
if (!int.TryParse(args[index], out value))
{
value = #default;
}
}
catch(IndexOutOfRangeException)
{
value = #default;
}
return value;
}
}