Efficient way to remove ALL whitespace from String? - c#

I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace() method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):
XML.Contains("<name>" + workspaceName + "</name>");
I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.

This is fastest way I know of, even though you said you didn't want to use regular expressions:
Regex.Replace(XML, #"\s+", "");
Crediting #hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.
private static readonly Regex sWhitespace = new Regex(#"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}

I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:
public static string RemoveWhitespace(this string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
I tested it in a simple unit test:
[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
string s = null;
for (int i = 0; i < 1000000; i++)
{
s = input.RemoveWhitespace();
}
Assert.AreEqual(expected, s);
}
[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
string s = null;
for (int i = 0; i < 1000000; i++)
{
s = Regex.Replace(input, #"\s+", "");
}
Assert.AreEqual(expected, s);
}
For 1,000,000 attempts the first option (without regexp) runs in less than a second (700 ms on my machine), and the second takes 3.5 seconds.

Try the replace method of the string in C#.
XML.Replace(" ", string.Empty);

My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.
str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
Timings for 10,000 loop on simple string with whitespace inc new lines and tabs
split/join = 60 milliseconds
linq chararray = 94 milliseconds
regex = 437 milliseconds
Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...
public static string RemoveWhitespace(this string str) {
return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}

Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.
Long input string results:
InPlaceCharArray: 2021 ms (Sunsetquest's answer) - (Original source)
String split then join: 4277ms (Kernowcode's answer)
String reader: 6082 ms
LINQ using native char.IsWhitespace: 7357 ms
LINQ: 7746 ms (Henk's answer)
ForLoop: 32320 ms
RegexCompiled: 37157 ms
Regex: 42940 ms
Short input string results:
InPlaceCharArray: 108 ms (Sunsetquest's answer) - (Original source)
String split then join: 294 ms (Kernowcode's answer)
String reader: 327 ms
ForLoop: 343 ms
LINQ using native char.IsWhitespace: 624 ms
LINQ: 645ms (Henk's answer)
RegexCompiled: 1671 ms
Regex: 2599 ms
Code:
public class RemoveWhitespace
{
public static string RemoveStringReader(string input)
{
var s = new StringBuilder(input.Length); // (input.Length);
using (var reader = new StringReader(input))
{
int i = 0;
char c;
for (; i < input.Length; i++)
{
c = (char)reader.Read();
if (!char.IsWhiteSpace(c))
{
s.Append(c);
}
}
}
return s.ToString();
}
public static string RemoveLinqNativeCharIsWhitespace(string input)
{
return new string(input.ToCharArray()
.Where(c => !char.IsWhiteSpace(c))
.ToArray());
}
public static string RemoveLinq(string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
public static string RemoveRegex(string input)
{
return Regex.Replace(input, #"\s+", "");
}
private static Regex compiled = new Regex(#"\s+", RegexOptions.Compiled);
public static string RemoveRegexCompiled(string input)
{
return compiled.Replace(input, "");
}
public static string RemoveForLoop(string input)
{
for (int i = input.Length - 1; i >= 0; i--)
{
if (char.IsWhiteSpace(input[i]))
{
input = input.Remove(i, 1);
}
}
return input;
}
public static string StringSplitThenJoin(this string str)
{
return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}
public static string RemoveInPlaceCharArray(string input)
{
var len = input.Length;
var src = input.ToCharArray();
int dstIdx = 0;
for (int i = 0; i < len; i++)
{
var ch = src[i];
switch (ch)
{
case '\u0020':
case '\u00A0':
case '\u1680':
case '\u2000':
case '\u2001':
case '\u2002':
case '\u2003':
case '\u2004':
case '\u2005':
case '\u2006':
case '\u2007':
case '\u2008':
case '\u2009':
case '\u200A':
case '\u202F':
case '\u205F':
case '\u3000':
case '\u2028':
case '\u2029':
case '\u0009':
case '\u000A':
case '\u000B':
case '\u000C':
case '\u000D':
case '\u0085':
continue;
default:
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
}
Tests:
[TestFixture]
public class Test
{
// Short input
//private const string input = "123 123 \t 1adc \n 222";
//private const string expected = "1231231adc222";
// Long input
private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";
private const int iterations = 1000000;
[Test]
public void RemoveInPlaceCharArray()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveInPlaceCharArray(input);
}
stopwatch.Stop();
Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveStringReader()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveStringReader(input);
}
stopwatch.Stop();
Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveLinqNativeCharIsWhitespace()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
}
stopwatch.Stop();
Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveLinq()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveLinq(input);
}
stopwatch.Stop();
Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveRegex()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveRegex(input);
}
stopwatch.Stop();
Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveRegexCompiled()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveRegexCompiled(input);
}
stopwatch.Stop();
Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveForLoop()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveForLoop(input);
}
stopwatch.Stop();
Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[TestMethod]
public void StringSplitThenJoin()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.StringSplitThenJoin(input);
}
stopwatch.Stop();
Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
}
Edit: Tested a nice one liner from Kernowcode.

Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.
input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.Select(c => c.ToString())
.Aggregate((a, b) => a + b);
Testing 1,000,000 loops on "This is a simple Test"
This method = 1.74 seconds
Regex = 2.58 seconds
new String (Henks) = 0.82 seconds

I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)
He tested ten different methods. This one is the fastest safe version...
public static string TrimAllWithInplaceCharArray(string str) {
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
for (int i = 0; i < len; i++) {
var ch = src[i];
switch (ch) {
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
continue;
default:
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )
public static unsafe void RemoveAllWhitespace(ref string str)
{
fixed (char* pfixed = str)
{
char* dst = pfixed;
for (char* p = pfixed; *p != 0; p++)
{
switch (*p)
{
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
continue;
default:
*dst++ = *p;
break;
}
}
uint* pi = (uint*)pfixed;
ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
pi[-1] = (uint)len;
pfixed[len] = '\0';
}
}
There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.

If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.
If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:
public static string RemoveWhitespace(this string input)
{
int j = 0, inputlen = input.Length;
char[] newarr = new char[inputlen];
for (int i = 0; i < inputlen; ++i)
{
char tmp = input[i];
if (!char.IsWhiteSpace(tmp))
{
newarr[j] = tmp;
++j;
}
}
return new String(newarr, 0, j);
}

Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:
public static partial class Extension
{
public static string RemoveWhiteSpace(this string self)
{
return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
}
}

I think alot of persons come here for removing spaces. :
string s = "my string is nice";
s = s.replace(" ", "");

Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.
static string RemoveWhitespace(string input)
{
StringBuilder output = new StringBuilder(input.Length);
for (int index = 0; index < input.Length; index++)
{
if (!Char.IsWhiteSpace(input, index))
{
output.Append(input[index]);
}
}
return output.ToString();
}

I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:
"a b c\r\n d\t\t\t e"
to
"a b c d e"
I used the following method
private static string RemoveWhiteSpace(string value)
{
if (value == null) { return null; }
var sb = new StringBuilder();
var lastCharWs = false;
foreach (var c in value)
{
if (char.IsWhiteSpace(c))
{
if (lastCharWs) { continue; }
sb.Append(' ');
lastCharWs = true;
}
else
{
sb.Append(c);
lastCharWs = false;
}
}
return sb.ToString();
}

I assume your XML response looks like this:
var xml = #"<names>
<name>
foo
</name>
<name>
bar
</name>
</names>";
The best way to process XML is to use an XML parser, such as LINQ to XML:
var doc = XDocument.Parse(xml);
var containsFoo = doc.Root
.Elements("name")
.Any(e => ((string)e).Trim() == "foo");

We can use:
public static string RemoveWhitespace(this string input)
{
if (input == null)
return null;
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}

Using Linq, you can write a readable method this way :
public static string RemoveAllWhitespaces(this string source)
{
return string.IsNullOrEmpty(source) ? source : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
}

Here is yet another variant:
public static string RemoveAllWhitespace(string aString)
{
return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}
As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.

I have found different results to be true. I am trying to replace all whitespace with a single space and the regex was extremely slow.
return( Regex::Replace( text, L"\s+", L" " ) );
What worked the most optimally for me (in C++ cli) was:
String^ ReduceWhitespace( String^ text )
{
String^ newText;
bool inWhitespace = false;
Int32 posStart = 0;
Int32 pos = 0;
for( pos = 0; pos < text->Length; ++pos )
{
wchar_t cc = text[pos];
if( Char::IsWhiteSpace( cc ) )
{
if( !inWhitespace )
{
if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
inWhitespace = true;
newText += L' ';
}
posStart = pos + 1;
}
else
{
if( inWhitespace )
{
inWhitespace = false;
posStart = pos;
}
}
}
if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
return( newText );
}
I tried the above routine first by replacing each character separately, but had to switch to doing substrings for the non-space sections. When applying to a 1,200,000 character string:
the above routine gets it done in 25 seconds
the above routine + separate character replacement in 95 seconds
the regex aborted after 15 minutes.

The straightforward way to remove all whitespaces from a string, "example" is your initial string.
String.Concat(example.Where(c => !Char.IsWhiteSpace(c))

Related

Thousands separator after the decimal point [duplicate]

I wonder what would be the best way to format numbers so that the NumberGroupSeparator would work not only on the integer part to the left of the comma, but also on the fractional part, on the right of the comma.
Math.PI.ToString("###,###,##0.0##,###,###,###") // As documented ..
// ..this doesn't work
3.14159265358979 // result
3.141,592,653,589,79 // desired result
As documented on MSDN the NumberGroupSeparator works only to the left of the comma. I wonder why??
A little clunky, and it won't work for scientific numbers but here is a try:
class Program
{
static void Main(string[] args)
{
var π=Math.PI*10000;
Debug.WriteLine(Display(π));
// 31,415.926,535,897,931,899
}
static string Display(double x)
{
int s=Math.Sign(x);
x=Math.Abs(x);
StringBuilder text=new StringBuilder();
var y=Math.Truncate(x);
text.Append((s*y).ToString("#,#"));
x-=y;
if (x>0)
{
// 15 decimal places is max reasonable precision
y=Math.Truncate(x*Math.Pow(10, 15));
text.Append(".");
text.Append(y.ToString("#,#").TrimEnd('0'));
}
return text.ToString();
}
}
It might be best to work with the string generated by your .ToString():
class Program
{
static string InsertSeparators(string s)
{
string decSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator;
int separatorPos = s.IndexOf(decSeparator);
if (separatorPos >= 0)
{
string decPart = s.Substring(separatorPos + decSeparator.Length);
// split the string into parts of 3 or less characters
List<String> parts = new List<String>();
for (int i = 0; i < decPart.Length; i += 3)
{
string part = "";
for (int j = 0; (j < 3) && (i + j < decPart.Length); j++)
{
part += decPart[i + j];
}
parts.Add(part);
}
string groupSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator;
s = s.Substring(0, separatorPos) + decSeparator + String.Join(groupSeparator, parts);
}
return s;
}
static void Main(string[] args)
{
for (int n = 0; n < 15; n++)
{
string s = Math.PI.ToString("0." + new string('#', n));
Console.WriteLine(InsertSeparators(s));
}
Console.ReadLine();
}
}
Outputs:
3
3.1
3.14
3.142
3.141,6
3.141,59
3.141,593
3.141,592,7
3.141,592,65
3.141,592,654
3.141,592,653,6
3.141,592,653,59
3.141,592,653,59
3.141,592,653,589,8
3.141,592,653,589,79
OK, not my strong side, but I guess this may be my best bet:
string input = Math.PI.ToString();
string decSeparator = System.Threading.Thread.CurrentThread
.CurrentCulture.NumberFormat.NumberGroupSeparator;
Regex RX = new Regex(#"([0-9]{3})");
string result = RX.Replace(input , #"$1" + decSeparator);
Thanks for listening..

Using indexOf (C#) to remove a set of characters from a string

public static string shita1(string st1)
{
string st2 = "", stemp = st1;
int i;
for(i=0; i<stemp.Length; i++)
{
if (stemp.IndexOf("cbc") == i)
{
i += 2 ;
stemp = "";
stemp = st1.Substring(i);
i = 0;
}
else
st2 = st2 + stemp[i];
}
return st2;
}
static void Main(string[] args)
{
string st1;
Console.WriteLine("enter one string:");
st1 = Console.ReadLine();
Console.WriteLine(shita1(st1));
}
}
i got a challange from my college, the challange is to move any "cbc" characters from a string...
this is my code... it works when i use only one "cbc" but when i use 2 of them it stucks... help please :)
The IndexOf Method gives you everything you need to know.
Per the documentation.
Reports the zero-based index of the first occurrence of a specified
Unicode character or string within this instance. The method returns
-1 if the character or string is not found in this instance.
This means you can create a loop that repeats as long as the returned index is not -1 and you don't have to loop through the string testing letter by letter.
I think this should work just tested it on some examples. Doesn't use string.Replace or IndexOf
static void Main(string[] args)
{
Console.WriteLine("enter one string:");
var input = Console.ReadLine();
Console.WriteLine(RemoveCBC(input));
}
static string RemoveCBC(string source)
{
var result = new StringBuilder();
for (int i = 0; i < source.Length; i++)
{
if (i + 2 == source.Length)
break;
var c = source[i];
var b = source[i + 1];
var c2 = source[i + 2];
if (c == 'c' && c2 == 'c' && b == 'b')
i = i + 2;
else
result.Append(source[i]);
}
return result.ToString();
}
You can use Replace to remove/replace all occurances of a string inside another string:
string original = "cbc_STRING_cbc";
original = original.Replace("cbc", String.Empty);
If you want remove characters from string using only IndexOf method you can use this code.
public static string shita1(string st1)
{
int index = -1;
string yourMatchingString = "cbc";
while ((index = st1.IndexOf(yourMatchingString)) != -1)
st1 = st1.Remove(index, yourMatchingString.Length);
return st1;
}
This code remove all inputs of you string.
But you can do it just in one line:
st1 = st1.Replace("cbc", string.Empty);
Hope this help.

Decimal group seperator for the fractional part

I wonder what would be the best way to format numbers so that the NumberGroupSeparator would work not only on the integer part to the left of the comma, but also on the fractional part, on the right of the comma.
Math.PI.ToString("###,###,##0.0##,###,###,###") // As documented ..
// ..this doesn't work
3.14159265358979 // result
3.141,592,653,589,79 // desired result
As documented on MSDN the NumberGroupSeparator works only to the left of the comma. I wonder why??
A little clunky, and it won't work for scientific numbers but here is a try:
class Program
{
static void Main(string[] args)
{
var π=Math.PI*10000;
Debug.WriteLine(Display(π));
// 31,415.926,535,897,931,899
}
static string Display(double x)
{
int s=Math.Sign(x);
x=Math.Abs(x);
StringBuilder text=new StringBuilder();
var y=Math.Truncate(x);
text.Append((s*y).ToString("#,#"));
x-=y;
if (x>0)
{
// 15 decimal places is max reasonable precision
y=Math.Truncate(x*Math.Pow(10, 15));
text.Append(".");
text.Append(y.ToString("#,#").TrimEnd('0'));
}
return text.ToString();
}
}
It might be best to work with the string generated by your .ToString():
class Program
{
static string InsertSeparators(string s)
{
string decSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator;
int separatorPos = s.IndexOf(decSeparator);
if (separatorPos >= 0)
{
string decPart = s.Substring(separatorPos + decSeparator.Length);
// split the string into parts of 3 or less characters
List<String> parts = new List<String>();
for (int i = 0; i < decPart.Length; i += 3)
{
string part = "";
for (int j = 0; (j < 3) && (i + j < decPart.Length); j++)
{
part += decPart[i + j];
}
parts.Add(part);
}
string groupSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator;
s = s.Substring(0, separatorPos) + decSeparator + String.Join(groupSeparator, parts);
}
return s;
}
static void Main(string[] args)
{
for (int n = 0; n < 15; n++)
{
string s = Math.PI.ToString("0." + new string('#', n));
Console.WriteLine(InsertSeparators(s));
}
Console.ReadLine();
}
}
Outputs:
3
3.1
3.14
3.142
3.141,6
3.141,59
3.141,593
3.141,592,7
3.141,592,65
3.141,592,654
3.141,592,653,6
3.141,592,653,59
3.141,592,653,59
3.141,592,653,589,8
3.141,592,653,589,79
OK, not my strong side, but I guess this may be my best bet:
string input = Math.PI.ToString();
string decSeparator = System.Threading.Thread.CurrentThread
.CurrentCulture.NumberFormat.NumberGroupSeparator;
Regex RX = new Regex(#"([0-9]{3})");
string result = RX.Replace(input , #"$1" + decSeparator);
Thanks for listening..

Best way to convert Pascal Case to a sentence

What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}

Concatenating an array of strings to "string1, string2 or string3"

Consider the following code:
string[] s = new[] { "Rob", "Jane", "Freddy" };
string joined = string.Join(", ", s);
// joined equals "Rob, Jane, Freddy"
For UI reasons I might well want to display the string "Rob, Jane or Freddy".
Any suggestions about the most concise way to do this?
Edit
I am looking for something that is concise to type. Since I am only concatenating small numbers (<10) of strings I am not worried about run-time performance here.
Concise meaning to type? or to run? The fastest to run will be hand-cranked with StringBuilder. But to type, probably (edit handle 0/1 etc):
string joined;
switch (s.Length) {
case 0: joined = ""; break;
case 1: joined = s[0]; break;
default:
joined = string.Join(", ", s, 0, s.Length - 1)
+ " or " + s[s.Length - 1];
break;
}
The StringBuilder approach might look something like:
static string JoinOr(string[] values) {
switch (values.Length) {
case 0: return "";
case 1: return values[0];
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < values.Length - 2; i++) {
sb.Append(values[i]).Append(", ");
}
return sb.Append(values[values.Length-2]).Append(" or ")
.Append(values[values.Length-1]).ToString();
}
Concatenate all but the last one. Do the last one manually.
Create an extension method on string[] that implement the same logic as string.Join but the last item will be appended with "or".
string[] s = new[] { "Rob", "Jane", "Freddy" };
Console.WriteLine(s.BetterJoin(", ", " or "));
// ---8<----
namespace ExtensionMethods
{
public static class MyExtensions
{
public static string BetterJoin(this string[] items, string separator, string lastSeparator)
{
StringBuilder sb = new StringBuilder();
int length = items.Length - 2;
int i = 0;
while (i < length)
{
sb.AppendFormat("{0}{1}", items[i++], separator);
}
sb.AppendFormat("{0}{1}", items[i++], lastSeparator);
sb.AppendFormat("{0}", items[i]);
return sb.ToString();
}
}
}
What about:
if (s.Length > 1)
{
uiText = string.Format("{0} and {1}", string.Join(", ", s, 0, s.Length - 1), s[s.Length - 1]);
}
else
{
uiText = s.Length > 0 ? s[0] : "";
}
Generic solution for any type T.
static class IEnumerableExtensions
{
public static string Join<T>(this IEnumerable<T> items,
string seperator, string lastSeperator)
{
var sep = "";
return items.Aggregate("", (current, item) =>
{
var result = String.Concat(current,
// not first OR not last
current == "" || !items.Last().Equals(item) ? sep : lastSeperator,
item.ToString());
sep = seperator;
return result;
});
}
}
Usage:
var three = new string[] { "Rob", "Jane", "Freddy" };
var two = new string[] { "Rob", "Jane" };
var one = new string[] { "Rob" };
var threeResult = three.Join(", ", " or "); // = "Rob, Jane or Freddy"
var twoResult = two.Join(", ", " or "); // = "Rob or Jane"
var oneResult = one.Join(", ", " or "); // = "Rob"
The most memory efficient and scalable would be using a StringBuilder and precalculating the length of the final string to elliminate buffer reallocations. (This is similar to how the String.Concat method works.)
public static string Join(string[] items, string separator, string lastSeparator) {
int len = separator.Length * (items.Length - 2) + lastSeparator.Length;
foreach (string s in items) len += s.Length;
StringBuilder builder = new StringBuilder(len);
for (int i = 0; i < items.Length; i++) {
builder.Append(items[i]);
switch (items.Length - i) {
case 1: break;
case 2: builder.Append(lastSeparator); break;
default: builder.Append(separator); break;
}
}
return builder.ToString();
}
Usage:
string joined = Join(s, ", ", " or ");
An interresting solution would be using a recursive algorithm. It works well for a reasonably small number of strings, but it doesn't scale very well.
public static string Join(string[] items, int index , string separator, string lastSeparator) {
return items[index++] + (index == items.Length-1 ? lastSeparator + items[index] : separator + Join(items, index, separator, lastSeparator));
}
Usage:
string joined = Join(s, 0, ", ", " or ");
string[] name_storage = new[] { "emre" , "balc" };
name_storage[name_storage.Count() - 1] += "ı";
string name = name_storage[0];
string sur_name = name_storage[1];
divElement.InnerHtml += name + " - " + sur_name;
//result = emre - balcı

Categories