Better way to clean a string? - c#

I am using this method to clean a string:
public static string CleanString(string dirtyString)
{
string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
string result = dirtyString;
foreach (char c in removeChars)
{
result = result.Replace(c.ToString(), string.Empty);
}
return result;
}
This method gives the correct result. However, there is a performance glitch in this method. Every time I pass the string, every character goes into the loop. If I have a large string then it will take too much time to return the object.
Is there a better way of doing the same thing? Maybe using LINQ or jQuery/JavaScript?
Any suggestions would be appreciated.

OK, consider the following test:
public class CleanString
{
//by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx
public static string UseRegex(string strIn)
{
// Replace invalid characters with empty strings.
return Regex.Replace(strIn, #"[^\w\.#-]", "");
}
// by Paolo Tedesco
public static String UseStringBuilder(string strIn)
{
const string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !removeChars.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
// by Paolo Tedesco, but using a HashSet
public static String UseStringBuilderWithHashSet(string strIn)
{
var hashSet = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
// by SteveDog
public static string UseStringBuilderWithHashSet2(string dirtyString)
{
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
StringBuilder result = new StringBuilder(dirtyString.Length);
foreach (char c in dirtyString)
if (removeChars.Contains(c))
result.Append(c);
return result.ToString();
}
// original by patel.milanb
public static string UseReplace(string dirtyString)
{
string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
string result = dirtyString;
foreach (char c in removeChars)
{
result = result.Replace(c.ToString(), string.Empty);
}
return result;
}
// by L.B
public static string UseWhere(string dirtyString)
{
return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray());
}
}
static class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main()
{
var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef#ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef#ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef#ssyd€sdöf";
var sw = new Stopwatch();
var iterations = 50000;
sw.Start();
for (var i = 0; i < iterations; i++)
CleanString.<SomeMethod>(dirtyString);
sw.Stop();
Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString());
sw.Reset();
....
<repeat>
....
}
}
Output
CleanString.UseReplace: 791
CleanString.UseStringBuilder: 2805
CleanString.UseStringBuilderWithHashSet: 521
CleanString.UseStringBuilderWithHashSet2: 331
CleanString.UseRegex: 1700
CleanString.UseWhere: 233
Conclusion
It probably does not matter which method you use.
The difference in time between the fastest (UseWhere: 233ms) and the slowest (UseStringBuilder: 2805ms) method is 2572ms when called 50000 (!) times in a row. If you don't run the method that often, the difference does not really matter.
But if performance is critical, use the UseWhere method (written by L.B). Note, however, that its behavior is slightly different.

If it's purely speed and efficiency you are after, I would recommend doing something like this:
public static string CleanString(string dirtyString)
{
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
StringBuilder result = new StringBuilder(dirtyString.Length);
foreach (char c in dirtyString)
if (!removeChars.Contains(c)) // prevent dirty chars
result.Append(c);
return result.ToString();
}
RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToString at the end). This will cut down on memory usage and increase the speed, especially on longer strings.
However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncode instead of doing it yourself.

use regex [?&^$##!()+-,:;<>’\'-_*] for replacing with empty string

I don't know if, performance-wise, using a Regex or LINQ would be an improvement.
Something that could be useful, would be to create the new string with a StringBuilder instead of using string.Replace each time:
using System.Linq;
using System.Text;
static class Program {
static void Main(string[] args) {
const string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
string result = "x&y(z)";
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(result.Length);
foreach (char x in result.Where(c => !removeChars.Contains(c))) {
sb.Append(x);
}
result = sb.ToString();
}
}

This one is even faster!
use:
string dirty=#"tfgtf$#$%gttg%$% 664%$";
string clean = dirty.Clean();
public static string Clean(this String name)
{
var namearray = new Char[name.Length];
var newIndex = 0;
for (var index = 0; index < namearray.Length; index++)
{
var letter = (Int32)name[index];
if (!((letter > 96 && letter < 123) || (letter > 64 && letter < 91) || (letter > 47 && letter < 58)))
continue;
namearray[newIndex] = (Char)letter;
++newIndex;
}
return new String(namearray).TrimEnd();
}

Give this a try: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx

Perhaps it helps to first explain the 'why' and then the 'what'. The reason you're getting slow performance is because c# copies-and-replaces the strings for each replacement. From my experience using Regex in .NET isn't always better - although in most scenario's (I think including this one) it'll probably work just fine.
If I really need performance I usually don't leave it up to luck and just tell the compiler exactly what I want: that is: create a string with the upper bound number of characters and copy all the chars in there that you need. It's also possible to replace the hashset with a switch / case or array in which case you might end up with a jump table or array lookup - which is even faster.
The 'pragmatic' best, but fast solution is:
char[] data = new char[dirtyString.Length];
int ptr = 0;
HashSet<char> hs = new HashSet<char>() { /* all your excluded chars go here */ };
foreach (char c in dirtyString)
if (!hs.Contains(c))
data[ptr++] = c;
return new string(data, 0, ptr);
BTW: this solution is incorrect when you want to process high surrogate Unicode characters - but can easily be adapted to include these characters.
-Stefan.

I use this in my current project and it works fine. It takes a sentence, it removes all the non alphanumerical characters, it then returns the sentence with all the words in the first letter upper case and everything else in lower case. Maybe I should call it SentenceNormalizer. Naming is hard :)
internal static string StringSanitizer(string whateverString)
{
whateverString = whateverString.Trim().ToLower();
Regex cleaner = new Regex("(?:[^a-zA-Z0-9 ])", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
var listOfWords = (cleaner.Replace(whateverString, string.Empty).Split(' ', StringSplitOptions.RemoveEmptyEntries)).ToList();
string cleanString = string.Empty;
foreach (string word in listOfWords)
{
cleanString += $"{word.First().ToString().ToUpper() + word.Substring(1)} ";
}
return cleanString;
}

I am not able to spend time on acid testing this but this line did not actually clean slashes as desired.
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
I had to add slashes individually and escape the backslash
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’'-_*");
removeChars.Add('/');
removeChars.Add('\\');

Related

Reverse a String without using Reverse. It works, but why?

Ok, so a friend of mine asked me to help him out with a string reverse method that can be reused without using String.Reverse (it's a homework assignment for him). Now, I did, below is the code. It works. Splendidly actually. Obviously by looking at it you can see the larger the string the longer the time it takes to work. However, my question is WHY does it work? Programming is a lot of trial and error, and I was more pseudocoding than actual coding and it worked lol.
Can someone explain to me how exactly reverse = ch + reverse; is working? I don't understand what is making it go into reverse :/
class Program
{
static void Reverse(string x)
{
string text = x;
string reverse = string.Empty;
foreach (char ch in text)
{
reverse = ch + reverse;
// this shows the building of the new string.
// Console.WriteLine(reverse);
}
Console.WriteLine(reverse);
}
static void Main(string[] args)
{
string comingin;
Console.WriteLine("Write something");
comingin = Console.ReadLine();
Reverse(comingin);
// pause
Console.ReadLine();
}
}
If the string passed through is "hello", the loop will be doing this:
reverse = 'h' + string.Empty
reverse = 'e' + 'h'
reverse = 'l' + 'eh'
until it's equal to
olleh
If your string is My String, then:
Pass 1, reverse = 'M'
Pass 2, reverse = 'yM'
Pass 3, reverse = ' yM'
You're taking each char and saying "that character and tack on what I had before after it".
I think your question has been answered. My reply goes beyond the immediate question and more to the spirit of the exercise. I remember having this task many decades ago in college, when memory and mainframe (yikes!) processing time was at a premium. Our task was to reverse an array or string, which is an array of characters, without creating a 2nd array or string. The spirit of the exercise was to teach one to be mindful of available resources.
In .NET, a string is an immutable object, so I must use a 2nd string. I wrote up 3 more examples to demonstrate different techniques that may be faster than your method, but which shouldn't be used to replace the built-in .NET Replace method. I'm partial to the last one.
// StringBuilder inserting at 0 index
public static string Reverse2(string inputString)
{
var result = new StringBuilder();
foreach (char ch in inputString)
{
result.Insert(0, ch);
}
return result.ToString();
}
// Process inputString backwards and append with StringBuilder
public static string Reverse3(string inputString)
{
var result = new StringBuilder();
for (int i = inputString.Length - 1; i >= 0; i--)
{
result.Append(inputString[i]);
}
return result.ToString();
}
// Convert string to array and swap pertinent items
public static string Reverse4(string inputString)
{
var chars = inputString.ToCharArray();
for (int i = 0; i < (chars.Length/2); i++)
{
var temp = chars[i];
chars[i] = chars[chars.Length - 1 - i];
chars[chars.Length - 1 - i] = temp;
}
return new string(chars);
}
Please imagine that you entrance string is "abc". After that you can see that letters are taken one by one and add to the start of the new string:
reverse = "", ch='a' ==> reverse (ch+reverse) = "a"
reverse= "a", ch='b' ==> reverse (ch+reverse) = b+a = "ba"
reverse= "ba", ch='c' ==> reverse (ch+reverse) = c+ba = "cba"
To test the suggestion by Romoku of using StringBuilder I have produced the following code.
public static void Reverse(string x)
{
string text = x;
string reverse = string.Empty;
foreach (char ch in text)
{
reverse = ch + reverse;
}
Console.WriteLine(reverse);
}
public static void ReverseFast(string x)
{
string text = x;
StringBuilder reverse = new StringBuilder();
for (int i = text.Length - 1; i >= 0; i--)
{
reverse.Append(text[i]);
}
Console.WriteLine(reverse);
}
public static void Main(string[] args)
{
int abcx = 100; // amount of abc's
string abc = "";
for (int i = 0; i < abcx; i++)
abc += "abcdefghijklmnopqrstuvwxyz";
var x = new System.Diagnostics.Stopwatch();
x.Start();
Reverse(abc);
x.Stop();
string ReverseMethod = "Reverse Method: " + x.ElapsedMilliseconds.ToString();
x.Restart();
ReverseFast(abc);
x.Stop();
Console.Clear();
Console.WriteLine("Method | Milliseconds");
Console.WriteLine(ReverseMethod);
Console.WriteLine("ReverseFast Method: " + x.ElapsedMilliseconds.ToString());
System.Console.Read();
}
On my computer these are the speeds I get per amount of alphabet(s).
100 ABC(s)
Reverse ~5-10ms
FastReverse ~5-15ms
1000 ABC(s)
Reverse ~120ms
FastReverse ~20ms
10000 ABC(s)
Reverse ~16,852ms!!!
FastReverse ~262ms
These time results will vary greatly depending on the computer but one thing is for certain if you are processing more than 100k characters you are insane for not using StringBuilder! On the other hand if you are processing less than 2000 characters the overhead from the StringBuilder definitely seems to catch up with its performance boost.

c# how to check whether a Long string contains any letters or not?

Okay so I have a long string of digits, but I need to make sure it hasn't got any digits, whats the easiest way to do this?
You could use char.IsDigit:
var containsOnlyDigits = "007".All(char.IsDigit); // true
Scan the string, test the char... s.All(c => Char.IsDigit(c))
Enumerable.All will exit as soon as it finds a non-digit characted. IsDigit is very fast at checking the char. Cost is O(N) (as good as it can get); for sure, it is bettet than trying to parse the string (which will fail if the string is really long) or use a regexpr...
If you try this solution and see it is too slow for you, you can always go back to good old loops to scan the string..
foreach (char c in s) {
if (!Char.IsDigit(c))
return false;
}
return true;
or even better:
for (int i = 0; i < s.Length; i++){
if (!Char.IsDigit(s[i]))
return false;
}
return true;
EDIT: Benchmarks, at last!
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;
namespace FindTest
{
class Program
{
const int Iterations = 1000;
static string TestData;
static Regex regex;
static bool ValidResult = false;
static void Test(Func<string, bool> function)
{
Console.Write("{0}... ", function.Method.Name);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
bool result = function(TestData);
if (result != ValidResult)
{
throw new Exception("Bad result: " + result);
}
}
sw.Stop();
Console.WriteLine(" {0}ms", sw.ElapsedMilliseconds);
GC.Collect();
}
static void InitializeTestDataEnd(int length)
{
TestData = new string(Enumerable.Repeat('1', length - 1).ToArray()) + "A";
}
static void InitializeTestDataStart(int length)
{
TestData = "A" + new string(Enumerable.Repeat('1', length - 1).ToArray());
}
static void InitializeTestDataMid(int length)
{
TestData = new string(Enumerable.Repeat('1', length / 2).ToArray()) + "A" + new string(Enumerable.Repeat('1', length / 2 - 1).ToArray());
}
static void InitializeTestDataPositive(int length)
{
TestData = new string(Enumerable.Repeat('1', length).ToArray());
}
static bool LinqScan(string s)
{
return s.All(Char.IsDigit);
}
static bool ForeachScan(string s)
{
foreach (char c in s)
{
if (!Char.IsDigit(c))
return false;
}
return true;
}
static bool ForScan(string s)
{
for (int i = 0; i < s.Length; i++)
{
if (!Char.IsDigit(s[i]))
return false;
}
return true;
}
static bool Regexp(string s)
{
// String contains numbers
return regex.IsMatch(s);
// String contains letters
//return Regex.IsMatch(s, "\\w", RegexOptions.Compiled);
}
static void Main(string[] args)
{
regex = new Regex(#"^\d+$", RegexOptions.Compiled);
Console.WriteLine("Positive (all digitis)");
InitializeTestDataPositive(100000);
ValidResult = true;
Test(LinqScan);
Test(ForeachScan);
Test(ForScan);
Test(Regexp);
Console.WriteLine("Negative (char at beginning)");
InitializeTestDataStart(100000);
ValidResult = false;
Test(LinqScan);
Test(ForeachScan);
Test(ForScan);
Test(Regexp);
Console.WriteLine("Negative (char at end)");
InitializeTestDataEnd(100000);
ValidResult = false;
Test(LinqScan);
Test(ForeachScan);
Test(ForScan);
Test(Regexp);
Console.WriteLine("Negative (char in middle)");
InitializeTestDataMid(100000);
ValidResult = false;
Test(LinqScan);
Test(ForeachScan);
Test(ForScan);
Test(Regexp);
Console.WriteLine("Done");
}
}
}
I tested positive, and three negatives, to 1) test which regex is the correct one, 2) look for confirmation of a suspect I had...
My opinion was that Regexp.IsMatch had to scan the string as well, and so it seems to be:
Times are consistent with scans, only 3x worse:
Positive (all digitis)
LinqScan... 952ms
ForeachScan... 1043ms
ForScan... 869ms
Regexp... 3074ms
Negative (char at beginning)
LinqScan... 0ms
ForeachScan... 0ms
ForScan... 0ms
Regexp... 0ms
Negative (char at end)
LinqScan... 921ms
ForeachScan... 958ms
ForScan... 867ms
Regexp... 3986ms
Negative (char in middle)
LinqScan... 455ms
ForeachScan... 476ms
ForScan... 430ms
Regexp... 1982ms
Credits: I borrowed the Test function from Jon Skeet
Conclusions: s.All(Char.IsDigit) is efficient, and really easy (which was the original question, after all). Personally, I find it easier than regular expressions (I had to look on SO which was the correct one, as I am not familiar with C# regexp syntax - which is standard, but I didn't know - and the proposed solution was wrong).
So.. measure, and don't believe in myths like "LINQ is slow" or "RegExp are slow".
After all, they are both OK for the task (it really depends on what you need it for), pick the one you prefer.
Try using TryParse
bool longStringisInt = Int64.TryParse(longString, out number);
If the string (i.e. longString) cannot be converted to an int (i.e. has letters in it) then the bool is false, otherwise it would be true.
EDIT: Changed to Int64 to ensure wider coverage
You can use IndexOfAny
bool containsDigits = mystring.IndexOfAny("0123456789".ToCharArray()) != -1;
In Linq you'd have to do:
bool containsDigits = mystring.Any(char.IsDigit);
Edit:
I timed this and it turns out the Linq solution is slower.
For string length 1,000,000 the execution time is ~13ms for the linq solution and 1ms for IndexOfAny.
For string length 10,000,000 the execution time for Linq is still ~122ms, whereas IndexOfAny is ~18ms.
Use regular expressions:
using System.Text.RegularExpressions;
string myString = "some long characters...";
if(Regex.IsMatch(myString, #"^\d+$", RegexOptions.Compiled))
{
// String contains only numbers
}
if(Regex.IsMatch(myString, #"^\w+$", RegexOptions.Compiled))
{
// String contains only letters
}

Replace multiple characters in a C# string

Is there a better way to replace strings?
I am surprised that Replace does not take in a character array or string array. I guess that I could write my own extension but I was curious if there is a better built in way to do the following? Notice the last Replace is a string not a character.
myString.Replace(';', '\n').Replace(',', '\n').Replace('\r', '\n').Replace('\t', '\n').Replace(' ', '\n').Replace("\n\n", "\n");
You can use a replace regular expression.
s/[;,\t\r ]|[\n]{2}/\n/g
s/ at the beginning means a search
The characters between [ and ] are the characters to search for (in any order)
The second / delimits the search-for text and the replace text
In English, this reads:
"Search for ; or , or \t or \r or (space) or exactly two sequential \n and replace it with \n"
In C#, you could do the following: (after importing System.Text.RegularExpressions)
Regex pattern = new Regex("[;,\t\r ]|[\n]{2}");
pattern.Replace(myString, "\n");
If you are feeling particularly clever and don't want to use Regex:
char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";
string[] temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
s = String.Join("\n", temp);
You could wrap this in an extension method with little effort as well.
Edit: Or just wait 2 minutes and I'll end up writing it anyway :)
public static class ExtensionMethods
{
public static string Replace(this string s, char[] separators, string newVal)
{
string[] temp;
temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
return String.Join( newVal, temp );
}
}
And voila...
char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";
s = s.Replace(separators, "\n");
You could use Linq's Aggregate function:
string s = "the\nquick\tbrown\rdog,jumped;over the lazy fox.";
char[] chars = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
string snew = chars.Aggregate(s, (c1, c2) => c1.Replace(c2, '\n'));
Here's the extension method:
public static string ReplaceAll(this string seed, char[] chars, char replacementCharacter)
{
return chars.Aggregate(seed, (str, cItem) => str.Replace(cItem, replacementCharacter));
}
Extension method usage example:
string snew = s.ReplaceAll(chars, '\n');
This is the shortest way:
myString = Regex.Replace(myString, #"[;,\t\r ]|[\n]{2}", "\n");
Strings are just immutable char arrays
You just need to make it mutable:
either by using StringBuilder
go in the unsafe world and play with pointers (dangerous though)
and try to iterate through the array of characters the least amount of times. Note the HashSet here, as it avoids to traverse the character sequence inside the loop. Should you need an even faster lookup, you can replace HashSet by an optimized lookup for char (based on an array[256]).
Example with StringBuilder
public static void MultiReplace(this StringBuilder builder,
char[] toReplace,
char replacement)
{
HashSet<char> set = new HashSet<char>(toReplace);
for (int i = 0; i < builder.Length; ++i)
{
var currentCharacter = builder[i];
if (set.Contains(currentCharacter))
{
builder[i] = replacement;
}
}
}
Edit - Optimized version (only valid for ASCII)
public static void MultiReplace(this StringBuilder builder,
char[] toReplace,
char replacement)
{
var set = new bool[256];
foreach (var charToReplace in toReplace)
{
set[charToReplace] = true;
}
for (int i = 0; i < builder.Length; ++i)
{
var currentCharacter = builder[i];
if (set[currentCharacter])
{
builder[i] = replacement;
}
}
}
Then you just use it like this:
var builder = new StringBuilder("my bad,url&slugs");
builder.MultiReplace(new []{' ', '&', ','}, '-');
var result = builder.ToString();
Ohhh, the performance horror!
The answer is a bit outdated, but still...
public static class StringUtils
{
#region Private members
[ThreadStatic]
private static StringBuilder m_ReplaceSB;
private static StringBuilder GetReplaceSB(int capacity)
{
var result = m_ReplaceSB;
if (null == result)
{
result = new StringBuilder(capacity);
m_ReplaceSB = result;
}
else
{
result.Clear();
result.EnsureCapacity(capacity);
}
return result;
}
public static string ReplaceAny(this string s, char replaceWith, params char[] chars)
{
if (null == chars)
return s;
if (null == s)
return null;
StringBuilder sb = null;
for (int i = 0, count = s.Length; i < count; i++)
{
var temp = s[i];
var replace = false;
for (int j = 0, cc = chars.Length; j < cc; j++)
if (temp == chars[j])
{
if (null == sb)
{
sb = GetReplaceSB(count);
if (i > 0)
sb.Append(s, 0, i);
}
replace = true;
break;
}
if (replace)
sb.Append(replaceWith);
else
if (null != sb)
sb.Append(temp);
}
return null == sb ? s : sb.ToString();
}
}
You may also simply write these string extension methods, and put them somewhere in your solution:
using System.Text;
public static class StringExtensions
{
public static string ReplaceAll(this string original, string toBeReplaced, string newValue)
{
if (string.IsNullOrEmpty(original) || string.IsNullOrEmpty(toBeReplaced)) return original;
if (newValue == null) newValue = string.Empty;
StringBuilder sb = new StringBuilder();
foreach (char ch in original)
{
if (toBeReplaced.IndexOf(ch) < 0) sb.Append(ch);
else sb.Append(newValue);
}
return sb.ToString();
}
public static string ReplaceAll(this string original, string[] toBeReplaced, string newValue)
{
if (string.IsNullOrEmpty(original) || toBeReplaced == null || toBeReplaced.Length <= 0) return original;
if (newValue == null) newValue = string.Empty;
foreach (string str in toBeReplaced)
if (!string.IsNullOrEmpty(str))
original = original.Replace(str, newValue);
return original;
}
}
Call them like this:
"ABCDE".ReplaceAll("ACE", "xy");
xyBxyDxy
And this:
"ABCDEF".ReplaceAll(new string[] { "AB", "DE", "EF" }, "xy");
xyCxyF
Use RegEx.Replace, something like this:
string input = "This is text with far too much " +
"whitespace.";
string pattern = "[;,]";
string replacement = "\n";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Here's more info on this MSDN documentation for RegEx.Replace
Performance-Wise this probably might not be the best solution but it works.
var str = "filename:with&bad$separators.txt";
char[] charArray = new char[] { '#', '%', '&', '{', '}', '\\', '<', '>', '*', '?', '/', ' ', '$', '!', '\'', '"', ':', '#' };
foreach (var singleChar in charArray)
{
str = str.Replace(singleChar, '_');
}
string ToBeReplaceCharacters = #"~()##$%&+,'"<>|;\/*?";
string fileName = "filename;with<bad:separators?";
foreach (var RepChar in ToBeReplaceCharacters)
{
fileName = fileName.Replace(RepChar.ToString(), "");
}
A .NET Core version for replacing a defined set of string chars to a specific char. It leverages the recently introduced Span type and string.Create method.
The idea is to prepare a replacement array, so no actual comparison operations would be required for the each string char. Thus, the replacement process reminds the way a state machine works. In order to avoid initialization of all items of the replacement array, let's store oldChar ^ newChar (XOR'ed) values there, what gives the following benefits:
If a char is not changing: ch ^ ch = 0 - no need to initialize non-changing items
The final char can be found by XOR'ing: ch ^ repl[ch]:
ch ^ 0 = ch - not changed chars case
ch ^ (ch ^ newChar) = newChar - replaced char
So the only requirement would be to ensure that the replacement array is zero-ed when initialized. We'll be using ArrayPool<char> to avoid allocations each time the ReplaceAll method is called. And, in order to ensure that the arrays are zero-ed without expensive call to Array.Clear method, we'll be maintaining a pool dedicated for the ReplaceAll method. We'll be clearing the replacement array (exact items only) before returning it to the pool.
public static class StringExtensions
{
private static readonly ArrayPool<char> _replacementPool = ArrayPool<char>.Create();
public static string ReplaceAll(this string str, char newChar, params char[] oldChars)
{
// If nothing to do, return the original string.
if (string.IsNullOrEmpty(str) ||
oldChars is null ||
oldChars.Length == 0)
{
return str;
}
// If only one character needs to be replaced,
// use the more efficient `string.Replace`.
if (oldChars.Length == 1)
{
return str.Replace(oldChars[0], newChar);
}
// Get a replacement array from the pool.
var replacements = _replacementPool.Rent(char.MaxValue + 1);
try
{
// Intialize the replacement array in the way that
// all elements represent `oldChar ^ newChar`.
foreach (var oldCh in oldChars)
{
replacements[oldCh] = (char)(newChar ^ oldCh);
}
// Create a string with replaced characters.
return string.Create(str.Length, (str, replacements), (dst, args) =>
{
var repl = args.replacements;
foreach (var ch in args.str)
{
dst[0] = (char)(repl[ch] ^ ch);
dst = dst.Slice(1);
}
});
}
finally
{
// Clear the replacement array.
foreach (var oldCh in oldChars)
{
replacements[oldCh] = char.MinValue;
}
// Return the replacement array back to the pool.
_replacementPool.Return(replacements);
}
}
}
I know this question is super old, but I want to offer 2 options that are more efficient:
1st off, the extension method posted by Paul Walls is good but can be made more efficient by using the StringBuilder class, which is like the string data type but made especially for situations where you will be changing string values more than once. Here is a version I made of the extension method using StringBuilder:
public static string ReplaceChars(this string s, char[] separators, char newVal)
{
StringBuilder sb = new StringBuilder(s);
foreach (var c in separators) { sb.Replace(c, newVal); }
return sb.ToString();
}
I ran this operation 100,000 times and using StringBuilder took 73ms compared to 81ms using string. So the difference is typically negligible, unless you're running many operations or using a huge string.
Secondly, here is a 1 liner loop you can use:
foreach (char c in separators) { s = s.Replace(c, '\n'); }
I personally think this is the best option. It is highly efficient and doesn't require writing an extension method. In my testing this ran the 100k iterations in only 63ms, making it the most efficient.
Here is an example in context:
string s = "this;is,\ra\t\n\n\ntest";
char[] separators = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
foreach (char c in separators) { s = s.Replace(c, '\n'); }
Credit to Paul Walls for the first 2 lines in this example.
I also fiddled around with that problem, and found that most of the solutions here are very slow. The fastest one was actually the LINQ + Aggregate method that dodgy_coder posted.
But I thought, well that might be also quite heavy in memory allocations depending upon how many old characters there are. So I came out with this:
The idea here is to have a cached replacement map of the old characters for the current thread, to safe allocations. And other than that just working with a character array of the input that later on is returned as string again. Whereas the character array is modified as less as possible.
[ThreadStatic]
private static bool[] replaceMap;
public static string Replace(this string input, char[] oldChars, char newChar)
{
if (input == null) throw new ArgumentNullException(nameof(input));
if (oldChars == null) throw new ArgumentNullException(nameof(oldChars));
if (oldChars.Length == 1) return input.Replace(oldChars[0], newChar);
if (oldChars.Length == 0) return input;
replaceMap = replaceMap ?? new bool[char.MaxValue + 1];
foreach (var oldChar in oldChars)
{
replaceMap[oldChar] = true;
}
try
{
var count = input.Length;
var output = input.ToCharArray();
for (var i = 0; i < count; i++)
{
if (replaceMap[input[i]])
{
output[i] = newChar;
}
}
return new string(output);
}
finally
{
foreach (var oldChar in oldChars)
{
replaceMap[oldChar] = false;
}
}
}
For me this is at most two allocations for the actual input string to work on. A StringBuilder turned out to be much slower for me for some reasons. And it is 2 times faster than the LINQ variant.
No "Replace" (Linq only):
string myString = ";,\r\t \n\n=1;;2,,3\r\r4\t\t5 6\n\n\n\n7=";
char NoRepeat = '\n';
string ByeBye = ";,\r\t ";
string myResult = myString.ToCharArray().Where(t => !"STOP-OUTSIDER".Contains(t))
.Select(t => "" + ( ByeBye.Contains(t) ? '\n' : t))
.Aggregate((all, next) => (
next == "" + NoRepeat && all.Substring(all.Length - 1) == "" + NoRepeat
? all : all + next ) );
Having built my own solution, and looking at the solution used here, I leveraged an answer that isn't using complex code and is generally efficient for most parameters.
Cover base cases where other methods are more appropriate. If there are no chars to replacement, return the original string. If there is only one, just use the Replace method.
Use a StringBuilder and initialize the capacity to the length of the original string. After all, the new string being built will have the same length of the original string if its just chars being replaced. This ensure only 1 memory allocation is used for the new string.
Assuming that the 'char' length could be small or large will impact performance. Large collections are better with hashsets, while smaller collections are not. This is a near-perfect use case for Hybrid Dictionaries. They switch to using a Hash based lookup once the collection gets too large. However, we don't care about the value of the dictionary, so I just set it to "true".
Have different methods for StringBuilder verse just a string will prevent unnecessary memory allocation. If its just a string, don't instantiate a StringBuilder unless the base cases were checked. If its already a StringBuilder, then perform the replacements and return the StringBuilder itself (as other StringBuilder methods like Append do).
I put the replacement char first, and the chars to check at the end. This way, I can leverage the params keyword for easily passing additional strings. However, you don't have to do this if you prefer the other order.
namespace Test.Extensions
{
public static class StringExtensions
{
public static string ReplaceAll(this string str, char replacementCharacter, params char[] chars)
{
if (chars.Length == 0)
return str;
if (chars.Length == 1)
return str.Replace(chars[0], replacementCharacter);
StringBuilder sb = new StringBuilder(str.Length);
var searcher = new HybridDictionary(chars.Length);
for (int i = 0; i < chars.Length; i++)
searcher[chars[i]] = true;
foreach (var c in str)
{
if (searcher.Contains(c))
sb.Append(replacementCharacter);
else
sb.Append(c);
}
return sb.ToString();
}
public static StringBuilder ReplaceAll(this StringBuilder sb, char replacementCharacter, params char[] chars)
{
if (chars.Length == 0)
return sb;
if (chars.Length == 1)
return sb.Replace(chars[0], replacementCharacter);
var searcher = new HybridDictionary(chars.Length);
for (int i = 0; i < chars.Length; i++)
searcher[chars[i]] = true;
for (int i = 0; i < sb.Length; i++)
{
var val = sb[i];
if (searcher.Contains(val))
sb[i] = replacementCharacter;
}
return sb;
}
}
}

Best approach of word censoring - C# 4.0

For my custom made chat screen i am using the code below for checking censored words. But i wonder can this code performance improved. Thank you.
if (srMessageTemp.IndexOf(" censored1 ") != -1)
return;
if (srMessageTemp.IndexOf(" censored2 ") != -1)
return;
if (srMessageTemp.IndexOf(" censored3 ") != -1)
return;
C# 4.0 . actually list is a lot more long but i don't put here as it goes away.
I would use LINQ or regular expression for this:
LINQ: How to: Query for Sentences that Contain a Specified Set of Words (LINQ)
Regular Expression: Highlight a list of words using a regular expression in c#
You can simplify it. Here listOfCencoredWords will contains all the censored words
if (listOfCensoredWords.Any(item => srMessageTemp.Contains(item)))
return;
If you want to make it really fast, you can use Aho-Corasick automaton. This is how antivirus software checks thousands of viruses at once. But I don't know where you can get the implementation done, so it will require much more work from you compared to using just simple slow methods like regular expressions.
See the theory here: http://en.wikipedia.org/wiki/Aho-Corasick
First, I hope you aren't really "tokenizing" the words as written. You know, just because someone doesn't put a space before a bad word, it doesn't make the word less bad :-) Example ,badword,
I'll say that I would use a Regex here :-) I'm not sure if a Regex or a man-made parser would be faster, but at least a Regex would be a good starting point. As others wrote, you begin by splitting the text in words and then checking an HashSet<string>.
I'm adding a second version of the code, based on ArraySegment<char>. I speak later of this.
class Program
{
class ArraySegmentComparer : IEqualityComparer<ArraySegment<char>>
{
public bool Equals(ArraySegment<char> x, ArraySegment<char> y)
{
if (x.Count != y.Count)
{
return false;
}
int end = x.Offset + x.Count;
for (int i = x.Offset, j = y.Offset; i < end; i++, j++)
{
if (!x.Array[i].ToString().Equals(y.Array[j].ToString(), StringComparison.InvariantCultureIgnoreCase))
{
return false;
}
}
return true;
}
public override int GetHashCode(ArraySegment<char> obj)
{
unchecked
{
int hash = 17;
int end = obj.Offset + obj.Count;
int i;
for (i = obj.Offset; i < end; i++)
{
hash *= 23;
hash += Char.ToUpperInvariant(obj.Array[i]);
}
return hash;
}
}
}
static void Main()
{
var rx = new Regex(#"\b\w+\b", RegexOptions.Compiled);
var sampleText = #"For my custom made chat screen i am using the code below for checking censored words. But i wonder can this code performance improved. Thank you.
if (srMessageTemp.IndexOf("" censored1 "") != -1)
return;
if (srMessageTemp.IndexOf("" censored2 "") != -1)
return;
if (srMessageTemp.IndexOf("" censored3 "") != -1)
return;
C# 4.0 . actually list is a lot more long but i don't put here as it goes away.
And now some accented letters àèéìòù and now some letters with unicode combinable diacritics àèéìòù";
//sampleText += sampleText;
//sampleText += sampleText;
//sampleText += sampleText;
//sampleText += sampleText;
//sampleText += sampleText;
//sampleText += sampleText;
//sampleText += sampleText;
HashSet<string> prohibitedWords = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase) { "For", "custom", "combinable", "away" };
Stopwatch sw1 = Stopwatch.StartNew();
var words = rx.Matches(sampleText);
foreach (Match word in words)
{
string str = word.Value;
if (prohibitedWords.Contains(str))
{
Console.Write(str);
Console.Write(" ");
}
else
{
//Console.WriteLine(word);
}
}
sw1.Stop();
Console.WriteLine();
Console.WriteLine();
HashSet<ArraySegment<char>> prohibitedWords2 = new HashSet<ArraySegment<char>>(
prohibitedWords.Select(p => new ArraySegment<char>(p.ToCharArray())),
new ArraySegmentComparer());
var sampleText2 = sampleText.ToCharArray();
Stopwatch sw2 = Stopwatch.StartNew();
int startWord = -1;
for (int i = 0; i < sampleText2.Length; i++)
{
if (Char.IsLetter(sampleText2[i]) || Char.IsDigit(sampleText2[i]))
{
if (startWord == -1)
{
startWord = i;
}
}
else
{
if (startWord != -1)
{
int length = i - startWord;
if (length != 0)
{
var wordSegment = new ArraySegment<char>(sampleText2, startWord, length);
if (prohibitedWords2.Contains(wordSegment))
{
Console.Write(sampleText2, startWord, length);
Console.Write(" ");
}
else
{
//Console.WriteLine(sampleText2, startWord, length);
}
}
startWord = -1;
}
}
}
if (startWord != -1)
{
int length = sampleText2.Length - startWord;
if (length != 0)
{
var wordSegment = new ArraySegment<char>(sampleText2, startWord, length);
if (prohibitedWords2.Contains(wordSegment))
{
Console.Write(sampleText2, startWord, length);
Console.Write(" ");
}
else
{
//Console.WriteLine(sampleText2, startWord, length);
}
}
}
sw2.Stop();
Console.WriteLine();
Console.WriteLine();
Console.WriteLine(sw1.ElapsedTicks);
Console.WriteLine(sw2.ElapsedTicks);
}
}
I'll note that you could go faster doing the parsing "in" the original string. What does this means: if you subdivide the "document" in words and each word is put in a string, clearly you are creating n string, one for each word of your document. But what if you skipped this step and operated directly on the document, simply keeping the current index and the length of the current word? Then it would be faster! Clearly then you would need to create a special comparer for the HashSet<>.
But wait! C# has something similar... It's called ArraySegment. So your document would be a char[] instead of a string and each word would be an ArraySegment<char>. Clearly this is much more complex! You can't simply use Regexes, you have to build "by hand" a parser (but I think converting the \b\w+\b expression would be quite easy). And creating a comparer for HashSet<char> would be a little complex (hint: you would use HashSet<ArraySegment<char>> and the words to be censored would be ArraySegments "pointing" to a char[] of a word and with size equal to the char[].Length, like var word = new ArraySegment<char>("tobecensored".ToCharArray());)
After some simple benchmark, I can see that an unoptimized version of the program using ArraySegment<string> is as much fast as the Regex version for shorter texts. This probably because if a word is 4-6 char long, it's as much "slow" to copy it around than it's to copy around an ArraySegment<char> (an ArraySegment<char> is 12 bytes, a word of 6 characters is 12 bytes. On top of both of these we have to add a little overhead... But in the end the numbers are comparable). But for longer texts (try decommenting the //sampleText += sampleText;) it becomes a little faster (10%) in Release -> Start Without Debugging (CTRL-F5)
I'll note that comparing strings character by character is wrong. You should always use the methods given to you by the string class (or by the OS). They know how to handle "strange" cases much better than you (and in Unicode there isn't any "normal" case :-) )
You can use linq for this but it's not required if you use a list to hold your list of censored values. The solution below uses the build in list functions and allows you to do your searches case insensitive.
private static List<string> _censoredWords = new List<string>()
{
"badwordone1",
"badwordone2",
"badwordone3",
"badwordone4",
};
static void Main(string[] args)
{
string badword1 = "BadWordOne2";
bool censored = ShouldCensorWord(badword1);
}
private static bool ShouldCensorWord(string word)
{
return _censoredWords.Contains(word.ToLower());
}
What you think about this:
string[] censoredWords = new[] { " censored1 ", " censored2 ", " censored3 " };
if (censoredWords.Contains(srMessageTemp))
return;

Best way to convert Pascal Case to a sentence

What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}

Categories