Related
I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}
I want to remove all underscores from a string with the uppercase of the character following the underscore. So for example: _my_string_ becomes: MyString similarly: my_string becomes MyString
Is there a simpler way to do it? I currently have the following (assuming no input has two consecutive underscores):
StringBuilder sb = new StringBuilder();
int i;
for (i = 0; i < input.Length - 1; i++)
{
if (input[i] == '_')
sb.Append(char.ToUpper(input[++i]));
else if (i == 0)
sb.Append(char.ToUpper(input[i]));
else
sb.Append(input[i]);
}
if (i < input.Length && input[i] != '_')
sb.Append(input[i]);
return sb.ToString();
Now I know this is not totally related, but I thought to run some numbers on the implementations provided in the answers, and here are the results in Milliseconds for each implementation using 1000000 iterations of the string: "_my_string_121_a_" :
Achilles: 313
Raj: 870
Damian: 7916
Dmitry: 5380
Equalsk: 574
method utilised:
Stopwatch stp = new Stopwatch();
stp.Start();
for (int i = 0; i < 1000000; i++)
{
sb = Test("_my_string_121_a_");
}
stp.Stop();
long timeConsumed= stp.ElapsedMilliseconds;
In the end I think I'll go with Raj's implementation, because it's just very simple and easy to understand.
This must do it using ToTitleCase using System.Globalization namespace
static string toCamel(string input)
{
TextInfo info = CultureInfo.CurrentCulture.TextInfo;
input= info.ToTitleCase(input).Replace("_", string.Empty);
return input;
}
Shorter (regular expressions), but I doubt if it's better (regular expressions are less readable):
string source = "_my_string_123__A_";
// MyString123A
string result = Regex
// _ + lower case Letter -> upper case letter (thanks to Wiktor Stribiżew)
.Replace(source, #"(_+|^)(\p{Ll})?", match => match.Groups[2].Value.ToUpper())
// all the other _ should be just removed
.Replace("_", "");
Loops over each character and converts to uppercase as necessary.
public string GetNewString(string input)
{
var convert = false;
var sb = new StringBuilder();
foreach (var c in input)
{
if (c == '_')
{
convert = true;
continue;
}
if (convert)
{
sb.Append(char.ToUpper(c));
convert = false;
continue;
}
sb.Append(c);
}
return sb.ToString().First().ToString().ToUpper() + sb.ToString().Substring(1);
}
Usage:
GetNewString("my_string");
GetNewString("___this_is_anewstring_");
GetNewString("___this_is_123new34tring_");
Output:
MyString
ThisIsAnewstring
ThisIs123new34tring
Try with Regex:
var regex = new Regex("^[a-z]|_[a-z]?");
var result = regex.Replace("my_string_1234", x => x.Value== "_" ? "" : x.Value.Last().ToString().ToUpper());
Tested with:
my_string -> MyString
_my_string -> MyString
_my_string_ -> MyString
You need convert snake case to camel case, You can use this code it's working for me
var x ="_my_string_".Split(new[] {"_"}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => char.ToUpperInvariant(s[0]) + s.Substring(1, s.Length - 1))
.Aggregate(string.Empty, (s1, s2) => s1 + s2);
x = MyString
static string toCamel(string input)
{
StringBuilder sb = new StringBuilder();
int i;
for (i = 0; i < input.Length; i++)
{
if ((i == 0) || (i > 0 && input[i - 1] == '_'))
sb.Append(char.ToUpper(input[i]));
else
sb.Append(char.ToLower(input[i]));
}
return sb.ToString();
}
My string is like this:
Abc , xyz , pqr
Final output:
Abc,xyz,pqr
i want to remove all trailing space(from front and end) from my word whenever i encounter comma in my string but condittion is if my string contain comma or space.
Eg:
Abc pqr, ttt ooo
output:
Abc,pqr,ttt,ooo
(no space before or after the word)
So all spaces and commas are separators and you want to remove all consecutive duplicates. You can use String.Split with StringSplitOptions.RemoveEmptyEntries and String.Join:
string[] parts = input.Split(new []{' ', ',' }, StringSplitOptions.RemoveEmptyEntries);
string result = string.Join(",", parts);
This is my favourite since it's readable, maintainable and efficient. I've tested it against a regex approach with a 60000-length string and 1000 repetitions:
Regex: 11.68 seconds
String.Split + String.Join: 1.28 seconds
But if the string is very large you might want to use a StringBuilder approach.
This is the best result so far:
public static string SplitAnyKeepSingleSeparator(string input, string separator, params char[] delimiter)
{
if(input == null) return null;
input = input.Trim(delimiter);
StringBuilder sb = new StringBuilder(input.Length);
int index = 0;
int delimiterIndex = input.IndexOfAny(delimiter);
while (delimiterIndex != -1)
{
string token = input.Substring(index, delimiterIndex - index);
sb.Append(token).Append(separator);
index = delimiterIndex + 1;
while (delimiter.Contains(input[index])) index++;
delimiterIndex = input.IndexOfAny(delimiter, index);
}
sb.Append(input.Substring(index));
return sb.ToString();
}
But with a 60,000 character-string it's still less efficient than the String.Split+Join approach.
The simplest and the best in perfomance:
private static string SplitWordsByComma(string s)
{
return Regex.Replace(s.Trim(' ', ','), #"(?<=\b\w+\b)[\s,]+", ",");
}
fastest (and properly working for bound cases):
private static string SplitWordsByComma(string s)
{
var sb = new StringBuilder(s.Length);
for (int i = 0; i < s.Length; i++)
{
while (i < s.Length && !char.IsLetter(s[i]))
{
i++;
}
while (i < s.Length && char.IsLetter(s[i]))
{
sb.Append(s[i++]);
}
sb.Append(',');
}
return sb.Remove(sb.Length - 1, 1).ToString();
}
This should work :)
string inputStr = "ABC, cde , fgh, IJk";
string outputStr = inputStr.Replace(' ', '');
OR
string outputStr = string.Join(",",(inputStr.Split(new char[] { ' ', ',' }, StringSplitOptions.RemoveEmptyEntries));
Try this one..
string input = "input:plumber, plumber output:plumber,,plumber";
input = input.Replace(" ", ",").Trim();
while (input.Contains(",,"))
{
input = input.Replace(",,", ",");
}
Edited...
I have tested the answers provided and compared with mine. (in VB.net)
Dim input As String = IO.File.ReadAllText("C:\Users\SARVESH\Desktop\abc.txt")
Dim stp As New Diagnostics.Stopwatch
stp.Start()
input = input.Replace(" ", ",").Trim()
While input.Contains(",,")
input = input.Replace(",,", ",")
End While
stp.Stop()
MessageBox.Show(stp.ElapsedMilliseconds)
input = IO.File.ReadAllText("C:\Users\SARVESH\Desktop\abc.txt")
stp.Reset()
stp.Restart()
input = System.Text.RegularExpressions.Regex.Replace(input.Trim(" "c, ","c), "(?<=\b\w+\b)[\s,]+", ",")
stp.Stop()
MessageBox.Show(stp.ElapsedMilliseconds)
input = IO.File.ReadAllText("C:\Users\SARVESH\Desktop\abc.txt")
stp.Reset()
stp.Restart()
Dim parts As String() = input.Split({" "c, ","c}, StringSplitOptions.RemoveEmptyEntries)
Dim result As String = String.Join(",", parts)
stp.Stop()
MessageBox.Show(stp.ElapsedMilliseconds)
Tested on a string of 10616532 characters.
Tim Schmelter' code works much faster... does the task in 141 ms.
Mine does the same in 512 ms.
regex does the same in 1221 ms.
This question already has answers here:
A faster way of doing multiple string replacements
(8 answers)
Closed 9 years ago.
I want to replace some invalid characters in the name of a file uploaded to my application.
I've searched up to something on the internet and found some complex algorithms to do it, here's one:
public static string RemoverAcentuacao(string palavra)
{
string palavraSemAcento = null;
string caracterComAcento = "áàãâäéèêëíìîïóòõôöúùûüçáàãâÄéèêëíìîïóòõÖôúùûÜç, ?&:/!;ºª%‘’()\"”“";
string caracterSemAcento = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC___________________";
if (!String.IsNullOrEmpty(palavra))
{
for (int i = 0; i < palavra.Length; i++)
{
if (caracterComAcento.IndexOf(Convert.ToChar(palavra.Substring(i, 1))) >= 0)
{
int car = caracterComAcento.IndexOf(Convert.ToChar(palavra.Substring(i, 1)));
palavraSemAcento += caracterSemAcento.Substring(car, 1);
}
else
{
palavraSemAcento += palavra.Substring(i, 1);
}
}
string[] cEspeciais = { "#39", "---", "--", "'", "#", "\r\n", "\n", "\r" };
for (int q = 0; q < cEspeciais.Length; q++)
{
palavraSemAcento = palavraSemAcento.Replace(cEspeciais[q], "-");
}
for (int x = (cEspeciais.Length - 1); x > -1; x--)
{
palavraSemAcento = palavraSemAcento.Replace(cEspeciais[x], "-");
}
palavraSemAcento = palavraSemAcento.Replace("+", "-").Replace(Environment.NewLine, "").TrimStart('-').TrimEnd('-').Replace("<i>", "-").Replace("<-i>", "-").Replace("<br>", "").Replace("--", "-");
}
else
{
palavraSemAcento = "indefinido";
}
return palavraSemAcento.ToLower();
}
There's a way to do it with a less complex algorithm?
I think this algorithm is very complex to something not too complex, but I can't think in something diferent of this.
I want to replace some invalid characters in the name of a file
if this is really what you want then it is easy
string ToLegalFileName(string s)
{
var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
return String.Join("", s.Select(c => invalidChars.Contains(c) ? '_' : c));
}
if your intent is to replace accented chars with their ascii counterparts then
string RemoverAcentuacao(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
}
and this is the 3rd version which replaces accented chars + other chars with '_'
string RemoverAcentuacao2(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
.Select(c => char.IsLetterOrDigit(c) ? c : '_')
.Select(c => (int)c < 128 ? c : '_'));
}
A solution using regular expressions:
string ReplaceSpecial(string input, string replace, char replacewith)
{
char[] back = input.ToCharArray();
var matches = Regex.Matches(String.Format("[{0}]", replace), input);
foreach (var i in matches)
back[i.Index] = replacewith;
return new string(back);
}
A somewhat simpler solution using String.Replace:
string ReplaceSpecial(string input, char[] replace, char replacewith)
{
string back = input;
foreach (char i in replace)
back.Replace(i, replacewith);
return back;
}
static string RemoverAcentuacao(string s)
{
string caracterComAcento = "áàãâäéèêëíìîïóòõôöúùûüçáàãâÄéèêëíìîïóòõÖôúùûÜç, ?&:/!;ºª%‘’()\"”“";
string caracterSemAcento = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC___________________";
return new String(s.Select(c =>
{
int i = caracterComAcento.IndexOf(c);
return (i == -1) ? c : caracterSemAcento[i];
}).ToArray());
}
Here is a really simple method that I've used recently.
I hope it meets your requirements. To be honest, the code is a bit difficult to read due to the language of the variable declarations.
List<char> InvalidCharacters = new List<char>() { 'a','b','c' };
static string StripInvalidCharactersFromField(string field)
{
for (int i = 0; i < field.Length; i++)
{
string s = new string(new char[] { field[i] });
if (InvalidCharacters.Contains(s))
{
field = field.Remove(i, 1);
i--;
}
}
return field;
}
Is there a better way to replace strings?
I am surprised that Replace does not take in a character array or string array. I guess that I could write my own extension but I was curious if there is a better built in way to do the following? Notice the last Replace is a string not a character.
myString.Replace(';', '\n').Replace(',', '\n').Replace('\r', '\n').Replace('\t', '\n').Replace(' ', '\n').Replace("\n\n", "\n");
You can use a replace regular expression.
s/[;,\t\r ]|[\n]{2}/\n/g
s/ at the beginning means a search
The characters between [ and ] are the characters to search for (in any order)
The second / delimits the search-for text and the replace text
In English, this reads:
"Search for ; or , or \t or \r or (space) or exactly two sequential \n and replace it with \n"
In C#, you could do the following: (after importing System.Text.RegularExpressions)
Regex pattern = new Regex("[;,\t\r ]|[\n]{2}");
pattern.Replace(myString, "\n");
If you are feeling particularly clever and don't want to use Regex:
char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";
string[] temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
s = String.Join("\n", temp);
You could wrap this in an extension method with little effort as well.
Edit: Or just wait 2 minutes and I'll end up writing it anyway :)
public static class ExtensionMethods
{
public static string Replace(this string s, char[] separators, string newVal)
{
string[] temp;
temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
return String.Join( newVal, temp );
}
}
And voila...
char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";
s = s.Replace(separators, "\n");
You could use Linq's Aggregate function:
string s = "the\nquick\tbrown\rdog,jumped;over the lazy fox.";
char[] chars = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
string snew = chars.Aggregate(s, (c1, c2) => c1.Replace(c2, '\n'));
Here's the extension method:
public static string ReplaceAll(this string seed, char[] chars, char replacementCharacter)
{
return chars.Aggregate(seed, (str, cItem) => str.Replace(cItem, replacementCharacter));
}
Extension method usage example:
string snew = s.ReplaceAll(chars, '\n');
This is the shortest way:
myString = Regex.Replace(myString, #"[;,\t\r ]|[\n]{2}", "\n");
Strings are just immutable char arrays
You just need to make it mutable:
either by using StringBuilder
go in the unsafe world and play with pointers (dangerous though)
and try to iterate through the array of characters the least amount of times. Note the HashSet here, as it avoids to traverse the character sequence inside the loop. Should you need an even faster lookup, you can replace HashSet by an optimized lookup for char (based on an array[256]).
Example with StringBuilder
public static void MultiReplace(this StringBuilder builder,
char[] toReplace,
char replacement)
{
HashSet<char> set = new HashSet<char>(toReplace);
for (int i = 0; i < builder.Length; ++i)
{
var currentCharacter = builder[i];
if (set.Contains(currentCharacter))
{
builder[i] = replacement;
}
}
}
Edit - Optimized version (only valid for ASCII)
public static void MultiReplace(this StringBuilder builder,
char[] toReplace,
char replacement)
{
var set = new bool[256];
foreach (var charToReplace in toReplace)
{
set[charToReplace] = true;
}
for (int i = 0; i < builder.Length; ++i)
{
var currentCharacter = builder[i];
if (set[currentCharacter])
{
builder[i] = replacement;
}
}
}
Then you just use it like this:
var builder = new StringBuilder("my bad,url&slugs");
builder.MultiReplace(new []{' ', '&', ','}, '-');
var result = builder.ToString();
Ohhh, the performance horror!
The answer is a bit outdated, but still...
public static class StringUtils
{
#region Private members
[ThreadStatic]
private static StringBuilder m_ReplaceSB;
private static StringBuilder GetReplaceSB(int capacity)
{
var result = m_ReplaceSB;
if (null == result)
{
result = new StringBuilder(capacity);
m_ReplaceSB = result;
}
else
{
result.Clear();
result.EnsureCapacity(capacity);
}
return result;
}
public static string ReplaceAny(this string s, char replaceWith, params char[] chars)
{
if (null == chars)
return s;
if (null == s)
return null;
StringBuilder sb = null;
for (int i = 0, count = s.Length; i < count; i++)
{
var temp = s[i];
var replace = false;
for (int j = 0, cc = chars.Length; j < cc; j++)
if (temp == chars[j])
{
if (null == sb)
{
sb = GetReplaceSB(count);
if (i > 0)
sb.Append(s, 0, i);
}
replace = true;
break;
}
if (replace)
sb.Append(replaceWith);
else
if (null != sb)
sb.Append(temp);
}
return null == sb ? s : sb.ToString();
}
}
You may also simply write these string extension methods, and put them somewhere in your solution:
using System.Text;
public static class StringExtensions
{
public static string ReplaceAll(this string original, string toBeReplaced, string newValue)
{
if (string.IsNullOrEmpty(original) || string.IsNullOrEmpty(toBeReplaced)) return original;
if (newValue == null) newValue = string.Empty;
StringBuilder sb = new StringBuilder();
foreach (char ch in original)
{
if (toBeReplaced.IndexOf(ch) < 0) sb.Append(ch);
else sb.Append(newValue);
}
return sb.ToString();
}
public static string ReplaceAll(this string original, string[] toBeReplaced, string newValue)
{
if (string.IsNullOrEmpty(original) || toBeReplaced == null || toBeReplaced.Length <= 0) return original;
if (newValue == null) newValue = string.Empty;
foreach (string str in toBeReplaced)
if (!string.IsNullOrEmpty(str))
original = original.Replace(str, newValue);
return original;
}
}
Call them like this:
"ABCDE".ReplaceAll("ACE", "xy");
xyBxyDxy
And this:
"ABCDEF".ReplaceAll(new string[] { "AB", "DE", "EF" }, "xy");
xyCxyF
Use RegEx.Replace, something like this:
string input = "This is text with far too much " +
"whitespace.";
string pattern = "[;,]";
string replacement = "\n";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Here's more info on this MSDN documentation for RegEx.Replace
Performance-Wise this probably might not be the best solution but it works.
var str = "filename:with&bad$separators.txt";
char[] charArray = new char[] { '#', '%', '&', '{', '}', '\\', '<', '>', '*', '?', '/', ' ', '$', '!', '\'', '"', ':', '#' };
foreach (var singleChar in charArray)
{
str = str.Replace(singleChar, '_');
}
string ToBeReplaceCharacters = #"~()##$%&+,'"<>|;\/*?";
string fileName = "filename;with<bad:separators?";
foreach (var RepChar in ToBeReplaceCharacters)
{
fileName = fileName.Replace(RepChar.ToString(), "");
}
A .NET Core version for replacing a defined set of string chars to a specific char. It leverages the recently introduced Span type and string.Create method.
The idea is to prepare a replacement array, so no actual comparison operations would be required for the each string char. Thus, the replacement process reminds the way a state machine works. In order to avoid initialization of all items of the replacement array, let's store oldChar ^ newChar (XOR'ed) values there, what gives the following benefits:
If a char is not changing: ch ^ ch = 0 - no need to initialize non-changing items
The final char can be found by XOR'ing: ch ^ repl[ch]:
ch ^ 0 = ch - not changed chars case
ch ^ (ch ^ newChar) = newChar - replaced char
So the only requirement would be to ensure that the replacement array is zero-ed when initialized. We'll be using ArrayPool<char> to avoid allocations each time the ReplaceAll method is called. And, in order to ensure that the arrays are zero-ed without expensive call to Array.Clear method, we'll be maintaining a pool dedicated for the ReplaceAll method. We'll be clearing the replacement array (exact items only) before returning it to the pool.
public static class StringExtensions
{
private static readonly ArrayPool<char> _replacementPool = ArrayPool<char>.Create();
public static string ReplaceAll(this string str, char newChar, params char[] oldChars)
{
// If nothing to do, return the original string.
if (string.IsNullOrEmpty(str) ||
oldChars is null ||
oldChars.Length == 0)
{
return str;
}
// If only one character needs to be replaced,
// use the more efficient `string.Replace`.
if (oldChars.Length == 1)
{
return str.Replace(oldChars[0], newChar);
}
// Get a replacement array from the pool.
var replacements = _replacementPool.Rent(char.MaxValue + 1);
try
{
// Intialize the replacement array in the way that
// all elements represent `oldChar ^ newChar`.
foreach (var oldCh in oldChars)
{
replacements[oldCh] = (char)(newChar ^ oldCh);
}
// Create a string with replaced characters.
return string.Create(str.Length, (str, replacements), (dst, args) =>
{
var repl = args.replacements;
foreach (var ch in args.str)
{
dst[0] = (char)(repl[ch] ^ ch);
dst = dst.Slice(1);
}
});
}
finally
{
// Clear the replacement array.
foreach (var oldCh in oldChars)
{
replacements[oldCh] = char.MinValue;
}
// Return the replacement array back to the pool.
_replacementPool.Return(replacements);
}
}
}
I know this question is super old, but I want to offer 2 options that are more efficient:
1st off, the extension method posted by Paul Walls is good but can be made more efficient by using the StringBuilder class, which is like the string data type but made especially for situations where you will be changing string values more than once. Here is a version I made of the extension method using StringBuilder:
public static string ReplaceChars(this string s, char[] separators, char newVal)
{
StringBuilder sb = new StringBuilder(s);
foreach (var c in separators) { sb.Replace(c, newVal); }
return sb.ToString();
}
I ran this operation 100,000 times and using StringBuilder took 73ms compared to 81ms using string. So the difference is typically negligible, unless you're running many operations or using a huge string.
Secondly, here is a 1 liner loop you can use:
foreach (char c in separators) { s = s.Replace(c, '\n'); }
I personally think this is the best option. It is highly efficient and doesn't require writing an extension method. In my testing this ran the 100k iterations in only 63ms, making it the most efficient.
Here is an example in context:
string s = "this;is,\ra\t\n\n\ntest";
char[] separators = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
foreach (char c in separators) { s = s.Replace(c, '\n'); }
Credit to Paul Walls for the first 2 lines in this example.
I also fiddled around with that problem, and found that most of the solutions here are very slow. The fastest one was actually the LINQ + Aggregate method that dodgy_coder posted.
But I thought, well that might be also quite heavy in memory allocations depending upon how many old characters there are. So I came out with this:
The idea here is to have a cached replacement map of the old characters for the current thread, to safe allocations. And other than that just working with a character array of the input that later on is returned as string again. Whereas the character array is modified as less as possible.
[ThreadStatic]
private static bool[] replaceMap;
public static string Replace(this string input, char[] oldChars, char newChar)
{
if (input == null) throw new ArgumentNullException(nameof(input));
if (oldChars == null) throw new ArgumentNullException(nameof(oldChars));
if (oldChars.Length == 1) return input.Replace(oldChars[0], newChar);
if (oldChars.Length == 0) return input;
replaceMap = replaceMap ?? new bool[char.MaxValue + 1];
foreach (var oldChar in oldChars)
{
replaceMap[oldChar] = true;
}
try
{
var count = input.Length;
var output = input.ToCharArray();
for (var i = 0; i < count; i++)
{
if (replaceMap[input[i]])
{
output[i] = newChar;
}
}
return new string(output);
}
finally
{
foreach (var oldChar in oldChars)
{
replaceMap[oldChar] = false;
}
}
}
For me this is at most two allocations for the actual input string to work on. A StringBuilder turned out to be much slower for me for some reasons. And it is 2 times faster than the LINQ variant.
No "Replace" (Linq only):
string myString = ";,\r\t \n\n=1;;2,,3\r\r4\t\t5 6\n\n\n\n7=";
char NoRepeat = '\n';
string ByeBye = ";,\r\t ";
string myResult = myString.ToCharArray().Where(t => !"STOP-OUTSIDER".Contains(t))
.Select(t => "" + ( ByeBye.Contains(t) ? '\n' : t))
.Aggregate((all, next) => (
next == "" + NoRepeat && all.Substring(all.Length - 1) == "" + NoRepeat
? all : all + next ) );
Having built my own solution, and looking at the solution used here, I leveraged an answer that isn't using complex code and is generally efficient for most parameters.
Cover base cases where other methods are more appropriate. If there are no chars to replacement, return the original string. If there is only one, just use the Replace method.
Use a StringBuilder and initialize the capacity to the length of the original string. After all, the new string being built will have the same length of the original string if its just chars being replaced. This ensure only 1 memory allocation is used for the new string.
Assuming that the 'char' length could be small or large will impact performance. Large collections are better with hashsets, while smaller collections are not. This is a near-perfect use case for Hybrid Dictionaries. They switch to using a Hash based lookup once the collection gets too large. However, we don't care about the value of the dictionary, so I just set it to "true".
Have different methods for StringBuilder verse just a string will prevent unnecessary memory allocation. If its just a string, don't instantiate a StringBuilder unless the base cases were checked. If its already a StringBuilder, then perform the replacements and return the StringBuilder itself (as other StringBuilder methods like Append do).
I put the replacement char first, and the chars to check at the end. This way, I can leverage the params keyword for easily passing additional strings. However, you don't have to do this if you prefer the other order.
namespace Test.Extensions
{
public static class StringExtensions
{
public static string ReplaceAll(this string str, char replacementCharacter, params char[] chars)
{
if (chars.Length == 0)
return str;
if (chars.Length == 1)
return str.Replace(chars[0], replacementCharacter);
StringBuilder sb = new StringBuilder(str.Length);
var searcher = new HybridDictionary(chars.Length);
for (int i = 0; i < chars.Length; i++)
searcher[chars[i]] = true;
foreach (var c in str)
{
if (searcher.Contains(c))
sb.Append(replacementCharacter);
else
sb.Append(c);
}
return sb.ToString();
}
public static StringBuilder ReplaceAll(this StringBuilder sb, char replacementCharacter, params char[] chars)
{
if (chars.Length == 0)
return sb;
if (chars.Length == 1)
return sb.Replace(chars[0], replacementCharacter);
var searcher = new HybridDictionary(chars.Length);
for (int i = 0; i < chars.Length; i++)
searcher[chars[i]] = true;
for (int i = 0; i < sb.Length; i++)
{
var val = sb[i];
if (searcher.Contains(val))
sb[i] = replacementCharacter;
}
return sb;
}
}
}