Replacing special chars in a string with a single unique char - c#

I have a string like so:
string inputStr = "Name*&^%LastName*##";
The following Regex will replace all the special chars with a '-'
Regex rgx = new Regex("[^a-zA-Z0-9 - _]");
someStr = rgx.Replace(someStr, "-");
That produces an output something like:
Name---LastName---
How do I replace '---' with a single '-' so the output looks like this:
Name-LastName
So the question is how do I replace all the special chars with a single '-'?
Regards.

Try this
Regex rgx = new Regex("[^a-zA-Z0-9 \- _]+");//note - character is escaped
or
Regex rgx = new Regex("[^a-zA-Z0-9 _-]+");//or use - as last character
But this will give Name-LastName- Is this okay or..?
If you don't need - at last position you can use the following code as well. Credit goes to
#MatthewStrawbridge. You can see in comments.
string someStr = rgx.Replace(inputStr, "-").TrimEnd('-');
will output Name-LastName.
Edit: As #pguardiario pointed in comments updated my answer to escape - since range([]) has special meaning for - character. If we need - as a literal we need to escape it or make it first or last character of the character class in order to behave as literal.

Related

How replace whitespaces (unicode to utf-8) with a regex C#

I'm trying to do a replace regex in C #. The method that I'm trying to write replace some unicode character (spaces) by normal space in UTF-8.
Let me explain with code. I'm not good writting regular expressions, culture information and regex.
//This method replace white spaces in unicode by whitespaces UTF-8
public static string cleanUnicodeSpaces(string value)
{
//This first pattern works but, remove other special characteres
//For example: mark accents
//string pattern = #"[^\u0000-\u007F]+";
string cleaned = "";
string pattern = #"[^\u0020\u0009\u000D]+"; //Unicode characters
string replacement = ""; //Replace by UTF-8 space
Regex regex = new Regex(pattern);
cleaned = regex.Replace(value, replacement).Trim(); //Trim by quit spaces
return cleaned;
}
Unicode spaces
HT:U+0009 = Character tabulation
LF:U+000A = Line Feed
CR:U+000D = Carriage Return
What I doing wrong?
Source
Unicode Characteres: https://unicode-table.com/en
White Spaces:https://en.wikipedia.org/wiki/Whitespace_character
Regex: https://msdn.microsoft.com/es-es/library/system.text.regularexpressions.regex(v=vs.110).aspx
SOLUTION
Thanks to #wiktor-stribiżew and #mathias-r-jessen, solution:
string pattern = #"[\u0020\u0009\u000D\u00A0]+";
//I include \u00A0 for replace &nbsp
Your regex - [^\u0020\u0009\u000D]+ - is a negated character class that matches any 1+ chars other than a regular space (\u0020), tab (\u0009) and carriage return (\u000D). You actually are looking for a positive character class that would match one of the three chars you indicated (\x0A for a newline, \x0D for a carriage return and \x09 for a tab) in the question with a regular space (\x20).
You may just use
var res = Regex.Replace(s, #"[\x0A\x0D\x09]", " ");
See the regex demo

Regex.IsMatch is not working when text including "$"

Regex.IsMatch method returns the wrong result while checking the following condition,
string text = "$0.00";
Regex compareValue = new Regex(text);
bool result = compareValue.IsMatch(text);
The above code returns as "False". Please let me know if i missed anything.
The Regex class has a special method for escaping characters in a pattern: Regex.Escape()
Change your code like this:
string text = "$0.00";
Regex compareValue = new Regex(Regex.Escape(text)); // Escape characters in text
bool result = compareValue.IsMatch(text);
"$" is a special character in C# regex. Escape it first.
Regex compareValue = new Regex(#"\$0\.00");
bool result = compareValue.IsMatch("$0.00");
Regex expressions: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
Both '.' and '$' are special characters and thus you need to escape them if you want to match the character itself. '.' matches any character and '$' matches the end of a string
see: https://regex101.com/r/pK2uY6/1
You have to escape $ since it is a special (reserved) character which means "end of string". In case . means just dot (say, decimal separator) you have to escape it as well (when not escaped, . means "any symbol"):
string pattern = #"\$0\.00";
bool result = RegEx.IsMatch(text, pattern);
As for your original pattern, it has no chance to match any string, since $0.00 means
$ end of string, followed by
0 zero
. any character
0 zero
0 zero
but end of string can't be followed by...

Regex to remove specific string if exist

I wanna remove the -L from the end of my string if exists
So
ABCD => ABCD
ABCD-L => ABCD
at the moment I'm using something like the line below which uses the if/else type of arrangement in my Regex, however, I have a feeling that it should be way more easier than this.
var match = Regex.Match("...", #"(?(\S+-L$)\S+(?=-L)|\S+)");
How about just doing:
Regex rgx = new Regex("-L$");
string result = rgx.Replace("ABCD-L", "");
So basically: if the string ends with -L, replace that part with an empty string.
If you want to not only invoke the replacement at the end of the string, but also at the end of a word, you can add an additional switch to detect word boundaries (\b) in addition to the end of the string:
Regex rgx = new Regex("-L(\b|$)");
string result = rgx.Replace("ABCD-L ABCD ABCD-L", "");
Note that detecting word boundaries can be a little ambiguous. See here for a list of characters that are considered to be word characters in C#.
You also can use String.Replace() method to find a specific string inside a string and replace it with another string in this case with an empty string.
http://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx
Use Regex.Replace function,
Regex.Replace(string, #"(\S+?)-L(?=\s|$)", "$1")
DEMO
Explanation:
( group and capture to \1:
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times)
) end of \1
-L '-L'
(?= look ahead to see if there is:
\s whitespace (\n, \r, \t, \f, and " ")
| OR
$ before an optional \n, and the end of
the string
) end of look-ahead
You certainly can use Regex for this, but why when using normal string functions is clearer?
Compare this:
text = text.EndsWith("-L")
? text.Substring(0, text.Length - "-L".Length)
: text;
to this:
text = Regex.Replace(text, #"(\S+?)-L(?=\s|$)", "$1");
Or better yet, define an extension method like this:
public static string RemoveIfEndsWith(this string text, string suffix)
{
return text.EndsWith(suffix)
? text.Substring(0, text.Length - suffix.Length)
: text;
}
Then your code can look like this:
text = text.RemoveIfEndsWith("-L");
Of course you can always define the extension method using the Regex. At least then your calling code looks a lot cleaner and is far more readable and maintainable.

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Regex to match alphanumeric and spaces

What am I doing wrong here?
string q = "john s!";
string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty);
// clean == "johns". I want "john s";
just a FYI
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
would actually be better like
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);
This:
string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty);
\x20 is ascii hex for 'space' character
you can add more individual characters that you want to be allowed.
If you want for example "?" to be ok in the return string add \x3f.
I got it:
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
Didn't know you could put \s in the brackets
The following regex is for space inclusion in textbox.
Regex r = new Regex("^[a-zA-Z\\s]+");
r.IsMatch(textbox1.text);
This works fine for me.
I suspect ^ doesn't work the way you think it does outside of a character class.
What you're telling it to do is replace everything that isn't an alphanumeric with an empty string, OR any leading space. I think what you mean to say is that spaces are ok to not replace - try moving the \s into the [] class.
There appear to be two problems.
You're using the ^ outside a [] which matches the start of the line
You're not using a * or + which means you will only match a single character.
I think you want the following regex #"([^a-zA-Z0-9\s])+"
bottom regex with space, supports all keyboard letters from different culture
string input = "78-selim güzel667.,?";
Regex regex = new Regex(#"[^\w\x20]|[\d]");
var result= regex.Replace(input,"");
//selim güzel
The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.
This regex will help you to filter if there is at least one alphanumeric character and zero or more special characters i.e. _ (underscore), \s whitespace, -(hyphen)
string comparer = "string you want to compare";
Regex r = new Regex(#"^([a-zA-Z0-9]+[_\s-]*)+$");
if (!r.IsMatch(comparer))
{
return false;
}
return true;
Create a set using [a-zA-Z0-9]+ for alphanumeric characters, "+" sign (a quantifier) at the end of the set will make sure that there will be at least one alphanumeric character within the comparer.
Create another set [_\s-]* for special characters, "*" quantifier is to validate that there can be special characters within comparer string.
Pack these sets into a capture group ([a-zA-Z0-9]+[_\s-]*)+ to say that the comparer string should occupy these features.
[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")]
Above syntax also accepts space

Categories