Regex for special characters?

Regex for special characters? - c#

string Val = Regex.Replace(TextBox1.Text, #"[^a-z, A-z, 0-9]", string.Empty);
This expression does not match the character ^ and _. What should i do to match those values?
One more things is, If TextBox1.Text string value is more than 10, the last string value(11th string value) should match.

Note that the ^ is has special meaning when enclosed in square brackets. It means match everything but those specified in the character class, basically '[]'.
If you want to match "^" and "_", put the caret (^) in another position than after the opening bracket like so, using the repetition to restrict character length:
[\W_]
That will make sure the characters in the entire string are 10.
Or you escape it using the slash "\^".
string Val = Regex.Replace(TextBox1.Text, #"[\W_]", string.Empty);

Your problem is A-z.
This matches all ASCII letters A through Z, then the characters that lie between Z and a (which contain, among others, ^ and _), then all ASCII letters between a and z. This means that ^ and _ won't be matched by your regex (as well as the comma and space which you included in your regex as well).
To clarify, your regex could also have been written as
[^a-zA-Z0-9\[\\\]^_` ,]
You probably wanted
string Val = Regex.Replace(TextBox1.Text, #"[^a-zA-Z0-9]", string.Empty);

Related

Regex for including a special character

I have requirement to remove all the special characters from any string except " and ' .
ClientName = Regex.Replace(ClientName, #"\(.*?\)", " ").Trim();
This is the regex I am using. I want exclude all the special characters except " and '.
Example:
clientName= "S"unny, Cool. Mr"
Output should be
"S"unny Cool Mr"

Consider using the following pattern:
#"[^\p{L}\p{Nd}'""\s]+"
This will target all special characters while also excluding single and double quote, as well as whitespace.
string clientName = "S\"unny, Cool. Mr";
string output = Regex.Replace(clientName, #"[^\p{L}\p{Nd}'""]+", "");
Console.WriteLine(output);
This prints:
S"unny Cool Mr
The character classes \p{L} and \p{N} represent all Unicode letters and numbers, so placing them into a negative character class means remove anything which is not a number or letter.

Regex.IsMatch is not working when text including "$"

Regex.IsMatch method returns the wrong result while checking the following condition,
string text = "$0.00";
Regex compareValue = new Regex(text);
bool result = compareValue.IsMatch(text);
The above code returns as "False". Please let me know if i missed anything.

The Regex class has a special method for escaping characters in a pattern: Regex.Escape()
Change your code like this:
string text = "$0.00";
Regex compareValue = new Regex(Regex.Escape(text)); // Escape characters in text
bool result = compareValue.IsMatch(text);

"$" is a special character in C# regex. Escape it first.
Regex compareValue = new Regex(#"\$0\.00");
bool result = compareValue.IsMatch("$0.00");
Regex expressions: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

Both '.' and '$' are special characters and thus you need to escape them if you want to match the character itself. '.' matches any character and '$' matches the end of a string
see: https://regex101.com/r/pK2uY6/1

You have to escape $ since it is a special (reserved) character which means "end of string". In case . means just dot (say, decimal separator) you have to escape it as well (when not escaped, . means "any symbol"):
string pattern = #"\$0\.00";
bool result = RegEx.IsMatch(text, pattern);
As for your original pattern, it has no chance to match any string, since $0.00 means
$ end of string, followed by
0 zero
. any character
0 zero
0 zero
but end of string can't be followed by...

Regex matches unspecified ampersand character in C#.NET

I'm trying to match a set of characters with a pattern. But ampersand is matching without specifying. Could you please explain why Regex behaves like this?
string input = "<font face=\"Verdana\">É-øá-É-</font><font face=\"Arial\"> ;&: ant ;ghj\n</font>";
Regex Matcher = new Regex("</font><font face=\"[\\w\\s-_]+\">[ -,:;\\.\\r\\n\\/\\]\\)]+");
string output = Matcher.Match(input);
I need the output as
"</font><font face=\"Arial\"> ;"
since the matchable characters after font start tag doesn't contain & character.
But the actual output I'm getting is
"</font><font face=\"Myriad\"> ;&: "
Why this regex matches the & character too ?

You should escape the dash -.
[ -,
means match all character between the space and the comma.
SPACE => 32
COMMA => 44
APERSTAND => 38 (matches)

You have forgotten to escape the dash '-' Change to this:
Regex Matcher = new Regex("</font><font face=\"[\\w\\s-_]+\">[ \\-,:;\\r\\n\\/\\]\\)]+");

How to ignore regex matches in C#?

An input string:
string datar = "aag, afg, agg, arg";
I am trying to get matches: "aag" and "arg", but following won't work:
string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";
What is the correct way of ignoring regex matches in C#?

The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :
string regr = "a[a-z](?<!f|g)g"
Explanation :
a Match the character "a"
[a-z] Match a single character in the range between "a" and "z"
(?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
f|g Match the character "f" or match the character "g"
g Match the character "g"

Character classes aren't quite that fancy. The simple solution is:
a[a-eh-z]g
If you really want to explicitly list out the letters that don't belong, you could try something like:
a[^\W\d_A-Zfg]g
This character class matches everything except:
\W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
\d removes digits so now we have letters and the underscore _.
_ removes the underscore so now we only match letters.
A-Z removes uppercase letters so now we only match lowercase letters.
Finally at this point we can list the individual lowercase letters we don't want to match.
All in all way more complicated than we'd likely ever want. That's regular expressions for ya!

What you're using is Java's set intersection syntax:
a[a-z&&[^fg]]g
..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:
a[a-z-[fg]]g
...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').
Java demo:
String s = "aag, afg, agg, arg, a%g";
Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
System.out.println(m.group());
}
C# demo:
string s = #"aag, afg, agg, arg, a%g";
foreach (Match m in Regex.Matches(s, #"a[a-z-[fg]]g"))
{
Console.WriteLine(m.Value);
}
Output of both is
aag
arg

Try this if you want match arg and aag:
a[ar]g
If you want to match everything except afg and agg, you need this regex:
a[^fg]g

It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:
string regr = "a[a-eh-z]g";

Regex: a[a-eh-z]g.
Then use Regex.Matches to get the matched substrings.

Regex to match alphanumeric and spaces

What am I doing wrong here?
string q = "john s!";
string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty);
// clean == "johns". I want "john s";

just a FYI
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
would actually be better like
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);

This:
string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty);
\x20 is ascii hex for 'space' character
you can add more individual characters that you want to be allowed.
If you want for example "?" to be ok in the return string add \x3f.

I got it:
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
Didn't know you could put \s in the brackets

The following regex is for space inclusion in textbox.
Regex r = new Regex("^[a-zA-Z\\s]+");
r.IsMatch(textbox1.text);
This works fine for me.

I suspect ^ doesn't work the way you think it does outside of a character class.
What you're telling it to do is replace everything that isn't an alphanumeric with an empty string, OR any leading space. I think what you mean to say is that spaces are ok to not replace - try moving the \s into the [] class.

There appear to be two problems.
You're using the ^ outside a [] which matches the start of the line
You're not using a * or + which means you will only match a single character.
I think you want the following regex #"([^a-zA-Z0-9\s])+"

bottom regex with space, supports all keyboard letters from different culture
string input = "78-selim güzel667.,?";
Regex regex = new Regex(#"[^\w\x20]|[\d]");
var result= regex.Replace(input,"");
//selim güzel

The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.

This regex will help you to filter if there is at least one alphanumeric character and zero or more special characters i.e. _ (underscore), \s whitespace, -(hyphen)
string comparer = "string you want to compare";
Regex r = new Regex(#"^([a-zA-Z0-9]+[_\s-]*)+$");
if (!r.IsMatch(comparer))
{
return false;
}
return true;
Create a set using [a-zA-Z0-9]+ for alphanumeric characters, "+" sign (a quantifier) at the end of the set will make sure that there will be at least one alphanumeric character within the comparer.
Create another set [_\s-]* for special characters, "*" quantifier is to validate that there can be special characters within comparer string.
Pack these sets into a capture group ([a-zA-Z0-9]+[_\s-]*)+ to say that the comparer string should occupy these features.

[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")]
Above syntax also accepts space

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for special characters? - c#

string Val = Regex.Replace(TextBox1.Text, #"[^a-z, A-z, 0-9]", string.Empty); This expression does not match the character ^ and _. What should i do to match those values? One more things is, If TextBox1.Text string value is more than 10, the last string value(11th string value) should match.

Related

Regex for including a special character

Regex.IsMatch is not working when text including "$"

Regex matches unspecified ampersand character in C#.NET

How to ignore regex matches in C#?

Regex to match alphanumeric and spaces

Categories

Resources