Regex pattern failing

Regex pattern failing - c#

I am trying to strip out all things that are in a string that are not a letter number or space so I created the regex
private static Regex _NonAlphaChars = new Regex("[^[A-Za-z0-9 ]]", RegexOptions.Compiled);
however When I call _NonAlphaChars.Replace("Scott,", ""); it returns "Scott,"
What am I doing wrong that it is not matching the ,?

private static Regex _NonAlphaChars =
new Regex("[^A-Za-z0-9 ]", RegexOptions.Compiled);

You did something funny with the double bracketing. Change it to just
[^A-Za-z0-9 ]
Dropping your original expression into The Regex Coach explained your regex as:
The regular expression is a sequence consisting of the expression '[[^A-Za-z0-9 ]' and the character ']'.
For contrast, the explanation of the alternative I wrote is:
The regular expression is a character class representing everything but the range of characters from the character 'A' to the character 'Z', the range of characters from the character 'a' to the character 'z', the range of characters from the character '0' to the character '9', and the character ' '.

Try this
[^A-Za-z0-9\s]
or
\W

Related

Search for string with few missing characters

I have a big array of strings (characters a-z, A-Z and digits 0-9). I also have an input string, but few characters are missing (Example: /np/tSt/ing/), '/' represents missing character. I have to find every string from my array who can be formed from input string by replacing '/' with any character (Example: inputString1, AnpAtStAngA, 1np2tSt3ing4...). Is there any non brute-force solution for this problem?

You can use regular expressions for this. Something like .np.tSt.ing.
public static Regex regex = new Regex(".np.tSt.ing.", RegexOptions.CultureInvariant | RegexOptions.Compiled);
...
bool IsMatch = regex.IsMatch(InputText);

Regex.IsMatch is not working when text including "$"

Regex.IsMatch method returns the wrong result while checking the following condition,
string text = "$0.00";
Regex compareValue = new Regex(text);
bool result = compareValue.IsMatch(text);
The above code returns as "False". Please let me know if i missed anything.

The Regex class has a special method for escaping characters in a pattern: Regex.Escape()
Change your code like this:
string text = "$0.00";
Regex compareValue = new Regex(Regex.Escape(text)); // Escape characters in text
bool result = compareValue.IsMatch(text);

"$" is a special character in C# regex. Escape it first.
Regex compareValue = new Regex(#"\$0\.00");
bool result = compareValue.IsMatch("$0.00");
Regex expressions: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

Both '.' and '$' are special characters and thus you need to escape them if you want to match the character itself. '.' matches any character and '$' matches the end of a string
see: https://regex101.com/r/pK2uY6/1

You have to escape $ since it is a special (reserved) character which means "end of string". In case . means just dot (say, decimal separator) you have to escape it as well (when not escaped, . means "any symbol"):
string pattern = #"\$0\.00";
bool result = RegEx.IsMatch(text, pattern);
As for your original pattern, it has no chance to match any string, since $0.00 means
$ end of string, followed by
0 zero
. any character
0 zero
0 zero
but end of string can't be followed by...

Add prefix to special characters with Regular Expressions

I have a list of special characters that includes ^ $ ( ) % . [ ] * + - ?. I want put % in front of this special characters in a string value.
I need this to generate a Lua script to use in Redis.
For example Test$String? must be change to Test%$String%?.
Is there any way to do this with regular expressions in C#?

In C#, you just need a Regex.Replace:
var LuaEscapedString = Regex.Replace(input, #"[][$^()%.*+?-]", "%$&");
See the regex demo
The [][$^()%.*+?-] character class will match a single character, either a ], [, $, ^, (, ), %, ., *, +, ? or - and will reinsert it back with the $& backreference in the replacement pattern pre-pending with a % character.
A lookahead is just a redundant overhead here (or a show-off trick for your boss).

You can use lookaheads and replace with %
/(?=[]*+$?)[(.-])/
Regex Demo
(?=[]*+$?)[(.-]) Postive lookahead, checks if the character following any one from the altenation []. If yes, substitutes with %

You can use this regex: ([\\^$()%.\\[\\]*+\\-?])
It will match and capture characters inside the character class. Then you can use $1 to reference the captured character and insert % before it, like so: %$1.
Here is an example code and demo:
string input = "Test$String?";
string pattern = "([\\^$()%.\\[\\]*+\\-?])";
string replacement = "%$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);

You can use (?=[\^\$()\%.[]*+-\?]) regex replaced String as "%"

Regex matches unspecified ampersand character in C#.NET

I'm trying to match a set of characters with a pattern. But ampersand is matching without specifying. Could you please explain why Regex behaves like this?
string input = "<font face=\"Verdana\">É-øá-É-</font><font face=\"Arial\"> ;&: ant ;ghj\n</font>";
Regex Matcher = new Regex("</font><font face=\"[\\w\\s-_]+\">[ -,:;\\.\\r\\n\\/\\]\\)]+");
string output = Matcher.Match(input);
I need the output as
"</font><font face=\"Arial\"> ;"
since the matchable characters after font start tag doesn't contain & character.
But the actual output I'm getting is
"</font><font face=\"Myriad\"> ;&: "
Why this regex matches the & character too ?

You should escape the dash -.
[ -,
means match all character between the space and the comma.
SPACE => 32
COMMA => 44
APERSTAND => 38 (matches)

You have forgotten to escape the dash '-' Change to this:
Regex Matcher = new Regex("</font><font face=\"[\\w\\s-_]+\">[ \\-,:;\\r\\n\\/\\]\\)]+");

How to ignore regex matches in C#?

An input string:
string datar = "aag, afg, agg, arg";
I am trying to get matches: "aag" and "arg", but following won't work:
string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";
What is the correct way of ignoring regex matches in C#?

The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :
string regr = "a[a-z](?<!f|g)g"
Explanation :
a Match the character "a"
[a-z] Match a single character in the range between "a" and "z"
(?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
f|g Match the character "f" or match the character "g"
g Match the character "g"

Character classes aren't quite that fancy. The simple solution is:
a[a-eh-z]g
If you really want to explicitly list out the letters that don't belong, you could try something like:
a[^\W\d_A-Zfg]g
This character class matches everything except:
\W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
\d removes digits so now we have letters and the underscore _.
_ removes the underscore so now we only match letters.
A-Z removes uppercase letters so now we only match lowercase letters.
Finally at this point we can list the individual lowercase letters we don't want to match.
All in all way more complicated than we'd likely ever want. That's regular expressions for ya!

What you're using is Java's set intersection syntax:
a[a-z&&[^fg]]g
..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:
a[a-z-[fg]]g
...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').
Java demo:
String s = "aag, afg, agg, arg, a%g";
Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
System.out.println(m.group());
}
C# demo:
string s = #"aag, afg, agg, arg, a%g";
foreach (Match m in Regex.Matches(s, #"a[a-z-[fg]]g"))
{
Console.WriteLine(m.Value);
}
Output of both is
aag
arg

Try this if you want match arg and aag:
a[ar]g
If you want to match everything except afg and agg, you need this regex:
a[^fg]g

It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:
string regr = "a[a-eh-z]g";

Regex: a[a-eh-z]g.
Then use Regex.Matches to get the matched substrings.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex pattern failing - c#

private static Regex _NonAlphaChars = new Regex("[^A-Za-z0-9 ]", RegexOptions.Compiled);

Try this [^A-Za-z0-9\s] or \W

Related

Search for string with few missing characters

Regex.IsMatch is not working when text including "$"

Add prefix to special characters with Regular Expressions

Regex matches unspecified ampersand character in C#.NET

How to ignore regex matches in C#?

Categories

Resources