regex ismatch logic with special character

regex ismatch logic with special character - c#

I know this statement should return false as expected
Regex.IsMatch("+", #"[a-zA-Z0-9]")
but why these statements matches although they shouldn't (from my understanding)
Regex.IsMatch("C++", #"[a-zA-Z0-9]")
Regex.IsMatch("C++", #"[a-zA-Z0-9]+")

Those are matches because you don't match the entire string. They will match the C in C++.
Use ^ and $ to match the beginning and end of the string:
bool onlyAlphaNumeric = Regex.IsMatch("C++", #"^[a-zA-Z0-9]+$"); // will be false

Related

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.

^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line

use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*

If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo

You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Replace any character before <usernameredacted#example.com> with an empty string

I have this string
AnyText: "jonathon" <usernameredacted#example.com>
Desired Output Using Regex
AnyText: <usernameredacted#example.com>
Omit anything in between !
I am still a rookie at regular expressions. Could anyone out there help me with the matching & replacing expression for the above scenario?

Try this:
string input = "jonathon <usernameredacted#example.com>";
string output = Regex.Match(input, #"<[^>]+>").Groups[0].Value;
Console.WriteLine(output); //<usernameredacted#example.com>

You could use the following regex to match all the characters that you want to replace with an empty string:
^[^<]*
The first ^ is an anchor to the beginning of the string. The ^ inside the character class means that the character class is a negation. ie. any character that isn't an < will match. The * is a greedy quantifier. So in summary, this regex will swallow up all characters from the beginning of the string until the first <.

Here is the way to do it in VBA flavor: Replace "^[^""]*" with "".
^ marks the start of the sentence.
[^""]* marks anything other than a
quote sign.
UPDATE:
Since in your additional comment you mentioned you wanted to grab the "From:" and the email address, but none of the junk in between or after, I figure instead of replace, extract would be better. Here is a VBA function written for Excel that will give you back all the subgroup matches (everything you put in parenthesis) and nothing else.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Application.ScreenUpdating = False
Dim i As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Item(0).submatches.count - 1
result = result & allMatches.Item(0).submatches.Item(i)
Next
RegexExtract = result
Application.ScreenUpdating = True
End Function
Using this code, your regex call would be: "^(.+: ).+(<.+>).*"
^ denotes start of sentence
(.+: ) denotes first match group. .+ is one or more characters, followed by : and a space
.+ denotes one or more characters
(<.+>) denotes second match group.
< is <, then .+ for one or more characters, then the final >
.* denotes zero or more
characters.
So in excel you'd use (assuming cell is A1):
=RegexExtract(A1, "^(.+: ).+(<.+>).*")

Regex and the colon (:)

I have the following code. The idea is to detect whole words.
bool contains = Regex.IsMatch("Hello1 Hello2", #"\bHello\b"); // yields false
bool contains = Regex.IsMatch("Hello Hello2", #"\bHello\b"); // yields true
bool contains = Regex.IsMatch("Hello: Hello2", #"\bHello\b"); **// yields true, but should yield false**
Seems that Regex is ignoring the colon. How can I modify the code such that the last line will return false?

\b means "word boundary". : is not part of any word, so the expression is true.
Maybe you want an expression like this:
(^|\s)Hello(\s|$)
Which means: the string "Hello", preceded by either the start of the expression or a whitespace, and followed by either the end of the expression or a whitespace.

The Regex isn't ignoring the colon. The position before the colon is where \b matches, because \b matches word-boundaries. That means the position between a word-character and a non-word-chracter.
If you want Whitespace to follow after your word 'Hello', than use "\bHello\s".

To match a whole word not directly followed with a colon, use
\bHello\b(?!:)
\bHello(?![:\w])
See the regex demo. Details:
\b - a word boundary
Hello - a word
(?![:\w]) - a negative lookahead that fails the match if there is : or a word char immediately to the right of the current location.
Se the C# code demo:
bool contains = Regex.IsMatch("Hello: Hello2", #"\bHello\b");
Console.WriteLine(contains); // => False
Console.WriteLine(Regex.IsMatch("Hello: Hello2", #"\bHello(?![:\w])"));
// => False

How to ignore regex matches in C#?

An input string:
string datar = "aag, afg, agg, arg";
I am trying to get matches: "aag" and "arg", but following won't work:
string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";
What is the correct way of ignoring regex matches in C#?

The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :
string regr = "a[a-z](?<!f|g)g"
Explanation :
a Match the character "a"
[a-z] Match a single character in the range between "a" and "z"
(?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
f|g Match the character "f" or match the character "g"
g Match the character "g"

Character classes aren't quite that fancy. The simple solution is:
a[a-eh-z]g
If you really want to explicitly list out the letters that don't belong, you could try something like:
a[^\W\d_A-Zfg]g
This character class matches everything except:
\W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
\d removes digits so now we have letters and the underscore _.
_ removes the underscore so now we only match letters.
A-Z removes uppercase letters so now we only match lowercase letters.
Finally at this point we can list the individual lowercase letters we don't want to match.
All in all way more complicated than we'd likely ever want. That's regular expressions for ya!

What you're using is Java's set intersection syntax:
a[a-z&&[^fg]]g
..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:
a[a-z-[fg]]g
...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').
Java demo:
String s = "aag, afg, agg, arg, a%g";
Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
System.out.println(m.group());
}
C# demo:
string s = #"aag, afg, agg, arg, a%g";
foreach (Match m in Regex.Matches(s, #"a[a-z-[fg]]g"))
{
Console.WriteLine(m.Value);
}
Output of both is
aag
arg

Try this if you want match arg and aag:
a[ar]g
If you want to match everything except afg and agg, you need this regex:
a[^fg]g

It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:
string regr = "a[a-eh-z]g";

Regex: a[a-eh-z]g.
Then use Regex.Matches to get the matched substrings.

regular expression that matches a string which comprises of only specific letters

I've tried several regex combinations to figure out this, but some or the condition fails,
I have an input string, that could only contain a given set of defined characters
lets say A , B or C in it.
how do I match for something like this?
ABBBCCC -- isMatch True
AAASDFDCCC -- isMatch false
ps. I'm using C#

^[ABC]+$
Should be enough: that is using a Character class or Character Set.
The Anchors '^' and '$' would be there only to ensure the all String contains only those characters from start to end.
Regex.Match("ABACBA", "^[ABC]+$"); // => matches
Meaning: a Character Set will not guarantee the order of he characters matched.
Regex.Match("ABACBA", "^A+B+C+$"); // => false
Would guarantee the order

I think you are looking for this:
Match m = Regex.Match("abracadabra", "^[ABC]*$");
if (m.Success) {
// Macth
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regex ismatch logic with special character - c#

I know this statement should return false as expected Regex.IsMatch("+", #"[a-zA-Z0-9]") but why these statements matches although they shouldn't (from my understanding) Regex.IsMatch("C++", #"[a-zA-Z0-9]") Regex.IsMatch("C++", #"[a-zA-Z0-9]+")

Those are matches because you don't match the entire string. They will match the C in C++. Use ^ and $ to match the beginning and end of the string: bool onlyAlphaNumeric = Regex.IsMatch("C++", #"^[a-zA-Z0-9]+$"); // will be false

Related

Regex problems with equal sign?

Replace any character before <usernameredacted#example.com> with an empty string

Regex and the colon (:)

How to ignore regex matches in C#?

regular expression that matches a string which comprises of only specific letters

Categories

Resources