Regex not matching when input string contains an ampersand

Regex not matching when input string contains an ampersand - c#

I am trying to come up with a regex that starts with a letter followed by only letters, spaces, commas, dots, ampersands, apostrophes and hyphens.
However, the ampersand character is giving me headaches. Whenever it appears in the input string, the regex no longer matches.
I am using the regex in an ASP.net project using C# in the 'Format' property of a TextInput (a custom control created in the project). In it, I am using Regex.IsMatch(Text, Format) to match it.
For example, using this regex:
^[a-zA-Z][a-zA-Z&.,'\- ]*$
The results are:
John' william-david Pass
John, william'david allen--tony-'' Pass
John, william&david Fail
Whenever I put a & in the input string the regex no longer matches, but without it everything works fine.
How can I fix my issue? Why would the ampersand be causing a problem?
Notes:
I've tried to escape the ampersand with ^[a-zA-Z][a-zA-Z\&.,'\- ]*$ but it has the same issue
I've tried to put the ampersand at the beginning or end o ^[a-zA-Z][&a-zA-Z.,'\- ]*$ or ^[a-zA-Z][a-zA-Z.,'\-\& ]*$ but it also doesn't work

Your problem is somewhere else. The following expression evaluates to true:
Regex.IsMatch(#"John, william&david", #"^[a-zA-Z][a-zA-Z&.,'\- ]*$")
See https://dotnetfiddle.net/WDvQNP

You mentioned in the comments that your problem pertains to C#, so I'll answer your question in that context. If ampersand (&) is truly giving you issues in your character class, you should specify it in an alternate manner.
Luckily, C# supports hex escape sequences which means that you can specifying & as \x26.
For example, instead of:
^[a-zA-Z][a-zA-Z&.,'\- ]*$
use
^[a-zA-Z][a-zA-Z\x26.,'\- ]*$
If that doesn't fix your issue, then your issue is not the &, it's something else.

Related

ASP.net core RegularException attribute - multiple conditions

I have two regex that should be matched:
"^[a-z0-9\\!#\\$\\^&\\-\\+%\\=_\\(\\)\\{\\}\\<\\>'\";\\:/\\.,~`\\|\\\\]+$"
and
".*(g[o0]+gle).*"
The first one accept any alpha numeric character (with few more extras). Like helloworld123. The second one should reject any string that contain the word "google" (in diffrent forms - like: gooo0gle).
Allowed:
hello
helloworld
helloworld123
Disallowed:
hellogoogle
google
...
I want to use the RegularExpression to match this string. Thought about something like:
[RegularExpression("^[a-z0-9\\!#\\$\\^&\\-\\+%\\=_\\(\\)\\{\\}\\<\\>'\";\\:/\\.,~`\\|\\\\]+$|.*(g[o0]+gle).*"]
But it's not working since the second part (.*(g[o0]+gle).*) should be NOT.
How to do it right?
Thanks.

You can use your second regex by placing it in a negative look ahead and use the first regex as character set and combine both to get following regex that you can use,
^(?!.*g[o0]+gle)[-a-z0-9!#$^&+%=_(){}<>'";:\/.,~`|]+$
Here, this (?!.*g[o0]+gle) negative look ahead will reject any strings that contains google or any variation as supported by your regex, and this character set [-a-z0-9!#$^&+%=_(){}<>'";:\/.,~|]+` will match one or more characters allowed by it.
Also, you don't need to escape most special characters while they are in character set, hence I have unescaped most of them except / and also always place the hyphen - either as the very first character or very last character in the character set, else depending upon the regex dialects, you may see weird behavior.
Regex Demo

Conditional match without false force a match?

I'm using the following regex in c# to match some input cases:
^
(?<entry>[#])?
(?(entry)(?<id>\w+))
(?<value>.*)
$
The options are ignoring pattern whitespaces.
My input looks as follows:
hello
#world
[xxx]
This all can be tested here: DEMO
My problem is that this regex will not match the last line. Why?
What I'm trying to do is to check for an entry character. If it's there I force an identifier by \w+. The rest of the input should be captured in the last group.
This is a simplyfied regex and simplyfied input.
The problem can be fixed if I change the id regex to something like (?(entry)(?<id>\w+)|), (?(entry)(?<id>\w+))? or (?(entry)(?<id>\w+)?).
I try to understand why the conditional group doesn't match as stated in original regex.
I'm firm in regex and know that the regex can be simplyfied to ^(\#(?<id>\w+))?(?<value>.*)$ to match my needs. But the real regex contains two more optional groups:
^
(?<entry>[#])?
(\?\:)?
(\(\?(?:\w+(?:-\w+)?|-\w+)\))?
(?(entry)(?<id>\w+))
(?<value>.*)
$
That's the reason why I'm trying to use a conditional match.
UPDATE 10/12/2018
I tested a little arround it. I found the following regex that should match on every input, even an empty one - but it doesn't:
(?(a)a).*
DEMO
I'm of the opinion that this is a bug in .net regex and reported it to microsoft: See here for more information

There is no error in the regex parser, but in one's usage of the . wildcard specifier. The . specifier will consume all characters, wait for it, except the linefeed character \n. (See Character Classes in Regular Expressions "the any character" .])
If you want your regex to work you need to consume all characters including the linefeed and that can be done by specify the option SingleLine. Which to paraphrase what is said
Singline tells the parser to handle the . to match all characters including the \n.
Why does it still fail when not in singleline mode for the other lines are consumed? That is because the final match actually places the current position at the \n and the only option (as specified is use) is the [.*]; which as we mentioned cannot consume it, hence stops the parser. Also the $ will lock in the operations at this point.
Let me demonstrate what is happening by a tool I have created which illustrates the issue. In the tool the upper left corner is what we see of the example text. Below that is what the parser sees with \r\n characters represented by ↵¶ respectively. Included in that pane is what happens to be matched at the time in yellow boxes enclosing the match. The middle box is the actual pattern and the final right side box shows the match results in detail by listening out the return structures and also showing the white space as mentioned.
Notice the second match (as index 1) has world in group capture id and value as ↵.
I surmise your token processor isn't getting what you want in the proper groups and because one doesn't actually see the successful match of value as the \r, it is overlooked.
Let us turn on Singline and see what happens.
Now everything is consumed, but there is a different problem. :-)

Use OR in Regex Expression

I have a regex to match the following:
somedomain.com/services/something
Basically I need to ensure that /services is present.
The regex I am using and which is working is:
\/services*
But I need to match /services OR /servicos. I tried the following:
(\/services|\/servicos)*
But this shows 24 matches?! https://regex101.com/r/jvB1lr/1
How to create this regex?

The (\/services|\/servicos)* matches 0+ occurrences of /services or /servicos, and that means it can match an empty string anywhere inside the input string.
You can group the alternatives like /(services|servicos) and remove the * quantifier, but for this case, it is much better to use a character class [oe] as the strings only differ in 1 char.
You want to use the following pattern:
/servic[eo]s
See the regex demo
To make sure you match a whole subpart, you may append (?:/|$) at the pattern end, /servic[eo]s(?:/|$).
In C#, you may use Regex.IsMatch with the pattern to see if there is a match in a string:
var isFound = Regex.IsMatch(s, #"/servic[eo]s(?:/|$)");
Note that you do not need to escape / in a .NET regex as it is not a special regex metacharacter.
Pattern details
/ - a /
servic[eo]s - services or servicos
(?:/|$) - / or end of string.

Well the * quantifier means zero or more, so that is the problem. Remove that and it should work fine:
(\/services|\/servicos)
Keep in mind that in your example, you have a typo in the URL so it will correctly not match anything as it stands.
Here is an example with the typo in the URL fixed, so it shows 1 match as expected.

First off you specify C# (really .Net is the library which holds regex not the language) in this post but regex101 in your example is set to PHP. That is providing you with invalid information such as needed to escape a forward slash / with \/ which is unnecessary in .Net regular expressions. The regex language is the same but there are different tools which behave differently and php is not like .Net regex.
Secondly the star * on the ( ) is saying that there may be nothing in the parenthesis and your match is getting null nothing matches on every word.
Thirdly one does not need to split the whole word. I would just extract the commonality in the words into a set [ ]. That will allow the "or-ness" you need to match on either services or servicos. Such as
(/servic[oe]s)
Will inform you if services are found or not. Nothing else is needed.

How to add apostrophe as allowed character to existing regular expression?

I have the following regular expression for one of my name fields in C# web app:
^[A-Za-zÀ-ſ0-9.,#&-/'_!#;]?[a-zA-ZÀ-ſ0-9 '#&-/.,_:!#;]*[A-Za-zÀ-ſ0-9.,#-/_!#;]$
How can I properly modify it to add apostrophe/single quote character (') as an allowed character to it?

' is used for declaring a char, so put a backslash in front of the ' to escape it, like this \'.

It turned out that the RegEx has been fine, and it was the way the data has been input into database that caused the problem. Insert statements should have the apostrophes escaped. Even though the apostrophes were getting displayed correctly, they had been failing the RegEx check due to lack of escaping apostrophes. Thanks for your advice and sorry in case of any disapointment!

Character 'e' is not recognized by simple regular expression - why?

I wrote a very simple regular expression that need to match the next pattern:
word.otherWord
- Word must have at least 2 characters and must not start with digit.
I wrote the next expression:
[a-zA-Z][a-zA-Z](.[a-zA-Z0-9])+
I tested it using Regex tester and it seems to be working at most of the cases but when I try some inputs that ends with 'e' it's not working.
for example:
Hardware.Make does not work but Hardware.Makee is works fine, why? How can I fix it?

That's because your regex looks for inputs which length is even.
You have two characters matched by [a-zA-Z][a-zA-Z] and then another two characters matched by (.[a-zA-Z0-9]) as a group which is repeated one or more times (because of +).
You can see it here: http://regex101.com/r/fW2bC1
I think you need that:
[a-zA-Z]+(\.[a-zA-Z0-9]+)+

Actually, the dot is a regex metacharacter, which stands for "any character". You'll need to escape the dot.
For your situation, I'd do this:
[a-zA-Z]{2,}\.[a-zA-Z0-9]+
The {2,} means, at least 2 characters from the previous range.

In regex, the dot period is one of the most commonly used metacharacters and unfortunately also commonly misused metacharacter. The dot matches a single character without caring what that character is...
So u would also re-write it like
[a-zA-Z]+(\.[a-zA-Z0-9]+)+

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.