Regex - optional suffix triggered by dash - c#

I have a simple format that I'm already validating, but want to allow the users to validate that format, but have an optional dash + whatever they want at the end while still validating the first part. I want the dash to be a trigger that tells the Regex that it can accept whatever comes after.
So if my existing regex is something like:
^\d{7}
then I want to be able to update my regex to pass these:
1234567-Covid19
1234567-Scenario
1234567-AnyString
but not these since they are missing the dash:
12345678
1234567*AnyString
1245567AnyString
Any help is much appreciated!

I fiddled around with a regex tester a bit and came up with the below to satisfy my initial question:
^\d{7}(-[A-Za-z0-9]{1,10})?
This optionally matches a dash + 1-10 additional alphanumeric characters.

Related

ASP.net core RegularException attribute - multiple conditions

I have two regex that should be matched:
"^[a-z0-9\\!#\\$\\^&\\-\\+%\\=_\\(\\)\\{\\}\\<\\>'\";\\:/\\.,~`\\|\\\\]+$"
and
".*(g[o0]+gle).*"
The first one accept any alpha numeric character (with few more extras). Like helloworld123. The second one should reject any string that contain the word "google" (in diffrent forms - like: gooo0gle).
Allowed:
hello
helloworld
helloworld123
Disallowed:
hellogoogle
google
...
I want to use the RegularExpression to match this string. Thought about something like:
[RegularExpression("^[a-z0-9\\!#\\$\\^&\\-\\+%\\=_\\(\\)\\{\\}\\<\\>'\";\\:/\\.,~`\\|\\\\]+$|.*(g[o0]+gle).*"]
But it's not working since the second part (.*(g[o0]+gle).*) should be NOT.
How to do it right?
Thanks.
You can use your second regex by placing it in a negative look ahead and use the first regex as character set and combine both to get following regex that you can use,
^(?!.*g[o0]+gle)[-a-z0-9!#$^&+%=_(){}<>'";:\/.,~`|]+$
Here, this (?!.*g[o0]+gle) negative look ahead will reject any strings that contains google or any variation as supported by your regex, and this character set [-a-z0-9!#$^&+%=_(){}<>'";:\/.,~|]+` will match one or more characters allowed by it.
Also, you don't need to escape most special characters while they are in character set, hence I have unescaped most of them except / and also always place the hyphen - either as the very first character or very last character in the character set, else depending upon the regex dialects, you may see weird behavior.
Regex Demo

Regex to allow periods unless it's alone

I'm trying to create a name verification regex that allows users to use names like St. Germain but I don't want names that are only a period like . which it currently accepts.
my current regex is /^[A-Za-z\ -\.\']+$/
Taken from #Mong Zhu's example but allowing first word without dots as well:
\w+\.?\s?\w+
Brief
Your current regex has a potential unwanted bug \ -., which will match any character in the range from space to dot. I'm not sure if this is the intended behaviour, if so, you can use the second regex below.
Code
Version 1
See regex in use here
^(?!\.+$)[a-zA-Z .'-]+$
Version 2
^(?!\.+$)[a-zA-Z -.']+$
Results
Input
username
.
Something.a
...
.Some
some. some
some.
Output
Note: Only matches are shown below
username
Something.a
.Some
some. some
some.
Explanation
^ Assert position at the start of the line
(?!\.+$) Negative lookahead ensuring what follows is not the dot character \. literally, one or more times, asserting the ending position at the end of the line
[a-zA-Z .'-]+ Any character in the set a-zA-Z .'- between one and unlimited times
$ Assert position at the end of the line
Additionally
You may want to use p{L} instead of a-zA-Z to accept foreign characters

Regex lookbeaind only when contains colon

Today I use c# Regex.IsMatch function to matching key:value format.
I have some code that checking if string format is: key:value (like: H:15).
The Regex pattern that I am using today is: [D,H,M,S]:[1-9]+\d?
I what to add the option for default key, when the input is 15, I would like to consider it like: H:15
So, I need to improve my Regex to support key:value or only value (without colon), H:15 is good and 15 is also good
I tried to use the or regex condition (|) something like : ([D,H,M,S]:[1-9]+\d?)|([1-9]+\d?)
But now it match more thinks like :1 and H:01 that are bad input for me.
I try to use also lookbehind regex without success
Any help would be greatly appreciated,
Nadav.
This should do the trick:
\b(?:[DHMS]:|(?<!:))[1-9][0-9]*\b
Demo
So, either match [DHMS]: or a word boundary not preceded by :.
Also, [1-9]+\d? looks very suspicious to me, so I replaced it with [1-9][0-9]*. Note that in .NET \d is not equivalent to [0-9] because it includes Unicode digits as well.
Looks like Avinash just beat me to it, but I added word boundaries with this expression, which works well in tests.
\b(?<=[DHMS]:)?[1-9]\d*\b
Seems like you wants something like this,
#"^(?:[DHMS]:)?[1-9]\d*$"
[DHMS] matches a single character from the given list. ? after the non-capturing group will turn the key part to an optional one. \d* matches zero or more digit characters.

Why is this regex not allowing this text?

I have a username validator IsValidUsername, and I am testing "baconman" but it is failing, could someone please help me out with this regex?
if(!Regex.IsMatch(str, #"^[a-zA-Z]\\w+|[0-9][0-9_]*[a-zA-Z]+\\w*$")) {
isValid = false;
}
I want the restrictions to be: (It's very close)
Be between 5 & 17 characters long
contain at least one letter
no spaces
no special characters
You're escaping unnecessarily: if you write your regex as starting with # outside the string, you don't need both \ - just one is fine.
Either:
#"\w"
or
"\\w"
Edit: I didn't make this clear: right now due to the double escaping, you're looking for a \ in your regex and a w. So your match would need [some character]\w to match (example: "a\w" or "a\wwwwww" would match.
Your requirements are best taken care of in normal C#. They don't map well to a regular expression. Just code them up using LINQ which works on strings like it would on an IEnumerable<char>.
Also, understanding a query of a string is much easier than understanding a Regex with the requirements that you have.
It is possible to do everything as part of a Regex, however it is not pretty :-)
^(\w(?=\w*[a-zA-Z])|[a-zA-Z]|\w(?<=[a-zA-Z]\w*)){5,17}$
It does 3 checks that always results in 1 character being matched (so we can perform the length check in the end)
Either the character is any word character \w which is before [a-zA-Z]
Or it is [a-zA-Z]
Or it is any word character \w which is after [a-zA-Z]

Validating email address with single character domain-names with a regex

I have a regex that I am using to validate email addresses. I like this regex because it is fairly relax and has proven to work quite well.
Here is the regex:
(['\"]{1,}.+['\"]{1,}\s+)?<?[\w\.\-]+#[^\.][\w\.\-]+\.[A-Za-z]{2,}>?
Ok great, basically all reasonably valid email addresses that you can throw at it will validate. I know that maybe even some invalid ones will fall through but that is ok for my specific use-case.
Now it happens to be the case that joe#x.com does not validate. And guess what x.com is actually a domain name that exists (owned by paypall).
Looking at the regex part that validates the domain name:
#[^\.][\w\.\-]+
It looks like this should be able to parse the x.com domain name, but it doesn't. The culprit is the part that checks that a domain name can not begin with a dot (such as test#.test.com)
#[^\.]
If I remove the [^.] part of my regex the domain x.com validates but now the regex allows domains names beginning with a dot, such as .test.com; this is a little bit too relax for me ;-)
So my question is how can the negative character list part affect my single character check, basically the way I am reading the regex is: "make sure this string does not start with a dot", but apparantly it does more.
Any help would be appreciated.
Regards,
Waseem
As Luis suggested, you can use [^\.][\w\.\-]* to match the domtain name, however it will now also match addresses like john#x.....com and john##.com. You might want to make sure that there is only one period at a time, and that the first character after the # is more restricted than just not being a period.
Match the domain name and the period (and subdomains and their periods) using:
([\w\-]+\.)+
So your pattern would be:
(['\"]{1,}.+['\"]{1,}\s+)?<?[\w\.\-]+#([\w\-]+\.)+[A-Za-z]{2,}>?
If you change [^\.][\w\.\-]+ to [^\.][\w\.\-]*, it will work as you expect!
The reason is: [^\.] will match a single character which is not a dot (in your case, the "x" on "x.com", then you will try to reach 1 or more characters, and then a dot. You will match the dot after the x, and there are no more dots to match. The * will match 0 or more characters after the first one, which is what you want.
Change the quantifier +, meaning one or more, to *, meaning zero or more.
Change #[^\.][\w\.\-]+ to #[^\.][\w\.\-]*
The reason you need this is that [^\.] says match a single character that is not a dot. Now there are no more characters left so the [\w\.\-]+ has nothing to match, even though the plus sign requires a minimum of one character. Changing the plus to a star fixes this.
Look at the broader context in your pattern:
#[^\.][\w\.\-]+\.[A-Za-z]{2,}
So for joe#x.com,
[^.] matches x
[\w.-]+ matches .
\. needs a dot but finds c
Change this part to #[^.][\w-]*\.[A-Za-z]{2,}

Categories