URL match regex pattern - c#

I have a bunch of URLs that I need to filter out, based on whether it contains the keyword 'staff'
1. /services
2. /services/EarNoseThroat
3. /services/EarNoseThroat/Audiology
4. /services/EarNoseThroat/Audiology/CochlearImplant
5. /services/BehavioralHealth/Clinic
6. /services/BehavioralHealth/Clinic/staff
7. /services/BehavioralHealth/Clinic/staff/Jamie-Hudgins
I want to create one regex pattern to match all the URLs that have /services after the host URL, but not 'staff' anywhere in the URL. Basically match URLS 1 to 5.
I also need a pattern than only match URL 6 and 7.
It seems like the negative lookahead will do the trick, except I don't know how to put it together. Can someone help me out?
Something like:
^\/services\/(?:[^\/]+\/?)*$
OR
^/services\/...any Depth here...\/(?!staff)

Regex to match the following:
/services
/services/EarNoseThroat
/services/EarNoseThroat/Audiology
/services/EarNoseThroat/Audiology/CochlearImplant
/services/BehavioralHealth/Clinic
Regex:
^\/services\/(?!.*\bstaff\b).*$
Click for Demo
Explanation:
^ - asserts the start of the string
\/services\/ - matches /services/
(?!.*\bstaff\b) - negative lookahead to make sure that the word staff does not appear anywhere in the string
.* - matches 0+ occurrences of any character except a newline character
$ - asserts the end of string
Regex to match the following:
/services/BehavioralHealth/Clinic/staff
/services/BehavioralHealth/Clinic/staff/Jamie-Hudgins
Regex:
^\/services\/(?=.*\bstaff\b).*$
Click for Demo
Explanation:
The only difference is the positive lookahead:
(?=.*\bstaff\b) - positive lookahead to make sure that the word staff appears somewhere in the string before the end of the string

Related

Regex to find a hypen not sourrounded by charachters

I need help to build a regular expression to find all hyphens whose previous and next characters are not a-z and A-Z. The following are the examples in which hyphens should be found.
This is - test
this is -test
this is- test
this is 2- test
this is 2 -test
this is 2-2 test
Following is the example in which hyphen is ignored:
this is-test
So far I am able to write this:
(?<=[^a-z])-(?=[^a-z])
And this is only searching the following hyphens in the lines:
This is - test
this is 2- test
this is 2-2 test
Many thanks.
First, instead of using a negated class in a positive Lookahead/Lookbehind, you could use negative Lookaheads/Lookbehinds instead (unless you want to make sure that the hyphen is preceded and followed by something). Now, your pattern means:
Match a hyphen that is not preceded by [a-z] and not followed
by [a-z].
Whereas you seem to be looking for:
Match a hyphen that is not both preceded by [a-z] and followed by
[a-z] at the same time.
In which case, you may use the following:
(?<![a-z])-|-(?![a-z])
Demo.
Or if you want to keep the positive Lookarounds with negated classes:
(?<=[^a-z])-|-(?=[^a-z])
Note: You mentioned that you want to check for both a-z and A-Z but in your example, you only used a-z. To check for both, you may replace [a-z] with [a-zA-Z] in the pattern above.

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

C# equivalent for this regex pattern

I have this regular expression pattern: .{2}\#.{2}\K|\..*(*SKIP)(?!)|.(?=.*\.)
It works perfectly to convert to replace the matches to get
trabc#abtrec.com.lo => ***bc#ab*****.com.lo
demomail#demodomain.com => ******il#de*********.com
But when I try to use it on C# the \K and the (*SKIP) and (*F) are not allowed.
what will be the c# version of this pattern? or do you know a simpler way to mask the email without the unsupported pattern entries?
Demo
UPDATE:
(*SKIP): this verb causes the match to fail at the current starting position in the subject if the rest of the pattern does not match
(*F): Forces a matching failure at the given position in the pattern (the same as (?!)
Try this regex:
\w(?=.{2,}#)|(?<=#[^\.]{2,})\w
Click for Demo
Explanation:
\w - matches a word character
(?=.{2,}#) - positive lookahead to find the position immediately followed by 2+ occurrences of any character followed by #
| - OR
(?<=#[^\.]{2,}) - positive lookbehind to find the position immediately preceded by # followed by 2+ occurrences of any character that is not a .
\w - matches a word character.
Replace each match with a *
You can achieve the same result with a regex that matches items in one block, and applying a custom match evaluator:
var res = Regex.Replace(
s
, #"^.*(?=.{2}\#.{2})|(?<=.{2}\#.{2}).*(?=.com.*$)"
, match => new string('*', match.ToString().Length)
);
The regex has two parts:
The one on the left ^.*(?=.{2}\#.{2}) matches the user name portion except the last two characters
The one on the right (?<=.{2}\#.{2}).*(?=.com.*$) matches the suffix of the domain up to the ".com..." ending.
Demo.

Regex: Few Matches with * Quantifier

My regex is ending by quantifier * .
But I have few matches in a string. How can I make so it still found all matches ? My regex:
((CMD1|CMD2)+(?::|;)+.*)
And the test string is "cmd1: test. test. test cmd2: test2. test2. test2"
So I need to get matches:
cmd1: test. test. test
cmd2: test2. test2. test2
Commands could be random words like "Look", "Take", "Go". There could be n-occurance of any commands in one string.
Example:
Go: some sentences. and more. Take: other more sentences, and even more text here. Look: more and more. and more.
You could use a positive lookahead:
\w+:.*?(?= \w+:|$)
Match a word character one or more times \w+
Match a colon :
Match any character zero or more times .*
Make it non greedy ?
A positive lookahead which asserts a word character one or more times \w+ followed by a colon : or | the end of the sting (?= \w+:|$)
Demo
A general rule when writing regex is that when you want to find all occurrences of a pattern and put each pattern into its own match, you write a regex for that pattern, not that pattern quantified * times. Otherwise, you will end up putting the whole string into one single match.
I edited the regex for you:
CMD(?:1|2)(?::|;).*?(?=$|CMD)
The beginning is pretty much self-explanatory. Towards the end, I matched . with a lazy quantifier *?. This will stop matching as soon as the string after it matches the lookahead. The lookahead just matches another CMD or the end of the string.
Remember to turn on case insensitive option!
string s = "Go: some sentences. and more. Take: other more sentences, and even more text here. Look: more and more. and more.";
var matches = Regex.Matches(s, #"(?i)(go|take|look):.+?(?=\s+\w+:)");
You can remove \s+, but in this case you should call Trim on result string.

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

Categories