Regular expression to match capture group not preceded by certain characters

Regular expression to match capture group not preceded by certain characters - c#

I want to write a regex that will match if and only if the pattern is not preceded by the characters "Etc/".
Strings that should match:
GMT+01:00
UTC+01:00
UTC+01
+01:00
...
Strings that should not match:
Etc/GMT+01:00
Etc/UTC+01:00
Etc/UTC+01
...
This is what I have so far:
(?<!Etc\/)((UTC|GMT)?(\+|\-){1}(\d{1,2})(:|\.)?(\d{1,2})?)
The right part of the above regular expression already matches the UTC and GMT offset and covers all the cases I need. But I don't manage to implement the exceptions mentioned above.
I expected the above regex to not match the string Etc/GMT+1:00. But in fact it matches the part +01:00 and only ignores Etc/GMT.
How can I achieve that the the following regular expression does not match if it is preceded with "Etc/"?
(UTC|GMT)?(\+|\-){1}(\d{1,2})(:|\.)?(\d{1,2})?
Here I have an example with most of the use cases I need.

You may add \S* after Etc/ to make sure Etc/ is checked even if there are any zero or more non-whitespace chars between Etc/ and the expected match:
(?<!\bEtc/\S*)((UTC|GMT)?([+-])(\d{1,2})[:.]?(\d{1,2})?)
See the .NET regex demo
Details:
(?<!\bEtc/\S*) - a negative lookbehind that matches a location that is not immediately preceded with a whole word Etc/ and then zero or more non-whitespace chars
(UTC|GMT)? - an optional substring, UTC or GMT
([+-]) - + or -
(\d{1,2}) - one or two digits
[:.]? - an optional : or .
(\d{1,2})? - an optional sequence of one or two digits (equal to (\d{0,2})).

As you are already capturing all data in groups, another way could be getting all the matches of Etc/ out of the way, and use your pattern to capture what you want in the groups.
Note that you can change groupings of single chars like (:|\.) to a character class ([:.])
\bEtc/\S*|(UTC|GMT)?([+-])(\d{1,2})([:.])?(\d{1,2})?
\bEtc/\S* Match Etc/ and optional non whitespace chars
| Or
(UTC|GMT)?([+-])(\d{1,2})([:.])?(\d{1,2})? Your pattern with all the separate groups.
Regex demo
Or with just a single group:
\bEtc/\S*|((?:GMT|UTC)?\+\d{2}(?:[:.]\d{2})?)
Regex demo

Related

Match up to the comma - Regex

I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,]
This Pattern match just:
TCC 6005_5,
What should I change to the end to match these both strings:
TCC 6005-5 ,
TCC 6005_5,

You can add a non-greedy wildcard to your expression (.*?):
(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
^^^
This will now also match any characters between the last digit and the comma.
As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.
Try it online

This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]
To "match" both strings, you can match all parts instead of using a positive lookbehind.
\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,
See a regex demo
\bTCC A word boundary to prevent a partial match, then match TCC
(?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
[-_\s]\d{1,2} Match one of - _ or a whitespace char
\s?, Match an optional space and ,
Note that \s can also match a newline.
Using the lookbehind:
(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,
Regex demo
Or if you want to match 1 or more spaces except a newline
\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,
Regex demo

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.

Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.

One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).

If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.

This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.

This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

C# equivalent for this regex pattern

I have this regular expression pattern: .{2}\#.{2}\K|\..*(*SKIP)(?!)|.(?=.*\.)
It works perfectly to convert to replace the matches to get
trabc#abtrec.com.lo => ***bc#ab*****.com.lo
demomail#demodomain.com => ******il#de*********.com
But when I try to use it on C# the \K and the (*SKIP) and (*F) are not allowed.
what will be the c# version of this pattern? or do you know a simpler way to mask the email without the unsupported pattern entries?
Demo
UPDATE:
(*SKIP): this verb causes the match to fail at the current starting position in the subject if the rest of the pattern does not match
(*F): Forces a matching failure at the given position in the pattern (the same as (?!)

Try this regex:
\w(?=.{2,}#)|(?<=#[^\.]{2,})\w
Click for Demo
Explanation:
\w - matches a word character
(?=.{2,}#) - positive lookahead to find the position immediately followed by 2+ occurrences of any character followed by #
| - OR
(?<=#[^\.]{2,}) - positive lookbehind to find the position immediately preceded by # followed by 2+ occurrences of any character that is not a .
\w - matches a word character.
Replace each match with a *

You can achieve the same result with a regex that matches items in one block, and applying a custom match evaluator:
var res = Regex.Replace(
s
, #"^.*(?=.{2}\#.{2})|(?<=.{2}\#.{2}).*(?=.com.*$)"
, match => new string('*', match.ToString().Length)
);
The regex has two parts:
The one on the left ^.*(?=.{2}\#.{2}) matches the user name portion except the last two characters
The one on the right (?<=.{2}\#.{2}).*(?=.com.*$) matches the suffix of the domain up to the ".com..." ending.
Demo.

Regex to insert and replace characters in a string C#

I have a string which looks like this :-
"$.ConfigSettings.DatabaseSettings.DatabaseConnections.SqlConnectionString.0.Id"
and I want the result to look like this :-
"$.ConfigSettings.DatabaseSettings.DatabaseConnections.SqlConnectionString[0].Id"
Basically wherever there is a single digit preceded and succeeded by a period I need to change it to [digit] followed by period ie [digit]. .I have seen tons of examples where people are only replacing the regex string.
How will I do this using Regex.Replace in C#

Regex.Replace(input, #"\.(\d)(?=\.)", "[$1]")
\. - capture a "."
(\d) - then a single digit in a capturing group ($1 in the replacement)
(?= - start a positive lookahead
\. - that matches a "."
) - end the lookahead
So, it means : (match a dot followed by a digit in a capturing group) only if it is followed by a dot
So we matched ".0" and captured "0". We replace the entire match with "[$1]", where $1 refers to the first captured group.
See "Grouping Constructs in Regular Expressions" : https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx for information about the different grouping constructs that I use in this solution.

Cannot match parentheses in regex group

This is a regular expression, evaluated in .NET
I have the following input:
${guid->newguid()}
And I want to produce two matching groups, a character sequence after the ${ and before }, which are split by -> :
guid
newguid()
The pattern I am using is the following:
([^(?<=\${)(.*?)(?=})->]+)
But this doesn't match the parentheses, I am getting only the following matches:
guid
newguid
How can I modify the regex so I get the desired groups?

Your regex - ([^(?<=\${)(.*?)(?=})->]+) - match 1+ characters other than those defined in the negated character class (that is, 1 or more chars other than (, ?, <, etc).
I suggest using a matching regex like this:
\${([^}]*?)->([^}]*)}
See the regex demo
The results you need are in match.Groups[1] and match.Groups[2].
Pattern details:
\${ - match ${ literal character sequence
([^}]*?) - Group 1 capturing 0+ chars other than } as few as possible
-> - a literal char sequence ->
([^}]*) - Group 2 capturing 0+ chars other than } as many as possible
} - a literal }.
If you know that you only have word chars inside, you may simplify the regex to a mere
\${(\w+)->(\w+\(\))}
See the regex demo. However, it is much less generic.

Your input structure is always ${identifier->identifier()}? If this is the case, you can user ^\$\{([^-]+)->([^}]+)\}$.
Otherwise, you can modify your regexpr to ([^?<=\${.*??=}\->]+): using this rexexpr you should match input and get the desired groups: uid and newguid(). The key change is the quoting of - char, which is intendend as range operator without quoting and forces you to insert parenthesis in your pattern - but... [^......(....)....] excludes parenthesis from the match.
I hope than can help!
EDIT: testing with https://regex101.com helped me a lot... showing me that - was intended as range operator.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to match capture group not preceded by certain characters - c#

Related

Match up to the comma - Regex

Regex for alpha number string in c# accepting underscore and white spaces

C# equivalent for this regex pattern

Regex to insert and replace characters in a string C#

Cannot match parentheses in regex group

Categories

Resources