Difference between the Regex expressions in dotnet

Difference between the Regex expressions in dotnet - c#

what is the difference between the two regex
new Regex(#"(([[[{""]))", RegexOptions.Compiled)
and
new Regex(#"(^[[[{""])", RegexOptions.Compiled)
I've used the both regex but can't find the difference. it's almost match similar things.

The regex patterns are not well written because
There are duplicate characters in character classes (thus redundant)
The first regex contains duplicate capture group on the whole pattern.
The first regex - (([[[{""])) - matches 1 character, either a [, a {, or a ", and captures it into Group 1 and Group 2. See demo. It is equal to
[[{"]
Demo
The second regex - (^[[[{""]) - only matches the same characters as the pattern above, but at the beginning of a string (if RegexOptions.Multiline is not set), or the beginning of a line (if that option is set). See demo. It is equal to
^[[{"]
See demo
You will access the matched characters using Regex.Match(s).Value.
More about anchors
Aslo see Caret ^: Beginning of String (or Line)

Related

Regex c# obtain subgroup of a captured group

It seems a simple question, but I don't think it is so easy.
From the example string AAACARACBBBBBDZAAAAEE, I want to extract the first 8 characters (= AAACARAC) and from this resulting 8-char long string, I want to extract everything except the leading 'A' characters (= CARAC).
I tried with this regex (?^[A]<WORD>\w{8}), but I dont know how to apply another regex on the captured group named WORD?

This is the regex you want:
(?=^.{8}(.*)$)A*(?<WORD>.*?)\1$
See a demo here (click then on "Table" for looking at the specific matches).
The regex firs will match the first eight characters looking for what comes next (matching this "tail" in the first capturing group), then will restart from the beginning of the string excluding all the trailing As and matching for as less character as possible such that these characters are followed by the same content of the first capturing group.

Using C#, you might also use a positive lookbehind to assert 8 chars to the left, matching optional A's and capture the chars that follow in a group.
^A*(?<WORD>[^\sA].*)(?<=^.{8})
^ Start of string
A* match optional repetitions of A
(?<WORD> Named group WORD
[^\sA].* Match any non whitespace char except A
) Close named group WORD
(?<=^.{8}) Assert 8 chars to the left of the current position
.NET regex demo
If you only want to match word characters:
^A*(?<WORD>[^\WA]\w*)(?<=^\w{8})
.NET Regex demo

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.

Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.

One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).

If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.

This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.

This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

C# equivalent for this regex pattern

I have this regular expression pattern: .{2}\#.{2}\K|\..*(*SKIP)(?!)|.(?=.*\.)
It works perfectly to convert to replace the matches to get
trabc#abtrec.com.lo => ***bc#ab*****.com.lo
demomail#demodomain.com => ******il#de*********.com
But when I try to use it on C# the \K and the (*SKIP) and (*F) are not allowed.
what will be the c# version of this pattern? or do you know a simpler way to mask the email without the unsupported pattern entries?
Demo
UPDATE:
(*SKIP): this verb causes the match to fail at the current starting position in the subject if the rest of the pattern does not match
(*F): Forces a matching failure at the given position in the pattern (the same as (?!)

Try this regex:
\w(?=.{2,}#)|(?<=#[^\.]{2,})\w
Click for Demo
Explanation:
\w - matches a word character
(?=.{2,}#) - positive lookahead to find the position immediately followed by 2+ occurrences of any character followed by #
| - OR
(?<=#[^\.]{2,}) - positive lookbehind to find the position immediately preceded by # followed by 2+ occurrences of any character that is not a .
\w - matches a word character.
Replace each match with a *

You can achieve the same result with a regex that matches items in one block, and applying a custom match evaluator:
var res = Regex.Replace(
s
, #"^.*(?=.{2}\#.{2})|(?<=.{2}\#.{2}).*(?=.com.*$)"
, match => new string('*', match.ToString().Length)
);
The regex has two parts:
The one on the left ^.*(?=.{2}\#.{2}) matches the user name portion except the last two characters
The one on the right (?<=.{2}\#.{2}).*(?=.com.*$) matches the suffix of the domain up to the ".com..." ending.
Demo.

Regular Expression that matches on values after a pipe in between brackets

I'm still learning a lot about regex, so please forgive any naivety.
I've been using this site to test:
http://www.systemtextregularexpressions.com/regex.match
Basically, I'm having issues writing a regular expression that will match on any value after a pipe in between brackets.
Given an example string of:
"<div> \n [dont1.dont2|match1|match2] |dont3 [dont4] dont5. \n </div>"
Expected output would be a collection:
match1,
match2
The closest I've been able to get so far is:
(?!\[.*(\|)\])(?:\|)([\w-_.,:']*)
Above gives me the values, including the pipes, and dont3.
I've also tried this guy:
\|(.*(?=\]))
but it outputs:
|match1|match2

Here's one way of doing it:
(?<=\[[^\]]*\|)[^\]|]*
Here's the meaning of the pattern:
(?<=\[[^\]]*\|) - Lookbehind expression to ensure that any match must be preceded by an open bracket, followed by any number of non-close-bracket characters, followed by a pipe character
(?<= ... ) - Declares a lookbehind expression. Something matching the lookbehind must immediately precede the text in order for it the match. However, the part matched by the lookbehind is not included in the resulting match.
\[ - Matches an open bracket character
[^\]]* - Matches any number of non-close-bracket characters
\| - Matches a pipe character
[^\]|]* - Matches any number of characters which are neither close brackets nor pipe characters.
The lookbehind is greedy, so it will allow for any number of pipes between the open bracket and the matching text.

try this:
\[.*?(?:\|(?<mydata>.*?))+\]
note: the online tool will only show you the last capture inside a quantifed () for a given match, but .NET will remember each capture of a group that matches multiple times

Try this:
^<div>\s*[^|]+|([^|]+)|([^|]+)

C# Regular expression to match on a character not following pairs of the same charcater

Objective: Regex Matching
For this example I'm interested in matching a "|" pipe character.
I need to match it if it's alone: "aaa|aaa"
I need to match it (the last pipe) only if it's preceded by pairs of pipe: (2,4,6,8...any even number)
Another way: I want to ignore ALL pipe pairs "||" (right to left)
or I want to select bachelor bars only (the odd man out)
string twomatches = "aaaaaaaaa||||**|**aaaaaa||**|**aaaaaa";
string onematch = "aaaaaaaaa||**|**aaaaaaa||aaaaaaaa";
string noMatch = "||";
string noMatch = "||||";
I'm trying to select the last "|" only when preceded by an even sequence of "|" pairs or in a string when a single bar exists by itself.
Regardless of the number of "|"

You may use the following regex to select just odd one pipe out:
(?<=(?<!\|)(?:\|{2})*)\|(?!\|)
See regex demo.
The regex breakdown:
(?<=(?<!\|)(?:\|{2})*) - if a pipe is preceded with an even number of pipes ((?:\|{2})* - 0 or more sequences of exactly 2 pipes) from a position that has no preceding pipe ((?<!\|))
\| - match an odd pipe on the right
(?!\|) - if it is not followed by another pipe.
Please note that this regex uses a variable-width look-behind and is very resource-consuming. I'd rather use a capturing group mechanism here, but it all depends on the actual purpose of matching that odd pipe.
Here is a modified version of the regex for removing the odd one out:
var s = "1|2||3|||4||||5|||||6||||||7|||||||";
var data = Regex.Replace(s, #"(?<!\|)(?<even_pipes>(?:\|{2})*)\|(?!\|)", "${even_pipes}");
Console.WriteLine(data);
See IDEONE demo. Here, the quantified part is moved from lookbehind to an even_pipes named capturing group, so that it could be restored with the backreference in the replaced string. Regexhero.net shows 129,046 iterations per second for the version with a capturing group and 69,206 with the original version with variable-width lookbehind.
Only use variable-width look-behind if it is absolutely necessary!

Oh, it's reopened! If you need better performance, also try this negative improved version.
\|(?!\|)(?<!(?:[^|]|^)(?:\|\|)*)
The idea here is to first match the last literal | at right side of a sequence or single | and execute a negated version of the lookbehind just after the match. This should perform considerably better.
\|(?!\|) matches literal | IF NOT followed by another pipe character (right most if sequence).
(?<!(?:[^|]|^)(?:\|\|)*) IF position right after the matched | IS NOT preceded by (?:\|\|)* any amount of literal || until a non| or ^ start.In other words: If this position is not preceded by an even amount of pipe characters.
Btw, there is no performance gain in using \|{2} over \|\| it might be better readable.
See demo at regexstorm

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Difference between the Regex expressions in dotnet - c#

what is the difference between the two regex new Regex(#"(([[[{""]))", RegexOptions.Compiled) and new Regex(#"(^[[[{""])", RegexOptions.Compiled) I've used the both regex but can't find the difference. it's almost match similar things.

Related

Regex c# obtain subgroup of a captured group

Regex for alpha number string in c# accepting underscore and white spaces

C# equivalent for this regex pattern

Regular Expression that matches on values after a pipe in between brackets

C# Regular expression to match on a character not following pairs of the same charcater

Categories

Resources