Convert an Extended Regular expression to .NET compatible RegEx - c#

I've got a RegEx that works great on *NIX systems and languages that support Extended Regular Expressions (ERE). I haven't found a freely available library for .NET that supports ERE's, nor have I had any lucky trying to translate this into a non-ERE and get the same result. Here is the RegEx:
^\+(<{7} \.|={7}$|>{7} \.)
Background: the point of the RegEx is to identify if a given string appears to have the markers from an unresolved subversion merge.

It looks to me like ERE syntax is mostly upward-compatible with .NET's regex flavor, as it is with most other "Perl-compatible" flavors (Perl, PHP, Python, JavaScript, Ruby, Java...). In other words, anything you can do in an ERE regex, you should be able to do in an identical .NET regex. Certainly your example:
^\+(<{7} \.|={7}$|>{7} \.)
means the same thing in .NET as it does in ERE. The only major exception I can see is in the area of POSIX bracket expressions; .NET follows the Unicode standard instead.
It's when you go to apply the regex that things really get different. In C# you might apply that regex like this:
string result = Regex.Match(targetString, #"^\+(<{7} \.|={7}$|>{7} \.)").Value;
C#'s verbatim strings save you having to escape backslashes like in some other languages' string literals; you only have to escape quotation marks, which you do by doubling them:
#"He said, ""Look out!""";
Does that answer your question?

Are you sure you don't have a typo in that? RegexBuddy (when set to either POSIX ERE or GNU ERE) says that the "+" quantifier must be preceded by a token that can be repeated. Other than that, this appears to be a valid .NET Regex. You might want to check out one of the great O'Reilly books on regular expressions as well. If this doesn't help, please post some examples of text you're trying to match/not match.

Related

What kind of regex should I use to find if a word contains only several characters [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
What is this?
This is a collection of common Q&A. This is also a Community Wiki, so everyone is invited to participate in maintaining it.
Why is this?
regex is suffering from give me ze code type of questions and poor answers with no explanation. This reference is meant to provide links to quality Q&A.
What's the scope?
This reference is meant for the following languages: php, perl, javascript, python, ruby, java, .net.
This might be too broad, but these languages share the same syntax. For specific features there's the tag of the language behind it, example:
What are regular expression Balancing Groups? .net
The Stack Overflow Regular Expressions FAQ
See also a lot of general hints and useful links at the regex tag details page.
Online tutorials
RegexOne ↪
Regular Expressions Info ↪
Quantifiers
Zero-or-more: *:greedy, *?:reluctant, *+:possessive
One-or-more: +:greedy, +?:reluctant, ++:possessive
?:optional (zero-or-one)
Min/max ranges (all inclusive): {n,m}:between n & m, {n,}:n-or-more, {n}:exactly n
Differences between greedy, reluctant (a.k.a. "lazy", "ungreedy") and possessive quantifier:
Greedy vs. Reluctant vs. Possessive Quantifiers
In-depth discussion on the differences between greedy versus non-greedy
What's the difference between {n} and {n}?
Can someone explain Possessive Quantifiers to me? php, perl, java, ruby
Emulating possessive quantifiers .net
Non-Stack Overflow references: From Oracle, regular-expressions.info
Character Classes
What is the difference between square brackets and parentheses?
[...]: any one character, [^...]: negated/any character but
[^] matches any one character including newlines javascript
[\w-[\d]] / [a-z-[qz]]: set subtraction .net, xml-schema, xpath, JGSoft
[\w&&[^\d]]: set intersection java, ruby 1.9+
[[:alpha:]]:POSIX character classes
[[:<:]] and [[:>:]] Word boundaries
Why do [^\\D2], [^[^0-9]2], [^2[^0-9]] get different results in Java? java
Shorthand:
Digit: \d:digit, \D:non-digit
Word character (Letter, digit, underscore): \w:word character, \W:non-word character
Whitespace: \s:whitespace, \S:non-whitespace
Unicode categories (\p{L}, \P{L}, etc.)
Escape Sequences
Horizontal whitespace: \h:space-or-tab, \t:tab
Newlines:
\r, \n:carriage return and line feed
\R:generic newline php java-8
Negated whitespace sequences: \H:Non horizontal whitespace character, \V:Non vertical whitespace character, \N:Non line feed character pcre php5 java-8
Other: \v:vertical tab, \e:the escape character
Anchors
anchor
matches
flavors
^
Start of string
Common*
^
Start of line
Commonm
$
End of line
Commonm
$
End of text
Common* except javascript
$
Very end of string
javascript*, phpD
\A
Start of string
Common except javascript
\Z
End of text
Common except javascript python
\Z
Very end of string
python
\z
Very end of string
Common except javascript python
\b
Word boundary
Common
\B
Not a word boundary
Common
\G
End of previous match
Common except javascript, python
Term
Definition
Start of string
At the very start of the string.
Start of line
At the very start of the string, andafter a non-terminal line terminator.
Very end of string
At the very end of the string.
End of text
At the very end of the string, andat a terminal line terminator.
End of line
At the very end of the string, andat a line terminator.
Word boundary
At a word character not preceded by a word character, andat a non-word character not preceded by a non-word character.
End of previous match
At a previously set position, usually where a previous match ended.At the very start of the string if no position was set.
"Common" refers to the following: icu java javascript .net objective-c pcre perl php python swift ruby
* Default |
m Multi-line mode. |
D Dollar end only mode.
Groups
(...):capture group, (?:):non-capture group
Why is my repeating capturing group only capturing the last match?
\1:backreference and capture-group reference, $1:capture group reference
What's the meaning of a number after a backslash in a regular expression?
\g<1>123:How to follow a numbered capture group, such as \1, with a number?: python
What does a subpattern (?i:regex) mean?
What does the 'P' in (?P<group_name>regexp) mean?
(?>):atomic group or independent group, (?|):branch reset
Equivalent of branch reset in .NET/C# .net
Named capture groups:
General named capturing group reference at regular-expressions.info
java: (?<groupname>regex): Overview and naming rules (Non-Stack Overflow links)
Other languages: (?P<groupname>regex) python, (?<groupname>regex) .net, (?<groupname>regex) perl, (?P<groupname>regex) and (?<groupname>regex) php
Lookarounds
Lookaheads: (?=...):positive, (?!...):negative
Lookbehinds: (?<=...):positive, (?<!...):negative
Lookbehind limits in:
Lookbehinds need to be constant-length php, perl, python, ruby
Lookarounds of limited length {0,n} java
Variable length lookbehinds are allowed .net
Lookbehind alternatives:
Using \K php, perl (Flavors that support \K)
Alternative regex module for Python python
The hacky way
JavaScript negative lookbehind equivalents External link
Modifiers
flag
modifier
flavors
a
ASCII
python
c
current position
perl
e
expression
php perl
g
global
most
i
case-insensitive
most
m
multiline
php perl python javascript .net java
m
(non)multiline
ruby
o
once
perl ruby
r
non-destructive
perl
S
study
php
s
single line
ruby
U
ungreedy
php r
u
unicode
most
x
whitespace-extended
most
y
sticky ↪
javascript
How to convert preg_replace e to preg_replace_callback?
What are inline modifiers?
What is '?-mix' in a Ruby Regular Expression
Other:
|:alternation (OR) operator, .:any character, [.]:literal dot character
What special characters must be escaped?
Control verbs (php and perl): (*PRUNE), (*SKIP), (*FAIL) and (*F)
php only: (*BSR_ANYCRLF)
Recursion (php and perl): (?R), (?0) and (?1), (?-1), (?&groupname)
Common Tasks
Get a string between two curly braces: {...}
Match (or replace) a pattern except in situations s1, s2, s3...
How do I find all YouTube video ids in a string using a regex?
Validation:
Internet: email addresses, URLs (host/port: regex and non-regex alternatives), passwords
Numeric: a number, min-max ranges (such as 1-31), phone numbers, date
Parsing HTML with regex: See "General Information > When not to use Regex"
Advanced Regex-Fu
Strings and numbers:
Regular expression to match a line that doesn't contain a word
How does this PCRE pattern detect palindromes?
Match strings whose length is a fourth power
How does this regex find triangular numbers?
How to determine if a number is a prime with regex?
How to match the middle character in a string with regex?
Other:
How can we match a^n b^n?
Match nested brackets
Using a recursive pattern php, perl
Using balancing groups .net
“Vertical” regex matching in an ASCII “image”
List of highly up-voted regex questions on Code Golf
How to make two quantifiers repeat the same number of times?
An impossible-to-match regular expression: (?!a)a
Match/delete/replace this except in contexts A, B and C
Match nested brackets with regex without using recursion or balancing groups?
Flavor-Specific Information
(Except for those marked with *, this section contains non-Stack Overflow links.)
Java
Official documentation: Pattern Javadoc ↪, Oracle's regular expressions tutorial ↪
The differences between functions in java.util.regex.Matcher:
matches()): The match must be anchored to both input-start and -end
find()): A match may be anywhere in the input string (substrings)
lookingAt(): The match must be anchored to input-start only
(For anchors in general, see the section "Anchors")
The only java.lang.String functions that accept regular expressions: matches(s), replaceAll(s,s), replaceFirst(s,s), split(s), split(s,i)
*An (opinionated and) detailed discussion of the disadvantages of and missing features in java.util.regex
.NET
How to read a .NET regex with look-ahead, look-behind, capturing groups and back-references mixed together?
Official documentation:
Boost regex engine: General syntax, Perl syntax (used by TextPad, Sublime Text, UltraEdit, ...???)
JavaScript general info and RegExp object
.NET MySQL Oracle Perl5 version 18.2
PHP: pattern syntax, preg_match
Python: Regular expression operations, search vs match, how-to
Rust: crate regex, struct regex::Regex
Splunk: regex terminology and syntax and regex command
Tcl: regex syntax, manpage, regexp command
Visual Studio Find and Replace
General information
(Links marked with * are non-Stack Overflow links.)
Other general documentation resources: Learning Regular Expressions, *Regular-expressions.info, *Wikipedia entry, *RexEgg, Open-Directory Project
DFA versus NFA
Generating Strings matching regex
Books: Jeffrey Friedl's Mastering Regular Expressions
When to not use regular expressions:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (blog post written by Stack Overflow's founder)*
Do not use regex to parse HTML:
Don't. Please, just don't
Well, maybe...if you're really determined (other answers in this question are also good)
Examples of regex that can cause regex engine to fail
Why does this regular expression kill the Java regex engine?
Tools: Testers and Explainers
(This section contains non-Stack Overflow links.)
Online (* includes replacement tester, + includes split tester):
Debuggex (Also has a repository of useful regexes) javascript, python, pcre
*Regular Expressions 101 php, pcre, python, javascript, java
Regex Pal, regular-expressions.info javascript
Rubular ruby RegExr Regex Hero dotnet
*+ regexstorm.net .net
*RegexPlanet: Java java, Go go, Haskell haskell, JavaScript javascript, .NET dotnet, Perl perl php PCRE php, Python python, Ruby ruby, XRegExp xregexp
freeformatter.com xregexp
*+regex.larsolavtorvik.com php PCRE and POSIX, javascript
Offline:
Microsoft Windows: RegexBuddy (analysis), RegexMagic (creation), Expresso (analysis, creation, free)
MySQL 8.0: Various syntax changes were made. Note especially the doubling of backslashes in some contexts. (This Answer need further editing to reflect the differences.)

Why do these .NET capture group regular expression (regex) statements function differently? [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
What is this?
This is a collection of common Q&A. This is also a Community Wiki, so everyone is invited to participate in maintaining it.
Why is this?
regex is suffering from give me ze code type of questions and poor answers with no explanation. This reference is meant to provide links to quality Q&A.
What's the scope?
This reference is meant for the following languages: php, perl, javascript, python, ruby, java, .net.
This might be too broad, but these languages share the same syntax. For specific features there's the tag of the language behind it, example:
What are regular expression Balancing Groups? .net
The Stack Overflow Regular Expressions FAQ
See also a lot of general hints and useful links at the regex tag details page.
Online tutorials
RegexOne ↪
Regular Expressions Info ↪
Quantifiers
Zero-or-more: *:greedy, *?:reluctant, *+:possessive
One-or-more: +:greedy, +?:reluctant, ++:possessive
?:optional (zero-or-one)
Min/max ranges (all inclusive): {n,m}:between n & m, {n,}:n-or-more, {n}:exactly n
Differences between greedy, reluctant (a.k.a. "lazy", "ungreedy") and possessive quantifier:
Greedy vs. Reluctant vs. Possessive Quantifiers
In-depth discussion on the differences between greedy versus non-greedy
What's the difference between {n} and {n}?
Can someone explain Possessive Quantifiers to me? php, perl, java, ruby
Emulating possessive quantifiers .net
Non-Stack Overflow references: From Oracle, regular-expressions.info
Character Classes
What is the difference between square brackets and parentheses?
[...]: any one character, [^...]: negated/any character but
[^] matches any one character including newlines javascript
[\w-[\d]] / [a-z-[qz]]: set subtraction .net, xml-schema, xpath, JGSoft
[\w&&[^\d]]: set intersection java, ruby 1.9+
[[:alpha:]]:POSIX character classes
[[:<:]] and [[:>:]] Word boundaries
Why do [^\\D2], [^[^0-9]2], [^2[^0-9]] get different results in Java? java
Shorthand:
Digit: \d:digit, \D:non-digit
Word character (Letter, digit, underscore): \w:word character, \W:non-word character
Whitespace: \s:whitespace, \S:non-whitespace
Unicode categories (\p{L}, \P{L}, etc.)
Escape Sequences
Horizontal whitespace: \h:space-or-tab, \t:tab
Newlines:
\r, \n:carriage return and line feed
\R:generic newline php java-8
Negated whitespace sequences: \H:Non horizontal whitespace character, \V:Non vertical whitespace character, \N:Non line feed character pcre php5 java-8
Other: \v:vertical tab, \e:the escape character
Anchors
anchor
matches
flavors
^
Start of string
Common*
^
Start of line
Commonm
$
End of line
Commonm
$
End of text
Common* except javascript
$
Very end of string
javascript*, phpD
\A
Start of string
Common except javascript
\Z
End of text
Common except javascript python
\Z
Very end of string
python
\z
Very end of string
Common except javascript python
\b
Word boundary
Common
\B
Not a word boundary
Common
\G
End of previous match
Common except javascript, python
Term
Definition
Start of string
At the very start of the string.
Start of line
At the very start of the string, andafter a non-terminal line terminator.
Very end of string
At the very end of the string.
End of text
At the very end of the string, andat a terminal line terminator.
End of line
At the very end of the string, andat a line terminator.
Word boundary
At a word character not preceded by a word character, andat a non-word character not preceded by a non-word character.
End of previous match
At a previously set position, usually where a previous match ended.At the very start of the string if no position was set.
"Common" refers to the following: icu java javascript .net objective-c pcre perl php python swift ruby
* Default |
m Multi-line mode. |
D Dollar end only mode.
Groups
(...):capture group, (?:):non-capture group
Why is my repeating capturing group only capturing the last match?
\1:backreference and capture-group reference, $1:capture group reference
What's the meaning of a number after a backslash in a regular expression?
\g<1>123:How to follow a numbered capture group, such as \1, with a number?: python
What does a subpattern (?i:regex) mean?
What does the 'P' in (?P<group_name>regexp) mean?
(?>):atomic group or independent group, (?|):branch reset
Equivalent of branch reset in .NET/C# .net
Named capture groups:
General named capturing group reference at regular-expressions.info
java: (?<groupname>regex): Overview and naming rules (Non-Stack Overflow links)
Other languages: (?P<groupname>regex) python, (?<groupname>regex) .net, (?<groupname>regex) perl, (?P<groupname>regex) and (?<groupname>regex) php
Lookarounds
Lookaheads: (?=...):positive, (?!...):negative
Lookbehinds: (?<=...):positive, (?<!...):negative
Lookbehind limits in:
Lookbehinds need to be constant-length php, perl, python, ruby
Lookarounds of limited length {0,n} java
Variable length lookbehinds are allowed .net
Lookbehind alternatives:
Using \K php, perl (Flavors that support \K)
Alternative regex module for Python python
The hacky way
JavaScript negative lookbehind equivalents External link
Modifiers
flag
modifier
flavors
a
ASCII
python
c
current position
perl
e
expression
php perl
g
global
most
i
case-insensitive
most
m
multiline
php perl python javascript .net java
m
(non)multiline
ruby
o
once
perl ruby
r
non-destructive
perl
S
study
php
s
single line
ruby
U
ungreedy
php r
u
unicode
most
x
whitespace-extended
most
y
sticky ↪
javascript
How to convert preg_replace e to preg_replace_callback?
What are inline modifiers?
What is '?-mix' in a Ruby Regular Expression
Other:
|:alternation (OR) operator, .:any character, [.]:literal dot character
What special characters must be escaped?
Control verbs (php and perl): (*PRUNE), (*SKIP), (*FAIL) and (*F)
php only: (*BSR_ANYCRLF)
Recursion (php and perl): (?R), (?0) and (?1), (?-1), (?&groupname)
Common Tasks
Get a string between two curly braces: {...}
Match (or replace) a pattern except in situations s1, s2, s3...
How do I find all YouTube video ids in a string using a regex?
Validation:
Internet: email addresses, URLs (host/port: regex and non-regex alternatives), passwords
Numeric: a number, min-max ranges (such as 1-31), phone numbers, date
Parsing HTML with regex: See "General Information > When not to use Regex"
Advanced Regex-Fu
Strings and numbers:
Regular expression to match a line that doesn't contain a word
How does this PCRE pattern detect palindromes?
Match strings whose length is a fourth power
How does this regex find triangular numbers?
How to determine if a number is a prime with regex?
How to match the middle character in a string with regex?
Other:
How can we match a^n b^n?
Match nested brackets
Using a recursive pattern php, perl
Using balancing groups .net
“Vertical” regex matching in an ASCII “image”
List of highly up-voted regex questions on Code Golf
How to make two quantifiers repeat the same number of times?
An impossible-to-match regular expression: (?!a)a
Match/delete/replace this except in contexts A, B and C
Match nested brackets with regex without using recursion or balancing groups?
Flavor-Specific Information
(Except for those marked with *, this section contains non-Stack Overflow links.)
Java
Official documentation: Pattern Javadoc ↪, Oracle's regular expressions tutorial ↪
The differences between functions in java.util.regex.Matcher:
matches()): The match must be anchored to both input-start and -end
find()): A match may be anywhere in the input string (substrings)
lookingAt(): The match must be anchored to input-start only
(For anchors in general, see the section "Anchors")
The only java.lang.String functions that accept regular expressions: matches(s), replaceAll(s,s), replaceFirst(s,s), split(s), split(s,i)
*An (opinionated and) detailed discussion of the disadvantages of and missing features in java.util.regex
.NET
How to read a .NET regex with look-ahead, look-behind, capturing groups and back-references mixed together?
Official documentation:
Boost regex engine: General syntax, Perl syntax (used by TextPad, Sublime Text, UltraEdit, ...???)
JavaScript general info and RegExp object
.NET MySQL Oracle Perl5 version 18.2
PHP: pattern syntax, preg_match
Python: Regular expression operations, search vs match, how-to
Rust: crate regex, struct regex::Regex
Splunk: regex terminology and syntax and regex command
Tcl: regex syntax, manpage, regexp command
Visual Studio Find and Replace
General information
(Links marked with * are non-Stack Overflow links.)
Other general documentation resources: Learning Regular Expressions, *Regular-expressions.info, *Wikipedia entry, *RexEgg, Open-Directory Project
DFA versus NFA
Generating Strings matching regex
Books: Jeffrey Friedl's Mastering Regular Expressions
When to not use regular expressions:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (blog post written by Stack Overflow's founder)*
Do not use regex to parse HTML:
Don't. Please, just don't
Well, maybe...if you're really determined (other answers in this question are also good)
Examples of regex that can cause regex engine to fail
Why does this regular expression kill the Java regex engine?
Tools: Testers and Explainers
(This section contains non-Stack Overflow links.)
Online (* includes replacement tester, + includes split tester):
Debuggex (Also has a repository of useful regexes) javascript, python, pcre
*Regular Expressions 101 php, pcre, python, javascript, java
Regex Pal, regular-expressions.info javascript
Rubular ruby RegExr Regex Hero dotnet
*+ regexstorm.net .net
*RegexPlanet: Java java, Go go, Haskell haskell, JavaScript javascript, .NET dotnet, Perl perl php PCRE php, Python python, Ruby ruby, XRegExp xregexp
freeformatter.com xregexp
*+regex.larsolavtorvik.com php PCRE and POSIX, javascript
Offline:
Microsoft Windows: RegexBuddy (analysis), RegexMagic (creation), Expresso (analysis, creation, free)
MySQL 8.0: Various syntax changes were made. Note especially the doubling of backslashes in some contexts. (This Answer need further editing to reflect the differences.)

C# Regex to match attributes [duplicate]

To review regular expresions I read this tutorial. Anyways that tutorial mentions that \b matches a word boundary (between \w and \W characters). That tutorial also gives a link where you can install expresso (program that helps when creating regular expressions).
So I have created my regular expressions in expresso and I do inded get a match. Now when I copy the same regex to visual studio I do not get a match. Take a look:
Why am I not getting a match? in the immediate window I am showing the content of variable output. In expresso I do get a match and in visual studio I don't. why?
The C# language and .NET Regular Expressions both have their own distinct set of backslash-escape sequences, but the C# compiler is intercepting the "\b" in your string and converting it into an ASCII backspace character so the RegEx class never sees it. You need to make your string verbatim (prefix with an at-symbol) or double-escape the 'b' so the backslash is passed to RegEx like so:
#"\bCOMPILATION UNIT";
Or
"\\bCOMPILATION UNIT"
I'll say the .NET RegEx documentation does not make this clear. It took me a while to figure this out at first too.
Fun-fact: The \r and \n characters (carriage-return and line-break respectively) and some others are recognized by both RegEx and the C# language, so the end-result is the same, even if the compiled string is different.
You should use #"\bCOMPILATION UNIT". This is a verbatim literal. When you do "\b" instead, it parses \b into a special character. You can also do "\\b", whose double backslash is parsed into a real backslash, but it's generally easier to just use verbatims when dealing with regex.

Coverting a basic search string to a regex

Say I have a search string like:
"Hello [NAME], how are you today? I am fine."
If I were to use a regex pattern to search text I would have to convert it to something like (assuming that '\ ' is a valid regex search for a single space):
"\Hello\ \[NAME\],\ how\ are\ you\ today\?\ I\ am\ fine."
Now before I go off and try to write a function to do this myself is anyone aware of something that already does this sort of conversion? (Eclipse does something a bit like this; it converts all its searches into regular expressions before searching, even if you're not setting the search pattern to be a regex).
I'm targetting C# in this instance but feel free to add for other languages as other people might be interested in a similar thing for Java, Python et al.
Regex.Escape(string) will return a Regex pattern that matches the supplied literal string.
Specifically, it escapes \*+?|{[()^$.# and white space.

Match domain names as regular expressions

can some tell how can i write regular expression matching abc.com.ae or abc.net.af or anything ,ae which is the last in the string is optional
this is successfull with / but not . don't know why
^[a-z]{1,25}.[a-z]{3}$
answer
^[a-z]{1,30}\.[a-z]{3}((\.)[a-z]{2})?$
The answer is: write \. instead of ., because . means “any character”.
You can match specific characters using the \u escape code followed immediately by the unicode number for that character in hex format and it must also be four digits only. I think a full stop is \u002E.
In your example where '/' works but not '.', this is because a '/' is recognised as a normal character and does not need to be escaped whereas the full stop has a different meaning in c# regex (it matches any character).
If youre still not sure, here is a useful guide to regex in C#: http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
And this one has examples of regex for URLs: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx
I sounds like you need some environment to learn regular expressions.
The best way to learn regular expressions is with an interactive regular expression simulator: type in an expression, and it will give you the results, instantly.
There are a few free (and not so free) regular expression simulators available:
Free web version from http://lumadis.be/regex/test_regex.php.
RegExBuddy from http://www.regexbuddy.com/. I am not affiliated with this program, but I have used it with good success in the past.

Categories