Regex stop Quantifer on True possible? - c#

I wanna stop the Quantifier if the statment are true. any know how?
XXXXXX, 20. September 2017 XXX XXXXXXXXX XX
MwSt. Nummer: CHE-XXX.XXX.XXX p.A. XXXXX XXXXXX XXXXX
Rechnungs Nr.321 XX XXXXX 32
XXXXXX, (?<Date>\d{2}.\s{1,}[A-z]{1,}\s{1,}\d{4})\s{1,}(?<CompanyName>.*)\n(?(?=Rechnungs Nr\.)Rechnungs Nr\.(?<BillNumber>\d{1,})|.*\n){1,}
My target is that:
XXXXXX, (?<Date>\d{2}.\s{1,}[A-z]{1,}\s{1,}\d{4})\s{1,}(?<CompanyName>.*)\n(?(?=Rechnungs Nr\.)Rechnungs Nr\.(?<BillNumber>\d{1,})|.*\n){2}
you see this is not dynamic and here is the problem. I wanna do it much times as possible. in some case {2} isnt enough. So i pick {1,}. The Problem here is the following text are match to. That is bad for me. I wanna do after that loop more loops for other text sequence. I only want match the digits ( in this example 321 ) After this Stop the if condition.
Thank you in advance.
You can get Output here: Regular Expression

As per my comment (see the demo on regex101.com):
XXXXXX,\s*
(?<Date>\d{2}.\s+[A-Za-z]+\s+\d{4})\s+
(?<CompanyName>.*)(?s:.*?)
Rechnungs\ Nr\.(?<BillNumber>\d+)
Broken down this says:
XXXXXX,\s* # XXXXXX, followed by spaces
(?<Date>\d{2}.\s+[A-Za-z]+\s+\d{4})\s+ # your original expression
# followed by at least one space
(?<CompanyName>.*) # rest of the line goes into
# group CompanyName
(?s:.*?) # DOTALL, lazily
Rechnungs\ Nr\.(?<BillNumber>\d+) # Rechnungs Nr.
# followed by digits
Letting aside some potential optimizations, the main idea was to use
(?s:.*?)
Which turns on the DOTALL mode for a group, meaning that inside that group the dot matches every charater (including newline characters). With the lazy quantifier (.*?) it expands as needed, even across multiple lines.
As an alternative, you could use [\s\S]*? which combines whitespaces and not whitespaces leading to all characters in the end.
Side note: \s{1,} is the same as \s+, \d{1,} is the same as \d+, [A-z] includes more characters then [A-Za-z].

I found fast and good way:
XXXXXX, (?<Date>\d{2}.\s+[A-z]+\s+\d{4})\s{1,}(?<CompanyName>.*)\n(?(?!Rechnungs Nr\.).*\n)Rechnungs Nr\.(?<BillNumber>\d+)
You can get Output here: Regular Expression

Related

Regex to match `xyz` in `abc|qw|xzy mno`

It's driving nuts.
The input strings are:
abc|qw|xzy mno
abc||xzy mno
abc|qw|xzy
abc|qw|
I need to extract the first word (if any) after the 2nd vertical bar, in all cases above xyz but in general words in multiple (natural) languages.
Also, all lines must be considered as a block so single line does not apply, iow, the EOL is the break to account for.
Thank you, guys.
You can use the following regexp with the RegexOptions.Multiline option.
(?<=^(?:[^|]*\|){2})\w+
(?<= begins a positive lookbehind, so this matches a word that must be preceded by the beginning of the line followed by two pipe-delimited sequences.

Regex pattern in C# with empty space

I am having issue with a reg ex expression and can't find the answer to my question.
I am trying to build a reg ex pattern that will pull in any matches that have # around them. for example #match# or #mt# would both come back.
This works fine for that. #.*?#
However I don't want matches on ## to show up. Basically if there is nothing between the pound signs don't match.
Hope this makes sense.
Thanks.
Please use + to match 1 or more symbols:
#+.+#+
UPDATE:
If you want to only match substrings that are enclosed with single hash symbols, use:
(?<!#)#(?!#)[^#]+#(?!#)
See regex demo
Explanation:
(?<!#)#(?!#) - a # symbol that is not preceded with a # (due to the negative lookbehind (?<!#)) and not followed by a # (due to the negative lookahead (?!#))
[^#]+ - one or more symbols other than # (due to the negated character class [^#])
#(?!#) - a # symbol not followed with another # symbol.
Instead of using * to match between zero and unlimited characters, replace it with +, which will only match if there is at least one character between the #'s. The edited regex should look like this: #.+?#. Hope this helps!
Edit
Sorry for the incorrect regex, I had not expected multiple hash signs. This should work for your sentence: #+.+?#+
Edit 2
I am pretty sure I got it. Try this: (?<!#)#[^#].*?#. It might not work as expected with triple hashes though.
Try:
[^#]?#.+#[^#]?
The [^ character_group] construction matches any single character not included in the character group. Using the ? after it will let you match at the beginning/end of a string (since it matches the preceeding character zero or more times. Check out the documentation here

Regular Expression to deny input of repeated characters

I want a regular expression which allows the uses to enter the following values. Minimum of Four and max of 30 characters and first character should be Upper Case.
Eg: John, Smith, Anderson, Emma
And I don't want the user to input the following types of values
Jooohnnnnnn, Smmmmith, Aaaanderson, Emmmmmmmmma
Can any one provide me with a regular expression? I search for quite some time but can't find working RegEx.
I need it for my ASP.net MVC application Model validation.
Thanks
Edited: I don't know how to check for repeated characters I just tried the following
#"^[A-Z]{1}[a-zA-Z ]{2,29}$"
The rules that I would like to add are
1. First character Upper case
2. 4-30 characters
3. No repeats of characters. Not greater than 2
To perform a check on your regex you can use a negative look ahead:
^(?!.*(.)\1{2})[A-Z][a-zA-Z ]{3,29}$
The look ahead (?!...) will fail the whole regex if what's inside it matches.
To look for repeated patterns, we use a capture group: (.)\1{2}. We capture the first character, then check if it is followed by (at least) two identical characters with the backreference \1.
See demo here.
Here is what you are looking for:
^ (?# Starting of name)
(?=[A-Z]) (?# Ensure it starts with capital A-Z without consuming the text)
(?i:([a-z]) (?# Following letters ignoring case)
(?!\1{2,}) (?# Letter cant be followed by previous letter more than twice)
){3,30} (?# Allow condition to be repeated 3 to 30 times)
$
Visual representation would look like follow:

Does regex + symbol apply to previous element only?

In order to match all strings beginning with 04 and only containing digits, will the following work?
Regex.IsMatch(str, "^04[0-9]+$")
Or is another set of brackets necessary?
Regex.IsMatch(str, "^04([0-9])+$")
In Regex:
[character_group]
Matches any single character in character_group.
\d
Matches any decimal digit.
+
Matches the previous element one or more times.
(subexpression)
Captures the matched subexpression and assigns it a ordinal number.
^
The match must start at the beginning of the string or line.
$
The match must occur at the end of the string or before \n at the end of the line or string.
so that this code could be helpful:
Regex.IsMatch(str, "^04\d+$")
and all of your code works correctly.
Your first regex is correct, but the second one isn't. It matches the same things as the first regex, but it does a lot of unnecessary work in the process. Check it out:
Regex.IsMatch("04123", #"^04([0-9])+$")
In this example, the 1 is captured in group #1, only to be overwritten by 2 and again by 3. It's almost never a good idea to add a quantifier to a capturing group. For a detailed explanation, read this.
But maybe it's precedence rules you're asking about. Quantifiers have higher precedence than concatenation, so there's no need to isolate the character class with parentheses (if that's what you're doing).

Regular expression a captured group with 1 to 5 ords

I have a sentence like 'This is [[a captured group]].' The number of words between the captured can be 1 to 5.
I want to pick out everything between the two brackets (including the brackets). I know I could use something like #"^.*(?<identifier>\[\[\.*\]\]).*$" but I want to try to be more precise so I thought this would work: #"^.*(?<identifier>\[\[\w*(\b\w*){0,4}\]\]).*$"
Can anyone see why this doesn't work? It captures if there's one word as in between the brackets but not multiple. I thought the (\b\w*){0,4} would allow for 0 to 4 more words.
Thanks, Bill N
I think you forget about word delimeters (\s):
^.*(?<identifier>\[\[\w+(\s+\b\w+){0,4}\]\]).*$
You problem is here:
(\b\w*){0,4}
This would not work since you have not allowed for spaces. Change it to:
(\s+\b\w*){0,4}
This will capture spaces but you can easily post-process (using Trim()).
You create more than one captured groups, one per bracket. Try this:
#"^.*(?<identifier>\[\[\w*(?:\s\w*){0,4}\]\]).*$"
(?:) This is a non capturing group, that not creates a variable, so that your result is still in the named group.
Update: And of course as the two other answers pointed out, your main problem is the missing \s I added this also to my solution.
Update2: The \b is not needed when the \s is added, so removed.
My preference would be something like this (untested):
^[^\[]*(?<identifier>\[\[\s*(\w+(?:\s+|(?=\]))){1,5}\]\])[\S\s]*$
^ # begin of string
[^\[]* # some optional not '[' chars
(?<identifier> # <ID> begin
\[\[ # '[['
\s* # some optional whitespace
(?:\w+ (?:\s+|(?=\])) ){1,5} # 1-5 words separated by spaces
\]\] # ']]'
) # end <ID>
[\S\s]* # some optional any chars
$
# end of string

Categories