Repeating pattern matching with Regex - c#

I am trying to validate an input with a regular expression. Up until now all my tests fail and as my experience with regex is limited I thought someone might be able to help me out.
Pattern: digit (possibly "," digit) (possibly ;)
A String may not begin with a ; and not end with a ;.
Digits are allowed to stand alone or with
My regEx (not working): ((\d)(,\d)?)(;?) the problem is it does not seem to check until the end of the string. Also the optional parts are giving me headaches.
Update: ^[0-9]+(,[0-9])?(;[0-9]+(,[0-9])?)+$this seems to work better but it does not match the single digit.
OK:
2,3;4,4;3,2
2,3
2
2,3;3;4,3
NOK:
2,3,,,,
2,3asfafafa
;2,3
2,3;;3,4
2,3;3,4;

Your ^[0-9]+(,[0-9])?(;[0-9]+(,[0-9])?)+$ regex matches 1 or more digits, then an optional sequence of , and 1 digit, followed with one or more similar sequences.
You need to match zero or more comma-separated numbers:
^\d+(?:,\d+)?(?:;\d+(?:,\d+)?)*$
^
See the regex demo
Now, tweaking part:
If only single-digit numbers should be matched, use ^\d(?:,\d)?(?:;\d(?:,\d)?)*$
If the comma-separated number pairs can have the second element empty, add ? after each ,\d (if single digit numbers are to be matched) or * (if the numbers can have more than one digit): ^\d(?:,\d?)?(?:;\d(?:,\d?)?)*$ or ^\d+(?:,\d*)?(?:;\d+(?:,\d*)?)*$.

Related

Using RegEx, what's the best way to capture groups of digits, ignoring any whitespace in them

Given the following string...
ABC DEF GHI: 319 022 6543 QRS : 531 450
I'm trying to extract all ranges that start/end with a digit, and which may contain whitespace, but I want that whitespace itself removed.
For instance, the above should yield two results (since there are two 'ranges' that match what I aim looking for)...
3190226543
531450
My first thought was this, but this matches the spaces between the letters...
([\d\s])
Then I tried this, but it didn't seem to have any effect...
([\d+\s*])
This one comes close, but its grabbing the trailing spaces too. Also, this grabs the whitespace, but doesn't remove it.
(\d[\d\s]+)
If it's impossible to remove the spaces in a single statement, I can always post-process the groups if I can properly extract them. That most recent statement comes close, but how do I say it doesn't end with whitespace, but only a digit?
So what's the missing expression? Also, since sometimes people just post an answer, it would be helpful to explain out the RegEx too to help others figure out how to do this. I for one would love not just the solution, but an explanation. :)
Note: I know there can be some variations between RegEx on different platforms so that's fine if those differences are left up to the reader. I'm more interested in understanding the basic mechanics of the regex itself more so than the syntax. That said, if it helps, I'm using both Swift and C#.
You cannot get rid of whitespace from inside the match value within a single match operation. You will need to remove spaces as a post-processing step.
To match a string that starts with a digit and then optionally contains any amount of digits or whitespaces and then a digit you can use
\d(?:[\d\s]*\d)?
Details:
\d - a digit
(?:[\d\s]*\d)? - an optional non-capturing group matching
[\d\s]* - zero or more whitespaces / digits
\d - a digit.
See the regex demo.

How to validate Regex

Im having a hard time with grouping parts of a Regex. I want to validate a few things in a string that follows this format: I-XXXXXX.XX.XX.XX
Validate that the first set of 6 X's (I-xxxxxx.XX.XX.XX) does not contain characters and its length is no more than 6.
Validate that the third set of X's (I-XXXXXX.XX.xx.XX) does not contain characters and is only 1 or 2.
Now, I have already validation on the last set of XX's to make sure the numbers are 1-8 using
string pattern1 = #"^.+\.(0?[1-8])$";
Match match = Regex.Match(TxtWBS.Text, pattern1);
if (match.Success)
;
else
{ errMessage += "WBS invalid"; errMessage +=
Environment.NewLine; }
I just cant figure out how to target specific parts of the string. Any help would be greatly appreciated and thank you in advance!
You're having some trouble adding new validation to this string because it's very generic. Let's take a look at what you're doing:
^.+\.(0?[1-8])$
This finds the following:
^ the start of the string
.+ everything it can, other than a newline, basically jumping the engine's cursor to the end of your line
\. the last period in the string, because of the greedy quantifier in the .+ that comes before it
0? a zero, if it can
[1-8] a number between 1 and 8
()$ stores the two previous things in a group, and if the end of the string doesn't come after this, it may even backtrace and try the same thing from the second to last period instead, which we know isn't a great strategy.
This ends up matching a lot of weird stuff, like for example the string The number 0.1
Let's try patterning something more specific, if we can:
^I-(\d{6})\.(\d{2})\.(\d{1,2})\.([1-8]{2})$
This will match:
^I- an I and a hyphen at the start of the string
(\d{6}) six digits, which it stores in a capture group
\. a period. By now, if there was any other number of digits than six, the match fails instead of trying to backtrace all over the place.
(\d{2})\. Same thing, but two digits instead of six.
(\d{1,2})\. Same thing, the comma here meaning it can match between one and two digits.
([1-8]{2}) Two digits that are each between 1 and 8.
$ The end of the string.
I hope I understood what exactly you're trying to match here. Let me know if this isn't what you had in mind.
This regex:
^.-[0-9]{6}(\.[1-8]{1,2}){3}$
will validate the following:
The first character can be any character, but is of length 1
It is followed by a dash
The dash is followed by exactly 6 numbers 0 - 9. (If this could be less than 6 characters - for example, between 3 and 6 characters - just replace {6} with {3,6}).
This is followed by 3 groups of characters. Each of this groups are proceeded by a period, are of length 1 or 2, and can be any number 1 - 8.
An example of a valid string is:
I-587954.12.34.56
This is also valid:
I-587954.1.3.5
But this isn't:
I-587954.12.80.356
because the second-to-last group contains a 0, and because the last group is of length 3.
Pleas let me know if I have misunderstood any of the rules.
^I-([0-9]{1,6})\.(.{1,2})\.(0[1-2])\.(.{1,2})$
groups delimited by . (\.) :
([0-9]{1,6}) - 1-6 digits
(.{1,2}) - 1-2 any single character
(0[1-2]) - 01 or 02
(.{1,2}) - 1-2 any single character
you can write and easy test regex on your input data, just google "regex online"

regex find one or two digits but not three digits in a string

I would like to find the match for:
\024jack3hall2\c$
\024jack3hall02\c$
\024jack3hall12\c$
but not for:
\024jack3hall023\c$
difference is the number of digits in the end part. I would like to have only 1 or 2, not 3.
my try:
\\\\024[a-zA-Z0-9]+[0-9]{1,2}\\[a-zA-Z]{1}\$(?!.)
I tried only on http://regexr.com/ but will implement in C#.
Is it possible to edit my try or I have to write several separate checks?
Why is
{1,2}
not working? \024jack3hall12343\c$ is also matching,
From the examples you have shown, something as simple as:
[^\d](\d{1,2})\\
Should work. It will match 1 or 2 digits followed by a \ so long as it isn't proceeded by another digit.
The matched digits are in a capture group if you need them (or you can just remove the brackets if you don't need that).
As for your original effort, right here:
\\\\024[a-zA-Z0-9]+[0-9]{1,2}
You are matching 1 or more from the range a-z, A-Z or 0-9. So that will match your extra digits if they come at the end of that pattern.
Answer:
\\\\024[a-zA-Z0-9]+[^\d](\d{1,2})\\[a-zA-Z]{1}\$(?!.)
I believe you were not escaping backslash properly.
Here is the correct regex:
\\024[a-zA-Z0-9]+[0-9]{1,2}\\[a-zA-Z]{1}\$(?!.)

Need C# Regex to match a four digit sequence, but ignore any single digits peceeding

OK, I need to improve this question. Let me try this again:
I need to parse out a flight time which comes after an airport code, but may have a single digit and white space between the two.
Example data:
ORD 1100
HOU 1 1215
MAD 4 1300
I tried this:
([A-Z]{3})\s?\d?\s?(\d{4})
I end up with the airport code and a single digit.
I need a regex that will ignore everything after the airport code except the 4 digit flight time.
Hope I improved my question.
The solution might be as simple as:
\d{4}
According to your inputs you don't need to care about preceeding digits..
This is the answer I would use:
#"([A-Z]{3})\s+(?:[0-9]\s+)?([0-9]{4})"
Basically it is very similar to what you were attempting to do.
The first part is ([A-Z]{3}), which looks for 3 uppercase letters and assigns them to group 1 (Group 0 is the entire string).
The second part is \s+(?:[0-9]\s+)?, which requires at least one space, with the possibility of 1 digit in there somewhere. The noncapturing group in the middle requires that if there is a single digit there, it must be followed by at least 1 space. This prevents a mismatch for something like ABC 12345.
Next we have ([0-9]{4}), which simply matched the 4 digits you are looking for. These can be found in group 2. I use [0-9] here since \d refers to more digits than what we are used to (Like Eastern Arabic numerals).
Here's a little something, using lookbehind and lookahead to be sure there are only 4 digits, with non-digits (or beginning/end) surrounding them.
"(?<=[^\d]|^)\d{4}(?=[^\d]|$)"
The two [^\d] can be replaced with [\s] to only match 4-digits with whitespace around them.
Update:
With your latest update, I merged my regex with yours (from the comment) and came up with this:
"(?<=[A-Z]{3}\s(\d\s)?)\d{4}(?=\s|$)"
There are three parts to the pattern. First is the lookbehind: (?<=PatternHere). The pattern inside this must occur/match before what we seek.
The next part is our simple main pattern: \d{4}, four digits.
The last part is the lookahead: (?=PatternHere), which is pretty much the same as lookbehind, but checks the other side, forward.

C# Regex Validation

Can someone please validate this for me (newbie of regex match cons).
Rather than asking the question, I am writing this:
Regex rgx = new Regex (#"^{3}[a-zA-Z0-9](\d{5})|{3}[a-zA-Z0-9](\d{9})$"
Can someone telll me if it's OK...
The accounts I am trying to match are either of:
1. BAA89345 (8 chars)
2. 12345678 (8 chars)
3. 123456789112 (12 chars)
Thanks in advance.
You can use a Regex tester. Plenty of free ones online. My Regex Tester is my current favorite.
Is the value with 3 characters then followed by digits always starting with three... can it start with less than or more than three. What are these mins and max chars prior to the digits if they can be.
You need to place your quantifiers after the characters they are supposed to quantify. Also, character classes need to be wrapped in square brackets. This should work:
#"^(?:[a-zA-Z0-9]{3}|\d{3}\d{4})\d{5}$"
There are several good, automated regex testers out there. You may want to check out regexpal.
Although that may be a perfectly valid match, I would suggest rewriting it as:
^([a-zA-Z]{3}\d{5}|\d{8}|\d{12})$
which requires the string to match one of:
[a-zA-Z]{3}\d{5} three alpha and five numbers
\d{8} 8 digits or
\d{12} twelve digits.
Makes it easier to read, too...
I'm not 100% on your objective, but there are a few problems I can see right off the bat.
When you list the acceptable characters to match, like with a-zA-Z0-9, you need to put it inside brackets, like [a-zA-Z0-9] Using a ^ at the beginning will negate the contained characters, e.g. `[^a-zA-Z0-9]
Word characters can be matched like \w, which is equivalent to [a-zA-Z0-9_].
Quantifiers need to appear at the end of the match expression. So, instead of {3}[a-zA-Z0-9], you would need to write [a-zA-Z0-9]{3} (assuming you want to match three instances of a character that matches [a-zA-Z0-9]

Categories