Im having a hard time with grouping parts of a Regex. I want to validate a few things in a string that follows this format: I-XXXXXX.XX.XX.XX
Validate that the first set of 6 X's (I-xxxxxx.XX.XX.XX) does not contain characters and its length is no more than 6.
Validate that the third set of X's (I-XXXXXX.XX.xx.XX) does not contain characters and is only 1 or 2.
Now, I have already validation on the last set of XX's to make sure the numbers are 1-8 using
string pattern1 = #"^.+\.(0?[1-8])$";
Match match = Regex.Match(TxtWBS.Text, pattern1);
if (match.Success)
;
else
{ errMessage += "WBS invalid"; errMessage +=
Environment.NewLine; }
I just cant figure out how to target specific parts of the string. Any help would be greatly appreciated and thank you in advance!
You're having some trouble adding new validation to this string because it's very generic. Let's take a look at what you're doing:
^.+\.(0?[1-8])$
This finds the following:
^ the start of the string
.+ everything it can, other than a newline, basically jumping the engine's cursor to the end of your line
\. the last period in the string, because of the greedy quantifier in the .+ that comes before it
0? a zero, if it can
[1-8] a number between 1 and 8
()$ stores the two previous things in a group, and if the end of the string doesn't come after this, it may even backtrace and try the same thing from the second to last period instead, which we know isn't a great strategy.
This ends up matching a lot of weird stuff, like for example the string The number 0.1
Let's try patterning something more specific, if we can:
^I-(\d{6})\.(\d{2})\.(\d{1,2})\.([1-8]{2})$
This will match:
^I- an I and a hyphen at the start of the string
(\d{6}) six digits, which it stores in a capture group
\. a period. By now, if there was any other number of digits than six, the match fails instead of trying to backtrace all over the place.
(\d{2})\. Same thing, but two digits instead of six.
(\d{1,2})\. Same thing, the comma here meaning it can match between one and two digits.
([1-8]{2}) Two digits that are each between 1 and 8.
$ The end of the string.
I hope I understood what exactly you're trying to match here. Let me know if this isn't what you had in mind.
This regex:
^.-[0-9]{6}(\.[1-8]{1,2}){3}$
will validate the following:
The first character can be any character, but is of length 1
It is followed by a dash
The dash is followed by exactly 6 numbers 0 - 9. (If this could be less than 6 characters - for example, between 3 and 6 characters - just replace {6} with {3,6}).
This is followed by 3 groups of characters. Each of this groups are proceeded by a period, are of length 1 or 2, and can be any number 1 - 8.
An example of a valid string is:
I-587954.12.34.56
This is also valid:
I-587954.1.3.5
But this isn't:
I-587954.12.80.356
because the second-to-last group contains a 0, and because the last group is of length 3.
Pleas let me know if I have misunderstood any of the rules.
^I-([0-9]{1,6})\.(.{1,2})\.(0[1-2])\.(.{1,2})$
groups delimited by . (\.) :
([0-9]{1,6}) - 1-6 digits
(.{1,2}) - 1-2 any single character
(0[1-2]) - 01 or 02
(.{1,2}) - 1-2 any single character
you can write and easy test regex on your input data, just google "regex online"
Related
I'm very new to regex And I'm trying to use a regular expression to turn a credit card number which will be part of a conversation into something like 492900******2222
As it can come from any conversation it might contain string next to it or might have an inconsistent format, so essentially all of the below should be formatted to the example above:
hello my number is492900001111222
number is 4929000011112222ok?
4929 0000 1111 2222
4929-0000-1111-2222
It needs to be a regular expression which extracts the capture group of which I will then be able to use a MatchEvaluator to turn all digits (excluding non digits) which are not the first 6 and last 4 into a *
I've seen many examples here on stack overflow for PHP and JS but none which helps me resolve this issue.
Any guidance will be appreciated
UPDATE
I need to expand upon an existing implementation which uses MatchEvaluator to mask each character that is not the first 6 or last 4 and ideally I dont want to change the MatchEvaluator and just make the masking flexible based on the regular expression, see this for an example https://dotnetfiddle.net/J2LCo0
UPDATE 2
#Matt.G and #CAustin answers do resolve what I asked for but I am hitting another barrier where I cant have it be so strict. The final captured group needs to only take into account the digits and as such maintain the format of the input text.
So for example:
If some types in my card number is 99 9988 8877776666 the output from the evaluation should be 99 9988 ******666666
OR
my card number is 9999-8888-7777-6666 it should output 9999-88**-****-6666.
Is this possible?
Changed the list to include items that are in my unit tests https://dotnetfiddle.net/tU6mxQ
Try Regex: (?<=\d{4}\d{2})\d{2}\d{4}(?=\d{4})|(?<=\d{4}( |-)\d{2})\d{2}\1\d{4}(?=\1\d{4})
Regex Demo
C# Demo
Explanation:
2 alternative regexes
(?<=\d{4}\d{2})\d{2}\d{4}(?=\d{4}) - to handle cardnumbers without any separators (- or <space>)
(?<=\d{4}( |-)\d{2})\d{2}\1\d{4}(?=\1\d{4}) - to handle cardnumbers with separator (- or <space>)
1st Alternative (?<=\d{4}\d{2})\d{2}\d{4}(?=\d{4})
Positive Lookbehind (?<=\d{4}\d{2}) - matches text that has 6 digits immediately behind it
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
Positive Lookahead (?=\d{4}) - matches text that is followed immediately by 4 digits
Assert that the Regex below matches
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
2nd Alternative (?<=\d{4}( |-)\d{2})\d{2}\1\d{4}(?=\1\d{4})
Positive Lookbehind (?<=\d{4}( |-)\d{2}) - matches text that has (4 digits followed by a separator followed by 2 digits) immediately behind it
1st Capturing Group ( |-) - get the separator as a capturing group, this is to check the next occurence of the separator using \1
\1 matches the same text as most recently matched by the 1st capturing group (separator, in this case)
Positive Lookahead (?=\1\d{4}) - matches text that is followed by separator and 4 digits
If performance is a concern, here's a pattern that only goes through 94 steps, instead of the other answer's 473, by avoiding lookaround and alternation:
\d{4}[ -]?\d{2}\K\d{2}[ -]?\d{4}
Demo: https://regex101.com/r/0XMluq/4
Edit: In C#'s regex flavor, the following pattern can be used instead, since C# allows variable length lookbehind.
(?<=\d{4}[ -]?\d{2})\d{2}[ -]?\d{4}
Demo
I am trying to validate an input with a regular expression. Up until now all my tests fail and as my experience with regex is limited I thought someone might be able to help me out.
Pattern: digit (possibly "," digit) (possibly ;)
A String may not begin with a ; and not end with a ;.
Digits are allowed to stand alone or with
My regEx (not working): ((\d)(,\d)?)(;?) the problem is it does not seem to check until the end of the string. Also the optional parts are giving me headaches.
Update: ^[0-9]+(,[0-9])?(;[0-9]+(,[0-9])?)+$this seems to work better but it does not match the single digit.
OK:
2,3;4,4;3,2
2,3
2
2,3;3;4,3
NOK:
2,3,,,,
2,3asfafafa
;2,3
2,3;;3,4
2,3;3,4;
Your ^[0-9]+(,[0-9])?(;[0-9]+(,[0-9])?)+$ regex matches 1 or more digits, then an optional sequence of , and 1 digit, followed with one or more similar sequences.
You need to match zero or more comma-separated numbers:
^\d+(?:,\d+)?(?:;\d+(?:,\d+)?)*$
^
See the regex demo
Now, tweaking part:
If only single-digit numbers should be matched, use ^\d(?:,\d)?(?:;\d(?:,\d)?)*$
If the comma-separated number pairs can have the second element empty, add ? after each ,\d (if single digit numbers are to be matched) or * (if the numbers can have more than one digit): ^\d(?:,\d?)?(?:;\d(?:,\d?)?)*$ or ^\d+(?:,\d*)?(?:;\d+(?:,\d*)?)*$.
I'm trying to write a regular expression using C#/.Net that matches 1-4 alphanumerics followed by spaces, followed by 10 digits. The catch is the number of spaces plus the number of alphanumerics must equal 4, and the spaces must follow the alphanumerics, not be interspersed.
I'm at a total loss as to how to do this. I can do ^[A-Za-z\d\s]{1,4}[\d]{10}$, but that lets the spaces fall anywhere in the first four characters. Or I could do ^[A-Za-z\d]{1,4}[\s]{0,3}[\d]{10}$ to keep the spaces together, but that would allow more than a total of four characters before the 10 digit number.
Valid:
A12B1234567890
AB1 1234567890
AB 1234567890
Invalid:
AB1 1234567890 (more than 4 characters before the numbers)
A1B1234567890 (less than 4 characters before the numbers)
A1 B1234567890 (space amidst the first 4 characters instead of at the end)
You can force the check with a look-behind (?<=^[\p{L}\d\s]{4}) that will ensure there are four allowed characters before the 10-digits number:
^[\p{L}\d]{1,4}\s{0,3}(?<=^[\p{L}\d\s]{4})\d{10}$
^^^^^^^^^^^^^^^^^^^^
See demo
If you do not plan to support all Unicode letters, just replace \p{L} with [a-z] and use RegexOptions.IgnoreCase.
Here's the regex you need:
^(?=[A-Za-z0-9 ]{4}\d{10}$)[A-Za-z0-9]{1,4} *\d{10}$
It uses a lookahead (?= ) to test if it's followed by 4 chars, either alnum or space, and then it goes back to where it was (the beggining of string, not consuming any chars).
Once that condition is met, the rest is a expression quite similar to what you were trying ([A-Za-z0-9]{1,4} *\d{10}).
Online tester
I know this is dumb, but must work exactly as required.
^[A-Za-z\d]([A-Za-z\d]{3}|[A-Za-z\d]{2}\s|[A-Za-z\d]\s{2}|\s{3})[\d]{10}$
Not sure what you are looking for, but perhaps:
^(?=.{14}$)[A-Za-z0-9]{1,4} *\d{10}
demo
Try this:
Doesn't allow char/space/char combination and starts with a char:
/\b(?!\w\s{1,2}\w+)\w(\w|\s){3}\d{10}/gm
https://regex101.com/r/fF2tR8/2
I am very new to Regular expression so I apologise for the 'noobyness' of the question...
I need to match a pattern for an ID we use at work.
So far the only specification for the pattern is that it will be 9 characters long, and comprised of Capital letters and digits. The ID can contain 1 or any number of Capital letters or digits, so long as the total length of the string is 9 characters long.
So far i have the following... [A-Z][0-9]{9}
this does not makes sure the string has atleast one letter or digit (so a 9 character long string would pass)... Alos, Im sure it matched a 9 letter word made of non capitals.
I have done a fair bit of googling, but i have not found anything dumbed down enough for me to understand.
Any help very apppreciated :)
Thanks
EDIT: Just to recap the requirements - The id has to be 9 characters long, no more no less. It will be comprised of capital letters and digits. There can be any amount of either letter or digit, so long as the id contains atleast one of each (so BH98T6YUO or R3DBLUEEE or 1234R6789
I will also post my code to make sure that bits not wrong... ??
string myRegex = "A ton of different combinations that i have tried";
Regex re = new Regex(myRegex);
// stringCombos is a List<string> containing all my strings
// The strings contain within them, my id
// I am attempting to pull out this id
// the below is just to print out all found matches for each string in the list
foreach (string s in stringCombos)
{
MatchCollection mc = re.Matches(s);
Console.WriteLine("-------------------------");
Console.Write(s);
Console.WriteLine(" --- was split into the following:");
foreach (Match mt in mc)
{
Console.WriteLine(mt.ToString());
}
}
You actually have to learn regular expressions as a language. The curve is kind of steep, but there are a ton of excellent tutorials for the basics. Also, you might get this in a chat situation (SO has a chat functionality) - that is how I originally learnt them...
I think this might work for your case:
[A-Z0-9]{1,9}
According to your update, for exactly 9 elements, use:
[A-Z0-9]{9}
Note, though, that the requirement to include at least one letter and at least one digit is not expressed in this solution. An easy way to do that would be to apply a second and third match to the first one:
[0-9]*[A-Z][0-9]*
[A-Z]*[0-9][A-Z]*
Thereby matching three times. You might be able to get this result with the fancy forward and backward reference stuff, but you cannot really capture that requirement with a regular grammar.
You need to match the start and then end of the string using ^ and $, this means that it will match 9 characters and not 10
^[0-9A-Z]$
You aren't exact clear on the requirements the above match will match 9 characters either capital or numeric.
You may find Expresso useful for trying out your expressions.
EDIT (With new requirements) if you are requiring a minimum of 1 Uppercase character you could use the following.
\b[0-9A-Z]{8}(?:(?<=.*[A-Z].*)[0-9]|(?<=.*[0-9].*)[A-Z])\b
Breakdown
\b Match a word boundry
[0-9A-Z]{8} 8 Chars that are either uppercase or numbers
(?: Begins a non capturing group, this is to enclose the or condition
(?<=.*[A-Z].*)[0-9] This basically matches [0-9] aslong as there is an A-Z somewhere before it in the first [0-9A-Z]{8} capture
| OR
(?<=.*[0-9].*)[A-Z] This basically matches [A-Z] aslong as there is an 0-9 somewhere before it in the first [0-9A-Z]{8} capture
) close non capturing group
\b match a word boundry
Basically do a match on the first 8 chars and then if the 9th char is a digit then there has to be an uppercase in the first 8, if the 9th is an A-Z then there has to be a digit in the first 8
The new edited version will now find ID's that appear within the string rather than requiring the string exactly match them.
I'm trying to write something that format Brazilian phone numbers, but I want it to do it matching from the end of the string, and not the beginning, so it would turn input strings according to the following pattern:
"5135554444" -> "(51) 3555-4444"
"35554444" -> "3555-4444"
"5554444" -> "555-4444"
Since the begining portion is what usually changes, I thought of building the match using the $ sign so it would start at the end, and then capture backwards (so I thought), replacing then by the desired end format, and after, just getting rid of the parentesis "()" in front if they were empty.
This is the C# code:
s = "5135554444";
string str = Regex.Replace(s, #"\D", ""); //Get rid of non digits, if any
str = Regex.Replace(str, #"(\d{0,2})(\d{0,4})(\d{1,4})$", "($1) $2-$3");
return Regex.Replace(str, #"^\(\) ", ""); //Get rid of empty () at the beginning
The return value was as expected for a 10 digit number. But for anything less than that, it ended up showing some strange behavior. These were my results:
"5135554444" -> "(51) 3555-4444"
"35554444" -> "(35) 5544-44"
"5554444" -> "(55) 5444-4"
It seems that it ignores the $ at the end to do the match, except that if I test with something less than 7 digits it goes like this:
"554444" -> "(55) 444-4"
"54444" -> "(54) 44-4"
"4444" -> "(44) 4-4"
Notice that it keeps the "minimum" {n} number of times of the third capture group always capturing it from the end, but then, the first two groups are capturing from the beginning as if the last group was non greedy from the end, just getting the minimum... weird or it's me?
Now, if I change the pattern, so instead of {1,4} on the third capture I use {4} these are the results:
str = Regex.Replace(str, #"(\d{0,2})(\d{0,4})(\d{4})$", "($1) $2-$3");
"5135554444" -> "(51) 3555-4444" //As expected
"35554444" -> "(35) 55-4444" //The last four are as expected, but "35" as $1?
"54444" -> "(5) -4444" //Again "4444" in $3, why nothing in $2 and "5" in $1?
I know this is probably some stupidity of mine, but wouldn't it be more reasonable if I want to capture at the end of the string, that all previous capture groups would be captured in reverse order?
I would think that "54444" would turn into "5-4444" in this last example... then it does not...
How would one accomplish this?
(I know maybe there's a better way to accomplish the very same thing using different approaches... but what I'm really curious is to find out why this particular behavior of the Regex seems odd. So, the answer tho this question should focus on explaining why the last capture is anchored at the end of the string, and why the others are not, as demonstrated in this example. So I'm not particularly interested in the actual phone # formatting problem, but to understand the Regex sintax)...
Thanks...
So you want the third part to always have four digits, the second part zero to four digits, and the first part zero to two digits, but only if the second part contains four digits?
Use
^(\d{0,2}?)(\d{0,4})(\d{4})$
As a C# snippet, commented:
resultString = Regex.Replace(subjectString,
#"^ # anchor the search at the start of the string
(\d{0,2}?) # match as few digits as possible, maximum 2
(\d{0,4}) # match up to four digits, as many as possible
(\d{4}) # match exactly four digits
$ # anchor the search at the end of the string",
"($1) $2-$3", RegexOptions.IgnorePatternWhitespace);
By adding a ? to a quantifier (??, *?, +?, {a,b}?) you make it lazy, i. e. tell it to match as few characters as possible while still allowing an overall match to be found.
Without the ? in the first group, what would happen when trying to match 123456?
First, the \d{0,2} matches 12.
Then, the \d{0,4} matches 3456.
Then, the \d{4} doesn't have anything left to match, so the regex engine backtracks until that's possible again. After four steps, the \d{4} can match 3456. The \d{0,4} gives up everything it had matched greedily for this.
Now, an overall match has been found - no need to try any more combinations. Therefore, the first and third groups will contain parts of the match.
You have to tell it that it's OK if the first matching groups aren't there, but not the last one:
(\d{0,2}?)(\d{0,4}?)(\d{1,4})$
Matches your examples properly in my testing.