This is something of a follow up from a previous question. The requirements have changed, and I'm looking for some help with coming up with regex for either a comma separated number or a number with no commas. Sample input is below, along with some failed attempts.
Any help is appreciated.
Acceptable input:
1,234,567
1234567
Unacceptable input:
1234,567
1,234567
Some failed attempts:
^(\d{1,3}(?:[,]\d{3})*)|((\d)*)$
^\d{1,3}((?:[,]?\d{3})*|(?:[,]\d{3})*)$
Original success:
^\d{1,3}(?:[,]\d{3})*$
Just add an "or one or more digits" to the end:
^(?:\d{1,3}(?:[,]\d{3})*|\d+)$
I think you had it almost right the first time and just didn't match up all your parentheses correctly.
Try this:
^\d{1,3}(?:(?:,\d{3})+|\d*)$
This will match any sequence that begins with one to three digits, followed by either
one or more segments of a comma followed by three digits, or
zero or more digits.
Why not use Int.TryParse() rathern then a regex?
Please see this answer for a definitive treatment of the “Is it a number?” question, including allowance for correctly comma-separated digit groups.
One thing i've noticed with all these is that the first bit of the regex allows for '0' based numbers to work. For example the number:
0,123,456
Would match using the accepted answer. I've been using:
((?<!\w)[+-]?[1-9][0-9]{,2}(?:,[0-9]{3})+)
Which also ensures that the number has nothing in front of it. It does not catch numbers of less then 1,000 however. This was to prevent ill-formatted numbers from being captured at all. If the final + were a ? the following numbers would be captured:
0,123
1,2 (as 2 separate numbers)
I have a strange set of numbers to match for (integers with commas, with spaces and without both), so i'm using pretty restrictive regexs to capture these groups.
Anyway, something to think about!
Related
I have a regex
/^([a-zA-Z0-9]+)$/
this just allows only alphanumerics but also if I insert only number(s) or only character(s) then also it accepts it. I want it to work like the field should accept only alphanumeric values but the value must contain at least both 1 character and 1 number.
Why not first apply the whole test, and then add individual tests for characters and numbers? Anyway, if you want to do it all in one regexp, use positive lookahead:
/^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+)$/
This RE will do:
/^(?:[0-9]+[a-z]|[a-z]+[0-9])[a-z0-9]*$/i
Explanation of RE:
Match either of the following:
At least one number, then one letter or
At least one letter, then one number plus
Any remaining numbers and letters
(?:...) creates an unreferenced group
/i is the ignore-case flag, so that a-z == a-zA-Z.
I can see that other responders have given you a complete solution. Problem with regexes is that they can be difficult to maintain/understand.
An easier solution would be to retain your existing regex, then create two new regexes to test for your "at least one alphabetic" and "at least one numeric".
So, test for this :-
/^([a-zA-Z0-9]+)$/
Then this :-
/\d/
Then this :-
/[A-Z]/i
If your string passes all three regexes, you have the answer you need.
The accepted answers is not worked as it is not allow to enter special characters.
Its worked perfect for me.
^(?=.*[0-9])(?=.*[a-zA-Z])(?=\S+$).{6,20}$
one digit must
one character must (lower or upper)
every other things optional
Thank you.
While the accepted answer is correct, I find this regex a lot easier to read:
REGEX = "([A-Za-z]+[0-9]|[0-9]+[A-Za-z])[A-Za-z0-9]*"
This solution accepts at least 1 number and at least 1 character:
[^\w\d]*(([0-9]+.*[A-Za-z]+.*)|[A-Za-z]+.*([0-9]+.*))
And an idea with a negative check.
/^(?!\d*$|[a-z]*$)[a-z\d]+$/i
^(?! at start look ahead if string does not
\d*$ contain only digits | or
[a-z]*$ contain only letters
[a-z\d]+$ matches one or more letters or digits until $ end.
Have a look at this regex101 demo
(the i flag turns on caseless matching: a-z matches a-zA-Z)
Maybe a bit late, but this is my RE:
/^(\w*(\d+[a-zA-Z]|[a-zA-Z]+\d)\w*)+$/
Explanation:
\w* -> 0 or more alphanumeric digits, at the beginning
\d+[a-zA-Z]|[a-zA-Z]+\d -> a digit + a letter OR a letter + a digit
\w* -> 0 or more alphanumeric digits, again
I hope it was understandable
What about simply:
/[0-9][a-zA-Z]|[a-zA-Z][0-9]/
Worked like a charm for me...
Edit following comments:
Well, some shortsighting of my own late at night: apologies for the inconvenience...
The - incomplete - underlying idea was that only one "transition" from a digit to an alpha or from an alpha to a digit was needed somewhere to answer the question.
But next regex should do the job for a string only comprised of alphanumeric characters:
/^[0-9a-zA-Z]*([0-9][a-zA-Z]|[a-zA-Z][0-9])[0-9a-zA-Z]*$/
which in Javascript can be furthermore simplified as:
/^[0-9a-z]*([0-9][a-z]|[a-z][0-9])[0-9a-z]*$/i
In IMHO it's more straigthforward to read and understand than some other answers (no backtraking and the like).
Hope this helps.
If you need the digit to be at the end of any word, this worked for me:
/\b([a-zA-Z]+[0-9]+)\b/g
\b word boundary
[a-zA-Z] any letter
[0-9] any number
"+" unlimited search (show all results)
OK, I need to improve this question. Let me try this again:
I need to parse out a flight time which comes after an airport code, but may have a single digit and white space between the two.
Example data:
ORD 1100
HOU 1 1215
MAD 4 1300
I tried this:
([A-Z]{3})\s?\d?\s?(\d{4})
I end up with the airport code and a single digit.
I need a regex that will ignore everything after the airport code except the 4 digit flight time.
Hope I improved my question.
The solution might be as simple as:
\d{4}
According to your inputs you don't need to care about preceeding digits..
This is the answer I would use:
#"([A-Z]{3})\s+(?:[0-9]\s+)?([0-9]{4})"
Basically it is very similar to what you were attempting to do.
The first part is ([A-Z]{3}), which looks for 3 uppercase letters and assigns them to group 1 (Group 0 is the entire string).
The second part is \s+(?:[0-9]\s+)?, which requires at least one space, with the possibility of 1 digit in there somewhere. The noncapturing group in the middle requires that if there is a single digit there, it must be followed by at least 1 space. This prevents a mismatch for something like ABC 12345.
Next we have ([0-9]{4}), which simply matched the 4 digits you are looking for. These can be found in group 2. I use [0-9] here since \d refers to more digits than what we are used to (Like Eastern Arabic numerals).
Here's a little something, using lookbehind and lookahead to be sure there are only 4 digits, with non-digits (or beginning/end) surrounding them.
"(?<=[^\d]|^)\d{4}(?=[^\d]|$)"
The two [^\d] can be replaced with [\s] to only match 4-digits with whitespace around them.
Update:
With your latest update, I merged my regex with yours (from the comment) and came up with this:
"(?<=[A-Z]{3}\s(\d\s)?)\d{4}(?=\s|$)"
There are three parts to the pattern. First is the lookbehind: (?<=PatternHere). The pattern inside this must occur/match before what we seek.
The next part is our simple main pattern: \d{4}, four digits.
The last part is the lookahead: (?=PatternHere), which is pretty much the same as lookbehind, but checks the other side, forward.
I'm currently writing a library where I wish to allow the user to be able to specify spreadsheet cell(s) under four possible alternatives:
A single cell: "A1";
Multiple contiguous cells: "A1:B10"
Multiple separate cells: "A1,B6,I60,AA2"
A mix of 2 and 3: "B2:B12,C13:C18,D4,E11000"
Then, to validate whether the input respects these formats, I intended to use a regular expression to match against. I have consulted this article on Wikipedia:
Regular Expression (Wikipedia)
And I also found this related SO question:
regex matching alpha character followed by 4 alphanumerics.
Based on the information provided within the above-linked articles, I would try with this Regex:
Default Readonly Property Cells(ByVal cellsAddresses As String) As ReadOnlyDictionary(Of String, ICell)
Get
Dim validAddresses As Regex = New Regex("A-Za-z0-9:,A-Za-z0-9")
If (Not validAddresses.IsMatch(cellsAddresses)) then _
Throw New FormatException("cellsAddresses")
// Proceed with getting the cells from the Interop here...
End Get
End Property
Questions
1. Is my regular expression correct? If not, please help me understand what expression I could use.
2. What exception is more likely to be the more meaningful between a FormatException and an InvalidExpressionException? I hesitate here, since it is related to the format under which the property expect the cells to be input, aside, I'm using an (regular) expression to match against.
Thank you kindly for your help and support! =)
I would try this one:
[A-Za-z]+[0-9]+([:,][A-Za-z]+[0-9]+)*
Explanation:
Between [] is a possible group of characters for a single position
[A-Za-z] means characters (letters) from 'A' to 'Z' and from 'a' to 'z'
[0-9] means characters (digits) from 0 to 9
A "+" appended to a part of a regex means: repeat that one or more times
A "*" means: repeat the previous part zero or more times.
( ) can be used to define a group
So [A-Za-z]+[0-9]+ matches one or more letters followed by one or more digits for a single cell-address.
Then that same block is repeated zero or more times, with a ',' or ':' separating the addresses.
Assuming that the column for the spreadsheet is any 1- or 2-letter value and the row is any positive number, a more complex but tighter answer still would be:
^[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?(,[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?)*$
"[A-Z]{1,2}[1-9]\d*" is the expression for a single cell reference. If you replace "[A-Z]{1,2}[1-9]\d*" in the above with then the complex expression becomes
^<cell>(:<cell>)?(,<cell>(:<cell>*)?)*$
which more clearly shows that it is a cell or a range followed by one or more "cell or range" entries with commas in between.
The row and column indicators could be further refined to give a tighter still, yet more complex expression. I suspect that the above could be simplified with look-ahead or look-behind assertions, but I admit those are not (yet) my strong suit.
I'd go with this one, I think:
(([A-Z]+[1-9]\d*:)?[A-Z]+[1-9]\d*,)*([A-Z]+[1-9]\d*:)?[A-Z]+[1-9]\d*
This only allows capital letters as the prefix. If you want case insensitivity, use RegexOptions.IgnoreCase.
You could simplify this by replacing [A-Z]+[1-9]\d* with plain old [A-Z]\d+, but that will only allow a one-letter prefix, and it also allows stuff like A0 and B01. Up to you.
EDIT:
Having thought hard about DocMax's mention of lookarounds, and using Hans Kesting's answer as inspiration, it occurs to me that this should work:
^[A-Z]+\d+((,|(?<!:\w*):)[A-Z]+\d+)*$
Or if you want something really twisted:
^([A-Z]+\d+(,|$|(?<!:\w*):))*(?<!,|:)
As in the previous example, replace \d+ with [1-9]\d* if you want to prevent leading zeros.
The idea behind the ,|(?<!\w*:): is that if a group is delimited by a comma, you want to let it through; but if it's a colon, it's only allowed if the previous delimiter wasn't a colon. The (,|$|...) version is madness, but it allows you to do it all with only one [A-Z]+\d+ block.
However! Even though this is shorter, and I'll admit I feel a teeny bit clever about it, I pity the poor fellow who has to come along and maintain it six months from now. It's fun from a code-golf standpoint, but I think it's best for practical purposes to go with the earlier version, which is a lot easier to read.
i think your regex is incorrect, try (([A-Za-z0-9]*)[:,]?)*
Edit : to correct the bug pointed out by Baud : (([A-Za-z0-9]*)[:,]?)*([A-Za-z0-9]+)
and finally - best version : (([A-Za-z]+[0-9]+)[:,]?)*([A-Za-z]+[0-9]+)
// ah ok this wont work probably... but to answer 1. - no i dont think your regex is correct
( ) form a group
[ ] form a charclass (you can use A-Z a-d 0-9 etc or just single characters)
? means 1 or 0
* means 0 or any
id suggest reading http://www.regular-expressions.info/reference.html .
thats where i learned regexes some time ago ;)
and for building expressions i use Rad Software Regular Expression Designer
Let's build this step by step.
If you are following an Excel addressing format, to match a single-cell entry in your CSL, you would use the regular expression:
[A-Z]{1,2}[1-9]\d*
This matches the following in sequence:
Any character in A to Z once or twice
Any digit in 1 to 9
Any digit zero or more times
The digit expression will prevent inputting a cell address with leading zeros.
To build the expression that allows for a cell address pair, repeat the expression preceded by a colon as optional.
[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?
Now allow for repeating the pattern preceded by a comma zero or more times and add start and end string delimiters.
^[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?(,[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?)*$
Kind of long and obnoxious, I admit, but after trying enough variants, I can't find a way of shortening it.
Hope this is helpful.
I'm, quite frankly, completely clueless about Regular expression, more so building them. I am reading in a string that could contain any sort of combination of characters and numbers. What I know for certain is, somewhere in the string, there will be a number followed by % (1%, 13% etc.), and I want to extract that number from the string.
Examples are;
[05:37:25] Completed 21% //want to extract 21
[05:32:34] Completed 18000000 out of 50000000 steps (36%). //want to extract 36
I'm guessing I should be using either regex.Replace or regex.Split, but beyond that, I'm not sure. Any help would be appreciated.
You should be able to use something like "(\d+)%". This will match any number of consecutive digit characters, then a percent sign, and will capture the actual number so you can extract and parse it. Use this in Regex.Match(), and browse the Matches array of the result (I think it'll be the second element in the array, index 1).
If you need a decimal point, use "(\d+(\.\d+)?)%", which will match a string of digits, followed by a decimal point, then another set of digits.
The regex you want is:
/(\d+)%/
This will capture any number of digits immediately preceding a percentage sign.
([\d]+)(%)
The parentheses will group the result.
The [\d]+ gives you any digit, repeated one or more times.
The "%" is just a literal.
You will need to make sure you extract only the first grouping. Also, you will need to be sure that there are no other instances of "<number>%" in the line.
I'm not entirely sure how to make this C# specific, but I'm sure you can figure that out. :-P
Most likely you will need to use double-backslashes (\\) where I only had one.
What would be the following regular expressions for the following strings?
56AAA71064D6
56AAA7105A25
Would the regular expression change if the numbers rolled over? What I mean by this is that the above numbers happen to contain hexadecimal values and I don't know how the value changes one it reaches F. Using the first one as an example: 56AAA71064D6, if this went up to
56AAA71064F6 and then the following one would become 56AAA7106406, this would create a different regular expression because where a letter was allowed, now their is a digit, so does this make the regular expression even more difficult. Suggestions?
A manufacturer is going to enter a range of serial numbers. The problems are that different manufacturers have different formats for serial numbers (some are just numbers, some are alpha numeric, some contain extra characters like dashes, some contain hexadacimal values which makes it more difficult because I don't know how the roll over to the next serial number). The roll over issue is the biggest problem because the serial numbers are entered as a range like 5A1B - 6F12 and without knowing how the roll over, it seems to me that storing them in the database is not as easy. I was going to have the option of giving the user the option to input the pattern (expression) and storing that in the databse, but if a character or characters changes from a digit to a letter or vice versa, then the regular expression is no longer valid for certain serial numbers.
Also, the above example I gave is with just one case. There are multitude of serial numbers that would contain different expressions.
There's no single regular expression which is "the" expression to match both of those strings. Instead, there are infinitely many which will do so. Here are two options at opposite ends of the spectrum:
(56AAA71064D6)|(56AAA7105A25)
.*
The first will only match those two strings. The second will match anything. Both satisfy all the criteria you've given.
Now, if you specify more criteria, then we'd be able to give a more reasonable idea of the regular expression to provide - and that will drive the answers to the other questions. (At the moment, the only answer that makes sense is "It depends on what regex you use.")
I think you could do it this way for 12 characters. This will search for a 12 character phrase where each of the characters must be a capital (A or B or C or D or E or F or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 0)
[A-F0-9]{12}
If you're wanting to include the possibility of dashes then do this.
[A-F0-9\-]{12}
Or you're wanting to include the possibility of dashes plus the 12 characters then do this. But that would pick up any 12-15 character item that fit the criteria though.
[A-F0-9\-]{12,15}
Or if it's surrounded by spaces (AAAAHHHh...SO is stripping out my spaces!!!)
[A-F0-9\-]{12}
Or if it's surrounded by tabs
\t[A-F0-9\-]{12}\t
This match a string that contains 12 hexa
[0-9A-F]{12}
Assuming these are all 12-digit hexadecimal numbers, which it looks like they are, the following regex should work:
[0-9A-Fa-f]{12}
Here I'm using a character class to say that I want any digit, OR A-F, OR a-f. As a bonus I'm allowing lowercase letters; if you don't want those just get them out of the regex.
As Jon Skeet and others have said, you really didn't provide enough information, so if you don't like this answer please understand that I was doing the best I can with what information you provided.
So, how about this:
[0-9A-F]{12}
Well it sounds like you're describing a 12 digit hexadecimal number:
^[A-F0-9]{12}$