Regular expression where part of string must be number between 0-100 - c#

I need to validate serial numbers. For this we use regular expressions in C#, and a certain product, part of the serial number is the "seconds since midnight". There are 86400 seconds in a day, but how can I validate it as a 5-digit number in this string?:
654984051-86400-231324
I can't use this concept:
[0-8][0-6][0-4][0-0][0-0]
Because then 86399 wouldn't be valid. How can I overcome this? I want something like:
[00000-86400]
UPDATE
I want to make it clear that I'm aware of - and agree with - the "don't use regular expressions when there's a simpler way" school-of-thought. Jason's answer is exactly how I'd like to do it, however this serial number validation is for all serial numbers that pass through our system - there's currently no custom validation code for these specific ones. In this case I have a good reason for looking for a regex solution.
Of course, if there isn't one, then that makes the case for custom validation for these particular products undeniable, but I wanted to explore this avenue fully before going with a solution that requires code changes.

Don't use regex? If you're struggling to come up with the regex to parse this that says that maybe it's too complex and you should find something simpler. I see absolutely no benefit to using regex here when a simple
int value;
if(!Int32.TryParse(s, out value)) {
throw new ArgumentException();
}
if(value < 0 || value > 86400) {
throw new ArgumentOutOfRangeException();
}
will work just fine. It's just so clear and easily maintainable.

You don't want to try to use regular expressions for this, you'll end up with something incomprehensible, unwieldy, and difficult to modify (somebody will probably suggest one :). What you want to do is match the string using a regex to make sure that it contains digits in the format you want, then pull out a matching group and check the range using an arithmetic comparison. For example, in pseudocode:
match regex /(\d+)-(\d+)-(\d+)/
serial = capture group 2
if serial >= 0 and serial <= 86400 then
// serial is valid
end if

Generate a Regular Expression to Match an Arbitrary Numeric Range
http://utilitymill.com/utility/Regex_For_Range
yields the following regex expression:
\b0*([0-9]{1,4}|[1-7][0-9]{4}|8[0-5][0-9]{3}|86[0-3][0-9]{2}|86400)\b
Description of output:
First, break into equal length ranges:
0 - 9
10 - 99
100 - 999
1000 - 9999
10000 - 86400
Second, break into ranges that yield simple regexes:
0 - 9
10 - 99
100 - 999
1000 - 9999
10000 - 79999
80000 - 85999
86000 - 86399
86400 - 86400
Turn each range into a regex:
[0-9]
[1-9][0-9]
[1-9][0-9]{2}
[1-9][0-9]{3}
[1-7][0-9]{4}
8[0-5][0-9]{3}
86[0-3][0-9]{2}
86400
Collapse adjacent powers of 10:
[0-9]{1,4}
[1-7][0-9]{4}
8[0-5][0-9]{3}
86[0-3][0-9]{2}
86400
Combining the regexes above yields:
0*([0-9]{1,4}|[1-7][0-9]{4}|8[0-5][0-9]{3}|86[0-3][0-9]{2}|86400)
Tested here:
http://osteele.com/tools/rework/

With the standard 'this-is-not-a-particularly-regexy-problem' caveat,
[0-7]\d{4}|8[0-5]\d{3}|86[0-3]\d{2}|86400

If you really need a pure regex solution I believe this would work although the other posters make a good point about only validating they are digits and then using a matching group to validate the actual number.
([0-7][0-9]{4}) | (8[0-5][0-9]{3}) | (86[0-3][0-9]{2}) | (86400)

I would use regex combined with some .NET code to accomplish this. A pure regex solution isn't going to be easy or efficient to handle large number ranges.
But this will:
Regex myRegex = new Regex(#"\d{9}-(\d{5})-\d{6}");
String value = myRegex.Replace(#"654984051-86400-231324", "$1");
This will grab the value 86400 in this case. And then you'd just check if the captured number is between 0 and 86400 as per Jason's answer.

I don't believe this is possible in regular expressions since this isn't something that can be checked as part of a regular language. In other words, a finite state automata machine cannot recognize this string so a regular expression cannot either.
Edit: This can be recognized by a regex, but not in an elegant way. It would require a monster or chain (e.g.: 00000|00001|00002 or 0{1,5}|0{1,4}1|0{1,4}2). To me, having to enumerate such a large set of possibilities makes it clear that while it is technically possible, it is not feasible or manageable.

Related

Find matching sequences that have exactly one deviation

I'm trying to implement a feature which takes a sequence of digits (U.S. Social Security Numbers) as an argument and returns a collection of SSNs which match the input except for exactly one deviation.
So, the input 123456789 would return:
123356789
193456789
123450789
But would not return 123546789, etc.
I have a system in ASP.NET which does pattern matches on inputs with wildcards, like 123**6789. So I could adapt that, using a loop, to this. But if there was a single regex for this, I would just implement it in SQL and be done with it.
So, is there a regex that will do this without having to call it in a for loop?
Unfortunately I'm not specialist in regular expressions but I think that you can use simple regular expression to check absolute value of subtraction between testing value and your input value. Acceptable result should be exactly 1 digit and 0 or more trailing zeroes. For your example values:
Testing value Input value ABS(Subtraction)
--------------+--------------+------------------
123356789 123456789 100000
193456789 123456789 70000000
123450789 123456789 6000
123456786 123456789 3

Regex Specific Date Format

I was wondering if somebody could point me to a regex code that validates for this: ####/##/##
Example: 1990/05/25
The second number 0 can only be 0 or 1 and the number 2 and only be 0, 1, 2, or 3. Other than that all other numbers in this set are allowed (0-9).
The code should validate that there is only 9 or 10 characters in total including the slashes.
If you only want to validate this format, you can use a regex like...
^\d{1,4}\/[01]?\d\/[0-3]\d$
I tested it a bit on some dates here.
This will match:
1990/01/01
2012/13/34
2013/1/39
9999/0/00
But reject:
23121/32/44
12/05/013
013/000/00
If you want to reject invalid dates as well such as 2013/02/29, you can check out this thread.
Try this (edit following Jerry)
[0-2][0-9]{3,3}/[0|1][0-9]/[0-3][0-9]
Mess about with the {a,b} notation to change the length of the general digits, it means between a and b of the preceding expression inclusive. It's unclear in your question where you want the digit flexibility to be.
E.g. to emit 2013/5/29, use
[0-2][0-9]{3,3}/[0|1]{0,1}[0-9]/[0-3][0-9]
For all things regex I have found this website to be an invaluable resource. http://www.regular-expressions.info/reference.html
Specifically this page should get you what you need and contains a full explanation of how to go about validating date input format (not value) via Regular Expressions.
http://www.regular-expressions.info/dates.html
^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$
would match
yyyy-mm-dd

What is the regular expression for the following strings and would the expression change if the number rolled over?

What would be the following regular expressions for the following strings?
56AAA71064D6
56AAA7105A25
Would the regular expression change if the numbers rolled over? What I mean by this is that the above numbers happen to contain hexadecimal values and I don't know how the value changes one it reaches F. Using the first one as an example: 56AAA71064D6, if this went up to
56AAA71064F6 and then the following one would become 56AAA7106406, this would create a different regular expression because where a letter was allowed, now their is a digit, so does this make the regular expression even more difficult. Suggestions?
A manufacturer is going to enter a range of serial numbers. The problems are that different manufacturers have different formats for serial numbers (some are just numbers, some are alpha numeric, some contain extra characters like dashes, some contain hexadacimal values which makes it more difficult because I don't know how the roll over to the next serial number). The roll over issue is the biggest problem because the serial numbers are entered as a range like 5A1B - 6F12 and without knowing how the roll over, it seems to me that storing them in the database is not as easy. I was going to have the option of giving the user the option to input the pattern (expression) and storing that in the databse, but if a character or characters changes from a digit to a letter or vice versa, then the regular expression is no longer valid for certain serial numbers.
Also, the above example I gave is with just one case. There are multitude of serial numbers that would contain different expressions.
There's no single regular expression which is "the" expression to match both of those strings. Instead, there are infinitely many which will do so. Here are two options at opposite ends of the spectrum:
(56AAA71064D6)|(56AAA7105A25)
.*
The first will only match those two strings. The second will match anything. Both satisfy all the criteria you've given.
Now, if you specify more criteria, then we'd be able to give a more reasonable idea of the regular expression to provide - and that will drive the answers to the other questions. (At the moment, the only answer that makes sense is "It depends on what regex you use.")
I think you could do it this way for 12 characters. This will search for a 12 character phrase where each of the characters must be a capital (A or B or C or D or E or F or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 0)
[A-F0-9]{12}
If you're wanting to include the possibility of dashes then do this.
[A-F0-9\-]{12}
Or you're wanting to include the possibility of dashes plus the 12 characters then do this. But that would pick up any 12-15 character item that fit the criteria though.
[A-F0-9\-]{12,15}
Or if it's surrounded by spaces (AAAAHHHh...SO is stripping out my spaces!!!)
[A-F0-9\-]{12}
Or if it's surrounded by tabs
\t[A-F0-9\-]{12}\t
This match a string that contains 12 hexa
[0-9A-F]{12}
Assuming these are all 12-digit hexadecimal numbers, which it looks like they are, the following regex should work:
[0-9A-Fa-f]{12}
Here I'm using a character class to say that I want any digit, OR A-F, OR a-f. As a bonus I'm allowing lowercase letters; if you don't want those just get them out of the regex.
As Jon Skeet and others have said, you really didn't provide enough information, so if you don't like this answer please understand that I was doing the best I can with what information you provided.
So, how about this:
[0-9A-F]{12}
Well it sounds like you're describing a 12 digit hexadecimal number:
^[A-F0-9]{12}$

Regular expression to match a decimal value range

Is there an easy way to take a dynamic decimal value and create a validation regular expression that can handle this?
For example, I know that /1[0-9]{1}[0-9]{1}/ should match anything from 100-199, so what would be the best way to programmatically create a similar structure given any decimal number?
I was thinking that I could just loop through each digit and build one from there, but I have no idea how I would go about that.
Ranges are difficult to handle correctly with regular expressions. REs are a tool for text-based analysis or pattern matching, not semantic analysis. The best that you can probably do safely is to recognize a string that is a number with a certain number of digits. You can build REs for the maximum or minimum number of digits for a range using a base 10 logarithm. For example, the match a number between a and b where b > a, construct the RE by:
re = "[1-9][0-9]{"
re += str(log10(a)-1)
re += "-"
re += str(log10(b)-1)
re += "}"
Note: the example is in no particular programming language. Sorry, C# not really spoken here.
There are some boundary point issues, but the basic idea is to construct an RE like [1-9][0-9]{1} for anything between 100 and 999 and then if the string matches the expression, convert to an integer and do the range analysis in value space instead of lexical space.
With all of that said... I would go with Mehrdad's solution and use something provided by the language like decimal.TryParse and then range check the result.
^[-]?\d+(.\d+)?$
will validate a number with an optional decimal point and / or minus sign at the front
No, is the simple answer. Generating the regex that will work correctly would be more complicated than doing the following:
Decimal regex (find the decimal numbers in a string). "^\$?[+-]?[\d,]*(\.\d*)?$"
Convert result to decimal and compare to your range. (decimal.TryParse)
This depends on where and what you want to parse.
Using the bellow RegEx to parse strings for numbers.
Can handle comma's and dots.
[^\d.,](?<number>(\d{1,3}(\.\d{3})*,\d+|\d{1,3}(,\d{3})*\.\d+|\d*[,\.]\d+|\d+))[^\d.,]

how to write this regular expression?

an 20 - 24 char long alphanumeric string with no spaces and no symbols that has at least 2 digits
AAAAAAAAAAAAAAAAAAAA - not valid
AAAAAA0AAAAAAAAA0AAA - valid
AAAAAA01AAAAAAAAA0AAA - valid
AAAAAA0AAAAAAAAA0AAA# - not valid
I think this is only possible with look-ahead assertion:
^(?=[a-zA-Z\d]{20,24}$)[a-zA-Z]*\d[a-zA-Z]*\d[a-zA-Z\d]*$
The look-ahead assertion ((?=[a-zA-Z\d]{20,24}$)) checks if the string has the expected form (20–24 alphanumeric characters). And the second part ([a-zA-Z]*\d[a-zA-Z]*\d[a-zA-Z\d]*) checks if it contains at least two digits.
I think that this is the simplest pattern: First make a positive lookahead to check that there are at least two digits, then match 20-24 alphanumeric characters:
^(?=.*\d.*\d)[A-Za-z\d]{20,24}$
I'm going to be abstract because this sounds like homework (if it is, please tag it as such).
You can restrict the number of times a pattern matches with {min,max}
You can restrict which characters match with [charlist]
You can impose additional restrictions with what's called zero-width positive lookahead (there's also a negative form). The syntax varies, so check the docs for your environment.
Update your question (& tags) if you need more help.
Gumbo has a correct expression for the requirements.
It could be shortened, but his was more clear and probably faster than the short version.
var rX=/^(?=[a-zA-Z\d]{20,24}$)([a-zA-Z]*\d){2,}/
in JS (not confident enough with C# syntax):
if (str.length >= 20 && str.length <= 24 && /([a-z]*[0-9]){2}[a-z0-9]*/i.test(str)) {
// match found
}
Basically the same idea as Gumbo just a little shorter:
^(?=[\w\d]{20,24}$)[\w]*\d[\w]*\d[\w\d]*$

Categories