Find matching sequences that have exactly one deviation

Find matching sequences that have exactly one deviation - c#

I'm trying to implement a feature which takes a sequence of digits (U.S. Social Security Numbers) as an argument and returns a collection of SSNs which match the input except for exactly one deviation.
So, the input 123456789 would return:
123356789
193456789
123450789
But would not return 123546789, etc.
I have a system in ASP.NET which does pattern matches on inputs with wildcards, like 123**6789. So I could adapt that, using a loop, to this. But if there was a single regex for this, I would just implement it in SQL and be done with it.
So, is there a regex that will do this without having to call it in a for loop?

Unfortunately I'm not specialist in regular expressions but I think that you can use simple regular expression to check absolute value of subtraction between testing value and your input value. Acceptable result should be exactly 1 digit and 0 or more trailing zeroes. For your example values:
Testing value Input value ABS(Subtraction)
--------------+--------------+------------------
123356789 123456789 100000
193456789 123456789 70000000
123450789 123456789 6000
123456786 123456789 3

Related

Regex for numbers after a certain string part

I'm trying to extract numbers inside an URL with regex.
Example Input: http://localhost:23089/generic-url-segment-c?p=5
Expected Output : 5
Example Input: http://localhost:23089/generic-url-segment-c?p=12&sort=5
Expected Output: 12
First I tried to look for the numbers with a mixture of string.replace,string.indexof and substring but thought Regex would be easier.
So far I tried using ((p=)?=.) but can't get the 5 only.
And also as shown in second example, this value might be a two digit value or there even might be other parameters after it. So maybe a search between p= and & is necessary but I don't know how Regex behaves in absence of parameters.

Try the below pattern. The plus matches 1 or more so you can get 1 or more digits -
p=(\d+)
The brackets are a group so to get the value of the match within the group use
match.Groups[1].Value

You could use lookbehind:
(?<=\bp=)\d+
or
(?<=[?&]p=)\d+
Usage:
Regex.Match(str, #"(?<=[?&]p=)\d+").Value;

Decimal or integer with negative look behind

I have the following simple negative look behind
(?<![Ø]\s*)
And the following expression to match an integer or a decimal whether with or without integer part
([0-9]*(?:[.,][0-9]+)?)
the second expression matches 8 8.8 8,8 .8 ,88 etc..
I am trying to combine the 2 expressions to ignore the whole match of the second expression in case its preceded by Ø, so I did
(?<![Ø]\s*)([0-9]*(?:[.,][0-9]+)?)
and those values for testing
88.88
88,88
,88
.5
Ø .8
Ø 8.8
First 4 values match as expected but a part of the last 2 gets partially matched and I expected it to not match at all, can someone please tell what i am missing?

You can try this
(?<![Ø]\s*|[.,\d])(?=[\d.,]{1,})([0-9]*(?:[.,][0-9]+)?)
^^
||
regexstorm demo
A bit simpler version suggested by bobble bubble
(?<!Ø\s*|[.,\d])(\d*[.,]?\d+)

Always better to explain answers on SO. So here we go: the problem is that the expression can actually match anywhere in the string. Thus, if the test case has more than one character in the match it might fail on, then a match will start one character in and match the rest. Even more so given that the expression given can match some blank strings. The best way of doing things would be to:
Add a check to make sure that the match has at least one digit, and
Add a check to make sure that it is at the start of a potential match.
Both of these can be done with the negative lookbehind. Thus: (?<!Ø\s*[.,\d]*)\d*[.,]?\d+

RegEx Model Validation: Whole Number or Decimal Multiples of .25

I am trying to compose a regular expression to match a numeric value expressed as a decimal multiple of .25 (ex. 1.25, 14.75).
// Must Match
1.0
1.25
1.250000
1.5
1.500
1.75
1.7500
// Must Not Match
1.2
1.46
1.501
1.99
So far I have the following expression: \d+(\.((0+)|(250*)|(50*)|(750*))). It works when I use online tooling like gskinner.com/regexr. When I use the expression in a validation attribute to seed my EntityFramework db, it produces validation errors:
[RegularExpression(#"^\d+(\.((0+)|(250*)|(50*)|(750*)))$", ErrorMessage = "Hours must be 15 minute increments expressed as decimals (ex. .0, .25, .5, .75)")]
public double Hours { get; set; }
Similar question (I am looking for a way to round the decimal portion of numbers up or down to the nearest .25, .5, .75, or whole number) but I need to use a regular expression to use the above data annotation.
Question:
Anyone see what's wrong with my expression?
Bonus points if you can extend it to support whole numbers (ex. 4 or 4.25 but not 4. or 4.62)

To match such number use regex pattern
(?!0\d)\d+(?:[.](?:25|5|75|0)0*)?(?!\d)
To validate input to be such number use regex pattern
^(?!0\d)\d+(?:[.](?:25|5|75|0)0*)?$
In both cases, the very first part (?!0\d) is optional to disallow match/validate numbers with invalid leading zeros, such as 000003.250, when match would trim them and take just 3.250; validation would fail if this optional part is present in the regex.

This matches whole numbers too:
^\d+(\.(25|5|75|0)0*)?$
I tested it with RegexHero. It has a .NET Regex engine in the backstage. If you're using all test cases together, make sure that you make Multiline option selected, so that ^ and $ symbols match each line individually, not the whole text.

Regular Expression for UK postcodes

I have a list of post codes which should be excluded from my shipping methods.
Suppose I have to exclude Scilly Isles, Isle of Man and few others.
For the above 2 areas valid post codes are IM1-IM9, IM86, IM87, IM89. And if it is IM25 or IM85 it is invalid.
I have writtent following expression. But it is returning even it is IM25 or IM 85.
var regex = new Regex("(PO3[0-9]|PO4[0-1]|GY[1-9]|JE[1-5]|IM[1-9]|TR[1-9])");
If I am passing IM85, to my expression it should return false. for IM1-IM9,, IM86, IM87, IM89 it should return true.
Same with TR post codes also. TR1-TR27 is a valid post code. If I give TR28, it should return false.
I am using '|' to seperate multiple patterns. Is that the right way of including multiple patterns in 1 expression.

What do you expect? What should be matched and what not? And please give an example of the string you want to test.
If you match your pattern against "IM25" it will match because you do allow IM[1-9] in your pattern, so you get a valid partial match. If you want to avoid that (I am not sure what you want to achieve) and want to allow really only a single digit after the first letters, use a "word boundary" \b and specify exactly what you want to allow, something like this:
(PO3[0-9]|PO4[0-1]|GY[1-9]|JE[1-5]|IM([1-9]|8[6-9])|TR([1-9]|2[0-7]))\b
See it here on Regexr
this would allow for the "IM" part also 6-9 as a second digit when there is a 8 before.
Update
It is still not clear what the context of your task is. I assume you have a list of valid Postcodes, probably it would be better, you extract the post code or only the first part of it (for that you can eventually use a regex) and check if it is in the list or not.

The actual validation is on the wikipedia site... Google has the answers ;) http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
(GIR 0AA)|(((A[BL]|B[ABDFHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9][0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})

I still think you need more clarification. As a huge Regex guy, I would like to point out that multi-digit ranges should try to be put into the code side, not the Regex side, just for your sanity. But I personally like to play with Regex in this way. Regex reads one character at a time, so it only recognizes zero through nine. Not ten, not twenty eight. If you want to allow the following:
28 through 347
Then it becomes pretty complicated.
To put it into words, you want to allow:
If Two Digits, allow 2-9 for the first digit, and:
If the first digit is a Two, then allow 8/9 for the second digit,
ElseIf the first Digit is 3-9, then allow 0-9 for the second digit
Elseif Three Digits, allow 1-3 for the first Digit, and:
If the first digit is a Three, then allow 0-4 for the second digit, and:
If the second digit is a Four, then allow 0-7 for the third digit,
ElseIf the second digit is 0-3, then allow 0-9 for the third digit.
ElseIf the first digit is 1/2, then allow 0-9 for both the Second and Third digits.
Then with that, you can write a proper Regex like so, which searches for a word boundary or non-Digit surrounding a 2-pair or 3-pair. With this type of Problem-Solving, you should be able to figure out your Regex issue. Otherwise, let us know more about EXACTLY What you want to Match and NOT Match:
(\b|\D)((2[89]|[3-9][0-9])(\b|\D)|(3(4[0-7]|[0-3][0-9])|[12][0-9][0-9])(\b|\D))

I have changed my approach.
Instead of going for a regular expression which is becoming more complex, I am saving all the excluded outward codes of UK post codes.
And if any post code contains the particular outward code, excluding the post code from the list.
Outward codes are in this format
XX-YYY
XXX-YYY
XXXX-YYY
In all above formats, X represents outward code of an UK postcode.

What is the regular expression for the following strings and would the expression change if the number rolled over?

What would be the following regular expressions for the following strings?
56AAA71064D6
56AAA7105A25
Would the regular expression change if the numbers rolled over? What I mean by this is that the above numbers happen to contain hexadecimal values and I don't know how the value changes one it reaches F. Using the first one as an example: 56AAA71064D6, if this went up to
56AAA71064F6 and then the following one would become 56AAA7106406, this would create a different regular expression because where a letter was allowed, now their is a digit, so does this make the regular expression even more difficult. Suggestions?
A manufacturer is going to enter a range of serial numbers. The problems are that different manufacturers have different formats for serial numbers (some are just numbers, some are alpha numeric, some contain extra characters like dashes, some contain hexadacimal values which makes it more difficult because I don't know how the roll over to the next serial number). The roll over issue is the biggest problem because the serial numbers are entered as a range like 5A1B - 6F12 and without knowing how the roll over, it seems to me that storing them in the database is not as easy. I was going to have the option of giving the user the option to input the pattern (expression) and storing that in the databse, but if a character or characters changes from a digit to a letter or vice versa, then the regular expression is no longer valid for certain serial numbers.
Also, the above example I gave is with just one case. There are multitude of serial numbers that would contain different expressions.

There's no single regular expression which is "the" expression to match both of those strings. Instead, there are infinitely many which will do so. Here are two options at opposite ends of the spectrum:
(56AAA71064D6)|(56AAA7105A25)
.*
The first will only match those two strings. The second will match anything. Both satisfy all the criteria you've given.
Now, if you specify more criteria, then we'd be able to give a more reasonable idea of the regular expression to provide - and that will drive the answers to the other questions. (At the moment, the only answer that makes sense is "It depends on what regex you use.")

I think you could do it this way for 12 characters. This will search for a 12 character phrase where each of the characters must be a capital (A or B or C or D or E or F or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 0)
[A-F0-9]{12}
If you're wanting to include the possibility of dashes then do this.
[A-F0-9\-]{12}
Or you're wanting to include the possibility of dashes plus the 12 characters then do this. But that would pick up any 12-15 character item that fit the criteria though.
[A-F0-9\-]{12,15}
Or if it's surrounded by spaces (AAAAHHHh...SO is stripping out my spaces!!!)
[A-F0-9\-]{12}
Or if it's surrounded by tabs
\t[A-F0-9\-]{12}\t

This match a string that contains 12 hexa
[0-9A-F]{12}

Assuming these are all 12-digit hexadecimal numbers, which it looks like they are, the following regex should work:
[0-9A-Fa-f]{12}
Here I'm using a character class to say that I want any digit, OR A-F, OR a-f. As a bonus I'm allowing lowercase letters; if you don't want those just get them out of the regex.
As Jon Skeet and others have said, you really didn't provide enough information, so if you don't like this answer please understand that I was doing the best I can with what information you provided.

So, how about this:
[0-9A-F]{12}

Well it sounds like you're describing a 12 digit hexadecimal number:
^[A-F0-9]{12}$

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.