Regex mismatch issue

Regex mismatch issue - c#

I am having an issue with identifying a multiple character digit in a string.
What I am attempting to do is check, from right to left, for the first digits for comparison.
My original regex:
Regex.Match(s, #"\d{3}", RegexOptions.RightToLeft);
This would match the first 3 digits it came across in any string. Eg:
hello123 - Output: 123
good234bye - Output: 234
see-you-456-tomorrow - Output: 456
No worries. However, now we're not certain of the length the number might be, so we have changed the Regex to this:
Regex.Match(s, #"\d*", RegexOptions.RightToLeft);
This looks for the first string of digits, any length, and uses that. However, it returns an empty match if the number is not on the end of the string. Eg:
hello12 - Output: 12
good-bye-1234 - Output: 1234
see-1-you-2-morrow - Output: Nothing
How can I look for the first x-length of digits in a string, from right to left and disregarding any non-digit characters, without it returning an empty match?

Quantifiers
A digit repeated once or more times
\d+
A digit repeated from 3 times to 5 times
\d{3,5}
A digit repeated at least 5 times
\d{5,}
You can read more about quantifiers in this tutorial
As you may have realized,
see-1-you-2-morrow
^
|
\d* matches an empty position (here)
between the last character and the end of string

Can you try this? I'm new to Regex, but I guess it works for you!
var s = #"see-you-456-tomorrow";
var r = Regex.Match(s, #"[[\d]]*\d*", RegexOptions.RightToLeft);
Console.WriteLine(r);
You can see it working here.

Related

How to validate Regex

Im having a hard time with grouping parts of a Regex. I want to validate a few things in a string that follows this format: I-XXXXXX.XX.XX.XX
Validate that the first set of 6 X's (I-xxxxxx.XX.XX.XX) does not contain characters and its length is no more than 6.
Validate that the third set of X's (I-XXXXXX.XX.xx.XX) does not contain characters and is only 1 or 2.
Now, I have already validation on the last set of XX's to make sure the numbers are 1-8 using
string pattern1 = #"^.+\.(0?[1-8])$";
Match match = Regex.Match(TxtWBS.Text, pattern1);
if (match.Success)
;
else
{ errMessage += "WBS invalid"; errMessage +=
Environment.NewLine; }
I just cant figure out how to target specific parts of the string. Any help would be greatly appreciated and thank you in advance!

You're having some trouble adding new validation to this string because it's very generic. Let's take a look at what you're doing:
^.+\.(0?[1-8])$
This finds the following:
^ the start of the string
.+ everything it can, other than a newline, basically jumping the engine's cursor to the end of your line
\. the last period in the string, because of the greedy quantifier in the .+ that comes before it
0? a zero, if it can
[1-8] a number between 1 and 8
()$ stores the two previous things in a group, and if the end of the string doesn't come after this, it may even backtrace and try the same thing from the second to last period instead, which we know isn't a great strategy.
This ends up matching a lot of weird stuff, like for example the string The number 0.1
Let's try patterning something more specific, if we can:
^I-(\d{6})\.(\d{2})\.(\d{1,2})\.([1-8]{2})$
This will match:
^I- an I and a hyphen at the start of the string
(\d{6}) six digits, which it stores in a capture group
\. a period. By now, if there was any other number of digits than six, the match fails instead of trying to backtrace all over the place.
(\d{2})\. Same thing, but two digits instead of six.
(\d{1,2})\. Same thing, the comma here meaning it can match between one and two digits.
([1-8]{2}) Two digits that are each between 1 and 8.
$ The end of the string.
I hope I understood what exactly you're trying to match here. Let me know if this isn't what you had in mind.

This regex:
^.-[0-9]{6}(\.[1-8]{1,2}){3}$
will validate the following:
The first character can be any character, but is of length 1
It is followed by a dash
The dash is followed by exactly 6 numbers 0 - 9. (If this could be less than 6 characters - for example, between 3 and 6 characters - just replace {6} with {3,6}).
This is followed by 3 groups of characters. Each of this groups are proceeded by a period, are of length 1 or 2, and can be any number 1 - 8.
An example of a valid string is:
I-587954.12.34.56
This is also valid:
I-587954.1.3.5
But this isn't:
I-587954.12.80.356
because the second-to-last group contains a 0, and because the last group is of length 3.
Pleas let me know if I have misunderstood any of the rules.

^I-([0-9]{1,6})\.(.{1,2})\.(0[1-2])\.(.{1,2})$
groups delimited by . (\.) :
([0-9]{1,6}) - 1-6 digits
(.{1,2}) - 1-2 any single character
(0[1-2]) - 01 or 02
(.{1,2}) - 1-2 any single character
you can write and easy test regex on your input data, just google "regex online"

Replace digits between 2 and 8 in length with specific character using regex

We have a security issue where a specific field in a database has some sensitive information in it. I need a way to detect numbers that are between 2 and 8 in length, replace the digits with a "filler" of the same length.
For instance:
Jim8888Dandy
Mike9999999999Thompson * Note: this is 10 in length and we don't want to replace the digits
123Area Code
Tim Johnson5555555
In these instances anytime we find a number that is between 2 and 8 (inclusive) then I want to replace/fill/substitute that value with the number 0 and keep the length of the original digits
End Result
Jim0000Dandy
Mike9999999999Thompson
000Area Code
Tim Johnson0000000
Is there an easy way to accomplish this using RegEx?

You need to provide a static evaluator method that would do the replacing. It replaces digits in the match with zeroes:
public static string Evaluate(Match m)
{
return Regex.Replace(m.Value, "[0-9]", "0");
}
And then use it with this code:
string input = "9999999099999Thompson534543";
MatchEvaluator evaluator = new MatchEvaluator(Program.Evaluate);
string replaced = Regex.Replace(input, "(?:^|[^0-9])[0-9]{2,8}(?:$|[^0-9])", evaluator);
The regex is:
(?:^|[^0-9]) - should be at the start or preceeded by non-digit
[0-9]{2,8} - the to capture between 2 and 8 digits
(?:$|[^0-9]) - should be at the end or followed by non-digit

Just for the clever regex department. This is not an efficient regex.
(?<=(?>(?'front'\d){0,7}))\d(?=(?'back'(?'-front'\d)){0,7}(?!\d))((?'-front')|(?'-back'))
Replace to 0.
/(?<=(?>(?'front'\d){0,7})) # Measure how many digits we're behind.
\d # This digit is matched
(?=
(?'back' # Measure how many digits we're in front of.
(?'-front'\d)){0,7}
# For every digit here, subtract one group from 'front',
# As to assert we'll never go over the < 8 digit requirement.
(?!\d) # no more digits
)
(
(?'-front') # At least one capturing group left for 'front' or 'back'
|(?'-back') # for > 2 digits requirement.
)/x

Regex Substring or Left Equivalent

Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.

You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.

If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));

I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.

Do you even SQL? Pull some levers and stuff.

Regular expression for conditionally formatting a number string

orginal question removed
I am looking for a Regular Expression which will format a string containing of special characters, characters and numbers into a string containing only numbers.
There are special cases in which it’s not enough to only replace all non-numeric characters with “” (empty).
1.) Zero in brackets.
If there are only zeros in a bracket (0) these should be removed if it is the first bracket pair. (The second bracket pair containing only zeros should not be removed)
2.) Leading zero.
All leading zero should be removed (ignoring brackets)
Examples for better understanding:
123 (0) 123 would be 123123 (zero removed)
(0) 123 -123 would be 123123(zero and all other non-numeric characters removed)
2(0) 123 (0) would be 21230 (first zero in brackets removed)
20(0)123023(0) would be 201230230 (first zero in brackets removed)
00(0)1 would be 1(leading zeros removed)
001(1)(0) would be 110 (leading zeros removed)
0(0)02(0) would be 20 (leading zeros removed)
123(1)3 would be 12313 (characters removed)

You could use a lookbehind to match (0) only if it's not at the beginning of the string, and replace with empty string as you're doing.
(original solution removed)
Updated again to reflect new requirements
Matches leading zeroes, matches (0) only if it's the first parenthesized item, and matches any non-digit characters:
^[0\D]+|(?<=^[^(]*)\(0\)|\D
Note that most regex engines do not support variable-length lookbehinds (i.e., the use of quantifiers like *), so this will only work in a few regex engines -- .NET's being one of them.
^[0\D]+ # zeroes and non-digits at start of string
| # or
(?<=^[^(]*) # preceded by start of string and only non-"(" chars
\(0\) # "(0)"
| # or
\D # non-digit, equivalent to "[^\d]"
(tested at regexhero.net)
You've changed and added requirements several times now. For multiple rules like this, you're probably better off coding for them individually. It could become complicated and difficult to debug if one condition matches and causes another condition not to match when it should. For example, in separate steps:
Remove parenthesized items as necessary.
Remove non-digit characters.
Remove leading zeroes.
But if you absolutely need these three conditions all matched in a single regular expression (not recommended), here it is.

Regexes get much, much simpler if you can use multiple passes. I think you could do a first pass to drop your (0) if it's not the first thing in a string, then follow it with stripping out the non-digits:
var noMidStrParenZero = Regex.Replace(text, "^([^(]+)\(0\)", "$1");
var finalStr = Regex.Replace(noMidStrParenZero, "[^0-9]", "");
Avoids a lot of regex craziness, and it's also self-documenting to an extent.
EDIT: this version should work with your new examples too.

This regex should be pretty near the one you're searching for.
(^[^\d])|([^\d](0[^\d])?)+
(You can replace everything that is caught by an empty string)
EDIT :
Your request evolved, and is now to complex to be treatd with a single pass. Assuming you always got a space before a bracket group, you can use those passes (keep this order) :
string[] entries = new string[7] {
"800 (0) 123 - 1",
"800 (1) 123",
"(0)321 123",
"1 (0) 1",
"1 (12) (0) 1",
"1 (0) (0) 1",
"(9)156 (1) (0)"
};
foreach (string entry in entries)
{
var output = Regex.Replace(entry , #"\(0\)\s*\(0\)", "0");
output = Regex.Replace(output, #"\s\(0\)", "");
output = Regex.Replace(output, #"[^\d]", "");
System.Console.WriteLine("---");
System.Console.WriteLine(entry);
System.Console.WriteLine(output);
}

(?: # start grouping
^ # start of string
| # OR
^\( # start of string followed by paren
| # OR
\d # a digit
) # end grouping
(0+) # capture any number of zeros
| # OR
([1-9]) # capture any non-zero digit
This works for all of your example strings, but the entire expression does match the ( followed by the zero. You can use Regex.Matches to get the match collection using a global match and then join all of the matched groups into a string to get numbers only (or just remove any non-numbers).

.NET REGEX Matching matches empty strings

I have this
pattern:
[0-9]*\.?[0-9]*
Target:
X=113.3413475 Y=18.2054775
And i want to match the numbers. It matches find in testing software like http://regexpal.com/ and Regex Coach.
But in Dot net and http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I get:
Found 11 matches:
1.
2.
3.
4.
5.
6. 113.3413475
7.
8.
9.
10. 18.2054775
11.
String literals for use in programs:
C#
#"[0-9]*[\.]?[0-9]*"
Any one have any idea why i'm getting all these empty matches.
Thanks and Regards,
Kevin

Yes, that will match empty string. Look at it:
[0-9]* - zero or more digits
\.? - an optional period
[0-9]* - zero or more digits
Everything's optional, so an empty string matches.
It sounds like you always want there to be digits somewhere, for example:
[0-9]+\.[0-9]*|\.[0-9]+|[0-9]+
(The order here matters, as you want it to take the most possible.)
That works for me:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] args)
{
string x = "X=113.3413475 Y=18.2054775";
Regex regex = new Regex(#"[0-9]+\.[0-9]*|\.[0-9]+|[0-9]+");
var matches = regex.Matches(x);
foreach (Match match in matches)
{
Console.WriteLine(match);
}
}
}
Output:
113.3413475
18.2054775
There may well be better ways of doing it, admittedly :)

Try this one:
[0-9]+(\.[0-9]+)?
It's slightly different that Jon Skeet's answer in that it won't match .45, it requires either a number alone (e.g. 8) or a real decimal (e.g. 8.1 or 0.1)

Another alternative is to keep your original regex, and just assert it must have a number in it (maybe after a dot):
[0-9]*\.?[0-9]*
Goes to:
(?=\.?[0-9])[0-9]*\.?[0-9]*

The key problem is the *, which means "match zero or more of the preceding characters". The empty string matches zero or more digits, which is why you're getting all those matches.
Change your two *s to +s and you'll get what you want.

The problem with this regex is that it is completely optional in all the fields, so an empty string also is matched by it. I would consider adding all the cases. By the regex, I see you want the numbers with or without dot, and with or without a set of decimal digits. You can separate first those that contain only numbers [0-9]+, then those that contain numbers plus only a dot, [0-9]+\. and then join them all with | (or).
The problem with the regex as it is is that it allows cases that are not real numbers, for example, the cases in which the first set of numbers and the last set of numbers are empty (just a dot), so you have to put the valid cases explicitly.

Regex pattern = new Regex( #"[0-9]+[\.][0-9]+");
string info = "X=113.3413475 Y=18.2054775";
MatchCollection matches = pattern.Matches(info);
int count = 1;
foreach(Match match in matches)
{
Console.WriteLine("{0} : {1}", count++, match.Value);
}
//output
//1 : 113.3413475
//2 : 18.2054775
Replace your * with + and remove ? from your period case.
EDIT: from above conversation: #"[0-9]+.[0-9]*|.[0-9]+|[0-9]+", is the better case. catches 123, .123, 123.123 etc

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex mismatch issue - c#

Can you try this? I'm new to Regex, but I guess it works for you! var s = #"see-you-456-tomorrow"; var r = Regex.Match(s, #"[[\d]]\d", RegexOptions.RightToLeft); Console.WriteLine(r); You can see it working here.

Related

How to validate Regex

Replace digits between 2 and 8 in length with specific character using regex

Regex Substring or Left Equivalent

Regular expression for conditionally formatting a number string

.NET REGEX Matching matches empty strings

Categories

Resources

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex mismatch issue - c#

Can you try this? I'm new to Regex, but I guess it works for you! var s = #"see-you-456-tomorrow"; var r = Regex.Match(s, #"[[\d]]*\d*", RegexOptions.RightToLeft); Console.WriteLine(r); You can see it working here.

Related

How to validate Regex

Replace digits between 2 and 8 in length with specific character using regex

Regex Substring or Left Equivalent

Regular expression for conditionally formatting a number string

.NET REGEX Matching matches empty strings

Categories

Resources

Can you try this? I'm new to Regex, but I guess it works for you! var s = #"see-you-456-tomorrow"; var r = Regex.Match(s, #"[[\d]]\d", RegexOptions.RightToLeft); Console.WriteLine(r); You can see it working here.