regex expression help needed

regex expression help needed - c#

I was wondering if this was possible using Regex. I would like to exclude all letters (upper and lowercase) and the following 14 characters ! “ & ‘ * + , : ; < = > # _
The problem is the equal sign. In the string (which must either be 20 or 37 characters long) that I will be validating, that equal sign must either be in the 17th or 20th position because it is used as a separator in those positions. So it must check if that equal sign is anywhere other than in the 16th or 20th position (but not both). The following are some examples:
pass: 1234567890123456=12345678901234567890
pass: 1234567890123456789=12345678901234567
don't pass: 123456=890123456=12345678901234567
don't pass: 1234567890123456=12=45678901234567890
I am having a hard time with the part that I must allow the equal sign in those two positions and not sure if that's possible with Regex. Adding an if-statement would require substantial code change and regression testing because this function that stores this regex currently is used by many different plug-ins.

I'll go for
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Explanations :
1) Start with your allowed char :
^[^a-zA-Z!"&'*+,:;<=>#_]$
[^xxx] means all except xxx, where a-z is lower case letters A-Z upper case ones, and your others chars
2) Repeat it 16 times, then =, then others allowed chars ("allowed char" followed by '+' to tell that is repeated 1 to n times)
^[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+$
At this point you'll match your first case, when = is at position 17.
3) Your second case will be
^[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*$
with the last part followed by * instead of + to handle strings that are only 20 chars long and that ends with =
4) just use the (case1|case2) to handle both
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Tested OK with notepad++ and your examples
Edit to match exactly 20 or 37 chars
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{3}|[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{20}|[^a-zA-Z!"&'*+,:;<=>#_]{19}=|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]{17})$
More readable view with explanation :
`
^(
// 20 chars with = at 17
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 16 allowed chars
= // followed by =
[^a-zA-Z!"&'*+,:;<=>#_]{3} // folowed by 3 allowed chars
|
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 37 chars with = at 17
=
[^a-zA-Z!"&'*+,:;<=>#_]{20}
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 20 chars with = at 20
=
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 37 chars with = at 20
=
[^a-zA-Z!"&'*+,:;<=>#_]{17}
)$
`

I've omitted other symbols matching other symbols and just placed the [^=], you should have there code for all allowed symbols except =
var r = new Regex(#"^(([0-9\:\<\>]{16,16}=(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))|(^[^=]{19,19}=(([0-9\:\<\>]{17}))?))$");
/*
#"^(
([0-9\:\<\>]{16,16}
=
(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))
|
(^[^=]{19,19}
=
(([0-9\:\<\>]{17}))?)
)$"
*/
using {length,length} you can also specify the overall string length. The $ in the end and ^ in the beginning are important also.

Related

Extract phone numbers and exclude extraneous characters

I'm trying to create a regex which will extract a complete phone number from a string (which is the only thing in the string) but leaving out any cruft like decorative brackets, etc.
The pattern I have mostly appears to work, but returns a list of matches - whereas I want it to return the phone number with the characters removed. Unfortunately, it completely fails if I add the start and end of line matchers...
^(?!\(\d+\)\s*){1}(?:[\+\d\s]*)$
Without the ^ and $ this matches the following numbers:
12345-678-901 returns three groups: 12345 678 901
+44-123-4567-8901 returns four groups: +44 123 4567 8901
(+48) 123 456 7890 returns four groups: +48 123 456 7890
How can I get the groups to be returned as a single, joined up whole?
Other than that, the only change I would like to include is to return nothing if there are any non-numeric, non-bracket, non-+ characters anywhere. So, this should fail:
(+48) 123 burger 7890

I'd keep it simple, makes it more readable and maintainable:
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return Regex.Replace(messynumber, "[^0-9+]", "");
}
If any alphameric characters are present (extend this range if you wish) return blank else replace every char that is not 0-9 or +, with nothing. This produces output like 0123456789 and +481234567 with all the brackets, spaces and hyphens etc removed too. If you want to keep those in the output, add them to the Regex
Side note: It's not immediately clear or me what you think is "cruft" that should be stripped (non a-z?) and what you think is "cruft" that should cause blank (a-z?). I struggled with this because you said (paraphrase) "non digit, non bracket, non plus should cause blank" but earlier in your examples your processing permitted numbers that had hyphens and also spaces - being strictly demanding of spec hyphens/spaces would be "cruft that causes the whole thing to return blank" too
I've assumed that it's lowercase chars from the "burger" example but as noted you can extend the range in the IF part should you need to include other chars that return blank
If you have a lot of them to do maybe pre compile a regex as a class level variable and use it in the method:
private Regex _strip = new Regex( "[^0-9+]", RegexOptions.Compiled);
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return _strip.Replace(messynumber, "");
}
...
for(int x = 0; x < millionStrArray.Length; x++)
millionStrArray[x] = CleanPhoneNumber(millionStrArray[x], "");
I don't think you'll gain much from compiling the IsMatch one but you could try it in a similar pattern
Other options exist if you're avoiding regex, you cold even do it using LINQ, or looping on char arrays, stringbuilders etc. Regex is probably the easiest in terms of short maintainable code

The strategy here is to use a look ahead and kick out (fail) a match if word characters are found.
Then when there are no characters, it then captures the + and all numbers into a match group named "Phone". We then extract that from the match's "Phone" capture group and combine as such:
string pattern = #"
^
(?=[\W\d+\s]+\Z) # Only allows Non Words, decimals and spaces; stop match if letters found
(?<Phone>\+?) # If a plus found at the beginning; allow it
( # Group begin
(?:\W*) # Match but don't *capture* any non numbers
(?<Phone>[\d]+) # Put the numbers in.
)+ # 1 to many numbers.
";
var number = "+44-123-33-8901";
var phoneNumber =
string.Join(string.Empty,
Regex.Match(number,
pattern,
RegexOptions.IgnorePatternWhitespace // Allows us to comment the pattern
).Groups["Phone"]
.Captures
.OfType<Capture>()
.Select(cp => cp.Value));
// phoneNumber is `+44123338901`
If one looks a the match structure, the data it houses is this:
Match #0
[0]: +44-123-33-8901
["1"] → [1]: -8901
→1 Captures: 44, -123, -33, -8901
["Phone"] → [2]: 8901
→2 Captures: +, 44, 123, 33, 8901
As you can see match[0] contains the whole match, but we only need the captures under the "Phone" group. With those captures { +, 44, 123, 33, 8901 } we now can bring them all back together by the string.Join.

Regex to match all Romanian phone numbers

I searched the whole google to find some ways to verify if the phone number is Romanian but didn't found anything that helps me...
I want a Regex validator for the following numbers format:
074xxxxxxx
075xxxxxxx
076xxxxxxx
078xxxxxxx
072xxxxxxx
077xxxxxxx
0251xxxxxx
0351xxxxxx
This is the regex that I've made, but it is not working:
{ "Romania", new Regex("(/^(?:(?:(?:00\\s?|\\+)40\\s?|0)(?:7\\d{2}\\s?\\d{3}\\s?\\d{3}|(21|31)\\d{1}\\s?\\d{3}\\s?\\d{3}|((2|3)[3-7]\\d{1})\\s?\\d$)")}
It doesn't validate the correct numbers format.
More details:
If the number begins with other than the initial ones that I've added, then that number is not valid.
The x should contain any number, but there should not be the same number..like 0000000 1111111 etc.
It can also have the following format (but not mandatory): (072)xxxxxxx
Is there any way of doing this?
I want to implement this to store these numbers in database and check if their format is Romanian.
This is the code where I need to add the regex expression...there should be a new Regex named "Romanian"
static IDictionary<string, Regex> countryRegex = new Dictionary<string, Regex>()
{
{ "USA", new Regex("^[2-9]\\d{2}-\\d{3}-\\d{4}$")},
{ "UK", new Regex("(^1300\\d{6}$)|(^1800|1900|1902\\d{6}$)|(^0[2|3|7|8]{1}[0-9]{8}$)|(^13\\d{4}$)|(^04\\d{2,3}\\d{6}$)")},
{ "Netherlands", new Regex("(^\\+[0-9]{2}|^\\+[0-9]{2}\\(0\\)|^\\(\\+[0-9]{2}\\)\\(0\\)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9\\-\\s]{10}$)")},
};

If I understand the rules correctly, this pattern should work:
^(?<paren>\()?0(?:(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6}|(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5})$
So, you could add it to your code like this:
static IDictionary<string, Regex> countryRegex = new Dictionary<string, Regex>()
{
{ "USA", new Regex("^[2-9]\\d{2}-\\d{3}-\\d{4}$")},
{ "UK", new Regex("(^1300\\d{6}$)|(^1800|1900|1902\\d{6}$)|(^0[2|3|7|8]{1}[0-9]{8}$)|(^13\\d{4}$)|(^04\\d{2,3}\\d{6}$)")},
{ "Netherlands", new Regex("(^\\+[0-9]{2}|^\\+[0-9]{2}\\(0\\)|^\\(\\+[0-9]{2}\\)\\(0\\)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9\\-\\s]{10}$)")},
{ "Romania", new RegEx(#"^(?<paren>\()?0(?:(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6}|(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5})$")}
};
Here is the meaning of the pattern:
^ - Matches must start at the beginning of the input string
(?<paren>\()? - Optionally matches a ( character. If it is there, it captures it in a group named paren
0 - The number must start with a single 0
(?: - Begins an non-capturing group for the purpose of matching one of two different formats
(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6} - The first format
(?:72|74|75|76|77|78) - The next two digits must be 72, 74, 75, 76, 77, or 78
(?(paren)\)) - If the opening ( exists, then there must be a closing ) here
(?<first>\d) - Matches just the first of the ending seven digits and captures it in a group named first
(?!\k<first>{6}) - A negative look-ahead which ensures that the remaining six digits are not the same as the first one
\d{6} - Matches the remaining six digits
| - The or operator
(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5} - The second format
(?:251|351) - The next three digits must be 251 or 351.
(?(paren)\)) - If the opening ( exists, then there must be a closing ) here
(?<first>\d) - Matches just the first of the ending six digits and captures it in a group named first
(?!\k<first>{5}) - A negative look-ahead which ensures that the remaining five digits are not the same as the first one
\d{5} - Matches the remaining five digits
) - Ends the non-capturing group which specified the two potential formats
$ - The match must go all the way to the of the input string

Try this one: ^(?=0[723][2-8]\d{7})(?!.*(.)\1{2,}).{10}$ - The negative lookahead (?!...) is testing the repeating characters
I use http://regexr.com/ to test

This match your example:
0(([7][456728])|([23]51)).*

match regex patten between two pattern

Hi I am newbie in RegEx operations. I have a text like
[JUNCTIONS]
;ID Elev Demand Pattern
3 50 100 ;
4 50 30 ;
5 50 20 ;
6 40 20 ;
7 50 5 ;
8 30 5 ;
9 30 5 ;
2 50 80 ;
10 50 70 ;
11 50 30 ;
12 50 52 ;
13 50 40 ;
14 50 40 ;
15 50 10 ;
16 50 10 ;
17 50 10 ;
18 0 0 ;
19 0 0 ;
[RESERVOIRS]
;ID Head Pattern
1 100 ;
[TANKS]
I want to create a pattern and output the text between [JUNCTIONS] and [RESERVOIRS] then [RESERVOIRS] to [TANKS] then so on. [XXXX] is not known to me. I want to get text inside [XXX] to [XXX]. How can i do that?

Here is the regex:
(?=(\[\S+\].*?\[\S+\]))
or
(?=(\[(?:JUNCTIONS|RESERVOIRS)\].*?\[(?:RESERVOIRS|TANKS)\]))
Assuming you want to handle all the [...] things from your input.
Note: Use the make sure you are handling multiple line regex matching from your c#. And don't for get to escape the \ character if you need.

Here is some c# code to do the match, and get the results.
Be sure to add error checking, for example to make sure that the match actually worked.
Note the Singleline flag - this lets the dot (.) match all characters, including newlines. You'll also probably need to cleanup and trim the output, to remove any trailing newlines, etc.
MatchCollection matches = Regex.Matches(test, #"^\[JUNCTIONS\](.*)\[RESERVOIRS\](.*)\[TANKS\](.*)$", RegexOptions.Singleline);
GroupCollection groups = matches[0].Groups;
// JUNCTIONS text
Console.WriteLine(groups[1]);
// RESERVOIRS text
Console.WriteLine(groups[2]);
Edit - Updated to match OP's changes
If you want to match an unspecified number of matches, its a little trickier. This regex will match a [TEXT] block and anything that comes after it, until it its a [ character. The way to use this regex is to loop over the MatchCollection for each region, and use .groups[1] for the text and .groups[2] for the body.
MatchCollection matches =
Regex.Matches(test, #"\[([\w+]+)\]([^\[]+)?", RegexOptions.Singleline);
// for each block / section of the document
foreach(Match match in matches){
GroupCollection groups = match.Groups;
// [TEXT] part will be here
Console.WriteLine(groups[1]);
// The rest will be here
Console.WriteLine(groups[2]);
}

Why use a regex?
Assuming you can read this input text one line at a time, it will probably be quicker and easier to just loop over the lines, and output those you need. Some variant of:
Update:
In response to you comment below; you can probably use this to skip any lines with [something] in them, and print out the rest:
// Pattern: Any instance of [] with one or more characters of between them:
var pattern = #"\[.+\]";
while((line = file.ReadLine()) != null)
{
if(!Regex.IsMatch(line, pattern)) // Skip lines that match
{
Console.WriteLine(line);
}
}

How to validate this using Regex C#

I have a input which can be the following (Either one of these three):
1-8…in other words 1, 2,3,4,5,6,7,8
A-Z….in other words, A, B, C, D etc
01-98…in other words, 01,02,03,04 etc
I came up with this regex but it's not working not sure why:
#"[A-Z0-9][1-8]-
I am thinking to check for corner cases like just 0 and just 9 after regex check because regex check isn't validating this

Not sure I understand, but how about:
^(?:[A-Z]|[1-8]|0[1-9]|[1-8][0-9]|9[0-8])$
Explanation:
(?:...) is a group without capture.
| introduces an alternative
[A-Z] means one letter
[1-8] one digit between 1 and 8
0[1-9] a 0 followed by a digit between 1 and 9
[1-8][0-9] a digit between 1 and 8 followed by a digit between 1 and 9
9[0-8] 9 followed by a digit between 0 and 8
May be it is, depending on your real needs:
^(?:[A-Z]|[0-9]?[1-8])$

I think you may use this pattern
#"^([1-8A-Z]|0[1-9]|[1-9]{2})$"

What about ^([1-8]|([0-9][0-9])|[A-Z])$ ?
That will give a match for
A
8 (but not 9)
09

[1-8]{0,1}[A-Z]{0,1}\d{1,2}
matches all of the following
8A8 8B9 9 0 00

You can use following pattern:
^[1-8][A-Z](?:0[1-9]|[1-8][0-9]|9[0-8])$
^[1-8] - input should start with number 1-8
[A-Z] - then should be single letter A-Z
(0[1-9]|[1-8][0-9]|9[0-8])$ and it should end with two numbers which are 01-19 or 10-89 or 90-98
Test:
string pattern = #"^[1-8][A-Z](0[1-9]|[1-8][0-9]|9[0-8])$";
Regex regex = new Regex(pattern);
string[] valid = { "1A01", "8Z98" };
bool allMatch = valid.All(regex.IsMatch);
string[] invalid = { "0A01", "11A01", "1A1", "1A99", "1A00", "101", "1AA01" };
bool allNotMatch = !invalid.Any(regex.IsMatch);

Basic regex for 16 digit numbers

I currently have a regex that pulls up a 16 digit number from a file e.g.:
Regex:
Regex.Match(l, #"\d{16}")
This would work well for a number as follows:
1234567891234567
Although how could I also include numbers in the regex such as:
1234 5678 9123 4567
and
1234-5678-9123-4567

If all groups are always 4 digit long:
\b\d{4}[ -]?\d{4}[ -]?\d{4}[ -]?\d{4}\b
to be sure the delimiter is the same between groups:
\b\d{4}(| |-)\d{4}\1\d{4}\1\d{4}\b

If it's always all together or groups of fours, then one way to do this with a single regex is something like:
Regex.Match(l, #"\d{16}|\d{4}[- ]\d{4}[- ]\d{4}[- ]\d{4}")

You could try something like:
^([0-9]{4}[\s-]?){3}([0-9]{4})$
That should do the trick.
Please note:
This also allows
1234-5678 9123 4567
It's not strict on only dashes or only spaces.

Another option is to just use the regex you currently have, and strip all offending characters out of the string before you run the regex:
var input = fileValue.Replace("-",string.Empty).Replace(" ",string.Empty);
Regex.Match(input, #"\d{16}");

Here is a pattern which will get all the numbers and strip out the dashes or spaces. Note it also checks to validate that there is only 16 numbers. The ignore option is so the pattern is commented, it doesn't affect the match processing.
string value = "1234-5678-9123-4567";
string pattern = #"
^ # Beginning of line
( # Place into capture groups for 1 match
(?<Number>\d{4}) # Place into named group capture
(?:[\s-]?) # Allow for a space or dash optional
){4} # Get 4 groups
(?!\d) # 17th number, do not match! abort
$ # End constraint to keep int in 16 digits
";
var result = Regex.Match(value, pattern, RegexOptions.IgnorePatternWhitespace)
.Groups["Number"].Captures
.OfType<Capture>()
.Aggregate (string.Empty, (seed, current) => seed + current);
Console.WriteLine ( result ); // 1234567891234567
// Shows False due to 17 numbers!
Console.WriteLine ( Regex.IsMatch("1234-5678-9123-45678", pattern, RegexOptions.IgnorePatternWhitespace));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regex expression help needed - c#

Related

Extract phone numbers and exclude extraneous characters

Regex to match all Romanian phone numbers

match regex patten between two pattern

How to validate this using Regex C#

Basic regex for 16 digit numbers

Categories

Resources