Regular Expression for exactly N elements, not less, not more

Regular Expression for exactly N elements, not less, not more - c#

I am trying to figure out what the regex for finding matches with exactly N occurrences, not less, not more, of a group of characters. It looks like a pretty simple task, but I have not been able to find the proper regex for it.
More specifically, I want a regex that tells whether a given string contains exactly 3 digits - not less, not more.
I thought I would be able to achieve it simply by treating the 3 digits as a group and adding a quantifier of {1} after it, but it does not work.
Alternately, I expected [0-9][0-9][0-9] to work as well, but again it does not. Both regexes return the very same results, for the an input set as
1, 12, 123, 1234, 12345.
Below is a code sample that performs what I tried, as described above.
class Program
{
static void Main(string[] args)
{
List<Regex> regexes = new List<Regex> { new Regex("\\d{3}"), new Regex("[0-9][0-9][0-9]"), new Regex("(\\d{3}){1}") };
List<int> numbers = new List<int> { 1, 12, 123, 1234, 12345 };
foreach(Regex regex in regexes)
{
Console.WriteLine("Testing regex {0}", regex.ToString());
foreach (int number in numbers)
{
Console.WriteLine(string.Format("{0} {1}", number, regex.IsMatch(number.ToString()) ? "is a match" : "not a match"));
}
Console.WriteLine();
}
}
}
The output to the program above is:
Clearly, only 123 is a match, from all the input values.
What would be the regular expression that treats "123" alone as a match ?

All of your regular expressions are for 3 digits anywhere on the input. You are looking for:
new Regex("^\\d{3}$")
The ^ matches the beginning of the input, and the $ matches the end of the input. So this regular expression states, "From the beginning, there must be three digits, then expect the end."

You should prefix with ^ to indicate the start of the string and $ to indicate its end. See http://regexr.com/3be8e for a working example.

You should be looking for n characters followed by a non-character. So, if you are looking for digits, you should be looking for n digits followed by a non-digit. Make sure that you precede the regex by a non-digit as well.

Related

REGEX Matching string nonconsecutively

I'm trying to understand how to match a specific string that's held within an array (This string will always be 3 characters long, ex: 123, 568, 458 etc) and I would match that string to a longer string of characters that could be in any order (9841273 for example). Is it possible to check that at least 2 of the 3 characters in the string match (in this example) strMoves? Please see my code below for clarification.
private readonly string[] strSolutions = new string[8] { "123", "159", "147", "258", "357", "369", "456", "789" };
Private Static string strMoves = "1823742"
foreach (string strResult in strSolutions)
{
Regex rgxMain = new Regex("[" + strMoves + "]{2}");
if (rgxMain.IsMatch(strResult))
{
MessageBox.Show(strResult);
}
}
The portion where I have designated "{2}" in Regex is where I expected the result to check for at least 2 matching characters, but my logic is definitely flawed. It will return true IF the two characters are in consecutive order as compared to the string in strResult. If it's not in the correct order it will return false. I'm going to continue to research on this but if anyone has ideas on where to look in Microsoft's documentation, that would be greatly appreciated!
Correct order where it would return true: "144257" when matched to "123"
incorrect order: "35718" when matched to "123"
The 3 is before the 1, so it won't match.

You can use the following solution if you need to find at least two different not necessarily consecutive chars from a specified set in a longer string:
new Regex($#"([{strMoves}]).*(?!\1)[{strMoves}]", RegexOptions.Singleline)
It will look like
([1823742]).*(?!\1)[1823742]
See the regex demo.
Pattern details:
([1823742]) - Capturing group 1: one of the chars in the character class
.* - any zero or more chars as many as possible (due to RegexOptions.Singleline, . matches any char including newline chars)
(?!\1) - a negative lookahead that fails the match if the next char is a starting point of the value stored in the Group 1 memory buffer (since it is a single char here, the next char should not equal the text in Group 1, one of the specified digits)
[1823742] - one of the chars in the character class.

Extract phone numbers and exclude extraneous characters

I'm trying to create a regex which will extract a complete phone number from a string (which is the only thing in the string) but leaving out any cruft like decorative brackets, etc.
The pattern I have mostly appears to work, but returns a list of matches - whereas I want it to return the phone number with the characters removed. Unfortunately, it completely fails if I add the start and end of line matchers...
^(?!\(\d+\)\s*){1}(?:[\+\d\s]*)$
Without the ^ and $ this matches the following numbers:
12345-678-901 returns three groups: 12345 678 901
+44-123-4567-8901 returns four groups: +44 123 4567 8901
(+48) 123 456 7890 returns four groups: +48 123 456 7890
How can I get the groups to be returned as a single, joined up whole?
Other than that, the only change I would like to include is to return nothing if there are any non-numeric, non-bracket, non-+ characters anywhere. So, this should fail:
(+48) 123 burger 7890

I'd keep it simple, makes it more readable and maintainable:
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return Regex.Replace(messynumber, "[^0-9+]", "");
}
If any alphameric characters are present (extend this range if you wish) return blank else replace every char that is not 0-9 or +, with nothing. This produces output like 0123456789 and +481234567 with all the brackets, spaces and hyphens etc removed too. If you want to keep those in the output, add them to the Regex
Side note: It's not immediately clear or me what you think is "cruft" that should be stripped (non a-z?) and what you think is "cruft" that should cause blank (a-z?). I struggled with this because you said (paraphrase) "non digit, non bracket, non plus should cause blank" but earlier in your examples your processing permitted numbers that had hyphens and also spaces - being strictly demanding of spec hyphens/spaces would be "cruft that causes the whole thing to return blank" too
I've assumed that it's lowercase chars from the "burger" example but as noted you can extend the range in the IF part should you need to include other chars that return blank
If you have a lot of them to do maybe pre compile a regex as a class level variable and use it in the method:
private Regex _strip = new Regex( "[^0-9+]", RegexOptions.Compiled);
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return _strip.Replace(messynumber, "");
}
...
for(int x = 0; x < millionStrArray.Length; x++)
millionStrArray[x] = CleanPhoneNumber(millionStrArray[x], "");
I don't think you'll gain much from compiling the IsMatch one but you could try it in a similar pattern
Other options exist if you're avoiding regex, you cold even do it using LINQ, or looping on char arrays, stringbuilders etc. Regex is probably the easiest in terms of short maintainable code

The strategy here is to use a look ahead and kick out (fail) a match if word characters are found.
Then when there are no characters, it then captures the + and all numbers into a match group named "Phone". We then extract that from the match's "Phone" capture group and combine as such:
string pattern = #"
^
(?=[\W\d+\s]+\Z) # Only allows Non Words, decimals and spaces; stop match if letters found
(?<Phone>\+?) # If a plus found at the beginning; allow it
( # Group begin
(?:\W*) # Match but don't *capture* any non numbers
(?<Phone>[\d]+) # Put the numbers in.
)+ # 1 to many numbers.
";
var number = "+44-123-33-8901";
var phoneNumber =
string.Join(string.Empty,
Regex.Match(number,
pattern,
RegexOptions.IgnorePatternWhitespace // Allows us to comment the pattern
).Groups["Phone"]
.Captures
.OfType<Capture>()
.Select(cp => cp.Value));
// phoneNumber is `+44123338901`
If one looks a the match structure, the data it houses is this:
Match #0
[0]: +44-123-33-8901
["1"] → [1]: -8901
→1 Captures: 44, -123, -33, -8901
["Phone"] → [2]: 8901
→2 Captures: +, 44, 123, 33, 8901
As you can see match[0] contains the whole match, but we only need the captures under the "Phone" group. With those captures { +, 44, 123, 33, 8901 } we now can bring them all back together by the string.Join.

C#: Remove Excess Text From String

Okay, so after looking around here on SO, I have found a solution that meets about 95% of my requirement, although I believe it may need to be redone at this point.
ISSUE
Say I have a value range supplied as "1000 - 1009 ABC1 ABC SOMETHING ELSE" where I just need the 1000 - 1009 part. I need to be able to remove excess characters from the string supplied, even if they truly are accepted characters, but only if they are part of secondary strings with text. (Sorry if that description seems odd, my mind isn't full power today.)
CURRENT SOLUTION
I currently have a simple method utilizing Linq to return only accepted characters, however this will return "1000 - 10091" which is not the range I am needing. I've thought about looping through the strings individual characters and comparing to previous characters as I go using IsDigit and IsLetter to my advantage, but then comes the issue of replacing the unacceptable characters or removing them. I think if I gave it a day or two I could figure it out with a clear mind, but it needs to be done by the end of the day, and I am banging my head against the keyboard.
void RemoveExcessText(ref string val) {
string allowedChars = "0123456789-+>";
val = new string(val.Where(c => allowedChars.Contains(c)).ToArray());
}
// Alternatively?
char previousChar = ' ';
for (int i = 0; i < val.Length; i++) {
if (char.IsLetter(val[i])) {
previousChar = val[i];
val.Remove(i, 1);
} else if (char.IsDigit(val[i])) {
if (char.IsLetter(previousChar)) {
val.Remove(i, 1);
}
}
}
But how do I calculate white space and leave in the +, -, and > charactrers? I am losing my mind on this one today.

Why not use a regular expression?
Regex.Match("1000 - 1009 ABC1 ABC SOMETHING ELSE", #"^(\d+)([\s\-]+)(\d+)");
Should give you what you want
I made a fiddle

You use a regular expression with a capturing group:
Regex r = new Regex("^(?<v>[-0-9 ]+?)");
This means "from the start of the input string (^) match [0 to 9 or space or hyphen] and keep going for as many occurrences of these characters as are available (+?) and store it into variable v (?)"
We get it out like this:
r.Matches(input)[0].Groups["v"].Value
Note though that if the input string doesn't match, the match collection will be 0 long and a call to [0] will crash. To this end you might want to robust it up with some extra error checking:
MatchCollection mc = r.Matches(input);
if(mc.Length > 0)
MessageBox.Show(mc[0].Groups["v"].Value;

You could match this with a regular expression. \d{1,4} means match a decimal digit at least once up to 4 times. Followed by space, hyphen, space, and 1 to 4 digits again, then anything else. Only the part inside parenthesis is output in your results.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var pattern = #"(^\d{1,4} - \d{1,4}).*";
string input = ("1000 - 1009 ABC1 ABC SOMETHING ELSE");
string replacement = "$1";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
https://dotnetfiddle.net/cZGlX4

Regex to match all Romanian phone numbers

I searched the whole google to find some ways to verify if the phone number is Romanian but didn't found anything that helps me...
I want a Regex validator for the following numbers format:
074xxxxxxx
075xxxxxxx
076xxxxxxx
078xxxxxxx
072xxxxxxx
077xxxxxxx
0251xxxxxx
0351xxxxxx
This is the regex that I've made, but it is not working:
{ "Romania", new Regex("(/^(?:(?:(?:00\\s?|\\+)40\\s?|0)(?:7\\d{2}\\s?\\d{3}\\s?\\d{3}|(21|31)\\d{1}\\s?\\d{3}\\s?\\d{3}|((2|3)[3-7]\\d{1})\\s?\\d$)")}
It doesn't validate the correct numbers format.
More details:
If the number begins with other than the initial ones that I've added, then that number is not valid.
The x should contain any number, but there should not be the same number..like 0000000 1111111 etc.
It can also have the following format (but not mandatory): (072)xxxxxxx
Is there any way of doing this?
I want to implement this to store these numbers in database and check if their format is Romanian.
This is the code where I need to add the regex expression...there should be a new Regex named "Romanian"
static IDictionary<string, Regex> countryRegex = new Dictionary<string, Regex>()
{
{ "USA", new Regex("^[2-9]\\d{2}-\\d{3}-\\d{4}$")},
{ "UK", new Regex("(^1300\\d{6}$)|(^1800|1900|1902\\d{6}$)|(^0[2|3|7|8]{1}[0-9]{8}$)|(^13\\d{4}$)|(^04\\d{2,3}\\d{6}$)")},
{ "Netherlands", new Regex("(^\\+[0-9]{2}|^\\+[0-9]{2}\\(0\\)|^\\(\\+[0-9]{2}\\)\\(0\\)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9\\-\\s]{10}$)")},
};

If I understand the rules correctly, this pattern should work:
^(?<paren>\()?0(?:(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6}|(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5})$
So, you could add it to your code like this:
static IDictionary<string, Regex> countryRegex = new Dictionary<string, Regex>()
{
{ "USA", new Regex("^[2-9]\\d{2}-\\d{3}-\\d{4}$")},
{ "UK", new Regex("(^1300\\d{6}$)|(^1800|1900|1902\\d{6}$)|(^0[2|3|7|8]{1}[0-9]{8}$)|(^13\\d{4}$)|(^04\\d{2,3}\\d{6}$)")},
{ "Netherlands", new Regex("(^\\+[0-9]{2}|^\\+[0-9]{2}\\(0\\)|^\\(\\+[0-9]{2}\\)\\(0\\)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9\\-\\s]{10}$)")},
{ "Romania", new RegEx(#"^(?<paren>\()?0(?:(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6}|(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5})$")}
};
Here is the meaning of the pattern:
^ - Matches must start at the beginning of the input string
(?<paren>\()? - Optionally matches a ( character. If it is there, it captures it in a group named paren
0 - The number must start with a single 0
(?: - Begins an non-capturing group for the purpose of matching one of two different formats
(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6} - The first format
(?:72|74|75|76|77|78) - The next two digits must be 72, 74, 75, 76, 77, or 78
(?(paren)\)) - If the opening ( exists, then there must be a closing ) here
(?<first>\d) - Matches just the first of the ending seven digits and captures it in a group named first
(?!\k<first>{6}) - A negative look-ahead which ensures that the remaining six digits are not the same as the first one
\d{6} - Matches the remaining six digits
| - The or operator
(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5} - The second format
(?:251|351) - The next three digits must be 251 or 351.
(?(paren)\)) - If the opening ( exists, then there must be a closing ) here
(?<first>\d) - Matches just the first of the ending six digits and captures it in a group named first
(?!\k<first>{5}) - A negative look-ahead which ensures that the remaining five digits are not the same as the first one
\d{5} - Matches the remaining five digits
) - Ends the non-capturing group which specified the two potential formats
$ - The match must go all the way to the of the input string

Try this one: ^(?=0[723][2-8]\d{7})(?!.*(.)\1{2,}).{10}$ - The negative lookahead (?!...) is testing the repeating characters
I use http://regexr.com/ to test

This match your example:
0(([7][456728])|([23]51)).*

Regex for numbers only

I haven't used regular expressions at all, so I'm having difficulty troubleshooting. I want the regex to match only when the contained string is all numbers; but with the two examples below it is matching a string that contains all numbers plus an equals sign like "1234=4321". I'm sure there's a way to change this behavior, but as I said, I've never really done much with regular expressions.
string compare = "1234=4321";
Regex regex = new Regex(#"[\d]");
if (regex.IsMatch(compare))
{
//true
}
regex = new Regex("[0-9]");
if (regex.IsMatch(compare))
{
//true
}
In case it matters, I'm using C# and .NET2.0.

Use the beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
Note that "\d" will match [0-9] and other digit characters like the Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩. Use "^[0-9]+$" to restrict matches to just the Arabic numerals 0 - 9.
If you need to include any numeric representations other than just digits (like decimal values for starters), then see #tchrist's comprehensive guide to parsing numbers with regular expressions.

Your regex will match anything that contains a number, you want to use anchors to match the whole string and then match one or more numbers:
regex = new Regex("^[0-9]+$");
The ^ will anchor the beginning of the string, the $ will anchor the end of the string, and the + will match one or more of what precedes it (a number in this case).

If you need to tolerate decimal point and thousand marker
var regex = new Regex(#"^-?[0-9][0-9,\.]+$");
You will need a "-", if the number can go negative.

This works with integers and decimal numbers. It doesn't match if the number has the coma thousand separator ,
"^-?\\d*(\\.\\d+)?$"
some strings that matches with this:
894
923.21
76876876
.32
-894
-923.21
-76876876
-.32
some strings that doesn't:
hello
9bye
hello9bye
888,323
5,434.3
-8,336.09
87078.

It is matching because it is finding "a match" not a match of the full string. You can fix this by changing your regexp to specifically look for the beginning and end of the string.
^\d+$

Perhaps my method will help you.
public static bool IsNumber(string s)
{
return s.All(char.IsDigit);
}

If you need to check if all the digits are number (0-9) or not,
^[0-9]+$
Matches
1425
0142
0
1
And does not match
154a25
1234=3254

Sorry for ugly formatting.
For any number of digits:
[0-9]*
For one or more digit:
[0-9]+

^\d+$, which is "start of string", "1 or more digits", "end of string" in English.

Here is my working one:
^(-?[1-9]+\\d*([.]\\d+)?)$|^(-?0[.]\\d*[1-9]+)$|^0$
And some tests
Positive tests:
string []goodNumbers={"3","-3","0","0.0","1.0","0.1","0.0001","-555","94549870965"};
Negative tests:
string []badNums={"a",""," ","-","001","-00.2","000.5",".3","3."," -1","--1","-.1","-0"};
Checked not only for C#, but also with Java, Javascript and PHP

Use beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.

While non of the above solutions was fitting my purpose, this worked for me.
var pattern = #"^(-?[1-9]+\d*([.]\d+)?)$|^(-?0[.]\d*[1-9]+)$|^0$|^0.0$";
return Regex.Match(value, pattern, RegexOptions.IgnoreCase).Success;
Example of valid values:
"3",
"-3",
"0",
"0.0",
"1.0",
"0.7",
"690.7",
"0.0001",
"-555",
"945465464654"
Example of not valid values:
"a",
"",
" ",
".",
"-",
"001",
"00.2",
"000.5",
".3",
"3.",
" -1",
"--1",
"-.1",
"-0",
"00099",
"099"

Another way: If you like to match international numbers such as Persian or Arabic, so you can use following expression:
Regex = new Regex(#"^[\p{N}]+$");
To match literal period character use:
Regex = new Regex(#"^[\p{N}\.]+$");

Regex for integer and floating point numbers:
^[+-]?\d*\.\d+$|^[+-]?\d+(\.\d*)?$
A number can start with a period (without leading digits(s)),
and a number can end with a period (without trailing digits(s)).
Above regex will recognize both as correct numbers.
A . (period) itself without any digits is not a correct number.
That's why we need two regex parts there (separated with a "|").
Hope this helps.

I think that this one is the simplest one and it accepts European and USA way of writing numbers e.g. USA 10,555.12 European 10.555,12
Also this one does not allow several commas or dots one after each other e.g. 10..22 or 10,.22
In addition to this numbers like .55 or ,55 would pass. This may be handy.
^([,|.]?[0-9])+$

console.log(/^(0|[1-9][0-9]*)$/.test(3000)) // true

If you want to extract only numbers from a string the pattern "\d+" should help.

To check string is uint, ulong or contains only digits one .(dot) and digits
Sample inputs
Regex rx = new Regex(#"^([1-9]\d*(\.)\d*|0?(\.)\d*[1-9]\d*|[1-9]\d*)$");
string text = "12.0";
var result = rx.IsMatch(text);
Console.WriteLine(result);
Samples
123 => True
123.1 => True
0.123 => True
.123 => True
0.2 => True
3452.434.43=> False
2342f43.34 => False
svasad.324 => False
3215.afa => False

The following regex accepts only numbers (also floating point) in both English and Arabic (Persian) languages (just like Windows calculator):
^((([0\u0660\u06F0]|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+)(\.)[0-9\u0660-\u0669\u06F0-\u06F9]+)|(([0\u0660\u06F0]?|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+))|\b)$
The above regex accepts the following patterns:
11
1.2
0.3
۱۲
۱.۳
۰.۲
۲.۷
The above regex doesn't accept the following patterns:
3.
.3
0..3
.۱۲

Regex regex = new Regex ("^[0-9]{1,4}=[0-9]{1,4]$")

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular Expression for exactly N elements, not less, not more - c#

You should prefix with ^ to indicate the start of the string and $ to indicate its end. See http://regexr.com/3be8e for a working example.

You should be looking for n characters followed by a non-character. So, if you are looking for digits, you should be looking for n digits followed by a non-digit. Make sure that you precede the regex by a non-digit as well.

Related

REGEX Matching string nonconsecutively

Extract phone numbers and exclude extraneous characters

C#: Remove Excess Text From String

Regex to match all Romanian phone numbers

Regex for numbers only

Categories

Resources