Parse the number with Regex with non capturing group

Parse the number with Regex with non capturing group - c#

I'm trying to parse phone number with regex. Exactly I want to get a string with phone number in it using function like this:
string phoneRegex = #"^([+]|00)(\d{2,12}(?:\s*-*)){1,5}$";
string formated = Regex.Match(e.Value.ToString(), phoneRegex).Value;
As you can see I'm trying to use non-capturing group (?:\s*-*) but I'm doing something wrong.
Expected resoult should be:
input (e.Value): +48 123 234 344 or +48 123234344 or +48 123-234-345
output: +48123234344
Thanks in advance for any suggestions.

Regex.Match will not alter the string for you; it will simply match it. If you have a phone number string and want to format it by removing unwanted characters, you will want to use the Regex.Replace method:
// pattern for matching anything that is not '+' or a decimal digit
string replaceRegex = #"[^+\d]";
string formated = Regex.Replace("+48 123 234 344", replaceRegex, string.Empty);
In my sample the phone number is hard-coded, but it's just for demonstration purposes.
As a side note; the regex that you have in your code sample above assumes that the country code is 2 digits; this may not be the case. The United States has a one digit code (1) and many countries have 3-digit codes (perhaps there are countries with more digits than that, as well?).

This should work:
Match m = Regex.Match(s, #"^([+]|00)\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$");
return String.Format("{0}{1}{2}{4}", m.Groups[1], m.Groups[2], m.Groups[3], m.Groups[3]);

Related

How to extract digits between two fixed strings in Arabic Language?

I have a string in the format:
خصم بقيمة 108 بتاريخ 31-01-2021
And I want to replace the digits between the words: بقيمة & بتاريخ with a "?" character.
And keep the digits in the date part of the string
I tried using this Regular Expression: (?<=بقيمة)(.*?)(?=بتاريخ)
Which works on https://regex101.com/
But when I implement it in C# in Regex.Replace function, it doesn't have any effect when I use the Arabic words:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=بقيمة)(.*?)(?=بتاريخ)", "?");
But it works if I use Latin letters:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=X)(.*?)(?=Y)", "?");
Is there anyway to make the function work with Arabic characters?
Or is there a better approach I can take to achieve the desired result? For example excluding the date part?

Since the needed digits (without "-"s) are bookended by spaces just use \s(\d+)\s.
var txt = "خصم بقيمة 108 بتاريخ 12-31-2021";
var pattern = #"\s(\d+)\s";
Console.WriteLine( Regex.Match(txt, pattern).Value ); // 108

How to distinguish hexadecimal number and decimal number in Regex expression?

My English skill is poor because i'm not a native English speaker.
If rude expression exists in this article, please understand.
I am using regex library of .Net Core.
I wanted to distinguish hexadecimal number and decimal number so I written a code to do this.
But the code was not operated so I extract core logic that I want and created the below code on Test project newly.
string identPattern = "(?<rwC>[_a-zA-Z][_a-zA-Z0-9]*)";
string hexaPattern = "(?<iAZ>[0x][0-9]+)";
string decimalPattern = "(?<oKZ>[0-9]+)";
string pattern = string.Format("{0}|{1}|{2}", identPattern, hexaPattern, decimalPattern);
foreach (var data in Regex.Matches("0x10;", pattern, RegexOptions.Multiline | RegexOptions.ExplicitCapture))
{
var matchData = data as Match;
}
If it executes the above code "0" of "0x10" string is matched.
This means "0x10" string is matched with decimalPattern ("(?[0-9]+)"). This is not result that I want.
I want that "0x10" string is matched with hexaPattern.
Why don't matched with hexaPattern? and How to solve this problem?
My test code and execute result is as below.
Thanks for reading.

There is an error is in your hex pattern. you should not surround 0x with [] as it means charset and will match only one character.
Try below for hex pattern
(?<iAZ>0x[0-9]+)
And if you don't want your decimal pattern to match your hex number, try adding ^ at the beginning of decimal pattern.

Extract phone numbers and exclude extraneous characters

I'm trying to create a regex which will extract a complete phone number from a string (which is the only thing in the string) but leaving out any cruft like decorative brackets, etc.
The pattern I have mostly appears to work, but returns a list of matches - whereas I want it to return the phone number with the characters removed. Unfortunately, it completely fails if I add the start and end of line matchers...
^(?!\(\d+\)\s*){1}(?:[\+\d\s]*)$
Without the ^ and $ this matches the following numbers:
12345-678-901 returns three groups: 12345 678 901
+44-123-4567-8901 returns four groups: +44 123 4567 8901
(+48) 123 456 7890 returns four groups: +48 123 456 7890
How can I get the groups to be returned as a single, joined up whole?
Other than that, the only change I would like to include is to return nothing if there are any non-numeric, non-bracket, non-+ characters anywhere. So, this should fail:
(+48) 123 burger 7890

I'd keep it simple, makes it more readable and maintainable:
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return Regex.Replace(messynumber, "[^0-9+]", "");
}
If any alphameric characters are present (extend this range if you wish) return blank else replace every char that is not 0-9 or +, with nothing. This produces output like 0123456789 and +481234567 with all the brackets, spaces and hyphens etc removed too. If you want to keep those in the output, add them to the Regex
Side note: It's not immediately clear or me what you think is "cruft" that should be stripped (non a-z?) and what you think is "cruft" that should cause blank (a-z?). I struggled with this because you said (paraphrase) "non digit, non bracket, non plus should cause blank" but earlier in your examples your processing permitted numbers that had hyphens and also spaces - being strictly demanding of spec hyphens/spaces would be "cruft that causes the whole thing to return blank" too
I've assumed that it's lowercase chars from the "burger" example but as noted you can extend the range in the IF part should you need to include other chars that return blank
If you have a lot of them to do maybe pre compile a regex as a class level variable and use it in the method:
private Regex _strip = new Regex( "[^0-9+]", RegexOptions.Compiled);
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return _strip.Replace(messynumber, "");
}
...
for(int x = 0; x < millionStrArray.Length; x++)
millionStrArray[x] = CleanPhoneNumber(millionStrArray[x], "");
I don't think you'll gain much from compiling the IsMatch one but you could try it in a similar pattern
Other options exist if you're avoiding regex, you cold even do it using LINQ, or looping on char arrays, stringbuilders etc. Regex is probably the easiest in terms of short maintainable code

The strategy here is to use a look ahead and kick out (fail) a match if word characters are found.
Then when there are no characters, it then captures the + and all numbers into a match group named "Phone". We then extract that from the match's "Phone" capture group and combine as such:
string pattern = #"
^
(?=[\W\d+\s]+\Z) # Only allows Non Words, decimals and spaces; stop match if letters found
(?<Phone>\+?) # If a plus found at the beginning; allow it
( # Group begin
(?:\W*) # Match but don't *capture* any non numbers
(?<Phone>[\d]+) # Put the numbers in.
)+ # 1 to many numbers.
";
var number = "+44-123-33-8901";
var phoneNumber =
string.Join(string.Empty,
Regex.Match(number,
pattern,
RegexOptions.IgnorePatternWhitespace // Allows us to comment the pattern
).Groups["Phone"]
.Captures
.OfType<Capture>()
.Select(cp => cp.Value));
// phoneNumber is `+44123338901`
If one looks a the match structure, the data it houses is this:
Match #0
[0]: +44-123-33-8901
["1"] → [1]: -8901
→1 Captures: 44, -123, -33, -8901
["Phone"] → [2]: 8901
→2 Captures: +, 44, 123, 33, 8901
As you can see match[0] contains the whole match, but we only need the captures under the "Phone" group. With those captures { +, 44, 123, 33, 8901 } we now can bring them all back together by the string.Join.

Extract numbers if string format matches

I want to check if an input string follows a pattern and if it does extract information from it.
My pattern is like this Episode 000 (Season 00). The 00s are numbers that can range from 0-9. Now I want to check if this input Episode 094 (Season 02) matches this pattern and because it does it should then extract those two numbers, so I end up with two integer variables 94 & 2:
string latestFile = "Episode 094 (Season 02)";
if (!Regex.IsMatch(latestFile, #"^(Episode)\s[0-9][0-9][0-9]\s\((Season)\s[0-9][0-9]\)$"))
return
int Episode = Int32.Parse(Regex.Match(latestFile, #"\d+").Value);
int Season = Int32.Parse(Regex.Match(latestFile, #"\d+").Value);
The first part where I check if the overall string matches the pattern works, but I think it can be improved. For the second part, where I actually extract the numbers I'm stuck and what I posted above obviously doesn't works, because it grabs all digits from the string. So if anyone of you could help me figure out how to only extract the three number characters after Episode and the two characters after Season that would be great.

^Episode (\d{1,3}) \(Season (\d{1,2})\)$
Captures the 2 numbers (even with length 1 to 3/2) and gives them back as a group.
You can go even further and name your groups:
^Episode (?<episode>\d{1,3}) \(Season (?<season>\d{1,2})\)$
and then call them.
Example for using groups:
string pattern = #"abc(?<firstGroup>\d{1,3})abc";
string input = "abc234abc";
Regex rgx = new Regex(pattern);
Match match = rgx.Match(input);
string result = match.Groups["firstGroup"].Value; //=> 234
You can see what the expressions mean and test them here

In your regex ^(Episode)\s[0-9][0-9][0-9]\s\((Season)\s[0-9][0-9]\)$ you are capturing Episode and Season in a capturing group, but what you actually want to capture is the digits. You could switch your capturing groups like this:
^Episode\s([0-9][0-9][0-9])\s\(Season\s([0-9][0-9])\)$
Matching 3 digits in this way [0-9][0-9][0-9] can be written as \d{3} and [0-9][0-9] as \d{2}.
That would look like ^Episode\s(\d{3})\s\(Season\s(\d{2})\)$
To match one or more digits you could use \d+.
The \s is a matches a whitespace character. You could use \s or a whitespace.
Your regex could look like:
^Episode (\d{3}) \(Season (\d{2})\)$
string latestFile = "Episode 094 (Season 02)";
GroupCollection groups = Regex.Match(latestFile, #"^Episode (\d{3}) \(Season (\d{2})\)$").Groups;
int Episode = Int32.Parse(groups[1].Value);
int Season = Int32.Parse(groups[2].Value);
Console.WriteLine(Episode);
Console.WriteLine(Season);
That would result in:
94
2
Demo C#

Regex to extract digits followed by specific word

Using Regex, I want to extract digits which are followed by a specific word.
The number of digits is not finite.
Sample input:
My address is 1234#abc.com and you can send SMS to me.
Expected Result.
1234
In this case, the specific word is #abc.com, and the digits followed by this word need to be extracted.

Use the regular expression groups : on MSDN.
In C#, try this :
string pattern = #"(\d+)#abc\.com";
string input = "My address is 15464684#abc.com and you can send SMS to me";
Match match = Regex.Match(input, pattern);
// Get the first named group.
Group group1 = match.Groups[1];
Console.WriteLine("Group 1 value: {0}", group1.Success ? group1.Value : "Empty");

You will need to match 1234#abc.com and use a grouping to extract the digits:
(\d+)\#abc.com

.* (\d+)#abc\.com .* should work.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parse the number with Regex with non capturing group - c#

This should work: Match m = Regex.Match(s, #"^([+]|00)\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$"); return String.Format("{0}{1}{2}{4}", m.Groups[1], m.Groups[2], m.Groups[3], m.Groups[3]);

Related

How to extract digits between two fixed strings in Arabic Language?

How to distinguish hexadecimal number and decimal number in Regex expression?

Extract phone numbers and exclude extraneous characters

Extract numbers if string format matches

Regex to extract digits followed by specific word

Categories

Resources