How to extract contact numbers from long description field? - c#

This is my long input string which contains contact number in between this string like below:
sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890
sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890
Now i want to Extract all input numbers like:
123-456-7890
(123) 456-7890
123 456 7890
123.456.7890
+91 (123) 456-7890
I want to store all this number in array.
This is what i have tried but getting only 2 numbers only:
string pattern = #"^\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*$";
Regex reg = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var a = txt.Split();
List < string > list = new List < string > ();
foreach(var item in a) {
if (reg.IsMatch(item)) {
list.Add(item);
}
}
Can anybody help me with this??

Do not use Split for this.
Just go through the matches and get their Groups[0].Value, should be something like this:
foreach (var m in MyRegex.Match(myInput).Matches)
Console.WriteLine(m.Groups[0].Value);
Tested on regexhero:
Regex: \s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?:[ ]*x(\d+))?\s*
Input: sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890 sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890
Output: 5 matches
123-456-7890
(123) 456-7890
123 456 7890
123.456.7890
+91 (123) 456-7890
edit: regexhero didn't like the space in the last group, had to replace it with [ ].

Try to use regex directly on a String, like:
using System.IO;
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Program
{
static void Main()
{
Regex regex = new Regex(#"\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*");
Match match = regex.Match("sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890 sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890");
List < string > list = new List < string > ();
while (match.Success)
{
list.Add(match.Value);
match = match.NextMatch();
}
list.ForEach(Console.WriteLine);
}
}

You are getting two numbers because split() by default uses space as delimiter.

Try this tested code.
static void Main(string[] args)
{
string txt = "sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890 sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890";
Regex regex = new Regex(#"\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*");
List<string> list = new List<string>();
foreach (var item in regex.Matches(txt))
{
list.Add(item.ToString());
Console.WriteLine(item);
}
Console.ReadLine();
}

Related

Regex to split and ignore brackets

I need to split by comma in the text but the text also has a comma inside brackets which need to be ignored
Input text : Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.
Expected output:
Selectroasted peanuts
Sugars (sugar, fancymolasses)
Hydrogenatedvegetable oil (cottonseed and rapeseed oil)
Salt
MyCode
string pattern = #"\s*(?:""[^""]*""|\([^)]*\)|[^, ]+)";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("{0}", m.Value);
}
The output I am getting:
Selectroasted
peanuts
Sugars
(sugar, fancymolasses)
Hydrogenatedvegetable
oil
(cottonseed and rapeseed oil)
Salt
Please help.
You can use
string pattern = #"(?:""[^""]*""|\([^()]*\)|[^,])+";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.";
foreach (Match m in Regex.Matches(input.TrimEnd(new[] {'!', '?', '.', '…'}), pattern))
{
Console.WriteLine("{0}", m.Value);
}
// => Selectroasted peanuts
// Sugars (sugar, fancymolasses)
// Hydrogenatedvegetable oil (cottonseed and rapeseed oil)
// Salt
See the C# demo. See the regex demo, too. It matches one or more occurrences of
"[^"]*" - ", zero or more chars other than " and then a "
| - or
\([^()]*\) - a (, then any zero or more chars other than ( and ) and then a ) char
| - or
[^,] - a char other than a ,.
Note the .TrimEnd(new[] {'!', '?', '.', '…'}) part in the code snippet is meant to remove the trailing sentence punctuation, but if you can affort Salt. in the output, you can remove that part.

Mask the Phone no using Regex in c#

We have a Phone no field, which can be of maximum 9 digit and we need to mask the Phone no (which is basically a string) to show on UI as a mask value.
We have tried below code snippet :
var pattern="^(/d{2})(/d{3})(/d*)$";
var regExp=new Regex(pattern);
return regExp.Replace(value, "$1-$2-$3");
this snippet works for 123456789 and displays (123) 456-789, but for 12345 it displays () 12-345.
Could you please suggest what will be the best suitable option here to display phone no as (123) 456-789 for 123456789 and (123) 45 for 12345 Phone no.
Try following :
string[] inputs = { "123456789", "12345" };
string pattern = #"^(?'one'\d{3})(?'two'\d{3})(?'three'\d{3})|(?'one'\d{3})(?'two'\d{2})";
string output = "";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern);
if (match.Groups["three"].Value == "")
output = string.Format("({0}) {1}", match.Groups["one"].Value, match.Groups["two"].Value);
else
output = string.Format("({0}) {1}-{2}", match.Groups["one"].Value, match.Groups["two"].Value, match.Groups["three"].Value);
}

how to get a part from a string with regular expression in C#

How do I get 'Name' value and 'Age' value?
Case1 Data:
aaa bbbb; Name=John Lewis; ccc ddd; Age=20;
Case2 Data:
AAA bbbb; Age=21;
My regular expression:
(?:Name=(?'name'[\w\b]+)\;)[\s\S]*Age=(?'age'\d+)\;?
But no way to get values(Name, Age).
Case 1: Only Name is optional
A regex for your case should account for an optional Name field.
(?:\bName=(?<Name>[^;]+).*?;\s+)?\bAge=(?<Age>\d+)
^^^ ^^
See the regex demo
If Name and Age data are on separate lines, use the regex with RegexOptions.Singleline flag.
Details:
(?:\bName=(?<Name>[^;]+).*?;\s+)? - an optional string of subpatterns
\bName= - a whole word "Name" + =
(?<Name>[^;]+) - Group "Name" capturing 1+ chars other than ;
.*? - any 0+ chars (other than newline if (?s) is not used)
; - a semi-colon
\s+ - 1 or more whitespaces
\bAge= - whole word Age + =
(?<Age>\d+) - Capturing group "Age" matching 1+ digits.
C# demo:
var strs = new[] { "aaa bbbb; Name=John Lewis; ccc ddd; Age=20;", "AAA bbbb; Age=21;" };
var pattern = #"(?:\bName=(?<Name>[^;]+).*?;\s+)?\bAge=(?<Age>\d+)";
foreach (var str in strs)
{
var result = Regex.Match(str, pattern);
if (result.Success)
Console.WriteLine("Name: \"{0}\", Age: \"{1}\"", result.Groups["Name"].Value, result.Groups["Age"].Value);
}
// => Name: "John Lewis", Age: "20"
// Name: "", Age: "21"
Case 2: Both Name and Age are optional
Use optional groups for both fields:
(?:\bName=(?<Name>[^;]+).*?;\s+)?(?:\bAge=(?<Age>\d+))?
^^^ ^^^^^ ^^
See this C# demo
var strs = new[] { "aaa bbbb; Name=John Lewis; ccc ddd; Age=20;", "AAA bbbb; Age=21;", "Irrelevant", "My Name=Wiktor; no more data" };
var pattern = #"(?:\bName=(?<Name>[^;]+).*?;\s+)?(?:\bAge=(?<Age>\d+))?";
foreach (var str in strs)
{
var results = Regex.Matches(str, pattern)
.Cast<Match>()
.Where(m => m.Groups["Name"].Success || m.Groups["Age"].Success)
.Select(p => new {key=p.Groups["Name"].Value, val=p.Groups["Age"].Value} )
.ToList();
foreach (var r in results)
Console.WriteLine("Name: \"{0}\", Age: \"{1}\"", r.key, r.val);
}
Else, if you want to use a more regex engine-friendly pattern, use an alternation with 2 branches where either of the two patterns are obligatory
(so as to avoid empty matches handling):
var strs = new[] { "aaa bbbb; Name=John Lewis; ccc ddd; Age=20;", "AAA bbbb; Age=21;", "Irrelevant", "My Name=Wiktor; no more data" };
var pattern = #"(?:\bName=(?<Name>[^;]+).*?;\s+)?\bAge=(?<Age>\d+)|\bName=(?<Name>[^;]+)(?:.*?;\s+\bAge=(?<Age>\d+))?";
foreach (var str in strs)
{
var result = Regex.Match(str, pattern);
if (result.Success)
{
Console.WriteLine("Name: \"{0}\", Age: \"{1}\"", result.Groups["Name"].Value, result.Groups["Age"].Value);
}
}
See this C# demo
The (?:\bName=(?<Name>[^;]+).*?;\s+)?\bAge=(?<Age>\d+)|\bName=(?<Name>[^;]+)(?:.*?;\s+\bAge=(?<Age>\d+))? has 2 branches:
(?:\bName=(?<Name>[^;]+).*?;\s+)?\bAge=(?<Age>\d+) - the Name part is optional, Age is compulsory
| - or
\bName=(?<Name>[^;]+)(?:.*?;\s+\bAge=(?<Age>\d+))? - the Age part is optional, Name is compulsory
Here is the regex you want.
(?<Key>\w+?)=(?<Value>(?:\w|\s)+);
This pattern captures key/value pairs into the named groups Key and Value.
This solution will fail to function correctly if a key name contains a space.
C# Usage
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Linq;
public class Test
{
public static void Main()
{
string input = #"aaa bbbb; Name=John Lewis; ccc ddd; Age=20;";
string pattern = #"(?<Key>\w+?)=(?<Value>(?:\w|\s)+);";
var matches = Regex.Matches(input, pattern);
foreach (var match in matches.OfType<Match>())
{
string key = match.Groups["Key"].Value;
string value = match.Groups["Value"].Value;
Console.WriteLine(key + ": " + value);
}
}
}
Output
Name: John Lewis
Age: 20
(?'name'\w+)[=]{1}(?'value'[\w ]+)
This Regex will give you name and value groups. In name you'll have Name or Age and in value John Lewis or 20
You can see how it works here.
...
.*?(?:Name=(?'name'[^;]*);)*.*?(?:Age=(?'age'\d*);)*

Regex mask replacement pattern for Phonenumber

I am new to regex
I have phone number regex pattern as (?\d{3})?-? *\d{3}-? *-?\d{4}
I am trying to mask phone number to display only last 4 digits.
I am using a function Regex.Replace("(123) 556-7890 ", "(?\d{3})?-? *\d{3}-? *-?\d{4}", "#")
Would some one help me what would be the replace pattern.
I need out put like for . Input can be XML or JSON
Input
PhoneNumber> (123) 556-7890 PhoneNumber>
Output
PhoneNumber>(XXX) XXX-7890 PhoneNumber>
Input
PhoneNumber> 123 556 7890 PhoneNumber>
Output
PhoneNumber>XXX XXX 7890 PhoneNumber>
Input
PhoneNumber> (123) 556- 7890 PhoneNumber>
Output
PhoneNumber>(XXX) XXX- 7890 PhoneNumber>
You can use a regex to match any digit that is not within the last 4 digits at the end of the string, and replace with an X:
\d(?!\d{0,3}$)
Explantion:
\d - match a digit and...
(?!\d{0,3}$) - fail the match if there is 0 to 3 digits right at the end of the string.
See the regex demo and this C# demo:
var data = new string[] {"(123) 556-7890", "123 556 7890", "(123) 556- 7890"};
foreach (var s in data) {
Console.WriteLine(Regex.Replace(s, #"\d(?!\d{0,3}$)", "X"));
Results:
(XXX) XXX-7890
XXX XXX 7890
(XXX) XXX- 7890
UPDATE showing how to use YOUR regex combined with mine
You just need to use your regex to match the phone numbers in the required format, and use mine to mask the digits inside a match evaluator:
var data = "I have this (123) 556-7890 phone number, followed with 123 556 7890, and (123) 556- 7890.";
var res = Regex.Replace(data, #"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}",
x => Regex.Replace(x.Value, #"\d(?!\d{0,3}$)", "X"));
Console.WriteLine(res);
See the IDEONE demo
NOTE that #"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}\b" or #"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}(?!\d)" might be better patterns to extract phone numbers as the final 4 digits cannot be followed by a word/non-digit character.
If it's always the same number of digits, do you need to do a replace? Surely just taking the last four digits and putting (XXX) XXX- in front of it would achieve the same result?
string masked = "(XXX) XXX-" + input.Substring(input.Length - 4);
Obviously you should still use your original regex to make sure it's a valid phone number first.
Couldn't you simply save the latest 4 digits with \d{4}$ and then mock-up the previous codes? :)
Try this
(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d+)
Substitution
x\2x\4x\6x\8x\10x\12\13
Regex demo
Input
Input
PhoneNumber> (123) 556-7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX-7890 PhoneNumber>
Input
PhoneNumber> 123 556 7890 PhoneNumber>
Output PhoneNumber>XXX XXX 7890 PhoneNumber>
Input
PhoneNumber> (123) 556- 7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX- 7890 PhoneNumber>
Output
Input
PhoneNumber> (xxx) xxx-7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX-7890 PhoneNumber>
Input
PhoneNumber> xxx xxx 7890 PhoneNumber>
Output PhoneNumber>XXX XXX 7890 PhoneNumber>
Input
PhoneNumber> (xxx) xxx- 7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX- 7890 PhoneNumber>
Hey check this function just give the phone number you want to mask as input to the function and it will return you the masked string
function maskPhoneNumber(phoneNumber) {
var regularExpresion = /\(?\d{3}\)?-? *\d{3}-? *-?\d{4}/g, // regular expression to test phone numbers
stringArray,
maskString,
lastString;
// Check if given input matches the phone number pattern
if(regularExpresion.test(phoneNumber)) {
// split phone number to an array of characters to manipulate string
stringArray = phoneNumber.split("");
/*
* splice the array after reversing so that last 4 digits are seperated
* Now stringArray will have last 4 digits and maskString will have remaining characters
*
*/
maskString = stringArray.reverse().splice(4);
// reverse and join the array to get last 4 digits without any change
lastString = stringArray.reverse().join("");
// now replace the remaining characters where digits are present with "X" and then join the array
// concat masked string with last 4 digits to get the required format
phoneNumber = maskString.reverse().join("").replace(/\d/g,"X") + lastString;
}
return phoneNumber;
}

C# regex search and replace all integer numbers surrounded by space

I apologize if this is a duplicated question. I haven't found a solution for my situation.
I would like to search for all integer numbers surrounded by space, and replace them with a space.
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"\s[0-9]+\s", " ", RegexOptions.Compiled);
sbDigits return value is, "123 789 fdsa jkl xyz x5x 456 123"
I would like the return value to be "fdsa jkl xyz x5x"
So, what is going on? How do I ensure that I am getting the duplicate number?
How about this:
Test string:
123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x
Regex:
(?<=\s|^)[\d]+(?=\s|$)
Working example:
http://regex101.com/r/tJ5rA6
C#:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"(?<=\s|^)[\d]+(?=\s|$)", " ", RegexOptions.Compiled);
Return value:
fdsa jkl xyz x5x 5x
String fixed = Regex.Replace(originalString, "\\s*\\d+\\s+", "");
Try the following regex:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString(), #"\s*[0-9]+\d\s*", " ", RegexOptions.Compiled));
Regex demo
You can use this:
search: #"( [0-9]+)(?=\1\b)"
replace: ""
If you add in word-breaks (\b) you can capture only 'digit words' (which is what it sounds like you want. And you can capture zero or more white space around the digits while not matching the numbers inside letters:
\s*\b\d+\b\s*
I don't know too much about Regex. But it can be done with a little LINQ:
var str = "123 789 fdsa jkl xyz x5x 456 123";
var parts = str.Split().Where(x => !x.All(char.IsDigit));
var result = string.Join(" ", parts); // fdsa jkl xyz x5x
What hapenned
Look what happens when you apply your regex, which matches a whitespace, any number of digits, and another whitespace:
"( 123 )123 456 789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 123 ", first fitting pattern
"( 123 )123( 456 )789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 456 ", because the first match "ate" the whitespace
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456 123 123"
// matches the first " 456 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
// matches " 123 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
So the regex only found " 123 ", " 456 ", " 456 " and " 123 ". You replaced these matches with a whitespace and this is what caused your output.
What you want to do
You want to match word boundaries with something that won't "eat" the word boundary (here, the whitespace). As suggested by many others,
\b\d+\b
will do the trick.

Categories