C# regex search and replace all integer numbers surrounded by space - c#

I apologize if this is a duplicated question. I haven't found a solution for my situation.
I would like to search for all integer numbers surrounded by space, and replace them with a space.
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"\s[0-9]+\s", " ", RegexOptions.Compiled);
sbDigits return value is, "123 789 fdsa jkl xyz x5x 456 123"
I would like the return value to be "fdsa jkl xyz x5x"
So, what is going on? How do I ensure that I am getting the duplicate number?

How about this:
Test string:
123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x
Regex:
(?<=\s|^)[\d]+(?=\s|$)
Working example:
http://regex101.com/r/tJ5rA6
C#:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"(?<=\s|^)[\d]+(?=\s|$)", " ", RegexOptions.Compiled);
Return value:
fdsa jkl xyz x5x 5x

String fixed = Regex.Replace(originalString, "\\s*\\d+\\s+", "");

Try the following regex:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString(), #"\s*[0-9]+\d\s*", " ", RegexOptions.Compiled));
Regex demo

You can use this:
search: #"( [0-9]+)(?=\1\b)"
replace: ""

If you add in word-breaks (\b) you can capture only 'digit words' (which is what it sounds like you want. And you can capture zero or more white space around the digits while not matching the numbers inside letters:
\s*\b\d+\b\s*

I don't know too much about Regex. But it can be done with a little LINQ:
var str = "123 789 fdsa jkl xyz x5x 456 123";
var parts = str.Split().Where(x => !x.All(char.IsDigit));
var result = string.Join(" ", parts); // fdsa jkl xyz x5x

What hapenned
Look what happens when you apply your regex, which matches a whitespace, any number of digits, and another whitespace:
"( 123 )123 456 789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 123 ", first fitting pattern
"( 123 )123( 456 )789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 456 ", because the first match "ate" the whitespace
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456 123 123"
// matches the first " 456 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
// matches " 123 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
So the regex only found " 123 ", " 456 ", " 456 " and " 123 ". You replaced these matches with a whitespace and this is what caused your output.
What you want to do
You want to match word boundaries with something that won't "eat" the word boundary (here, the whitespace). As suggested by many others,
\b\d+\b
will do the trick.

Related

How to find strings between two strings in c#

I'm getting a string with some patterns, like:
A 11 A 222222 B 333 A 44444 B 55 A 66666 B
How to get all the strings between A and B in the smallest area?
For example, "A 11 A 222222 B" result in " 222222 "
And the first example should result in:
222222
333
44444
55
66666
We can try searching for all regex matches in your input string which are situated between A and B, or vice-versa. Here is a regex pattern which uses lookarounds to do this:
(?<=\bA )\d+(?= B\b)|(?<=\bB )\d+(?= A\b)
Sample script:
string input = "A 11 A 222222 B 333 A 44444 B 55 A 66666 B";
var vals = Regex.Matches(input, #"(?<=\bA )\d+(?= B\b)|(?<=\bB )\d+(?= A\b)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
foreach (string val in vals)
{
Console.WriteLine(val);
}
This prints:
222222
333
44444
55
66666

Restart Regex indexing based on line content

I am trying to load a text file into a C# ContentBox and index the lines.
Currently the text file contains
Data
Company
Phone
Email
Company
Phone
Email
I have currently setup a Refex index to number all the lines in the text file on load.
string content = File.ReadAllText(file);
content = Regex.Replace(content, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
var index = 0;
content = Regex.Replace(content, "^", (Match m) => (index++).ToString().PadLeft(4, '0') + " ", RegexOptions.Multiline);
ContentBox.Text = content;
This outputs
0000 Data
0001 Company
0002 Phone
0003 Email
0004 Company
0005 Phone
0006 Email
What I need to do is be able to output the following into the ContentBox.
0000 Data
0001 Company
0002 Phone
0003 Email
0001 Company
0002 Phone
0003 Email
Can anyone assist me with this?
Your use of Regex is not needed. This code should give you the direction you need:
string[] content =
File
.ReadAllText(file)
.Select((x, n) => $"{(n == 0 ? 0 : (n - 1) % 3 + 1):0000} {x}")
.ToArray();
Here's an example of it working:
string[] source = #"Data
ABC Bakery
0123
abc#bakery.com
DEF Pets
0124
def#pets.com".Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
string[] content =
source
.Select((x, n) => $"{(n == 0 ? 0 : (n - 1) % 3 + 1):0000} {x}")
.ToArray();
That gives:
0000 Data
0001 ABC Bakery
0002 0123
0003 abc#bakery.com
0001 DEF Pets
0002 0124
0003 def#pets.com

Regex mask replacement pattern for Phonenumber

I am new to regex
I have phone number regex pattern as (?\d{3})?-? *\d{3}-? *-?\d{4}
I am trying to mask phone number to display only last 4 digits.
I am using a function Regex.Replace("(123) 556-7890 ", "(?\d{3})?-? *\d{3}-? *-?\d{4}", "#")
Would some one help me what would be the replace pattern.
I need out put like for . Input can be XML or JSON
Input
PhoneNumber> (123) 556-7890 PhoneNumber>
Output
PhoneNumber>(XXX) XXX-7890 PhoneNumber>
Input
PhoneNumber> 123 556 7890 PhoneNumber>
Output
PhoneNumber>XXX XXX 7890 PhoneNumber>
Input
PhoneNumber> (123) 556- 7890 PhoneNumber>
Output
PhoneNumber>(XXX) XXX- 7890 PhoneNumber>
You can use a regex to match any digit that is not within the last 4 digits at the end of the string, and replace with an X:
\d(?!\d{0,3}$)
Explantion:
\d - match a digit and...
(?!\d{0,3}$) - fail the match if there is 0 to 3 digits right at the end of the string.
See the regex demo and this C# demo:
var data = new string[] {"(123) 556-7890", "123 556 7890", "(123) 556- 7890"};
foreach (var s in data) {
Console.WriteLine(Regex.Replace(s, #"\d(?!\d{0,3}$)", "X"));
Results:
(XXX) XXX-7890
XXX XXX 7890
(XXX) XXX- 7890
UPDATE showing how to use YOUR regex combined with mine
You just need to use your regex to match the phone numbers in the required format, and use mine to mask the digits inside a match evaluator:
var data = "I have this (123) 556-7890 phone number, followed with 123 556 7890, and (123) 556- 7890.";
var res = Regex.Replace(data, #"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}",
x => Regex.Replace(x.Value, #"\d(?!\d{0,3}$)", "X"));
Console.WriteLine(res);
See the IDEONE demo
NOTE that #"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}\b" or #"\(?\d{3}\)?-? *\d{3}-? *-?\d{4}(?!\d)" might be better patterns to extract phone numbers as the final 4 digits cannot be followed by a word/non-digit character.
If it's always the same number of digits, do you need to do a replace? Surely just taking the last four digits and putting (XXX) XXX- in front of it would achieve the same result?
string masked = "(XXX) XXX-" + input.Substring(input.Length - 4);
Obviously you should still use your original regex to make sure it's a valid phone number first.
Couldn't you simply save the latest 4 digits with \d{4}$ and then mock-up the previous codes? :)
Try this
(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d)([() -]*)(\d+)
Substitution
x\2x\4x\6x\8x\10x\12\13
Regex demo
Input
Input
PhoneNumber> (123) 556-7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX-7890 PhoneNumber>
Input
PhoneNumber> 123 556 7890 PhoneNumber>
Output PhoneNumber>XXX XXX 7890 PhoneNumber>
Input
PhoneNumber> (123) 556- 7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX- 7890 PhoneNumber>
Output
Input
PhoneNumber> (xxx) xxx-7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX-7890 PhoneNumber>
Input
PhoneNumber> xxx xxx 7890 PhoneNumber>
Output PhoneNumber>XXX XXX 7890 PhoneNumber>
Input
PhoneNumber> (xxx) xxx- 7890 PhoneNumber>
Output PhoneNumber>(XXX) XXX- 7890 PhoneNumber>
Hey check this function just give the phone number you want to mask as input to the function and it will return you the masked string
function maskPhoneNumber(phoneNumber) {
var regularExpresion = /\(?\d{3}\)?-? *\d{3}-? *-?\d{4}/g, // regular expression to test phone numbers
stringArray,
maskString,
lastString;
// Check if given input matches the phone number pattern
if(regularExpresion.test(phoneNumber)) {
// split phone number to an array of characters to manipulate string
stringArray = phoneNumber.split("");
/*
* splice the array after reversing so that last 4 digits are seperated
* Now stringArray will have last 4 digits and maskString will have remaining characters
*
*/
maskString = stringArray.reverse().splice(4);
// reverse and join the array to get last 4 digits without any change
lastString = stringArray.reverse().join("");
// now replace the remaining characters where digits are present with "X" and then join the array
// concat masked string with last 4 digits to get the required format
phoneNumber = maskString.reverse().join("").replace(/\d/g,"X") + lastString;
}
return phoneNumber;
}

How to extract contact numbers from long description field?

This is my long input string which contains contact number in between this string like below:
sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890
sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890
Now i want to Extract all input numbers like:
123-456-7890
(123) 456-7890
123 456 7890
123.456.7890
+91 (123) 456-7890
I want to store all this number in array.
This is what i have tried but getting only 2 numbers only:
string pattern = #"^\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*$";
Regex reg = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var a = txt.Split();
List < string > list = new List < string > ();
foreach(var item in a) {
if (reg.IsMatch(item)) {
list.Add(item);
}
}
Can anybody help me with this??
Do not use Split for this.
Just go through the matches and get their Groups[0].Value, should be something like this:
foreach (var m in MyRegex.Match(myInput).Matches)
Console.WriteLine(m.Groups[0].Value);
Tested on regexhero:
Regex: \s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?:[ ]*x(\d+))?\s*
Input: sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890 sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890
Output: 5 matches
123-456-7890
(123) 456-7890
123 456 7890
123.456.7890
+91 (123) 456-7890
edit: regexhero didn't like the space in the last group, had to replace it with [ ].
Try to use regex directly on a String, like:
using System.IO;
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Program
{
static void Main()
{
Regex regex = new Regex(#"\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*");
Match match = regex.Match("sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890 sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890");
List < string > list = new List < string > ();
while (match.Success)
{
list.Add(match.Value);
match = match.NextMatch();
}
list.ForEach(Console.WriteLine);
}
}
You are getting two numbers because split() by default uses space as delimiter.
Try this tested code.
static void Main(string[] args)
{
string txt = "sgsdgsdgs 123-456-7890 sdgsdgs (123) 456-7890 sdgsdgsdg 123 456 7890 sdgsdgsdg 123.456.7890 sdfsdfsdfs +91 (123) 456-7890";
Regex regex = new Regex(#"\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*");
List<string> list = new List<string>();
foreach (var item in regex.Matches(txt))
{
list.Add(item.ToString());
Console.WriteLine(item);
}
Console.ReadLine();
}

How to replace multiple occurrences in single pass?

I have the following string:
abc
def
abc
xyz
pop
mmm
091
abc
I need to replace all occurrences of abc with the ones from array ["123", "456", "789"] so the final string will look like this:
123
def
456
xyz
pop
mmm
091
789
I would like to do it without iteration, with just single expression. How can I do it?
Here is a "single expression version":
edit: Delegate instead of Lambda for 3.5
string[] replaces = {"123","456","789"};
Regex regEx = new Regex("abc");
int index = 0;
string result = regEx.Replace(input, delegate(Match match) { return replaces[index++];} );
Test it here
do it without iteration, with just single expression
This example uses the static Regex.Replace Method (String, String, MatchEvaluator) which uses a MatchEvaluator Delegate (System.Text.RegularExpressions) that will replace match values from a a queue and returns a string as a result:
var data =
#"abc
def
abc
xyz
pop
mmm
091
abc";
var replacements = new Queue<string>(new[] {"123", "456", "789"});
string result = Regex.Replace(data,
"(abc)", // Each match will be replaced with a new
(mt) => // queue item; instead of a one string.
{
return replacements.Dequeue();
});
Result
123
def
456
xyz
pop
mmm
091
789
.Net 3.5 Delegate
whereas I am limited to 3.5.
Regex.Replace(data, "(abc)", delegate(Match match) { return replacements.Dequeue(); } )

Categories