Regex to find and remove char with specific pattern - c#

I have this string
This is mail#mail.text #1 but page is #001#
(001 is variable, ex 01a or 021 etc)
And I want to make it
This is mail#mail.text #1 but page is 001
With this ^#([0-9]{1,3})#\z i can find a string that starts with "#" and ends with "#" with max 3 chars inside but it doesn't match inside a entire text.

You need to remove ^ (start of the string anchor) and replace the match with the contents of Group 1 using $1 backreference:
var str = "This is mail#mail.text #1 but page is #001#";
var result = Regex.Replace(str, #"#([0-9]{1,3})#\z", "$1");
See the regex demo
The #([0-9]{1,3})#\z pattern will find #, 1 to 3 digits (put inside a group), and then a # at the very end of string (\z).
Another variation: if the value may start with a digit and can be followed with an ASCII letter or digit, use
var result = Regex.Replace(str, #"#([0-9][0-9a-zA-Z]{0,2})#\z", "$1");
And if the value can just be alphanumeric, just use
var result = Regex.Replace(str, #"#([0-9a-zA-Z]{1,3})#\z", "$1");

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

How do I replace all instances of any special characters between each occurrence of a set of delimiters in a string?

I'm attempting to replace all instances of any special characters between each occurrence of a set of delimiters in a string. I believe the solution will include some combination of a regular expression match to retrieve the text between each set of delimiters and a regular expression replace to replace each offending character within the match with a space. Here’s what I have so far:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = "(~N3\\*)(.*?)(~N4\\*)";
string replacePattern = "[^0-9a-zA-Z ]?";
var matches = Regex.Matches(input, matchPattern);
foreach (Match match in matches)
{
match.Value = "~N3*" + Regex.Replace(match.Value, replacePattern, " ") + "~N4*";
}
MessageBox.Show(input);
I would expect the message box to show the following:
"***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave *Apt 6B~N4*Beverly Hills*CA*90210~DMG*"
Obviously this isn’t working because I can’t assign to the matched value inside the loop, but I hope you can follow my thought process. It is important that any characters which are not between the delimiters remain unchanged. Any direction or advice would be helpful. Thank you so much!
Use a Regex.Replace with a match evaluator where you may call the second Regex.Replace:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = #"(~N3\*)(.*?)(~N4\*)";
string replacePattern = "[^0-9a-zA-Z ]";
string res = Regex.Replace(input, matchPattern, m =>
string.Format("{0}{1}{2}",
m.Groups[1].Value,
Regex.Replace(m.Groups[2].Value, replacePattern, " "), // Here, you modify just inside the 1st regex matches
m.Groups[3].Value));
Console.Write(res); // Just to print the demo result
// => ***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave Apt 6B~N4*Beverly Hills*CA*90210~DMG*
See the C# demo
Actually, since ~N3* and ~N4* are literal strings, you may use a single capturing group in the pattern and then add those delimiters as hard-coded in the match evaluator, but it is up to you to decide what suits you best.

Regex pattern for splitting a delimited string in curly braces

I have the following string
{token1;token2;token3#somewhere.com;...;tokenn}
I need a Regex pattern, that would give a result in array of strings such as
token1
token2
token3#somewhere.com
...
...
...
tokenn
Would also appreciate a suggestion if can use the same pattern to confirm the format of the string, means string should start and end in curly braces and at least 2 values exist within the anchors.
You may use an anchored regex with named repeated capturing groups:
\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z
See the regex demo
\A - start of string
{ - a {
(?<val>[^;]*) - Group "val" capturing 0+ (due to * quantifier, if the value cannot be empty, use +) chars other than ;
(?:;(?<val>[^;]*))+ - 1 or more occurrences (thus, requiring at least 2 values inside {...}) of the sequence:
; - a semi-colon
(?<val>[^;]*) - Group "val" capturing 0+ chars other than ;
} - a literal }
\z - end of string.
.NET regex keeps each capture in a CaptureCollection stack, that is why all the values captured into "num" group can be accessed after a match is found.
C# demo:
var s = "{token1;token2;token3;...;tokenn}";
var pat = #"\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z";
var caps = new List<string>();
var result = Regex.Match(s, pat);
if (result.Success)
{
caps = result.Groups["val"].Captures.Cast<Capture>().Select(t=>t.Value).ToList();
}
Read it(similar to your problem): How to keep the delimiters of Regex.Split?.
For your RegEx testing use this: http://www.regexlib.com/RETester.aspx?AspxAutoDetectCookieSupport=1.
But RegEx is a very resource-intensive, slow operation.
In your case will be better to use the Split method of string class, for example : "token1;token2;token3;...;tokenn".Split(';');. It will return to you a collection of strings, that you want to obtain.

search string for everything before a set of characters in C#

I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.
You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}
You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it
You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead
This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.
^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()
Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}
You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple

Remove numbers in specific part of string (within parentheses)

I have a string Test123(45) and I want to remove the numbers within the parenthesis. How would I go about doing that?
So far I have tried the following:
string str = "Test123(45)";
string result = Regex.Replace(str, "(\\d)", string.Empty);
This however leads to the result Test(), when it should be Test123().
tis replaces all parenthesis, filled with digits by parenthesis
string str = "Test123(45)";
string result = Regex.Replace(str, #"\(\d+\)", "()");
\d+(?=[^(]*\))
Try this.Use with verbatinum mode #.The lookahead will make sure number have ) without ( before it.Replace by empty string.
See demo.
https://regex101.com/r/uE3cC4/4
string str = "Test123(45)";
string result = Regex.Replace(str, #"\(\d+\)", "()");
you can also try this way:
string str = "Test123(45)";
string[] delimiters ={#"("};;
string[] split = str.Split(delimiters, StringSplitOptions.None);
var b=split[0]+"()";
Remove a number that is in fact inside parentheses BUT not the parentheses and keep anything else inside them that is not a number with C# Regex.Replace means matching all parenthetical substrings with \([^()]+\) and then removing all digits inside the MatchEvaluator.
Here is a C# sample program:
var str = "Test123(45) and More (5 numbers inside parentheses 123)";
var result = Regex.Replace(str, #"\([^()]+\)", m => Regex.Replace(m.Value, #"\d+", string.Empty));
// => Test123() and More ( numbers inside parentheses )
To remove digits that are enclosed in ( and ) symbols, the ASh's \(\d+\) solution will work well: \( matches a literal (, \d+ matches 1+ digits, \) matches a literal ).

Categories