Regex replace all non-numeric characters except for certain character patterns - c#

I am having a hard time trying to figure out this regex pattern. I want to replace all non-numeric characters in a string except for certain alpha character patterns.
For example i am trying:
string str = "The sky is blue 323.05 lnp days of the year";
str = Regex.Replace(str, "(^blue|lnp|days)[^.0-9]", "", RegexOptions.IgnoreCase);
I would like it to return:
"blue 323.05 lnp days"
but I can't figure out how to get it to match the entire character pattern in the expression.

I'd suggest capturing what you need to keep and just matching what you need to remove:
var result = Regex.Replace(text, #"(\s*\b(?:blue|lnp|days)\b\s*)|[^.0-9]", "$1").Trim();
See the regex demo. Note that the eventual leading/trailing spaces will be trimmed with .Trim().
The regex means:
(\s*\b(?:blue|lnp|days)\b\s*) - Group 1 ($1):
\s* - 0+ whitespaces
\b(?:blue|lnp|days)\b - one of the three words as whole words
\s* - 0+ whitespaces
| - or
[^.0-9] - any char but . and ASCII digit.

Related

How to extract middle values in regex

I am trying to extract the fifth and sixth value present in the stream through regex.
The stream is
12,097.00 435.00 100.00 43,037.00 3,090.00 200.00 86.00 45,890.47 7,570.00 51,514.47
I want values 200.00 and 100.00.
I tried ^(?:\S+\s+\n?){3,3} but it's selecting the string from beginning.
Can anybody help me please in getting the values that are present in the middle?
Using a quantifier like {3,3} can be written as {3}, but note that in the example string the values 200.00 and 100.00 are not the 5th and the 6th value.
With your pattern you only get the values at the beginning as the anchor ^ asserts the start of the string.
To get the third and the sixth value, you could also use 2 capture groups by using a quantifier {2} for the parts in between.
^(?:\S+\s+){2}(\S+)(?:\s+\S+){2}\s+(\S+)
^ Start of string
(?:\S+\s+){2} Repeat 2 times matching non whitespace chars followed by whitespace char
(\S+) Capture group 1, match 1+ non whitespace chars
(?:\s+\S+){2}\s+ Repeat 2 times matching whitespace chars and non whitespace chars
(\S+) Capture group 2, match 1+ non whitespace chars
Regex demo
Certainly, if you have access to the code itself, it would be easier to split the string and get nth chunk by its index.
If you are limited to a regex, you can use
(?<=^(?:\S+\s+){2})\S+
(?<=^(?:\S+\s+){5})\S+
Or, if there can be leading whitespaces:
(?<=^\s*(?:\S+\s+){2})\S+
(?<=^\s*(?:\S+\s+){5})\S+
See a .NET regex demo.
Details:
(?<= - start of a positive lookbehind that requires the following sequence of patterns to appear immediately to the left of the current location:
^ - start of string
\s* - zero or more whitespaces
(?:\S+\s+){2} - two occurrences of 1+ non-whitespace chars followed with 1+ whitespace chars
) - end of the lookbehind
\S+ - one or more non-whitespace chars.

Alternate regex with -SDR?

I have the following regex in my c#:
(?<!\w)M20A\w+
Actual code:
string regex = $#"(?<!\w){prefix}\w+";
Notice the prefix var matches strings such as M20A and X50G.
It perfectly matches the following cases:
M20A0820
M20A1234
M20A7U8V
But now I got a new requirement from the business to match, for example:
M20A-SDR
It will be the prefix followed by the exact string "-SDR". Not just a dash followed by 3 alphanumerics, but literally "-SDR". The existing matches need to still work, but prefix + "-SDR" must also be matched.
What would be the regex that would match the following:
M20A0820
M20A1234
M20A7U8V
M20A-SDR
You may use
string regex = $#"(?<!\w){prefix}\w*(?:-SDR)?";
See the regex demo.
Or, to match as a whole word, you may use word boundaries:
string regex = $#"\b{prefix}\w*(?:-SDR)?\b";
See this regex demo
The \b word boundary at the start will work if all the values in prefix start with a word char, a letter, digit or _. The word boundary at the end will make sense if after -SDR, there can be no more word chars.
The (?:-SDR)? will match a -SDR string optonally.
Details
\b - word boundary
M20A - a literal string
\w* - 0+ word chars
(?:-SDR)? - a non-capturing group that matches 1 or 0 times (as there is a ? after it) an -SDR substring
\b - a word boundary.

Replace values between certain characters

I am trying to replace a values between OFFSET and ROWS in a string.
I'm using the below regex and it doesn't work.
string strValue = "OFFSET NUMBER ROWS"
string strIndex = "5";
strValue = Regex.Replace(strValue , #"(?<=OFFSET)(\w+?)(?=ROWS)", strIndex);
So my desired result will be like
OFFSET 5 ROWS
Can anyone help or suggest what's wrong with this regex as it doesn't replace values.
The regex related problem here is that you have not accounted for whitespace chars on both ends of the NUMBER. Add \s* or \s+ to account for them.
Use
string strValue = "OFFSET NUMBER ROWS";
string strIndex = "5";
strValue = Regex.Replace(strValue , #"(?<=OFFSET\s*)\w+(?=\s*ROWS)", strIndex);
Console.Write(strValue); // => OFFSET 5 ROWS
Here,
(?<=OFFSET\s*) is a positive lookbehind requiring OFFSET and 0+ whitespace chars immediately to the left of the current location
\w+ - 1+ word chars
(?=\s*ROWS) - is a positive lookahead requiring 0+ whitespace chars immediately to the right of the current location and then ROWS substring.
Alternatively, use capturing groups with backreferences in the replacement pattern:
strValue = Regex.Replace(strValue , #"(OFFSET\s*)\w+(\s*ROWS)", $"${{1}}{strIndex}$2");
See the C# online demo.
The variation of the solution with the capturing group is a bit tricky since the first backreference is followed with a digit, and thus you cannot use a regular $1 syntax, you must use an unambiguous form, ${1}.

Regex that removes the 2 trailing letters from a string not preceded with other letters

This is in C#. I've been bugging my head but not luck so far.
So for example
123456BVC --> 123456BVC (keep the same)
123456BV --> 123456 (remove trailing letters)
12345V -- > 12345V (keep the same)
12345 --> 12345 (keep the same)
ABC123AB --> ABC123 (remove trailing letters)
It can start with anything.
I've tried #".*[a-zA-Z]{2}$" but no luck
This is in C# so that I always return a string removing the two trailing letters if they do exist and are not preceded with another letter.
Match result = Regex.Match(mystring, pattern);
return result.Value;
Your #".*[a-zA-Z]{2}$" regex matches any 0+ characters other than a newline (as many as possible) and 2 ASCII letters at the end of the string. You do not check the context, so the 2 letters are matched regardless of what comes before them.
You need a regex that will match the last two letters not preceded with a letter:
(?<!\p{L})\p{L}{2}$
See this regex demo.
Details:
(?<!\p{L}) - fails the match if a letter (\p{L}) is found before the current position (you may use [a-zA-Z] if you only want to deal with ASCII letters)
\p{L}{2} - 2 letters
$ - end of string.
In C#, use
var result = Regex.Replace(mystring, #"(?<!\p{L})\p{L}{2}$", string.Empty);
If you're looking to remove those last two letters, you can simply do this:
string result = Regex.Replace(originalString, #"[A-Za-z]{2}$", string.Empty);
Remember that in regex $ means the end of the input or the string before a newline.

Regex search for string like "$12,56,45" using c#

I want it to search string like "$12,56,450" using Regex in c#, but it doesn't match the string
Here is my code:
string input="Total earn for the year $12,56,450";
string pattern = #"\b(?mi)($12,56,450)\b";
Regex regex = new Regex(pattern);
if (regex.Match(input).Success)
{
return true;
}
This Regex will do the job, (?mi)(\$\d{2},\d{2},\d{3}), and here's a Regex 101 to prove it.
Now let's break it down a little:
\$ matches the literal $ at the beginning of the string
\d{2} matches any two digits
, matches the literal ,
\d{2} matches any two digits
, matches the literal ,
\d{3} matches any three digits
Now, for the purposes of the demonstration I removed the word boundaries, \b, but I'm also pretty confident you don't need them anyway. See, word boundaries aren't generally necessary for such a finite string match. Consider their definition:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
You need to escape $ and some other special regex caracters.
try this #"\b(?mi)(\$12,56,450)\b";
if you want you can use \d to match a digit, and use \d{2,3} to match a digit with size 2 or 3.

Categories