Regex replace 'whole' decimal numbers not followed by a certain string - c#

I want to replace "whole" decimal numbers not followed by pt with M.
For example, I need to replace 1, 12, and 36.7, but not 45.63 in the following.
string exp = "y=tan^-1(45.63pt)+12sin(-36.7)";
I have already tried
string newExp = Regex.Replace(exp, #"(\d+\.?\d*)(?!pt)", "M");
and it gives
"y=tan^-M(M3pt)+Msin(-M)"
It does make sense to me why it works like this, but I need to get
"y=tan^-M(45.63pt)+Msin(-M)"

The problem with the regex is that it is still matching a portion of the decimal value 45.63, up to the second-to-last decimal digit. One solution is to add a negative lookahead to the pattern to ensure that we only assert (?!pt) at the real end of every decimal value. This version is working:
string exp = "y=tan^-1(45.63pt)+12sin(-36.7)";
string newExp = Regex.Replace(exp, #"(\d+(?:\.\d+)?)(?![\d.])(?!pt)", "M");
Console.WriteLine(newExp);
This prints:
y=tan^-M(45.63pt)+Msin(-M)
Here is an explanation of the regex pattern used:
( match and capture:
\d+ one or more whole number digits
(?:\.\d+)? followed by an optional decimal component
) stop capturing
(?![\d.]) not being followed by another digit or dot
(?!pt) not followed by pt

Hi there if you need the out put as
"y=tan^-M(Mpt)+Msin(-M)"
then then newExp should be
string newExp = Regex.Replace(exp, #"(\d+\.?\d*)", "M");
if output is
"y=tan^-M(45.63pt)+Msin(-M)"
then newExp should be
string newExp = Regex.Replace(exp, #"(\d+\.?\d*)(?![.\d]*pt), "M");

I think you may assert the point in a string where there are no digits and dots directly followed by "pt":
\b(?![\d.]+pt)\d+(?:\.\d+)?
See the online demo
\b - Match a word-boundary.
(?![\d.]+pt) - Negative lookahead for 1+ digits and dots followed by "pt".
\d+ - 1+ digits.
(?: - Open non-capture group:
\.\d+ - A literal dot and 1+ digits.
)? - Close non-capture group and make it optional.
See the .NET demo

Related

How to extract text that lies between parentheses

I have string like (CAT,A)(DOG,C)(MOUSE,D)
i want to get the DOG value C using Regular expression.
i tried following
Match match = Regex.Match(rspData, #"\(DOG,*?\)");
if (match.Success)
Console.WriteLine(match.Value);
But not working could any one help me to solve this issue.
You can use
(?<=\(DOG,)\w+(?=\))?
(?<=\(DOG,)[^()]*(?=\))
See the regex demo.
Details:
(?<=\(DOG,) - a positive lookbehind that matches a location that is immediately preceded with (DOG, string
\w+ - one or more letters, digits, connector punctuation
[^()]* - zero or more chars other than ( and )
(?=\)) - a positive lookahead that matches a location that is immediately followed with ).
As an alternative you can also use a capture group:
\(DOG,([^()]*)\)
Explanation
\(DOG, Match (DOG,
([^()]*) Capture group 1, match 0+ chars other than ( or )
\) Match )
Regex demo | C# demo
String rspData = "(CAT,A)(DOG,C)(MOUSE,D)";
Match match = Regex.Match(rspData, #"\(DOG,([^()]*)\)");
if (match.Success)
Console.WriteLine(match.Groups[1].Value);
}
Output
C

Regex pattern not working on my C# code however it works on an online tester

I want to extract the double value from the string that contains a specific keyword. For example:
Amount : USD 3,747,190.67
I need to extract the value "3,747,190.67" from the string above using the keyword Amount, for that I tested this pattern in different online Regex testers and it works:
(?<=\bAmount.*)(\d+\,*\.*)*
However it doesn't work on my C# code:
if (type == typeof(double))
{
double doubleVal = 0;
pattern = #"(?<=\bAmount.*)(\d+\,*\.*)*";
matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
return doubleVal;
}
This one works:
(?<=\bAmount.*)\d+(,\d+)*(\.\d+)?
(?<=\bAmount.*) the look behind
\d+                      leading digits (at least one digit)
(,\d+)*               thousands groups (zero or more times)
(\.\d+)?             decimals (? = optional)
Note that the regex tester says "9 matches found" for your pattern. For my pattern it says "1 match found".
The problem with your pattern is that its second part (\d+\,*\.*)* can be empty because of the * at the end. The * quantifier means zero, one or more repetitions. Therefore, the look-behind finds 8 empty entries between Amount and the number. The last of the 9 matches is the number. You can correct it by replacing the * with a +. See: regextester with *, regextester with +. You can also test it with "your" tester and switch to the table to see the detailed results.
My solution does not allow consecutive commas or points but allows numbers without thousands groups or a decimal part.
The lookbehind (?<=\bAmount.*) is always true in the example data after Amount.
The first 7 matches are empty, as (\d+\,*\.*)* can not consume a character where there is no digit, but it matches at the position as the quantifier * matches 0 or more times.
See this screenshot of the matches:
You might use
(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b
(?<=\bAmount\D*) Positive lookbehind, assert Amount to the left followed by optional non digits
\d{1,3} Match 1-3 digits
(?:,\d{3})* Optionally repeat , and 3 digits
(?:\.\d{1,2})? Optionally match . and 1 or 2 digits
\b A word boundary
See a .NET regex demo
For example
double doubleVal = 0;
var pattern = #"(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b";
var textToParse = "Amount : USD 3,747,190.67";
var matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
Console.WriteLine(doubleVal);
Output
3747190.67
You can omit the word boundaries if needed if a partial match is also valid
(?<=Amount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?

Build a regex that does not contain the first and last character you are looking for in the match

I have the following problem.
This is what the regex looks like:
var regexTest = new Regex(#"'\d.*\d#");
This is what the string looks like:
var text = "dsadsadsadsa('1.222222#dsadsa'";
That is the result of what I would like to have:
1.222222
That's the result I'm getting right now ...:
'1.222222#
You want to extract the float number in between ' and ", use
var text = "dsadsadsadsa('1.222222#dsadsa'";
var regexTest = new Regex(#"'(\d+\.\d+)#");
var m = regexTest.Match(text);
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
Here, (\d+\.\d+) captures any 1+ digits, . and then 1+ digits into Group 1 that you may access using match.Groups[1].Value. However, only access that value if there was a match, or you will get an exception (see m.Success part in my demo snippet).
See the regex demo:
Just enclose the part you want to get in parentheses, so that you can get it as a group:
var regexTest = new Regex(#"'(\d.*\d)#");
-----------------------------^------^----
In '\d.*\d# you are are matching ' followed by a digit, any character 0+ times followed by a digit. That would match '1.222222# but also for example '1.A2# because of the .*
To don't match the ' and the # you could use a positive lookahead and a positive lookbehind to assert that they are there. If you only want to match digits then the .* could be left out.
(?<=')\d+\.\d+(?=#)
Regex demo

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.
You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208
Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();
You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".
You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

Retrieving digits from a string with a certain pattern

I have a string value of:
"Drop 1.0.2.34 - Compatible with core revision 123456"
And I am trying to get the value 1.0.2.34 with the full stops from the string.
I am using Regex to try and get it, but it returns with a "" value.
Ex.
Match match = Regex.Match(string, "([0-9]*[.][0-9]*)*");
if (match.Success)
{
string version = match.Captures[0].Value;
}
I think there is something small that I am missing because it does find a match in the string, but it doesn't have a value. Can someone please help?
Your regex matches an empty string, and since you are only looking for 1 match, it returns the empty string beofre the first char.
Use
[0-9]+(?:\.[0-9]+)+
See the regex demo
Details:
[0-9]+ - 1+ digits
(?:\.[0-9]+)+ - 1+ sequences of:
\. - a dot
[0-9]+ - 1+ digits.
C#:
string version = string.Empty;
Match match = Regex.Match(string, #"[0-9]+(?:\.[0-9]+)+");
if (match.Success)
{
version = match.Value;
}

Categories