Regex how can I merge execution - c#

I have the following buffer returning from a .textContent
Latitude
32,6549581304256
Longitude
-16,9288643331225
I fixed the whitespaces with
dwText = Regex.Replace( dwText, #"\s{2,}", "\n"); resulting in
Latitude
32,6549581304256
Longitude
-16,9288643331225
I then transformed this new output to my needs by
dwText = Regex.Replace( dwText, #"(Latitude|Longitude)(.*)\n", "$1: "); resulting in
Latitude: 32,6549581304256
Longitude: -16,9288643331225
My question is can i do these 2 lines in one go?
dwText = Regex.Replace( dwText, #"\s{2,}", "\n");
dwText = Regex.Replace( dwText, #"(Latitude|Longitude)(.*)\n", "$1: ");
I would appreciate some help on how this can be achieved more efficiently, thank you.

Try the following (with i flag),
[\S\s]*?([a-z]+)[\S\s]*?([-\d,]+)[\S\s]*?
Replacement: $1: $2\n
C# Regex Demo
Explanation
[\S\s]*? - matches anything lazily.
[a-z]+ (first capture group) - matches alphabetical words, case insensitive.
[-\d,]+ (second capturing group) - matches digits, - (hyphen) and , (comma)

You can match the whitespace chars around the Latitude and Longitude and capture the values in 2 groups and use those 2 groups in the replacement.
\s*\b(Latitude|Longitude)\s*(-?[0-9]+(?:,[0-9]+)?)\b
Explanation
\s* Match 0+ whitespace chars
\b(Latitude|Longitude) A word boundary, capture either latitude or Longitude in group 1
\s* Match 0+ whitespace chars
(-?[0-9]+(?:,[0-9]+)?) Capture group 2, match optional -, 1+ digits with an optional decimal part
\b A word boundary
Replace with:
$1: $2\n
.Net regex demo

Why not parse the values out and then extract them to do what is needed with them?
By using match group named captures (?<{NameHere}> ) one can organize and then extract the data.
Example white space shortened, but it works across lines and with the original example:
var data = " Latitude 32,6549 Longitude -16,9288 ";
var pattern = #"[^\d]+(?<Lat>[\d,]+)[^\d]+(?<Long>[\d,]+)";
var mtch = Regex.Match(data, pattern);
Console.WriteLine($"Latitude: {mtch.Groups["Lat"].Value} Longitude: {mtch.Groups["Long"].Value}");
// Latitude: 32,6549 Longitude: 16,9288

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Regex - Get digits after a colon

I have a regex:
var topPayMatch = Regex.Match(result, #"(?<=Top Pay)(\D*)(\d+(?:\.\d+)?)", RegexOptions.IgnoreCase);
And I have to convert this to int which I did
topPayMatch = Convert.ToInt32(topPayMatchString.Groups[2].Value);
So now...
Top Pay: 1,000,000 then it currently grabs the first digit, which is 1. I want all 1000000.
If Top Pay: 888,888 then I want all 888888.
What should I add to my regex?
You can use something as simple like #"(?<=Top Pay: )([0-9,]+)". Note that, decimals will be ignored with this regex.
This will match all numbers with their commas after Top Pay:, which after you can parse it to an integer.
Example:
Regex rgx = new Regex(#"(?<=Top Pay: )([0-9,]+)");
string str = "Top Pay: 1,000,000";
Match match = rgx.Match(str);
if (match.Success)
{
string val = match.Value;
int num = int.Parse(val, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(num);
}
Console.WriteLine("Ended");
Source:
Convert int from string with commas
If you use the lookbehind, you don't need the capture groups and you can move the \D* into the lookbehind.
To get the values, you can match 1+ digits followed by optional repetitions of , and 1+ digits.
Note that your example data contains comma's and no dots, and using ? as a quantifier means 0 or 1 time.
(?<=Top Pay\D*)\d+(?:,\d+)*
The pattern matches:
(?<=Top Pay\D*) Positive lookbehind, assert what is to the left is Top Pay and optional non digits
\d+ Match 1+ digits
(?:,\d+)* Optionally repeat a , and 1+ digits
See a .NET regex demo and a C# demo
string pattern = #"(?<=Top Pay\D*)\d+(?:,\d+)*";
string input = #"Top Pay: 1,000,000
Top Pay: 888,888";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
var topPayMatch = int.Parse(m.Value, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(topPayMatch);
}
Output
1000000
888888

Regex replace 'whole' decimal numbers not followed by a certain string

I want to replace "whole" decimal numbers not followed by pt with M.
For example, I need to replace 1, 12, and 36.7, but not 45.63 in the following.
string exp = "y=tan^-1(45.63pt)+12sin(-36.7)";
I have already tried
string newExp = Regex.Replace(exp, #"(\d+\.?\d*)(?!pt)", "M");
and it gives
"y=tan^-M(M3pt)+Msin(-M)"
It does make sense to me why it works like this, but I need to get
"y=tan^-M(45.63pt)+Msin(-M)"
The problem with the regex is that it is still matching a portion of the decimal value 45.63, up to the second-to-last decimal digit. One solution is to add a negative lookahead to the pattern to ensure that we only assert (?!pt) at the real end of every decimal value. This version is working:
string exp = "y=tan^-1(45.63pt)+12sin(-36.7)";
string newExp = Regex.Replace(exp, #"(\d+(?:\.\d+)?)(?![\d.])(?!pt)", "M");
Console.WriteLine(newExp);
This prints:
y=tan^-M(45.63pt)+Msin(-M)
Here is an explanation of the regex pattern used:
( match and capture:
\d+ one or more whole number digits
(?:\.\d+)? followed by an optional decimal component
) stop capturing
(?![\d.]) not being followed by another digit or dot
(?!pt) not followed by pt
Hi there if you need the out put as
"y=tan^-M(Mpt)+Msin(-M)"
then then newExp should be
string newExp = Regex.Replace(exp, #"(\d+\.?\d*)", "M");
if output is
"y=tan^-M(45.63pt)+Msin(-M)"
then newExp should be
string newExp = Regex.Replace(exp, #"(\d+\.?\d*)(?![.\d]*pt), "M");
I think you may assert the point in a string where there are no digits and dots directly followed by "pt":
\b(?![\d.]+pt)\d+(?:\.\d+)?
See the online demo
\b - Match a word-boundary.
(?![\d.]+pt) - Negative lookahead for 1+ digits and dots followed by "pt".
\d+ - 1+ digits.
(?: - Open non-capture group:
\.\d+ - A literal dot and 1+ digits.
)? - Close non-capture group and make it optional.
See the .NET demo

Regex match with multiple delimiters

I have a regex that takes out all parts of a string in between citation marks.
\(([^)]*)\)
So
*- (Hello) + (World) -
returns two matches
(Hello)
(World)
Im trying but failing to modify it so that i also get the parts in between as their own matches. Like:
*-
(Hello)
+
(World)
-
Is it even possible?
In this case, with the current regex, you may use Regex.Split with the pattern wrapped in a capturing group:
var tokens = Regex.Split(s, #"(\([^)]*\))");
Or even, when matches occur in the leading/trailing positions:
var tokens = Regex.Split(s, #"(\([^)]*\))").Where(m => !string.IsNullOrEmpty(m));
See the regex demo:
Note you may need to replace all capturing groups in your regex into non-capturing to use this feature. When you use "technical" capturing groups to later refer to using backreferences, you would have to build the non-matching substring array using multiple matching and calling .Substring() on the input using the information on the match position.
You could use an alternation to match either the parenthesis with the characters \([^)]*\) or | match one or more times the characters listed in a character class [*+-]+
\([^)]*\)|[*+-]+
string pattern = #"\([^)]*\)|[*+-]+";
string input = #"*- (Hello) + (World) - ";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Value);
}
That would give you:
*-
(Hello)
+
(World)
-
Demo C#

Regex to extract substrings in C#

I have a string as:
string subjectString = #"(((43*('\\uth\Hgh.Green.two.190ITY.PCV')*9.8)/100000+('VBNJK.PVI.10JK.PCV'))*('ASFGED.Height Density.1JKHB01.PCV')/476)";
My expected output is:
Hgh.Green.two.190ITY.PCV
VBNJK.PVI.10JK.PCV
ASFGED.Height Density.1JKHB01.PCV
Here's what I have tried:
Regex regexObj = new Regex(#"'[^\\]*.PCV");
Match matchResults = regexObj.Match(subjectString);
string val = matchResults.Value;
This works when the input string is :"#"(((43*('\\uth\Hgh.Green.two.190ITY.PCV')*9.8)/100000+"; but when the string grows and the number of substrings to be extracted is more than 1 , I am getting undesired results .
How do I extract three substrings from the original string?
It seems you want to match word and . chars before .PCV.
Use
[\w\s.]*\.PCV
See the regex demo
To force at least 1 word char at the start use
\w[\w\s.]*\.PCV
Optionally, if needed, add a word boundary at the start: #"\b\w[\w\s.]*\.PCV".
To force \w match only ASCII letters and digits (and _) compile the regex object with RegexOptions.ECMAScript option.
Here,
\w - matches any letter, digit or _
[\w\s.]* - matches 0+ whitespace, word or/and . chars
\. - a literal .
PCV - a PCV substring.
Sample usage:
var results = Regex.Matches(str, #"\w[\w\s.]*\.PCV")
.Cast<Match>()
.Select(m=>m.Value)
.ToList();

Categories