Regex for both negative and positive values in dash-separated string

Regex for both negative and positive values in dash-separated string - c#

I'm reading weight and dimension dash-separated values from serial port.
This is what incoming data look like right now:
-15.0cm-47.8cm-83.1cm: 0.115 kg
And this is my pattern for it
#"(\d+\.\d+)"
However, sometimes one of those values can be negative as well, for example
--15.0cm-47.8cm--83.1cm: 0.115 kg.
My question is how I can get both negative and positive values at the same time? My expected output for the above string is [ "-15.0", "47.8", "-83.1", "0.115"].

You may use a lookbehind pattern to make sure there is a "dash" before another one (that will get consumed, i.e. added to the match value):
(?:(?<=-)-)?\d+\.\d+
See the regex demo against a --15.0cm-47.8cm--83.1cm: 0.115 kg string:
Here, (?:(?<=-)-)? is an optional non-capturing group that matches a - that is preceded with another -. The \d+\.\d+ matches 1+ digits, . and again 1 or more digits.
C# code:
var results = Regex.Matches(str, #"(?:(?<=-)-)?\d+\.\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

Related

How can find format number by Regex?

This my text=0.123.456Vaaa.789.V
I want find text=123.456V
I using this pattern in C#: \.[0-9]*[\.]?[0-9]*V
But result return 2 values: 123.456V and 789.V
I don't want get case blank after ".": 789.V
How can fix my pattern?
Thank you.

In your pattern, [\.]? does not have to be a separate character class, or the dot does not have to be escaped. I suggest writing the optional dot pattern as \.?, it is least ambiguous. [0-9]* after the optional dot pattern matches zero or more digits, hence you get unexpected matches.
You do not seem to need the \. at the start, either.
You can use
[0-9]*\.?[0-9]+V
See the .NET regex demo.
Details:
[0-9]* - zero or more ASCII digits
\.? - an optional .
[0-9]+ - one or more digits
V - a V char.
See a C# regex demo:
var results = Regex.Matches(text, #"[0-9]*\.?[0-9]+V")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
// => 123.456V

I think the simplest solution would be:
\d+\.\d+V
meaning you want to find some arbitrary number of digits, followed by a dot, followed by more digits, followed by the letter V.

Removing commas from numbers with .NET regex

So I'm processing a report that (brilliantly, really) spits out number values with commas in them, in a .csv output. Super useful.
So, I'm using (C#)regex lookahead positive and lookbehind positive expressions to remove commas that have digits on both sides.
If I use only the lookahead, it seems to work. However when I add the lookbehind as well, the expression breaks down and removes nothing. Both ends of the comma can have arbitrary numbers of digits around them, so I just want to remove the comma if the pattern has one or more digits around it.
Here's the expression that works with the lookahead only:
str = Regex.Replace(str, #"[,](?=(\d+)),"");
Here's the expression that doesn't work as I intend it:
str = Regex.Replace(str, #"[,](?=(\d+)?<=(\d+))", "");
What's wrong with my regex! If I had to guess, there's something I'm misunderstanding about how lookbehind works. Any ideas?

You may use any of the solutions below:
var s = "abc,def,2,100,xyz!,:))))";
Console.WriteLine(Regex.Replace(s, #"(\d),(\d)", "$1$2")); // Does not handle 1,2,3,4 cases
Console.WriteLine(Regex.Replace(s, #"(\d),(?=\d)", "$1")); // Handles consecutive matches with capturing group+backreference/lookahead
Console.WriteLine(Regex.Replace(s, #"(?<=\d),(?=\d)", "")); // Handles consecutive matches with lookbehind/lookahead, the most efficient way
Console.WriteLine(Regex.Replace(s, #",(?<=\d,)(?=\d)", "")); // Also handles all cases
See the C# demo.
Explanations:
(\d),(\d) - matches and captures single digits on both sides of , and $1$2 are replacement backreferences that insert captured texts back into the result
(\d),(?=\d) - matches and captures a digit before ,, then a comma is matched and then a positive lookahead (?=\d) requires a digit after ,, but since it is not consumed, onyl $1 is required in the replacement pattern
(?<=\d),(?=\d) - only such a comma is matched that is enclosed with digits without consuming the digits ((?<=\d) is a positive lookbehind that requires its pattern match immediately to the left of the current location)
,(?<=\d,)(?=\d) - matches a comma and only after matching it, the regex engine checks if there is a digit and a comma immediately before the location (that is after the comma), and if the check if true, the next char is checked for a digit. If it is a digit, a match is returned.
RegexHero.net test:
Bonus:
You may just match a pattern like yours with \d,\d and pass the match to the MatchEvaluator method where you may manipulate the match further:
Console.WriteLine(Regex.Replace(s, #"\d,\d", m => m.Value.Replace(",",string.Empty))); // Callback method
Here, m is the match object and m.Value holds the whole match value. With .Replace(",",string.Empty), you remove all commas from the match value.

You can always check a website that evaluates regex expressions.
I think this code might be able to help you:
str = Regex.Replace(str, #"[,](?=(\d+))(?<=(\d))", "");

RegExp multiply matches in text

I want to write a regexp to get multiple matches of the first character and next three digits. Some valid examples:
A123,
V322,
R333.
I try something like that
[a-aA-Z](1)\d3
but it gets me just the first match!
Could you possibly show me, how to rewrite this regexp to get multiple results?Thank you so much and Have a nice day!

Your regex does not work because it matches:
[a-aA-Z] - an ASCII letter, then
(1) - a 1 digit (and puts into a capture)
\d - any 1 digit
3 - a 3 digit.
So, it matches Y193, E103, etc., even in longer phrases, where Y and E are not first letters.
You need to use a word boundary and fix your pattern as
\b[a-aA-Z][0-9]{3}
NOTE: if you need to match it as a whole word, add \b at the end: \b[a-aA-Z][0-9]{3}\b.
See the regex demo.
Details:
\b - leading word boundary
[a-aA-Z] - an ASCII letter
[0-9]{3} - 3 digits.
C# code:
var results = Regex.Matches(s, #"\b[a-aA-Z][0-9]{3}")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

Numeric substrings between dots

I am trying to make a regex that finds substrings that start with a dot (.), have only numbers and end either with another dot or it's the strings end.
To clarify, here are a few examples:
abc.123.ds => 123
aAsd.12sd.SAs.32.asd.3123 => 32 and 3123
111.2e2 => no result
aaa.bbb.13.320.a => 13 and 320
I tried different approaches, this is the closest I cam to a result is "^[.][0-9]+\.?$" but it still fails.
Any tips would be greatly appreciated

The ^[.][0-9]+\.?$ fails becaue ^ forces the pattern to match at the start of the string and $ makes it match the end of string (the full string), and the .? at the end, when matched, will consume the . and will not let you match an overlapping number with a dot in front.
I suggest using lookarounds:
(?<=\.)[0-9]+(?=\.|$)
See the regex demo
Details:
(?<=\.) - there must be a . immediately to the left of the current position
[0-9]+ - 1+ digits
(?=\.|$) - there must be a . or end of string immediately to the right of the current location.
C#:
var res = Regex.Matches(str, #"(?<=\.)[0-9]+(?=\.|$)")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

Remove the begining of line anchor and do an alternative for the other:
\.[0-9]+(\.|$)

It is pretty simple using capturing groups:
int[] result = Regex.Matches("\.(\d+)\.?").Cast<Match>().Select(x=> int.Parse(x.Groups[2].Value)).ToList();
First group is your entire match
\.(\d+)\.?
Second is first nested brace-closed expression
\d+

regex for capturing digits and digit ranges

i have the following string
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture
212,323.222
2-2.24
0.5
i.e. i want the above three results from the string,
can any one help me with this regex

I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH".
So, here is another regex in C#
#"[0-9]+([,.\u2013-][0-9]+)*"
Test
MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", #"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
Console.WriteLine(m.Groups[0]);
}
Here is the results, my console does not support printing unicode char 2013, so its "?" but its properly matched.
2121,323.222
2?2.4
0.5

Okay I didn't notice the C# tag until now. I will leave the answer but I know that's not what you expected, see if you can do something with it. Perhaps the title should have mentioned the programming language?
Sure:
Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)
Find your substrings in \1, \2 and \3.
If for Emacs, swap all parentheses and escaped parentheses.

How about something like this:
^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$
A little more general, I think. I'm a little concerned about .* being greedy.

Fat mass loss was 2121,323.222 greater
for GPLC (2–2.4kg vs. 0.5kg)
a generalized extractor:
/\D+?([\d\,\.\-]+)/g
explanation:
/ # start pattern
\D+ # 1 or more non-digits
( # capture group 1
[\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
) # end capture group 1
/g # trailing regex g modifier (make regex continue after last match)
sorry I don't know c# well enough for a full writeup, but the pattern should plug right in.
see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples.

I came out with something like this atrocity:
-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?
Out of witch -?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))? is repeated twice, with – in the middle (note that this is a long hyphen).
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world - will capture 23,45.2-7.

It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". Here is a regex that should work:
\d+(?:[,.-]\d+)*
From C# 3, you can use it like this:
var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = #"\d+(?:[,.-]\d+)*";
var matches = Regex.Matches(input, pattern);
foreach ( var match in matches )
Console.WriteLine(match.Value);

Hmm, this is a tricky question, especially because the input string contains unicode character – (EN DASH) instead of - (HYPHEN-MINUS). Therefore the correct regex to match the numbers in the original string would be:
\d+(?:[\u2013,.]\d+)*
If you want a more generic approach would be:
\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*
which matches dash punctuation, connecter punctuation and other punctuation. See here for more information about those.
An implementation in C# would look like this:
string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
Regex rx = new Regex(#"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match match = rx.Match(input);
while (match.Success) {
// matched text: match.Value
// match start: match.Index
// match length: match.Length
match = match.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Let's try this one :
(?=\d)([0-9,.-]+)(?<=\d)
It captures all expressions containing only :
"[0-9,.-]" characters,
must start with a digit "(?=\d)",
must finish with a digit "(?<=\d)"
It works with a single digit expression and does not include beginning or trailing [.,-].
Hope this helps.

I got the solution to my problem.
The following is the Regex that gave my desired result:
(([0-9]+)([–.,-]*))+

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for both negative and positive values in dash-separated string - c#

Related

How can find format number by Regex?

Removing commas from numbers with .NET regex

RegExp multiply matches in text

Numeric substrings between dots

regex for capturing digits and digit ranges

Categories

Resources