I'm reading weight and dimension dash-separated values from serial port.
This is what incoming data look like right now:
-15.0cm-47.8cm-83.1cm: 0.115 kg
And this is my pattern for it
#"(\d+\.\d+)"
However, sometimes one of those values can be negative as well, for example
--15.0cm-47.8cm--83.1cm: 0.115 kg.
My question is how I can get both negative and positive values at the same time? My expected output for the above string is [ "-15.0", "47.8", "-83.1", "0.115"].
You may use a lookbehind pattern to make sure there is a "dash" before another one (that will get consumed, i.e. added to the match value):
(?:(?<=-)-)?\d+\.\d+
See the regex demo against a --15.0cm-47.8cm--83.1cm: 0.115 kg string:
Here, (?:(?<=-)-)? is an optional non-capturing group that matches a - that is preceded with another -. The \d+\.\d+ matches 1+ digits, . and again 1 or more digits.
C# code:
var results = Regex.Matches(str, #"(?:(?<=-)-)?\d+\.\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Related
This my text=0.123.456Vaaa.789.V
I want find text=123.456V
I using this pattern in C#: \.[0-9]*[\.]?[0-9]*V
But result return 2 values: 123.456V and 789.V
I don't want get case blank after ".": 789.V
How can fix my pattern?
Thank you.
In your pattern, [\.]? does not have to be a separate character class, or the dot does not have to be escaped. I suggest writing the optional dot pattern as \.?, it is least ambiguous. [0-9]* after the optional dot pattern matches zero or more digits, hence you get unexpected matches.
You do not seem to need the \. at the start, either.
You can use
[0-9]*\.?[0-9]+V
See the .NET regex demo.
Details:
[0-9]* - zero or more ASCII digits
\.? - an optional .
[0-9]+ - one or more digits
V - a V char.
See a C# regex demo:
var results = Regex.Matches(text, #"[0-9]*\.?[0-9]+V")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
// => 123.456V
I think the simplest solution would be:
\d+\.\d+V
meaning you want to find some arbitrary number of digits, followed by a dot, followed by more digits, followed by the letter V.
So I'm processing a report that (brilliantly, really) spits out number values with commas in them, in a .csv output. Super useful.
So, I'm using (C#)regex lookahead positive and lookbehind positive expressions to remove commas that have digits on both sides.
If I use only the lookahead, it seems to work. However when I add the lookbehind as well, the expression breaks down and removes nothing. Both ends of the comma can have arbitrary numbers of digits around them, so I just want to remove the comma if the pattern has one or more digits around it.
Here's the expression that works with the lookahead only:
str = Regex.Replace(str, #"[,](?=(\d+)),"");
Here's the expression that doesn't work as I intend it:
str = Regex.Replace(str, #"[,](?=(\d+)?<=(\d+))", "");
What's wrong with my regex! If I had to guess, there's something I'm misunderstanding about how lookbehind works. Any ideas?
You may use any of the solutions below:
var s = "abc,def,2,100,xyz!,:))))";
Console.WriteLine(Regex.Replace(s, #"(\d),(\d)", "$1$2")); // Does not handle 1,2,3,4 cases
Console.WriteLine(Regex.Replace(s, #"(\d),(?=\d)", "$1")); // Handles consecutive matches with capturing group+backreference/lookahead
Console.WriteLine(Regex.Replace(s, #"(?<=\d),(?=\d)", "")); // Handles consecutive matches with lookbehind/lookahead, the most efficient way
Console.WriteLine(Regex.Replace(s, #",(?<=\d,)(?=\d)", "")); // Also handles all cases
See the C# demo.
Explanations:
(\d),(\d) - matches and captures single digits on both sides of , and $1$2 are replacement backreferences that insert captured texts back into the result
(\d),(?=\d) - matches and captures a digit before ,, then a comma is matched and then a positive lookahead (?=\d) requires a digit after ,, but since it is not consumed, onyl $1 is required in the replacement pattern
(?<=\d),(?=\d) - only such a comma is matched that is enclosed with digits without consuming the digits ((?<=\d) is a positive lookbehind that requires its pattern match immediately to the left of the current location)
,(?<=\d,)(?=\d) - matches a comma and only after matching it, the regex engine checks if there is a digit and a comma immediately before the location (that is after the comma), and if the check if true, the next char is checked for a digit. If it is a digit, a match is returned.
RegexHero.net test:
Bonus:
You may just match a pattern like yours with \d,\d and pass the match to the MatchEvaluator method where you may manipulate the match further:
Console.WriteLine(Regex.Replace(s, #"\d,\d", m => m.Value.Replace(",",string.Empty))); // Callback method
Here, m is the match object and m.Value holds the whole match value. With .Replace(",",string.Empty), you remove all commas from the match value.
You can always check a website that evaluates regex expressions.
I think this code might be able to help you:
str = Regex.Replace(str, #"[,](?=(\d+))(?<=(\d))", "");
I want to write a regexp to get multiple matches of the first character and next three digits. Some valid examples:
A123,
V322,
R333.
I try something like that
[a-aA-Z](1)\d3
but it gets me just the first match!
Could you possibly show me, how to rewrite this regexp to get multiple results?Thank you so much and Have a nice day!
Your regex does not work because it matches:
[a-aA-Z] - an ASCII letter, then
(1) - a 1 digit (and puts into a capture)
\d - any 1 digit
3 - a 3 digit.
So, it matches Y193, E103, etc., even in longer phrases, where Y and E are not first letters.
You need to use a word boundary and fix your pattern as
\b[a-aA-Z][0-9]{3}
NOTE: if you need to match it as a whole word, add \b at the end: \b[a-aA-Z][0-9]{3}\b.
See the regex demo.
Details:
\b - leading word boundary
[a-aA-Z] - an ASCII letter
[0-9]{3} - 3 digits.
C# code:
var results = Regex.Matches(s, #"\b[a-aA-Z][0-9]{3}")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
I am trying to make a regex that finds substrings that start with a dot (.), have only numbers and end either with another dot or it's the strings end.
To clarify, here are a few examples:
abc.123.ds => 123
aAsd.12sd.SAs.32.asd.3123 => 32 and 3123
111.2e2 => no result
aaa.bbb.13.320.a => 13 and 320
I tried different approaches, this is the closest I cam to a result is "^[.][0-9]+\.?$" but it still fails.
Any tips would be greatly appreciated
The ^[.][0-9]+\.?$ fails becaue ^ forces the pattern to match at the start of the string and $ makes it match the end of string (the full string), and the .? at the end, when matched, will consume the . and will not let you match an overlapping number with a dot in front.
I suggest using lookarounds:
(?<=\.)[0-9]+(?=\.|$)
See the regex demo
Details:
(?<=\.) - there must be a . immediately to the left of the current position
[0-9]+ - 1+ digits
(?=\.|$) - there must be a . or end of string immediately to the right of the current location.
C#:
var res = Regex.Matches(str, #"(?<=\.)[0-9]+(?=\.|$)")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Remove the begining of line anchor and do an alternative for the other:
\.[0-9]+(\.|$)
It is pretty simple using capturing groups:
int[] result = Regex.Matches("\.(\d+)\.?").Cast<Match>().Select(x=> int.Parse(x.Groups[2].Value)).ToList();
First group is your entire match
\.(\d+)\.?
Second is first nested brace-closed expression
\d+
i have the following string
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture
212,323.222
2-2.24
0.5
i.e. i want the above three results from the string,
can any one help me with this regex
I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH".
So, here is another regex in C#
#"[0-9]+([,.\u2013-][0-9]+)*"
Test
MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", #"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
Console.WriteLine(m.Groups[0]);
}
Here is the results, my console does not support printing unicode char 2013, so its "?" but its properly matched.
2121,323.222
2?2.4
0.5
Okay I didn't notice the C# tag until now. I will leave the answer but I know that's not what you expected, see if you can do something with it. Perhaps the title should have mentioned the programming language?
Sure:
Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)
Find your substrings in \1, \2 and \3.
If for Emacs, swap all parentheses and escaped parentheses.
How about something like this:
^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$
A little more general, I think. I'm a little concerned about .* being greedy.
Fat mass loss was 2121,323.222 greater
for GPLC (2–2.4kg vs. 0.5kg)
a generalized extractor:
/\D+?([\d\,\.\-]+)/g
explanation:
/ # start pattern
\D+ # 1 or more non-digits
( # capture group 1
[\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
) # end capture group 1
/g # trailing regex g modifier (make regex continue after last match)
sorry I don't know c# well enough for a full writeup, but the pattern should plug right in.
see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples.
I came out with something like this atrocity:
-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?
Out of witch -?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))? is repeated twice, with – in the middle (note that this is a long hyphen).
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world - will capture 23,45.2-7.
It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". Here is a regex that should work:
\d+(?:[,.-]\d+)*
From C# 3, you can use it like this:
var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = #"\d+(?:[,.-]\d+)*";
var matches = Regex.Matches(input, pattern);
foreach ( var match in matches )
Console.WriteLine(match.Value);
Hmm, this is a tricky question, especially because the input string contains unicode character – (EN DASH) instead of - (HYPHEN-MINUS). Therefore the correct regex to match the numbers in the original string would be:
\d+(?:[\u2013,.]\d+)*
If you want a more generic approach would be:
\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*
which matches dash punctuation, connecter punctuation and other punctuation. See here for more information about those.
An implementation in C# would look like this:
string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
Regex rx = new Regex(#"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match match = rx.Match(input);
while (match.Success) {
// matched text: match.Value
// match start: match.Index
// match length: match.Length
match = match.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Let's try this one :
(?=\d)([0-9,.-]+)(?<=\d)
It captures all expressions containing only :
"[0-9,.-]" characters,
must start with a digit "(?=\d)",
must finish with a digit "(?<=\d)"
It works with a single digit expression and does not include beginning or trailing [.,-].
Hope this helps.
I got the solution to my problem.
The following is the Regex that gave my desired result:
(([0-9]+)([–.,-]*))+