Hi i need a Regex Expression for extracting only floating point numbers from right to left
Example string
Earning per Equity Share (in ) face value of 2 each26 1,675.10
1,252.56
My current Regex
(\+|-)?[0-9][0-9]*(\,[0-9]*)?(\.[0-9]*)? with Rex options-Right to left
but
Current Output is
1,252.56
1,675.10
26
2
However i do not want to match on 26 or 2
Please help me
Maybe something like this will help
Regex
/[-+]?[0-9,\.]*([,\.])[0-9]*/g
Example input
Earning -34 5 b4 pe8r blah4 t3st + - (in) 1,252.56 face
-12234,23423.342 of 1,675.10 1,252.56
Matches
1,252.56
-12234,23423.342
1,675.10
1,252.56
Explanation
[-+]? match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
-+ a single character in the list -+ literally
[0-9,\.]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
, the literal character ,
\. matches the character . literally
1st Capturing group ([,\.])
[,\.] match a single character present in the list below
, the literal character ,
\. matches the character . literally
[0-9]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
g modifier: global. All matches (don't return on first match)
Although this is a Regex question this is also taged as C#.
Below is an example of how you might get a little bit more control over your output.
It's also culture-specific and only picks up numbers with a decimal place, and has no false positives.
Method
private List<double> GetNumbers(string input)
{
// declare result
var resultList = new List<double>();
// if input is empty return empty results
if (string.IsNullOrEmpty(input))
{
return resultList;
}
// Split input in to words, exclude empty entries
var words = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// set your desirted culture
var culture = CultureInfo.CreateSpecificCulture("en-GB");
// Refine words into a list that represents potential numbers
// must have decimal place, must not start or end with decimal place
var refinedList = words.Where(x => x.Contains(".") && !x.StartsWith(".") && !x.EndsWith("."));
foreach (var word in refinedList)
{
double value;
// parse words using designated culture, and the Number option of double.TryParse
if (double.TryParse(word, NumberStyles.Number, culture, out value))
{
resultList.Add(value);
}
}
return resultList;
}
Usage
var testString = "Earning -34 5 b4 , . 234. 234, ,345 45.345 $234234 234.3453.345 $23423.2342 +234 -23423 pe8r blah4 t3st + - (in) 1,252.56 face -12234,23423.342 of 1,675.10 1,252.56";
var results = GetNumbers(testString);
foreach (var item in results)
{
Debug.WriteLine("{0}", item);
}
Output
45.345
1252.56
-1223423423.342
1675.1
1252.56
Additional Notes
You can learn more about double.TryParse and its options here.
You can learn more about the CultureInfo Class here.
Related
I want to extract the double value from the string that contains a specific keyword. For example:
Amount : USD 3,747,190.67
I need to extract the value "3,747,190.67" from the string above using the keyword Amount, for that I tested this pattern in different online Regex testers and it works:
(?<=\bAmount.*)(\d+\,*\.*)*
However it doesn't work on my C# code:
if (type == typeof(double))
{
double doubleVal = 0;
pattern = #"(?<=\bAmount.*)(\d+\,*\.*)*";
matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
return doubleVal;
}
This one works:
(?<=\bAmount.*)\d+(,\d+)*(\.\d+)?
(?<=\bAmount.*) the look behind
\d+ leading digits (at least one digit)
(,\d+)* thousands groups (zero or more times)
(\.\d+)? decimals (? = optional)
Note that the regex tester says "9 matches found" for your pattern. For my pattern it says "1 match found".
The problem with your pattern is that its second part (\d+\,*\.*)* can be empty because of the * at the end. The * quantifier means zero, one or more repetitions. Therefore, the look-behind finds 8 empty entries between Amount and the number. The last of the 9 matches is the number. You can correct it by replacing the * with a +. See: regextester with *, regextester with +. You can also test it with "your" tester and switch to the table to see the detailed results.
My solution does not allow consecutive commas or points but allows numbers without thousands groups or a decimal part.
The lookbehind (?<=\bAmount.*) is always true in the example data after Amount.
The first 7 matches are empty, as (\d+\,*\.*)* can not consume a character where there is no digit, but it matches at the position as the quantifier * matches 0 or more times.
See this screenshot of the matches:
You might use
(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b
(?<=\bAmount\D*) Positive lookbehind, assert Amount to the left followed by optional non digits
\d{1,3} Match 1-3 digits
(?:,\d{3})* Optionally repeat , and 3 digits
(?:\.\d{1,2})? Optionally match . and 1 or 2 digits
\b A word boundary
See a .NET regex demo
For example
double doubleVal = 0;
var pattern = #"(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b";
var textToParse = "Amount : USD 3,747,190.67";
var matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
Console.WriteLine(doubleVal);
Output
3747190.67
You can omit the word boundaries if needed if a partial match is also valid
(?<=Amount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?
I'm trying to understand how to match a specific string that's held within an array (This string will always be 3 characters long, ex: 123, 568, 458 etc) and I would match that string to a longer string of characters that could be in any order (9841273 for example). Is it possible to check that at least 2 of the 3 characters in the string match (in this example) strMoves? Please see my code below for clarification.
private readonly string[] strSolutions = new string[8] { "123", "159", "147", "258", "357", "369", "456", "789" };
Private Static string strMoves = "1823742"
foreach (string strResult in strSolutions)
{
Regex rgxMain = new Regex("[" + strMoves + "]{2}");
if (rgxMain.IsMatch(strResult))
{
MessageBox.Show(strResult);
}
}
The portion where I have designated "{2}" in Regex is where I expected the result to check for at least 2 matching characters, but my logic is definitely flawed. It will return true IF the two characters are in consecutive order as compared to the string in strResult. If it's not in the correct order it will return false. I'm going to continue to research on this but if anyone has ideas on where to look in Microsoft's documentation, that would be greatly appreciated!
Correct order where it would return true: "144257" when matched to "123"
incorrect order: "35718" when matched to "123"
The 3 is before the 1, so it won't match.
You can use the following solution if you need to find at least two different not necessarily consecutive chars from a specified set in a longer string:
new Regex($#"([{strMoves}]).*(?!\1)[{strMoves}]", RegexOptions.Singleline)
It will look like
([1823742]).*(?!\1)[1823742]
See the regex demo.
Pattern details:
([1823742]) - Capturing group 1: one of the chars in the character class
.* - any zero or more chars as many as possible (due to RegexOptions.Singleline, . matches any char including newline chars)
(?!\1) - a negative lookahead that fails the match if the next char is a starting point of the value stored in the Group 1 memory buffer (since it is a single char here, the next char should not equal the text in Group 1, one of the specified digits)
[1823742] - one of the chars in the character class.
I'm pretty bad at Regex (C#) with my attempts at doing the following giving non-sense results.
Given string: 058:09:07
where only the last two digits are guaranteed, I need the result of:
"58y 9m 7d"
The needed rules are:
The last two digits "07" are days group and always present. If "00", then only the last "0" is to be printed,
The group immediately to the left of "07" which ends with ":" signify the months and are only present if enough days are present to lead into months. Again, if "00", then only the last "0" is to be printed,
The group immediately to the left of "09:" which ends with ":" signify years and will only be present if more then 12 months are needed.
In each group a leading "0" will be dropped.
(This is the result of an age calculation where 058:09:07 means 58 years, 9 months, and 7 days old. The ":" (colon) always used to separate years from months from days).
Example:
058:09:07 --> 58y 9m 7d
01:00 --> 1m 0d
08:00:00 --> 8y 0m 0d
00 --> 0d
Any help is most appreciated.
Well, you can pretty much do this without regex.
var str = "058:09:07";
var integers = str.Split(':').Select(int.Parse).ToArray();
var result = "";
switch(integers.Length)
{
case 1:
result = string.Format("{0}d", integers[0]); break;
case 2:
result = string.Format("{0}m {1}d", integers[0], integers[1]); break;
case 3:
result = string.Format("{0}y {1}m {2}d", integers[0], integers[1], integers[2]); break;
}
If you want to use regex so bad, that it starts to hurt, you can use this one instead:
var integers = Regex.Matches(str, "\d+").Cast<Match>().Select(x=> int.Parse(x.Value)).ToArray();
But, its overhead, of course. You see, regex is not parsing language, its pattern matching language, and should be used as one. For example, for finding substrings in strings. If you can find final substrings simply by cutting it by char, why not to use it?
DISCLAIMER: I am posting this answer for the educational purposes. The easiest and most correct way in case the whole string represents the time span eocron06's answer is to be used.
The point here is that you have optional parts that go in a specific order. To match them all correctly you may use the following regex:
\b(?:(?:0*(?<h>\d+):)?0*(?<m>\d+):)?0*(?<d>\d+)\b
See the regex demo
Details:
\b - initial word boundary
(?: - start of a non-capturing optional group (see the ? at the end below)
(?:0*(?<h>\d+):)? - a nested non-capturing optional group that matches zero or more zeros (to trim this part from the start from zeros), then captures 1+ digits into Group "h" and matches a :
0*(?<m>\d+): - again, matches zero or more 0s, then captures one or more digits into Group "m"
)? - end of the first optional group
0*(?<d>\d+) - same as the first two above, but captures 1+ digits (days) into Group "d"
\b - trailing word boundary
See the C# demo where the final string is built upon analyzing which group is matched:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var pattern = #"\b(?:(?:0*(?<h>\d+):)?0*(?<m>\d+):)?0*(?<d>\d+)\b";
var strs = new List<string>() {"07", "09:07", "058:09:07" };
foreach (var s in strs)
{
var result = Regex.Replace(s, pattern, m =>
m.Groups["h"].Success && m.Groups["m"].Success ?
string.Format("{0}h {1}m {2}d", m.Groups["h"].Value, m.Groups["m"].Value, m.Groups["d"].Value) :
m.Groups["m"].Success ?
string.Format("{0}m {1}d", m.Groups["m"].Value, m.Groups["d"].Value) :
string.Format("{0}d", m.Groups["d"].Value)
);
Console.WriteLine(result);
}
}
}
I have the following string:
01-21-27-0000-00-048 and it is easy to split it apart because each section is separated by a -, but sometimes this string is represented as 01-21-27-0000-00048, so splitting it is not as easy because the last 2 parts are combined. How can I handle this? Also, what about the case where it might be something like 01-21-27-0000-00.048
In case anyone is curious, this is a parcel number and it varies from county to county and a county can have 1 format or they can have 100 formats.
This is a very good case for using regular expressions. You string matches the following regexp:
(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})
Match the input against this expression, and harvest the six groups of digits from the match:
var str = new[] {
"01-21-27-0000-00048", "01-21-27-0000-00.048", "01-21-27-0000-00-048"
};
foreach (var s in str) {
var m = Regex.Match(s, #"(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})");
for (var i = 1 /* one, not zero */ ; i != m.Groups.Count ; i++) {
Console.Write("{0} ", m.Groups[i]);
}
Console.WriteLine();
}
If you would like to allow for other characters, say, letters in the segments that are separated by dashes, you could use \w instead of \d to denote a letter, a digit, or an underscore. If you would like to allow an unspecified number of such characters within a known range, say, two to four, you can use {2,4} in the regexp instead of the more specific {2}, which means "exactly two". For example,
(\w{2,3})-(\w{2})-(\w{2})-(\d{4})-(\d{2})[.-]?(\d{3})
lets the first segment contain two to three digits or letters, and also allow for letters in segments two and three.
Normalize the string first.
I.e. if you know that the last part is always three characters, then insert a - as the fourth-to-last character, then split the resultant string. Along the same line, convert the dot '.' to a dash '-' and split that string.
Replace all the char which are not digit with emptyString('').
then any of your string become in the format like
012127000000048
now you can use the divide it in (2, 2, 2, 4, 2, 3) parts.
string sentence = "X10 cats, Y20 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,20,40,1
string sentence = "X10.4 cats, Y20.5 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,4,20,5,40,1
But I would like to get like
10.4,20.5,40,1
as decimal numbers. How can I achieve this?
Small improvement to #Michael's solution:
// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
// (c=>...) is the lambda for dealing with each element of the array
// where c is an array element.
// .Trim() == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, #"[^0-9\.]+")
.Where(c => c != "." && c.Trim() != "");
Returns:
10.4
20.5
40
1
The original solution was returning
[empty line here]
10.4
20.5
40
1
.
The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0 omitted, whether or not extract a number that ends with a decimal separator.
A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:
[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
I only changed the capturing group to a non-capturing one (added ?: after (). It matches
If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \. with a character class (or a bracket expression) [.,]:
[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
^^^^
Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ? after \. (demo):
[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
Now, 34 is not matched: is matched.
If you do not want to match float numbers without leading zeros (like .5) make the first digit matching pattern obligatory (by adding + quantifier, to match 1 or more occurrences of digits):
[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
See this demo. Now, it matches much fewer samples:
Now, what if you do not want to match <digits>.<digits> inside <digits>.<digits>.<digits>.<digits>? How to match them as whole words? Use lookarounds:
[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)
And a demo here:
Now, what about those floats that have thousand separators, like 12 123 456.23 or 34,345,767.678? You may add (?:[,\s][0-9]+)* after the first [0-9]+ to match zero or more sequences of a comma or whitespace followed with 1+ digits:
[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])
See the regex demo:
Swap a comma with \. if you need to use a comma as a decimal separator and a period as as thousand separator.
Now, how to use these patterns in C#?
var results = Regex.Matches(input, #"<PATTERN_HERE>")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
try
Regex.Split (sentence, #"[^0-9\.]+")
You'll need to allow for decimal places in your regular expression. Try the following:
\d+(\.\d+)?
This will match the numbers rather than everything other than the numbers, but it should be simple to iterate through the matches to build your array.
Something to keep in mind is whether you should also be looking for negative signs, commas, etc.
Check the syntax lexers for most programming languages for a regex for decimals.
Match that regex to the string, finding all matches.
If you have Linq:
stringArray.Select(s=>decimal.Parse(s));
A foreach would also work. You may need to check that each string is actually a number (.Parse does not throw en exception).
Credit for following goes to #code4life. All I added is a for loop for parsing the integers/decimals before returning.
public string[] ExtractNumbersFromString(string input)
{
input = input.Replace(",", string.Empty);
var numbers = Regex.Split(input, #"[^0-9\.]+").Where(c => !String.IsNullOrEmpty(c) && c != ".").ToArray();
for (int i = 0; i < numbers.Length; i++)
numbers[i] = decimal.Parse(numbers[i]).ToString();
return numbers;
}