How to extract decimal number from string in C# - c#

string sentence = "X10 cats, Y20 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,20,40,1
string sentence = "X10.4 cats, Y20.5 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,4,20,5,40,1
But I would like to get like
10.4,20.5,40,1
as decimal numbers. How can I achieve this?

Small improvement to #Michael's solution:
// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
// (c=>...) is the lambda for dealing with each element of the array
// where c is an array element.
// .Trim() == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, #"[^0-9\.]+")
.Where(c => c != "." && c.Trim() != "");
Returns:
10.4
20.5
40
1
The original solution was returning
[empty line here]
10.4
20.5
40
1
.

The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0 omitted, whether or not extract a number that ends with a decimal separator.
A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:
[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
I only changed the capturing group to a non-capturing one (added ?: after (). It matches
If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \. with a character class (or a bracket expression) [.,]:
[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
^^^^
Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ? after \. (demo):
[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
Now, 34 is not matched: is matched.
If you do not want to match float numbers without leading zeros (like .5) make the first digit matching pattern obligatory (by adding + quantifier, to match 1 or more occurrences of digits):
[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
See this demo. Now, it matches much fewer samples:
Now, what if you do not want to match <digits>.<digits> inside <digits>.<digits>.<digits>.<digits>? How to match them as whole words? Use lookarounds:
[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)
And a demo here:
Now, what about those floats that have thousand separators, like 12 123 456.23 or 34,345,767.678? You may add (?:[,\s][0-9]+)* after the first [0-9]+ to match zero or more sequences of a comma or whitespace followed with 1+ digits:
[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])
See the regex demo:
Swap a comma with \. if you need to use a comma as a decimal separator and a period as as thousand separator.
Now, how to use these patterns in C#?
var results = Regex.Matches(input, #"<PATTERN_HERE>")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

try
Regex.Split (sentence, #"[^0-9\.]+")

You'll need to allow for decimal places in your regular expression. Try the following:
\d+(\.\d+)?
This will match the numbers rather than everything other than the numbers, but it should be simple to iterate through the matches to build your array.
Something to keep in mind is whether you should also be looking for negative signs, commas, etc.

Check the syntax lexers for most programming languages for a regex for decimals.
Match that regex to the string, finding all matches.

If you have Linq:
stringArray.Select(s=>decimal.Parse(s));
A foreach would also work. You may need to check that each string is actually a number (.Parse does not throw en exception).

Credit for following goes to #code4life. All I added is a for loop for parsing the integers/decimals before returning.
public string[] ExtractNumbersFromString(string input)
{
input = input.Replace(",", string.Empty);
var numbers = Regex.Split(input, #"[^0-9\.]+").Where(c => !String.IsNullOrEmpty(c) && c != ".").ToArray();
for (int i = 0; i < numbers.Length; i++)
numbers[i] = decimal.Parse(numbers[i]).ToString();
return numbers;
}

Related

Regex pattern not working on my C# code however it works on an online tester

I want to extract the double value from the string that contains a specific keyword. For example:
Amount : USD 3,747,190.67
I need to extract the value "3,747,190.67" from the string above using the keyword Amount, for that I tested this pattern in different online Regex testers and it works:
(?<=\bAmount.*)(\d+\,*\.*)*
However it doesn't work on my C# code:
if (type == typeof(double))
{
double doubleVal = 0;
pattern = #"(?<=\bAmount.*)(\d+\,*\.*)*";
matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
return doubleVal;
}
This one works:
(?<=\bAmount.*)\d+(,\d+)*(\.\d+)?
(?<=\bAmount.*) the look behind
\d+                      leading digits (at least one digit)
(,\d+)*               thousands groups (zero or more times)
(\.\d+)?             decimals (? = optional)
Note that the regex tester says "9 matches found" for your pattern. For my pattern it says "1 match found".
The problem with your pattern is that its second part (\d+\,*\.*)* can be empty because of the * at the end. The * quantifier means zero, one or more repetitions. Therefore, the look-behind finds 8 empty entries between Amount and the number. The last of the 9 matches is the number. You can correct it by replacing the * with a +. See: regextester with *, regextester with +. You can also test it with "your" tester and switch to the table to see the detailed results.
My solution does not allow consecutive commas or points but allows numbers without thousands groups or a decimal part.
The lookbehind (?<=\bAmount.*) is always true in the example data after Amount.
The first 7 matches are empty, as (\d+\,*\.*)* can not consume a character where there is no digit, but it matches at the position as the quantifier * matches 0 or more times.
See this screenshot of the matches:
You might use
(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b
(?<=\bAmount\D*) Positive lookbehind, assert Amount to the left followed by optional non digits
\d{1,3} Match 1-3 digits
(?:,\d{3})* Optionally repeat , and 3 digits
(?:\.\d{1,2})? Optionally match . and 1 or 2 digits
\b A word boundary
See a .NET regex demo
For example
double doubleVal = 0;
var pattern = #"(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b";
var textToParse = "Amount : USD 3,747,190.67";
var matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
Console.WriteLine(doubleVal);
Output
3747190.67
You can omit the word boundaries if needed if a partial match is also valid
(?<=Amount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?

Regex expression for matching only floating point numbers

Hi i need a Regex Expression for extracting only floating point numbers from right to left
Example string
Earning per Equity Share (in ) face value of 2 each26 1,675.10
1,252.56
My current Regex
(\+|-)?[0-9][0-9]*(\,[0-9]*)?(\.[0-9]*)? with Rex options-Right to left
but
Current Output is
1,252.56
1,675.10
26
2
However i do not want to match on 26 or 2
Please help me
Maybe something like this will help
Regex
/[-+]?[0-9,\.]*([,\.])[0-9]*/g
Example input
Earning -34 5 b4 pe8r blah4 t3st + - (in) 1,252.56 face
-12234,23423.342 of 1,675.10 1,252.56
Matches
1,252.56
-12234,23423.342
1,675.10
1,252.56
Explanation
[-+]? match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
-+ a single character in the list -+ literally
[0-9,\.]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
, the literal character ,
\. matches the character . literally
1st Capturing group ([,\.])
[,\.] match a single character present in the list below
, the literal character ,
\. matches the character . literally
[0-9]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
g modifier: global. All matches (don't return on first match)
Although this is a Regex question this is also taged as C#.
Below is an example of how you might get a little bit more control over your output.
It's also culture-specific and only picks up numbers with a decimal place, and has no false positives.
Method
private List<double> GetNumbers(string input)
{
// declare result
var resultList = new List<double>();
// if input is empty return empty results
if (string.IsNullOrEmpty(input))
{
return resultList;
}
// Split input in to words, exclude empty entries
var words = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// set your desirted culture
var culture = CultureInfo.CreateSpecificCulture("en-GB");
// Refine words into a list that represents potential numbers
// must have decimal place, must not start or end with decimal place
var refinedList = words.Where(x => x.Contains(".") && !x.StartsWith(".") && !x.EndsWith("."));
foreach (var word in refinedList)
{
double value;
// parse words using designated culture, and the Number option of double.TryParse
if (double.TryParse(word, NumberStyles.Number, culture, out value))
{
resultList.Add(value);
}
}
return resultList;
}
Usage
var testString = "Earning -34 5 b4 , . 234. 234, ,345 45.345 $234234 234.3453.345 $23423.2342 +234 -23423 pe8r blah4 t3st + - (in) 1,252.56 face -12234,23423.342 of 1,675.10 1,252.56";
var results = GetNumbers(testString);
foreach (var item in results)
{
Debug.WriteLine("{0}", item);
}
Output
45.345
1252.56
-1223423423.342
1675.1
1252.56
Additional Notes
You can learn more about double.TryParse and its options here.
You can learn more about the CultureInfo Class here.

Get sub-strings from a string that are enclosed using some specified character

Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?
If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345
Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo
For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.
Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());
const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));
Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.

How can I split part of a string that is inconsistent?

I have the following string:
01-21-27-0000-00-048 and it is easy to split it apart because each section is separated by a -, but sometimes this string is represented as 01-21-27-0000-00048, so splitting it is not as easy because the last 2 parts are combined. How can I handle this? Also, what about the case where it might be something like 01-21-27-0000-00.048
In case anyone is curious, this is a parcel number and it varies from county to county and a county can have 1 format or they can have 100 formats.
This is a very good case for using regular expressions. You string matches the following regexp:
(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})
Match the input against this expression, and harvest the six groups of digits from the match:
var str = new[] {
"01-21-27-0000-00048", "01-21-27-0000-00.048", "01-21-27-0000-00-048"
};
foreach (var s in str) {
var m = Regex.Match(s, #"(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})");
for (var i = 1 /* one, not zero */ ; i != m.Groups.Count ; i++) {
Console.Write("{0} ", m.Groups[i]);
}
Console.WriteLine();
}
If you would like to allow for other characters, say, letters in the segments that are separated by dashes, you could use \w instead of \d to denote a letter, a digit, or an underscore. If you would like to allow an unspecified number of such characters within a known range, say, two to four, you can use {2,4} in the regexp instead of the more specific {2}, which means "exactly two". For example,
(\w{2,3})-(\w{2})-(\w{2})-(\d{4})-(\d{2})[.-]?(\d{3})
lets the first segment contain two to three digits or letters, and also allow for letters in segments two and three.
Normalize the string first.
I.e. if you know that the last part is always three characters, then insert a - as the fourth-to-last character, then split the resultant string. Along the same line, convert the dot '.' to a dash '-' and split that string.
Replace all the char which are not digit with emptyString('').
then any of your string become in the format like
012127000000048
now you can use the divide it in (2, 2, 2, 4, 2, 3) parts.

Regex for numbers only

I haven't used regular expressions at all, so I'm having difficulty troubleshooting. I want the regex to match only when the contained string is all numbers; but with the two examples below it is matching a string that contains all numbers plus an equals sign like "1234=4321". I'm sure there's a way to change this behavior, but as I said, I've never really done much with regular expressions.
string compare = "1234=4321";
Regex regex = new Regex(#"[\d]");
if (regex.IsMatch(compare))
{
//true
}
regex = new Regex("[0-9]");
if (regex.IsMatch(compare))
{
//true
}
In case it matters, I'm using C# and .NET2.0.
Use the beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
Note that "\d" will match [0-9] and other digit characters like the Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩. Use "^[0-9]+$" to restrict matches to just the Arabic numerals 0 - 9.
If you need to include any numeric representations other than just digits (like decimal values for starters), then see #tchrist's comprehensive guide to parsing numbers with regular expressions.
Your regex will match anything that contains a number, you want to use anchors to match the whole string and then match one or more numbers:
regex = new Regex("^[0-9]+$");
The ^ will anchor the beginning of the string, the $ will anchor the end of the string, and the + will match one or more of what precedes it (a number in this case).
If you need to tolerate decimal point and thousand marker
var regex = new Regex(#"^-?[0-9][0-9,\.]+$");
You will need a "-", if the number can go negative.
This works with integers and decimal numbers. It doesn't match if the number has the coma thousand separator ,
"^-?\\d*(\\.\\d+)?$"
some strings that matches with this:
894
923.21
76876876
.32
-894
-923.21
-76876876
-.32
some strings that doesn't:
hello
9bye
hello9bye
888,323
5,434.3
-8,336.09
87078.
It is matching because it is finding "a match" not a match of the full string. You can fix this by changing your regexp to specifically look for the beginning and end of the string.
^\d+$
Perhaps my method will help you.
public static bool IsNumber(string s)
{
return s.All(char.IsDigit);
}
If you need to check if all the digits are number (0-9) or not,
^[0-9]+$
Matches
1425
0142
0
1
And does not match
154a25
1234=3254
Sorry for ugly formatting.
For any number of digits:
[0-9]*
For one or more digit:
[0-9]+
^\d+$, which is "start of string", "1 or more digits", "end of string" in English.
Here is my working one:
^(-?[1-9]+\\d*([.]\\d+)?)$|^(-?0[.]\\d*[1-9]+)$|^0$
And some tests
Positive tests:
string []goodNumbers={"3","-3","0","0.0","1.0","0.1","0.0001","-555","94549870965"};
Negative tests:
string []badNums={"a",""," ","-","001","-00.2","000.5",".3","3."," -1","--1","-.1","-0"};
Checked not only for C#, but also with Java, Javascript and PHP
Use beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
While non of the above solutions was fitting my purpose, this worked for me.
var pattern = #"^(-?[1-9]+\d*([.]\d+)?)$|^(-?0[.]\d*[1-9]+)$|^0$|^0.0$";
return Regex.Match(value, pattern, RegexOptions.IgnoreCase).Success;
Example of valid values:
"3",
"-3",
"0",
"0.0",
"1.0",
"0.7",
"690.7",
"0.0001",
"-555",
"945465464654"
Example of not valid values:
"a",
"",
" ",
".",
"-",
"001",
"00.2",
"000.5",
".3",
"3.",
" -1",
"--1",
"-.1",
"-0",
"00099",
"099"
Another way: If you like to match international numbers such as Persian or Arabic, so you can use following expression:
Regex = new Regex(#"^[\p{N}]+$");
To match literal period character use:
Regex = new Regex(#"^[\p{N}\.]+$");
Regex for integer and floating point numbers:
^[+-]?\d*\.\d+$|^[+-]?\d+(\.\d*)?$
A number can start with a period (without leading digits(s)),
and a number can end with a period (without trailing digits(s)).
Above regex will recognize both as correct numbers.
A . (period) itself without any digits is not a correct number.
That's why we need two regex parts there (separated with a "|").
Hope this helps.
I think that this one is the simplest one and it accepts European and USA way of writing numbers e.g. USA 10,555.12 European 10.555,12
Also this one does not allow several commas or dots one after each other e.g. 10..22 or 10,.22
In addition to this numbers like .55 or ,55 would pass. This may be handy.
^([,|.]?[0-9])+$
console.log(/^(0|[1-9][0-9]*)$/.test(3000)) // true
If you want to extract only numbers from a string the pattern "\d+" should help.
To check string is uint, ulong or contains only digits one .(dot) and digits
Sample inputs
Regex rx = new Regex(#"^([1-9]\d*(\.)\d*|0?(\.)\d*[1-9]\d*|[1-9]\d*)$");
string text = "12.0";
var result = rx.IsMatch(text);
Console.WriteLine(result);
Samples
123 => True
123.1 => True
0.123 => True
.123 => True
0.2 => True
3452.434.43=> False
2342f43.34 => False
svasad.324 => False
3215.afa => False
The following regex accepts only numbers (also floating point) in both English and Arabic (Persian) languages (just like Windows calculator):
^((([0\u0660\u06F0]|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+)(\.)[0-9\u0660-\u0669\u06F0-\u06F9]+)|(([0\u0660\u06F0]?|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+))|\b)$
The above regex accepts the following patterns:
11
1.2
0.3
۱۲
۱.۳
۰.۲
۲.۷
The above regex doesn't accept the following patterns:
3.
.3
0..3
.۱۲
Regex regex = new Regex ("^[0-9]{1,4}=[0-9]{1,4]$")

Categories