Get negative numbers from expression - c#

I'm trying to separate the tokens on a string expression. The expression looks like this:
-1-2+-3
This is the regex I'm using:
[\d\.]+|[-][\d\.]+|\+|\-|\*|\/|\^|\(|\)
This brings me these matches:
-1
-2
+
-3
I was expecting:
-1
-
2
+
-3
Any ideas how can I distinct negative numbers from operators?

Maybe you could try this one; it makes use of a look-behind:
((?<=\d)[+*\/^()-]|\-?[\d.]+)
I tested it here.
Basically, makes sure that there is a number before the operator to decide what to match. So, if there is a digit before the operator, treat the operator alone, otherwise, combine the minus with the digit.
EDIT: Separated the brackets from the lot, just in case (demo):
((?<=\d)[+*\/^-]|[()]|\-?[\d.]+)

This pattern should do what you're looking for:
^(?:(?<num>-?[\d\.]+)(?:(?<op>[-+*/^])|$))+$
For example:
var input = "-1-2+-3";
var pattern = #"^(?:(?<num>-?[\d\.]+)(?:(?<op>[-+*/^])|$))+$";
var match = Regex.Match(input, pattern);
var results =
from Group g in match.Groups.Cast<Group>().Skip(1)
from Capture c in g.Captures
orderby c.Index
select c.Value;
Will produce:
-1
-
2
+
-3

Related

How do I find the Nth occurrence of a pattern with regex?

I have a string of numbers separated by some non-numeric character like this: "16-15316-273"
Is it possible to build regex expression the way it returns me Nth matching group? I heard that ${n} might help, but it does not work for me at least in this expression:
// Example: I want to get 15316
var r = new Regex(#"(\d+)${1}");
var m = r.Match("16-15316-273");
(\d+)${0} returns 16, but (\d+)${1} gives me 273 instead of expected 15316
So N which is order of pattern needed to be extracted and input string itself ("16-15316-273" is just an example) are dynamic values which might change during app execution. The task is to build regex expression the way where the only thing changed inside it is N, and to be applicable to any such string.
Please do not offer solutions with any additional c# code like m.Groups[n] or Split, I'm intentionally asking for building proper Regex pattern for that. In short, I can not modify the code for every new N value, all I can modify is regex expression which is built dynamically, N will be passed as a parameter to the method. All the rest is static, no way to change it.
Maybe this expression will help you?
(?<=(\d+[^\d]+){1})\d+
You will need to modify {1} according to your N.
I.e.
(?<=(\d+[^\d]+){0})\d+ => 16
(?<=(\d+[^\d]+){1})\d+ => 15316
(?<=(\d+[^\d]+){2})\d+ => 273
Your regular expression
(\d+)${1}
says to match this:
(\d+): match 1 or more decimal digits, followed by
${1}: match the atomic zero-width assertion "end of input string" exactly once.
One should note that the {1} quantifier is redundant since there's normally only one end-of-input-string (unless you've turned on the multiline option).
That's why you're matching `273': it's the longest sequence of digits anchored at end-of-string.
You need to use a zero-width positive look-behind assertion. To capture the Nth field in your string, you need to capture that string of digits that is preceded by N-1 fields. Given this source string:
string input = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
The regular expression to match the 3rd field, where the first field is 1 rather than 0 looks like this:
(?<=^(\d+(-|$)){2})\d+
It says to
match the longest sequence of digits that is preceded by
start of text, followed by
a group, consisting of
1 or more decimal digits, followed by
either a - or end-of-text
with that group repeated exactly 2 times
Here's a sample program:
string src = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
for ( int n = 1 ; n <= 10 ; ++n )
{
int n1 = n-1 ;
string x = n1.ToString(CultureInfo.InvariantCulture) ;
string regex = #"(?<=^(\d+(-|$)){"+ x + #"})\d+" ;
Console.Write( "regex: {0} ",regex);
Regex rx = new Regex( regex ) ;
Match m = rx.Match( src ) ;
Console.WriteLine( "N={0,-2}, N-1={1,-2}, {2}" ,
n ,
n1 ,
m.Success ? "success: " + m.Value : "failure"
) ;
}
It produces this output:
regex: (?<=^(\d+(-|$)){0})\d+ N= 1, N-1=0 , success: 1
regex: (?<=^(\d+(-|$)){1})\d+ N= 2, N-1=1 , success: 22
regex: (?<=^(\d+(-|$)){2})\d+ N= 3, N-1=2 , success: 333
regex: (?<=^(\d+(-|$)){3})\d+ N= 4, N-1=3 , success: 4444
regex: (?<=^(\d+(-|$)){4})\d+ N= 5, N-1=4 , success: 55555
regex: (?<=^(\d+(-|$)){5})\d+ N= 6, N-1=5 , success: 666666
regex: (?<=^(\d+(-|$)){6})\d+ N= 7, N-1=6 , success: 7777777
regex: (?<=^(\d+(-|$)){7})\d+ N= 8, N-1=7 , success: 88888888
regex: (?<=^(\d+(-|$)){8})\d+ N= 9, N-1=8 , success: 999999999
regex: (?<=^(\d+(-|$)){9})\d+ N=10, N-1=9 , failure
Try this:
string text = "16-15316-273";
Regex r = new Regex(#"\d+");
var m = r.Match(text, text.IndexOf('-'));
The output is 15316 ;)

Get sub-strings from a string that are enclosed using some specified character

Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?
If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345
Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo
For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.
Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());
const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));
Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.

Regex regular expression

I'm programming a calculator in C# to begin with that.
I will separate a string in two variables= Nb1 and Nb2. I looked on the web for examples and I found something :
var numAlpha = new Regex("(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)");
var match = numAlpha.Match("codename123");
var alpha = match.Groups["Alpha"].Value; // Alpha = codename
var num = match.Groups["Numeric"].Value; // Numeric = 123
I fails to only adapt for the numbers : " 121165468746*1132" or "4586/6953"
Nb1 =121165468746 || 4586
Nb2 =1132 || 6953
Can you help me ? I'm going crazy :-)
var numAlpha = new Regex("(?<NumOne>[0-9]+)(?<Operator>[^0-9])(?<NumTwo>[0-9]+)");
var match = numAlpha.Match("121165468746*1132");
var nb1 = match.Groups["NumOne"].Value; // nb1 = 121165468746
var nb2 = match.Groups["NumTwo"].Value; // nb2 = 1132
var op = match.Groups["Operator"].Value; // op = *
It looks like what you're trying to do is match some pair of integers separated by an operator. The above regex uses named groups (?<GroupName> ... ), and two simple regular expressions to achieve that. [0-9]+ will match one or more digits, and [^0-9] will match any one non-digit character, which here is being assumed to be the operator.
If this isn't what you were looking for, leave a comment and I'll try to help you out. In the mean time, some reading material:
Regular Expressions Page, with plenty of tutorials and references
Javascript Regular Expressions Tester with Syntax Highlighting
Java RegEx Tester. More powerful, fewer frills.
Try to use this pattern for your Regex: it assumes that there are at least two numbers with one or more digits, separated by one or more non-digit characters (in case the operator is not only one character). The groups are called n1 and n2.
^(?<n1>\d+)[^\d]+?(?<n2>\d+)$
Use the following to match numbers with predefined 4 basic operations Multiply, Subtract, Add, Divide. You can add more operators to the "op" expressions as per your need.
Regex rg = new Regex(#"(?<num1>[0-9]+)(?<op>[\*\-\+\\])(?<num2>[0-9]+)");

Regex "or" Expression

This is probably a really basic question, but I can't find any answers. I need to match a string by either two or more spaces OR an equals sign.
When I split this string: 9 x 13 = (8.9 x 13.4) (89 x 134)
with ( +) I get:
part 0: 9 x 13 = (8.9 x 13.4)
part 1: (89 x 134)
When I split it with (=) I get:
part 0: 9 x 13
part 1: (8.9 x 13.4) (89 x 134)
How can split by BOTH? Something like: (=)OR( +)
Edit:
This does not work(=)|( +), I was expecting:
part 0: 9 x 13
part 1: (8.9 x 13.4)
part 2: (89 x 134)
Your regex should have worked, except it would leave the spaces that were before and after the =. That's assuming you really did use two spaces in the ( +) part (which got normalized to one space by SO's formatting). This one yields the exact result you said you want:
#" {2,}|\s*=\s*"
Simply,
Pattern = "\s*=\s*|(?!\))\s+?(?=\()"
(=)|( +)
Is that good for you?
Explanation and example:
http://msdn.microsoft.com/en-us/library/ze12yx1d.aspx , scroll down to the 3rd remark...
You can use a regex like this: [= ]+
var regex = new Regex("[= ]+");
var parts = regex.Split("this is=a test");
// parts = [ "this", "is", "a", "test" ]
If you want to keep the separators enclose the regex in parens: ([= ]+)

How to extract decimal number from string in C#

string sentence = "X10 cats, Y20 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,20,40,1
string sentence = "X10.4 cats, Y20.5 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,4,20,5,40,1
But I would like to get like
10.4,20.5,40,1
as decimal numbers. How can I achieve this?
Small improvement to #Michael's solution:
// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
// (c=>...) is the lambda for dealing with each element of the array
// where c is an array element.
// .Trim() == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, #"[^0-9\.]+")
.Where(c => c != "." && c.Trim() != "");
Returns:
10.4
20.5
40
1
The original solution was returning
[empty line here]
10.4
20.5
40
1
.
The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0 omitted, whether or not extract a number that ends with a decimal separator.
A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:
[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
I only changed the capturing group to a non-capturing one (added ?: after (). It matches
If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \. with a character class (or a bracket expression) [.,]:
[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
^^^^
Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ? after \. (demo):
[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
Now, 34 is not matched: is matched.
If you do not want to match float numbers without leading zeros (like .5) make the first digit matching pattern obligatory (by adding + quantifier, to match 1 or more occurrences of digits):
[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
See this demo. Now, it matches much fewer samples:
Now, what if you do not want to match <digits>.<digits> inside <digits>.<digits>.<digits>.<digits>? How to match them as whole words? Use lookarounds:
[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)
And a demo here:
Now, what about those floats that have thousand separators, like 12 123 456.23 or 34,345,767.678? You may add (?:[,\s][0-9]+)* after the first [0-9]+ to match zero or more sequences of a comma or whitespace followed with 1+ digits:
[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])
See the regex demo:
Swap a comma with \. if you need to use a comma as a decimal separator and a period as as thousand separator.
Now, how to use these patterns in C#?
var results = Regex.Matches(input, #"<PATTERN_HERE>")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
try
Regex.Split (sentence, #"[^0-9\.]+")
You'll need to allow for decimal places in your regular expression. Try the following:
\d+(\.\d+)?
This will match the numbers rather than everything other than the numbers, but it should be simple to iterate through the matches to build your array.
Something to keep in mind is whether you should also be looking for negative signs, commas, etc.
Check the syntax lexers for most programming languages for a regex for decimals.
Match that regex to the string, finding all matches.
If you have Linq:
stringArray.Select(s=>decimal.Parse(s));
A foreach would also work. You may need to check that each string is actually a number (.Parse does not throw en exception).
Credit for following goes to #code4life. All I added is a for loop for parsing the integers/decimals before returning.
public string[] ExtractNumbersFromString(string input)
{
input = input.Replace(",", string.Empty);
var numbers = Regex.Split(input, #"[^0-9\.]+").Where(c => !String.IsNullOrEmpty(c) && c != ".").ToArray();
for (int i = 0; i < numbers.Length; i++)
numbers[i] = decimal.Parse(numbers[i]).ToString();
return numbers;
}

Categories