Regex.Split on plus and minus sign - c#

I have a string 1.5(+1.2/-0.5). I want to use Regex to extract numerical value: {1.5, 1.2, 0.5}.
My plan is to split the string with (, +, / and -. When I do split with ( and /, it splits OK, but if I also add + and -, then program crashes.
string[] foo = Regex.Split("1.5(+1.5/-0.5)", #"(?=[(/)])");
// OK
string[] foo = Regex.Split("1.5(+1.5/-0.5)", #"(?=[(/+-)])");
// Exception catched
And the caught exception is:
System.ArgumentException: parsing "(?=[(/+-)])" - [x-y] range in
reverse order

The dash is a special character when inside square brackets in a regexp. It means a range: [a-z] means any character from a to z. When you wrote [(/+-)], it would actually mean (, or any character from + to ). The error comes from the fact that in ASCII ordering ) comes before +, so a character range [+-)] is invalid.
To fix this, dash must always come first or last when in brackets, or it needs to be backslashed.
And I agree, I'd probably use a global regexp to pick out [0-9.]+, and not a split to cut on everything else.

Tried to escape signs like +?
And why not a RegEx like /\d+\.?\d+/ ? This won't split it but return the numbers.

Related

Regex to match if string is exactly as defined

How can I check that the string is in correct format. I want the string to compare and pass only if matches exactly. Following are the correct formats :
0.#
0.##
0.###
0.####
0.#####
The hash (#) after the dot (.) can be upto 10 characters but it should only have 0.# nothing else is allowed.
Can someone please guide me how can I validate a string of this type ?
Im Regular Expression the carret (^) represent start-of-line and the ($) represents end-of-line (or before newline).
A regex with an exact match is just what you want enclosed by ^ and $. But you must ensure that special regular expression characters are quoted. For example the regex
^Hello World$
would match exactly on the String "Hello World" and nothing else.
You also can use numbers directly. You need to escape the dot "." as a dot in a regular expression means any character except newline. You escape a character by adding a backslash.
Next you should know about quantifiers. The usually ones are
-> 0 or many
-> 1 or many
{n} -> exactly n times
{n,} -> at least n times
{n,m} -> n to m times
So you can write:
^0\.#{1,10}$
If you use a normal string in C# with quotations (") you must use two backslashes
^0\\.#{1,10}$

Regular expression to validate a mathematical formula

I need to validate a string using regex to confirm whether it is following a valid format.
The string can contain numbers, operators, space, dot, left parenthesis, right parenthesis, comma, these aggregate functions SUM, MAX, MIN, AVG and variables starting with letter V.
I found this regex ^[0-9+-/()., ]+$ this checks 0-9 (numbers); '+'; '-'; ''; '/'; '('; ')'; '.'; ','; ' '(space). But I am not able to include aggregate functions and letter V in this.
Some of the valid input strings are
AVG(SUM(1, 2, 3), SUM(4, 5, 6)) * 100
SUM(V1/2,(2+7),3)+(V1+V2)
Can someone please help me on this.
From the comments on the question:
Are you trying to ensure that only valid characters, aggregate functions, and variable names appear in the string or are you attempting to also check that the string is well formatted (i.e. there is an operand on either side of an operator, parenthesis are matched, etc...)?
- D M
#D M I am just trying to validate only for valid characters
- DevMJ
Since you're only looking to check that a formula contains digits, functions, variables, etc (and not that it is also valid for execution), you can add possibilities as alternatives in one group.
One possibility is the pattern ^(?:\d|\+|\-|\/|\*|\(|\)|\.|\,|AVG|SUM|MAX|MIN|V\d+| )*$ which matches the samples you provided.
Try it out!
Explanation:
Token
Matches
^
Start of a line
(?:
Start of the non-capturing group of alternatives
\d
A digit (equivalent to [0-9])
\+
The + character
\-
The - character
\/
The / character
\*
The * character
\(
The ( character
\)
The ) character
\.
The . character
\,
The , character
AVG
The string AVG
SUM
The string SUM
MAX
The string MAX
MIN
The string MIN
V\d+
The V character followed by one or more digits
A space
)
End of the non-capturing group of alternatives
*
Any of the alternatives zero or more times
$
End of a line
As mentioned in the comments, if you also want to check that the string can be executed successfully, you will need to look into defining a context-free grammar for your "language" and using a tool like ANTLR to parse strings using the grammar.
Since all you care for is the valid characters, that's indeed a job for regexes.
A simple way to filter this is just to add letters to the valid characters:
^[A-Z0-9+-/()., ]+$
You can even add a-z if you want to allow lowercase characters as well.

Add special characters to alphanumeric regex

Brand new to using Regular Expressions. I have one that currently accepts alphanumeric characters only. I need to add the following special characters to the regex:
# #$%*():;"',/? !+=-_
Here is the regular expression:
RegularExpression(#"^[a-zA-Z\s.,0-9-]{1,30}$",
When I try to add the special characters, I alter the Regex like so:
RegularExpression(#"^[a-zA-Z\s.,0-9-# #$%*():;"',/? !+=-_]{1,30}$"
However this throws an error starting with the ' character that says Newline in constant.
I've tied to escape both the " and the ' characters, however without any luck.
the problem comes from the double quote that need to be escaped (""), not from the single quote.
#"^[a-zA-Z\s.,0-9##$%*():;""'/?!+=_-]{1,30}$"
note that the - must be at the last (or first) position in a character class, since it has a special meaning (define ranges)
These regexs' are equivalent to yours.
Both use tilde ~ as the delimeter.
Both use double quotes on the regex strings.
Note that in order for the the dash - in class to be interpreted literally and not as a range operator, it must exist somewhere disambiguous, or be escaped.
A good place to put it is between valid ranges (or at the beginning or end of a class).
For example [a-z-0-9] is a good place.
Edit - '-' Literal may have to be escaped or beginning/end of class. (This case was for Perl/PCRE engines)
This one ^[a-z-A-Z0-9_\s.,##$%*():;"',/?!+=]{1,30}$ is your regex without duplicate chars.
To make it more readable noting that the word class is contained, it can be reduced to
^[\w-\s.,##$%*():;"',/?!+=]{1,30}$
Edit - Php test cases removed.

regex issue c# numbers are underscores now

My Regex is removing all numeric (0-9) in my string.
I don't get why all numbers are replaced by _
EDIT: I understand that my "_" regex pattern changes the characters into underscores. But not why numbers!
Can anyone help me out? I only need to remove like all special characters.
See regex here:
string symbolPattern = "[!##$%^&*()-=+`~{}'|]";
Regex.Replace("input here 12341234" , symbolPattern, "_");
Output: "input here ________"
The problem is your pattern uses a dash in the middle, which acts as a range of the ascii characters from ) to =. Here's a breakdown:
): 41
1: 49
=: 61
As you can see, numbers start at 49, and falls between the range of 41-61, so they're matched and replaced.
You need to place the - at either the beginning or end of the character class for it to be matched literally rather than act as a range:
"[-!##$%^&*()=+`~{}'|]"
you must escape - because sequence [)-=] contains digits
string symbolPattern = "[!##$%^&*()\-=+`~{}'|]";
Move the - to the end of the list so it is seen as a literal:
"[!##$%^&*()=+`~{}'|-]"
Or, to the front:
"[-!##$%^&*()=+`~{}'|]"
As it stands, it will match all characters in the range )-=, which includes all numerals.
You need to escape your special characters in your regex. For instance, * is a wildcard match. Look at what some of those special characters mean for your match.
I've not used C#, but typically the "*" character is also a control character that would need escaping.
The following matches a whole line of any characters, although the "^" and "$" are some what redundant:
^.*$
This matches any number of "A" characters that appear in a string:
A*
The "Owl" book from oreilly is what you really need to research this:
http://shop.oreilly.com/product/9780596528126.do?green=B5B9A1A7-B828-5E41-9D38-70AF661901B8&intcmp=af-mybuy-9780596528126.IP

Regular Expression: single word

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

Categories