I need to validate a string using regex to confirm whether it is following a valid format.
The string can contain numbers, operators, space, dot, left parenthesis, right parenthesis, comma, these aggregate functions SUM, MAX, MIN, AVG and variables starting with letter V.
I found this regex ^[0-9+-/()., ]+$ this checks 0-9 (numbers); '+'; '-'; ''; '/'; '('; ')'; '.'; ','; ' '(space). But I am not able to include aggregate functions and letter V in this.
Some of the valid input strings are
AVG(SUM(1, 2, 3), SUM(4, 5, 6)) * 100
SUM(V1/2,(2+7),3)+(V1+V2)
Can someone please help me on this.
From the comments on the question:
Are you trying to ensure that only valid characters, aggregate functions, and variable names appear in the string or are you attempting to also check that the string is well formatted (i.e. there is an operand on either side of an operator, parenthesis are matched, etc...)?
- D M
#D M I am just trying to validate only for valid characters
- DevMJ
Since you're only looking to check that a formula contains digits, functions, variables, etc (and not that it is also valid for execution), you can add possibilities as alternatives in one group.
One possibility is the pattern ^(?:\d|\+|\-|\/|\*|\(|\)|\.|\,|AVG|SUM|MAX|MIN|V\d+| )*$ which matches the samples you provided.
Try it out!
Explanation:
Token
Matches
^
Start of a line
(?:
Start of the non-capturing group of alternatives
\d
A digit (equivalent to [0-9])
\+
The + character
\-
The - character
\/
The / character
\*
The * character
\(
The ( character
\)
The ) character
\.
The . character
\,
The , character
AVG
The string AVG
SUM
The string SUM
MAX
The string MAX
MIN
The string MIN
V\d+
The V character followed by one or more digits
A space
)
End of the non-capturing group of alternatives
*
Any of the alternatives zero or more times
$
End of a line
As mentioned in the comments, if you also want to check that the string can be executed successfully, you will need to look into defining a context-free grammar for your "language" and using a tool like ANTLR to parse strings using the grammar.
Since all you care for is the valid characters, that's indeed a job for regexes.
A simple way to filter this is just to add letters to the valid characters:
^[A-Z0-9+-/()., ]+$
You can even add a-z if you want to allow lowercase characters as well.
Related
I'm trying to make a suvat calculator so one can input decimals, a letter (e.g., S) and a question mark if you do not have a value.
Tests that will be valid include "2.3", "S", "?" but not values like "2.5s", "??", etc (only one type, can't have decimals AND a letter in the same input box)
Is there a regex expression for this? So far I have only got the regex for the decimal number:
^[0-9]\\d*(\\.\\d+)
I did also try a way simpler one but I would like a more developed expression for later on.
[0-9sS.?]
if i got your use case right, then this might work:
^(\?|(\d+\.?\d+)|\S)$
Read it as: The word contains either one question mark,
or a numeric value with propably a dot and numbers behind that
or a single letter
You can try it our here:
https://regex101.com/r/wLGJhJ/1
You can use
#"^(?:[0-9]+(?:\.[0-9]+)?|[A-Za-z?])\z"
Details:
^ - start of string
(?: - start of a non-capturing group:
[0-9]+ - one or more ASCII digits
(?:\.[0-9]+)? - an optional occurrence of . and one or more ASCII digits
| - or
[A-Za-z?] - an ASCII letter or ?char
) - end of the group
\z - the very end of string.
See a .NET regex demo online.
How can I check that the string is in correct format. I want the string to compare and pass only if matches exactly. Following are the correct formats :
0.#
0.##
0.###
0.####
0.#####
The hash (#) after the dot (.) can be upto 10 characters but it should only have 0.# nothing else is allowed.
Can someone please guide me how can I validate a string of this type ?
Im Regular Expression the carret (^) represent start-of-line and the ($) represents end-of-line (or before newline).
A regex with an exact match is just what you want enclosed by ^ and $. But you must ensure that special regular expression characters are quoted. For example the regex
^Hello World$
would match exactly on the String "Hello World" and nothing else.
You also can use numbers directly. You need to escape the dot "." as a dot in a regular expression means any character except newline. You escape a character by adding a backslash.
Next you should know about quantifiers. The usually ones are
-> 0 or many
-> 1 or many
{n} -> exactly n times
{n,} -> at least n times
{n,m} -> n to m times
So you can write:
^0\.#{1,10}$
If you use a normal string in C# with quotations (") you must use two backslashes
^0\\.#{1,10}$
I have created the following search patterns:
1) Search numbers within given range and excludes specific numbers (excludes 1,2,8)
string numberPattern = #"^([3-7|9 ]*)$";
2) Search letters within given range and excludes specific characters (excludes B,V)
string characterPattern = #"^(?:(?![BV])[A-Z ])+$";
And there can be three kind of inputs:
Input can be just characters: ANRPIGHSAGASGG
Input can be just numbers: 34567934567967
Input can be letters and numbers: 9ANRPIG34HS56A
Question:
Is there a way to tell regex, if using number pattern then it ignores characters and same for character pattern, that it would ignore numbers? The data just can be mixed, in mixed order, I just don't see other way than grouping numbers and characters in different lists and then use related pattern. Is there a way to accomplish that using only regex?
I suggest using
^[3-79A-Z -[BV]]*$
See the regex demo.
Details:
^ - a start of a string anchor
[3-79A-Z -[BV]]* - zero or more (*) characters:
3-79A-Z - digits from 3 to 7, 9, uppercase ASCII letters and a space except B and V ASCII letters (the -[BV] is a character class subtraction construct)
$ - end of string anchor.
Put it into a more readable state so you can maintain it.
^(?:[0-9A-Z](?<![128BV]))+$
Explained
^ # Beginning of string
(?: # Cluster group
[0-9A-Z] # Initially allow 0-9 or A-Z
(?<! [128BV] ) # Qualify, not 1,2,8,B,V
)+ # End cluster, must be at least 1 character
$ # End of string
I am trying to combine two Regular Expression patterns to determine if a String is either a double value or a variable. My restrictions are as follows:
The variable can only begin with an _ or alphabetical letter (A-Z, ignoring case), but it can be followed by zero or more _s, letters, or digits.
Here's what I have so far, but I can't get it to work properly.
String varPattern = #"[a-zA-Z_](?: [a-zA-Z_]|\d)*";
String doublePattern = #"(?: \d+\.\d* | \d*\.\d+ | \d+ ) (?: [eE][\+-]?\d+)?";
String pattern = String.Format("({0}) | ({1})",
varPattern, doublePattern);
Regex.IsMatch(word, varPattern, RegexOptions.IgnoreCase)
It seems that it is capturing both Regular Expression patterns, but I need it to be either/or.
For example, _A2 2 is valid using the code above, but _A2 is invalid.
Some examples of valid variables are as follows:
_X6 , _ , A , Z_2_A
And some examples of invalid variables are as follows:
2_X6 , $2 , T_2$
I guess I just need clarification on the pattern format for the Regular Expression. The format is unclear to me.
As noted, the literal whitespace you've put in your regular expressions is part of the regular expression. You won't get a match unless that same whitespace is in the text being scanned by the regular expression. If you want to use whitespace to make your regex, you'll need to specify RegexOptions.IgnorePatternWhitespace, after that, if you want to match any whitespace, you'll have to do so explicitly, either by specifying \s, \x20, etc.
It should be noted that if you do specify RegexOptions.IgnorePatternWhitespace, you can use Perl-style comments (# to end of line) to document your regular expression (as I've done below). For complex regular expressions, someone 5 years from now — who might be you! — will thank you for the kindness.
Your [presumably intended] patterns are also, I think, more complex than they need be. A regular expression to match the identifier rule you've specified is this:
[a-zA-Z_][a-zA-Z0-9_]*
Broken out into its constituent parts:
[a-zA-Z_] # match an upper- or lower-case letter or an underscore, followed by
[a-zA-Z0-9_]* # zero or more occurences of an upper- or lower-case letter, decimal digit or underscore
A regular expression to match the conventional style of a numeric/floating-point literal is this:
([+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
Broken out into its constituent parts:
( # a mandatory group that is the integer portion of the value, consisting of
[+-]? # - an optional plus- or minus-sign, followed by
[0-9]+ # - one or more decimal digits
) # followed by
( # an optional group that is the fractional portion of the value, consisting of
\. # - a decimal point, followed by
[0-9]+ # - one or more decimal digits
)? # followed by,
( # an optional group, that is the exponent portion of the value, consisting of
[Ee] # - The upper- or lower-case letter 'E' indicating the start of the exponent, followed by
[+-]? # - an optional plus- or minus-sign, followed by
[0-9]+ # - one or more decimal digits.
)? # Easy!
Note: Some grammars differ as to whether the sign of the value is a unary operator or part
of the value and whether or not a leading + sign is allowed. Grammars also vary as to whether
something like 123245. is valid (e.g., is a decimal point with no fractional digits valid?)
To combine these two regular expression,
First, group each of them with parentheses (you might want to name the containing groups, as I've done):
(?<identifier>[a-zA-Z_][a-zA-Z0-9_]*)
(?<number>[+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
Next, combine with the alternation operation, |:
(?<identifier>[a-zA-Z_][a-zA-Z0-9_]*)|(?<number>[+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
Finally, enclose the whole shebang in an #"..." literal and you should be good to go.
That's about all there is to it.
Spaces are not ignored in regular expressions by default, so for each space in your current expressions it is looking for a space in that string. Add the RegexOptions.IgnorePatternWhitespace flag or remove the spaces from your expressions.
You will also want to add some beginning and end of string anchors (^ and $ respectively) so you do not match just part of a string.
You should avoid having spaces in your regular expressions unless you explicitly set IgnorePatterWhiteSpace. To make sure you get only matches on complete words you should include the beginning of line (^) and end of line ($) characters. I would also suggest you build the entire expression pattern instead of using String.Format("({0}) | ({1})", ...) as you have here.
The below should work given your examples:
string pattern = #"(?:^[a-zA-Z_][a-zA-Z_\d]*)|(?:^\d+(?:\.\d+){0,1}(?:[Ee][\+-]\d+){0,1}$)";
I have trouble finding a regex matching this pattern:
A numeric (decimal separator can be . or ,), followed by
a dash -, followed by
a numeric (decimal separator can be . or ,), followed by
a semi-column or a space character
This pattern can be repeated one or more time.
The following examples should match the regex:
1-2;
1-2;3-4;5-6;
1,0-2;
1.0-2;
1,0-2.0;
1-2 3-4;
1-2 3,00-4;5.0-6;
The following examples should not match the regex:
1-2
1 2;
1_2;
1-2;3-4
Edit updated based on moving of 1 2; to non-match.
This should work:
#"^(\d+([,.]\d+)?-\d+([,.]\d+)?[ ;])+(?<=;)$"
Explanation
^ //Start of the string.
( //Start of group to be repeated. You can also use (?=
\d+ //One or more digits.
([,.]\d+)? //With an optional decimal
- //Separated by a dash
\d+([,.]\d+)? //Same as before.
[ ;] //Terminated by a semi-colon or a space
)+ //One or more of these groups.
(?<=;) //The last char before the end needs to be a semi-colon
$ //End of string.
Try this:
#"^([\d.,]+-[\d.,]+[ ;])*[\d.,]+-[\d.,]+;$"
Note that [\d.,]+ accepts some character sequences which wouldn't normally be considered valid "numeric" values such as 00..,.,. You might want to find a better regular expression to match numeric values and substitute it into the regular expression.