I'm, looking for a regular expression that will match only when all curly braces properly match. Matching braces can be nested.
Ex.
Matches
Hello {0}{}
Hello to the following {0}: {{Object1}}, {{Object2}}
Test { {1} { {2} { {3} { {4}}}}}
Non-matches
}{Hello {0}
{{}Hello to the following {0}: {{Object1}}, {{Object2}}
Test { {1} { {2} { {3} { {4}{}
In .NET you can use balancing groups to count, which allows you to solve such problems.
For example make sure { and } are balanced you could use an expression like:
(?x)^
[^{}]*
(?:
(?:
(?'open' \{ ) # open++
[^{}]*
)+
(?:
(?'close-open' \} ) # open--, only if open > 0
[^{}]*
)+
)*
(?(open) (?!) ) # fail if open != 0
$
bool BracesMatch( string s )
{
int numOpen = 0, numClosed = 0;
foreach( char c in s.ToCharArray() )
{
if ( c == '{' ) numOpen++;
if ( c == '}' ) numClosed++;
if ( numClosed > numOpen ) return false;
}
return numOpen == numClosed;
}
This might work using the Dot-Net balanced groups as well.
# #"^[^{}]*(?:\{(?>[^{}]+|\{(?<Depth>)|\}(?<-Depth>))*(?(Depth)(?!))\}[^{}]*)*[^{}]*$"
^
[^{}]* # Anything (but only if we're not at the start of { or } )
(?:
\{ # Match opening {
(?> # Then either match (possessively):
[^{}]+ # Anything (but only if we're not at the start of { or } )
| # or
\{ # { (and increase the braces counter)
(?<Depth> )
| # or
\} # } (and decrease the braces counter).
(?<-Depth> )
)* # Repeat as needed.
(?(Depth) # Assert that the braces counter is at zero.
(?!) # Fail this part if depth > 0
)
\} # Then match a closing }.
[^{}]* # Anything (but only if we're not at the start of { or } )
)* # Repeat as needed
[^{}]* # Anything (but only if we're not at the start of { or } )
$
Related
i'm currently developping an application with , i have probleme with Regex.
i have a file txt that contain email like that:
test#test.uk
test1#test.uk
my function loademail must import email from txt and add him to list result.
but the probleme he still work he dont add any email
this is my code :
public class Loademail
{
public EmailAddress email;
public List<Loademail> loademail()
{
var result = new List<Loademail>();
string fileSocks = Path.GetFullPath(Path.Combine(Application.StartupPath, "liste.txt"));
var input = File.ReadAllText(fileSocks);
var r = new Regex(#"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#"
+ #"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])\."
+ #"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|"
+ #"([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-Z0-9-]{1,23})$", RegexOptions.IgnoreCase);
foreach (Match match in r.Matches(input))
{
string Email = match.Groups[1].Value;
Loademail bi = new Loademail();
bi.email = EmailAddress.Parse(Email);
result.Add(bi);
//result.Add(Email);
}
return result;
}
what i should do thnks?
Use ignore pattern whitespace.
Edit
Try it using a while () { next match ...}
Like this
Match _mData = Rx.Match( Input );
while (_mData.Success)
{
if (_mData.Groups[1].Success )
Console.WriteLine("{0} \r\n", _mData.Groups[1].Value);
_mData = _mData.NextMatch();
}
// -------------------
Regex Rx = new Regex(
#"
^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#((([0
-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{
1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-
5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][
0-9])){1}|([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-
Z0-9-]{1,23})$
",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace );
Use a good tool to format and process large expressions.
Formatted:
^
( # (1 start)
( [\w-]+ \. )+ # (2)
[\w-]+
| ( [a-zA-Z]{1} | [\w-]{2,} ) # (3)
) # (1 end)
#
( # (4 start)
( # (5 start)
( # (6 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (6 end)
\.
( # (7 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (7 end)
\.
( # (8 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (8 end)
\.
( # (9 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (9 end)
){1} # (5 end)
|
( [a-zA-Z0-9]+ [\w-]+ \. )+ # (10)
[a-zA-Z]{1} [a-zA-Z0-9-]{1,23}
) # (4 end)
$
As a side note, this is a good email regex as well.
# http://www.w3.org/TR/html5/forms.html#valid-e-mail-address
# ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
^
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+
#
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
(?:
\.
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
)*
$
Reg Expression for Getting Text Between parenthesis ( ), I had tried but i am not getting the RegEx. For this example
Regex.Match(script, #"\((.*?)\)").Value
Example:-
add(mul(a,add(b,c)),d) + e - sub(f,g)
Output =>
1) mul(a,add(b,c)),d
2) f,g
.NET allows recursion in regular expressions. See Balancing Group Definitions
var input = #"add(mul(a,add(b,c)),d) + e - sub(f,g)";
var regex = new Regex(#"
\( # Match (
(
[^()]+ # all chars except ()
| (?<Level>\() # or if ( then Level += 1
| (?<-Level>\)) # or if ) then Level -= 1
)+ # Repeat (to go from inside to outside)
(?(Level)(?!)) # zero-width negative lookahead assertion
\) # Match )",
RegexOptions.IgnorePatternWhitespace);
foreach (Match c in regex.Matches(input))
{
Console.WriteLine(c.Value.Trim('(', ')'));
}
I have text like this:
This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text
I want to parse out data like that:
Name: name1
Value: value1
Name: name2
Value: {name3:even dipper {name4:valu4} dipper} some inner text
I would then recursively process each value to parse out nested fields.
Can you recommend a RegEx expression to do this?
In C# you can use balancing groups to count and balance the brackets:
{ (?'name' \w+ ) : # start of tag
(?'value' # named capture
(?> # don't backtrack
(?:
[^{}]+ # not brackets
| (?'open' { ) # count opening bracket
| (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
)*
)
(?(open)(?!)) # make sure open is not > 0
)
} # end of tag
Example:
string re = #"(?x) # enable eXtended mode (comments/spaces ignored)
{ (?'name' \w+ ) : # start of tag
(?'value' # named capture
(?> # don't backtrack
(?:
[^{}]+ # not brackets
| (?'open' { ) # count opening bracket
| (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
)*
)
(?(open)(?!)) # make sure open is not > 0
)
} # end of tag
";
string str = #"This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text";
foreach (Match m in Regex.Matches(str, re))
{
Console.WriteLine("name: {0}, value: {1}", m.Groups["name"], m.Groups["value"]);
}
Output:
name: name1, value: value1
name: name2, value: {name3:even dipper {name4:valu4} dipper} some inner text
If using Perl/PHP/PCRE it's not complicated at all. You can use an expression like:
{(\w+): # start of tag
((?:
[^{}]+ # not a tag
| (?R) # a tag (recurse to match the whole regex)
)*)
} # end of tag
I want to validate a string in C# that contains a Boolean expression with brackets.
The string should only contain numbers 1-9, round brackets, "OR" , "AND".
Examples of good strings:
"1 AND 2"
"2 OR 4"
"4 AND (3 OR 5)"
"2"
And so on...
I am not sure if Regular Expression are flexible enough for this task.
Is there a nice short way of achieving this in C# ?
It's probably simpler to do this with a simple parser. But you can do this with .NET regex by using balancing groups and realizing that if the brackets are removed from the string you always have a string matched by a simple expression like ^\d+(?:\s+(?:AND|OR)\s+\d+)*\z.
So all you have to do is use balancing groups to make sure that the brackets are balanced (and are in the right place in the right form).
Rewriting the expression above a bit:
(?x)^
OPENING
\d+
CLOSING
(?:
\s+(?:AND|OR)\s+
OPENING
\d+
CLOSING
)*
BALANCED
\z
((?x) makes the regex engine ignore all whitespace and comments in the pattern, so it can be made more readable.)
Where OPENING matches any number (0 included) of opening brackets:
\s* (?: (?<open> \( ) \s* )*
CLOSING matches any number of closing brackets also making sure that the balancing group is balanced:
\s* (?: (?<-open> \) ) \s* )*
and BALANCED performs a balancing check, failing if there are more open brackets then closed:
(?(open)(?!))
Giving the expression:
(?x)^
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
(?:
\s+(?:AND|OR)\s+
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
)*
(?(open)(?!))
\z
If you do not want to allow random spaces remove every \s*.
Example
See demo at IdeOne. Output:
matched: '2'
matched: '1 AND 2'
matched: '12 OR 234'
matched: '(1) AND (2)'
matched: '(((1)) AND (2))'
matched: '1 AND 2 AND 3'
matched: '1 AND (2 OR (3 AND 4))'
matched: '1 AND (2 OR 3) AND 4'
matched: ' ( 1 AND ( 2 OR ( 3 AND 4 ) )'
matched: '((1 AND 7) OR 6) AND ((2 AND 5) OR (3 AND 4))'
matched: '(1)'
matched: '(((1)))'
failed: '1 2'
failed: '1(2)'
failed: '(1)(2)'
failed: 'AND'
failed: '1 AND'
failed: '(1 AND 2'
failed: '1 AND 2)'
failed: '1 (AND) 2'
failed: '(1 AND 2))'
failed: '(1) AND 2)'
failed: '(1)() AND (2)'
failed: '((1 AND 7) OR 6) AND (2 AND 5) OR (3 AND 4))'
failed: '((1 AND 7) OR 6) AND ((2 AND 5 OR (3 AND 4))'
failed: ''
If you consider a boolean expression as generated by a formal grammar writing a parser is easier.
I made an open source library to interpret simple boolean expressions. You can take a look at it on GitHub, in particular look at the AstParser class and Lexer.
If you just want to validate the input string, you can write a simple parser.
Each method consumes a certain kind of input (digit, brackets, operator) and returns the remaining string after matching. An exception is thrown if no match can be made.
public class ParseException : Exception { }
public static class ExprValidator
{
public static bool Validate(string str)
{
try
{
string term = Term(str);
string stripTrailing = Whitespace(term);
return stripTrailing.Length == 0;
}
catch(ParseException) { return false; }
}
static string Term(string str)
{
if(str == string.Empty) return str;
char current = str[0];
if(current == '(')
{
string term = LBracket(str);
string rBracket = Term(term);
string temp = Whitespace(rBracket);
return RBracket(temp);
}
else if(Char.IsDigit(current))
{
string rest = Digit(str);
try
{
//possibly match op term
string op = Op(rest);
return Term(op);
}
catch(ParseException) { return rest; }
}
else if(Char.IsWhiteSpace(current))
{
string temp = Whitespace(str);
return Term(temp);
}
else throw new ParseException();
}
static string Op(string str)
{
string t1 = Whitespace_(str);
string op = MatchOp(t1);
return Whitespace_(op);
}
static string MatchOp(string str)
{
if(str.StartsWith("AND")) return str.Substring(3);
else if(str.StartsWith("OR")) return str.Substring(2);
else throw new ParseException();
}
static string LBracket(string str)
{
return MatchChar('(')(str);
}
static string RBracket(string str)
{
return MatchChar(')')(str);
}
static string Digit(string str)
{
return MatchChar(Char.IsDigit)(str);
}
static string Whitespace(string str)
{
if(str == string.Empty) return str;
int i = 0;
while(i < str.Length && Char.IsWhiteSpace(str[i])) { i++; }
return str.Substring(i);
}
//match at least one whitespace character
static string Whitespace_(string str)
{
string stripFirst = MatchChar(Char.IsWhiteSpace)(str);
return Whitespace(stripFirst);
}
static Func<string, string> MatchChar(char c)
{
return MatchChar(chr => chr == c);
}
static Func<string, string> MatchChar(Func<char, bool> pred)
{
return input => {
if(input == string.Empty) throw new ParseException();
else if(pred(input[0])) return input.Substring(1);
else throw new ParseException();
};
}
}
Pretty simply:
At first stage you must determ lexems (digit, bracket or operator) with simple string comparsion.
At second stage you must define variable of count of closed bracket (bracketPairs), which can be calculated by the following algorithm for each lexem:
if current lexem is '(', then bracketPairs++;
if current lexem is ')', then bracketPairs--.
Else do not modify bracketPairs.
At the end if all lexems are known and bracketPairs == 0 then input expression is valid.
The task is a bit more complex, if it's necesery to build AST.
what you want are "balanced groups", with them you can get all bracet definitions, then you just need a simple string parsing
http://blog.stevenlevithan.com/archives/balancing-groups
http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition
ANTLR Parser Generator?
a short way of achieving this in C#
Although it may be an overkill if its just numbers and OR + AND
I have an arithmetic expression
string exp = "((2+3.1)/2)*4.456";
I want to validate by using regular expression. The expression can only have integers, floating point numbers, operands and parenthesis.
How can i generate regular expression to validate please help or suggest any other way to validate that string.
Using Perl/PCRE we could verify such simple arithmetic expressions with help of a pattern structured like:
expr = pnum ( op pnum )*
pnum = num | \( expr \)
Where num and op defined as required. For example:
num = -?+\d++(?:\.\d++)?+
op = [-+*/]
Which would give us the following working expression:
(?x)^ (?&expr) $
(?(DEFINE)
(?<expr> (?&pnum) (?: (?&op) (?&pnum) )*+ )
(?<pnum> (?> (?&num) | \( (?&expr) \) ) )
(?<num> -?+\d++(?:\.\d++)?+ )
(?<op> [-+*/] )
)
But such expressions could not be used with .NET regex as it does not support (recursive) suppatern calls (?&name).
Instead .NET regex lib offers us its special feature: balancing groups.
With balancing groups we could rewrite the required recursive call used in pnum, and use a structure like this instead:
expr = pnum ( op pnum )* (?(p)(?!))
pnum = (?> (?<p> \( )* num (?<-p> \) )* )
What we've done here is to allow any number of optional opening and closing paranthesis before and after every number, counting the total number of open parentheses (?<p> \( ), subtracting closing parentheses from that number (?<-p> \) ) and at the end of the expression make sure that the number of open parentheses is 0 (?(p)(?!)).
(I believe this is equivalent to the original structure, altho I haven't made any formal proof.)
Resulting in the following .NET pattern:
(?x)
^
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
(?>(?:
[-+*/]
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
)*)
(?(p)(?!))
$
C# Example:
using System;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
var expressions = new string[] {
"((2+3.1)/2)*4.456",
"1",
"(2)",
"2+2",
"(1+(2+3))",
"-2*(2+-2)",
"1+(3/(2+7-(4+3)))",
"1-",
"2+2)",
"(2+2",
"(1+(2+3)",
};
var regex = new Regex(#"(?x)
^
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
(?>(?:
[-+*/]
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
)*)
(?(p)(?!))
$
");
foreach (var expr in expressions)
{
Console.WriteLine("Expression: " + expr);
Console.WriteLine(" Result: " + (regex.IsMatch(expr) ? "Matched" : "Failed"));
}
}
}
}
Output:
Expression: ((2+3.1)/2)*4.456
Result: Matched
Expression: 1
Result: Matched
Expression: (2)
Result: Matched
Expression: 2+2
Result: Matched
Expression: (1+(2+3))
Result: Matched
Expression: -2*(2+-2)
Result: Matched
Expression: 1+(3/(2+7-(4+3)))
Result: Matched
Expression: 1-
Result: Failed
Expression: 2+2)
Result: Failed
Expression: (2+2
Result: Failed
Expression: (1+(2+3)
Result: Failed
You could write a simple lexer in F# using fslex/fsyacc. Here is an example which is very close to your requirement: http://blogs.msdn.com/b/chrsmith/archive/2008/01/18/fslex-sample.aspx