How to get text between nested parentheses? - c#

Reg Expression for Getting Text Between parenthesis ( ), I had tried but i am not getting the RegEx. For this example
Regex.Match(script, #"\((.*?)\)").Value
Example:-
add(mul(a,add(b,c)),d) + e - sub(f,g)
Output =>
1) mul(a,add(b,c)),d
2) f,g

.NET allows recursion in regular expressions. See Balancing Group Definitions
var input = #"add(mul(a,add(b,c)),d) + e - sub(f,g)";
var regex = new Regex(#"
\( # Match (
(
[^()]+ # all chars except ()
| (?<Level>\() # or if ( then Level += 1
| (?<-Level>\)) # or if ) then Level -= 1
)+ # Repeat (to go from inside to outside)
(?(Level)(?!)) # zero-width negative lookahead assertion
\) # Match )",
RegexOptions.IgnorePatternWhitespace);
foreach (Match c in regex.Matches(input))
{
Console.WriteLine(c.Value.Trim('(', ')'));
}

Related

Regex match when there could be matches within [duplicate]

I'm, looking for a regular expression that will match only when all curly braces properly match. Matching braces can be nested.
Ex.
Matches
Hello {0}{}
Hello to the following {0}: {{Object1}}, {{Object2}}
Test { {1} { {2} { {3} { {4}}}}}
Non-matches
}{Hello {0}
{{}Hello to the following {0}: {{Object1}}, {{Object2}}
Test { {1} { {2} { {3} { {4}{}
In .NET you can use balancing groups to count, which allows you to solve such problems.
For example make sure { and } are balanced you could use an expression like:
(?x)^
[^{}]*
(?:
(?:
(?'open' \{ ) # open++
[^{}]*
)+
(?:
(?'close-open' \} ) # open--, only if open > 0
[^{}]*
)+
)*
(?(open) (?!) ) # fail if open != 0
$
bool BracesMatch( string s )
{
int numOpen = 0, numClosed = 0;
foreach( char c in s.ToCharArray() )
{
if ( c == '{' ) numOpen++;
if ( c == '}' ) numClosed++;
if ( numClosed > numOpen ) return false;
}
return numOpen == numClosed;
}
This might work using the Dot-Net balanced groups as well.
# #"^[^{}]*(?:\{(?>[^{}]+|\{(?<Depth>)|\}(?<-Depth>))*(?(Depth)(?!))\}[^{}]*)*[^{}]*$"
^
[^{}]* # Anything (but only if we're not at the start of { or } )
(?:
\{ # Match opening {
(?> # Then either match (possessively):
[^{}]+ # Anything (but only if we're not at the start of { or } )
| # or
\{ # { (and increase the braces counter)
(?<Depth> )
| # or
\} # } (and decrease the braces counter).
(?<-Depth> )
)* # Repeat as needed.
(?(Depth) # Assert that the braces counter is at zero.
(?!) # Fail this part if depth > 0
)
\} # Then match a closing }.
[^{}]* # Anything (but only if we're not at the start of { or } )
)* # Repeat as needed
[^{}]* # Anything (but only if we're not at the start of { or } )
$

Regex valid AdresseEmail c#

i'm currently developping an application with , i have probleme with Regex.
i have a file txt that contain email like that:
test#test.uk
test1#test.uk
my function loademail must import email from txt and add him to list result.
but the probleme he still work he dont add any email
this is my code :
public class Loademail
{
public EmailAddress email;
public List<Loademail> loademail()
{
var result = new List<Loademail>();
string fileSocks = Path.GetFullPath(Path.Combine(Application.StartupPath, "liste.txt"));
var input = File.ReadAllText(fileSocks);
var r = new Regex(#"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#"
+ #"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])\."
+ #"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|"
+ #"([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-Z0-9-]{1,23})$", RegexOptions.IgnoreCase);
foreach (Match match in r.Matches(input))
{
string Email = match.Groups[1].Value;
Loademail bi = new Loademail();
bi.email = EmailAddress.Parse(Email);
result.Add(bi);
//result.Add(Email);
}
return result;
}
what i should do thnks?
Use ignore pattern whitespace.
Edit
Try it using a while () { next match ...}
Like this
Match _mData = Rx.Match( Input );
while (_mData.Success)
{
if (_mData.Groups[1].Success )
Console.WriteLine("{0} \r\n", _mData.Groups[1].Value);
_mData = _mData.NextMatch();
}
// -------------------
Regex Rx = new Regex(
#"
^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#((([0
-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{
1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-
5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][
0-9])){1}|([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-
Z0-9-]{1,23})$
",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace );
Use a good tool to format and process large expressions.
Formatted:
^
( # (1 start)
( [\w-]+ \. )+ # (2)
[\w-]+
| ( [a-zA-Z]{1} | [\w-]{2,} ) # (3)
) # (1 end)
#
( # (4 start)
( # (5 start)
( # (6 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (6 end)
\.
( # (7 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (7 end)
\.
( # (8 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (8 end)
\.
( # (9 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (9 end)
){1} # (5 end)
|
( [a-zA-Z0-9]+ [\w-]+ \. )+ # (10)
[a-zA-Z]{1} [a-zA-Z0-9-]{1,23}
) # (4 end)
$
As a side note, this is a good email regex as well.
# http://www.w3.org/TR/html5/forms.html#valid-e-mail-address
# ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
^
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+
#
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
(?:
\.
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
)*
$

Split string on whitespace ignoring parenthesis

I have a string such as this
(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))
I need to split it into two such as this
ed
Karlsruhe Univ. (TH) (Germany, F.R.)
Basically, ignoring whitespace and parenthesis within a parenthesis
Is it possible to use a regex to achieve this?
If you can have more parentheses, it's better to use balancing groups:
string text = "(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))";
var charSetOccurences = new Regex(#"\(((?:[^()]|(?<o>\()|(?<-o>\)))+(?(o)(?!)))\)");
var charSetMatches = charSetOccurences.Matches(text);
foreach (Match match in charSetMatches)
{
Console.WriteLine(match.Groups[1].Value);
}
ideone demo
Breakdown:
\(( # First '(' and begin capture
(?:
[^()] # Match all non-parens
|
(?<o> \( ) # Match '(', and capture into 'o'
|
(?<-o> \) ) # Match ')', and delete the 'o' capture
)+
(?(o)(?!)) # Fails if 'o' stack isn't empty
)\) # Close capture and last opening brace
\((.*?)\)\s*\((.*)\)
you will get the two values in two match groups \1 and \2
demo here : http://regex101.com/r/rP5kG2
and this is what you get if you search and replace with the pattern \1\n\2 which also seems to be what you need exactly
string str = "(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))";
Regex re = new Regex(#"\((.*?)\)\s*\((.*)\)");
Match match = re.Match(str);
In general, No.
You can't describe recursive patterns in regular expression. ( Since it's not possible to recognize it with a finite automaton. )

RegEx to parse nested tags?

I have text like this:
This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text
I want to parse out data like that:
Name: name1
Value: value1
Name: name2
Value: {name3:even dipper {name4:valu4} dipper} some inner text
I would then recursively process each value to parse out nested fields.
Can you recommend a RegEx expression to do this?
In C# you can use balancing groups to count and balance the brackets:
{ (?'name' \w+ ) : # start of tag
(?'value' # named capture
(?> # don't backtrack
(?:
[^{}]+ # not brackets
| (?'open' { ) # count opening bracket
| (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
)*
)
(?(open)(?!)) # make sure open is not > 0
)
} # end of tag
Example:
string re = #"(?x) # enable eXtended mode (comments/spaces ignored)
{ (?'name' \w+ ) : # start of tag
(?'value' # named capture
(?> # don't backtrack
(?:
[^{}]+ # not brackets
| (?'open' { ) # count opening bracket
| (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
)*
)
(?(open)(?!)) # make sure open is not > 0
)
} # end of tag
";
string str = #"This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text";
foreach (Match m in Regex.Matches(str, re))
{
Console.WriteLine("name: {0}, value: {1}", m.Groups["name"], m.Groups["value"]);
}
Output:
name: name1, value: value1
name: name2, value: {name3:even dipper {name4:valu4} dipper} some inner text
If using Perl/PHP/PCRE it's not complicated at all. You can use an expression like:
{(\w+): # start of tag
((?:
[^{}]+ # not a tag
| (?R) # a tag (recurse to match the whole regex)
)*)
} # end of tag

Regular expression for validating arithmetic expression

I have an arithmetic expression
string exp = "((2+3.1)/2)*4.456";
I want to validate by using regular expression. The expression can only have integers, floating point numbers, operands and parenthesis.
How can i generate regular expression to validate please help or suggest any other way to validate that string.
Using Perl/PCRE we could verify such simple arithmetic expressions with help of a pattern structured like:
expr = pnum ( op pnum )*
pnum = num | \( expr \)
Where num and op defined as required. For example:
num = -?+\d++(?:\.\d++)?+
op = [-+*/]
Which would give us the following working expression:
(?x)^ (?&expr) $
(?(DEFINE)
(?<expr> (?&pnum) (?: (?&op) (?&pnum) )*+ )
(?<pnum> (?> (?&num) | \( (?&expr) \) ) )
(?<num> -?+\d++(?:\.\d++)?+ )
(?<op> [-+*/] )
)
But such expressions could not be used with .NET regex as it does not support (recursive) suppatern calls (?&name).
Instead .NET regex lib offers us its special feature: balancing groups.
With balancing groups we could rewrite the required recursive call used in pnum, and use a structure like this instead:
expr = pnum ( op pnum )* (?(p)(?!))
pnum = (?> (?<p> \( )* num (?<-p> \) )* )
What we've done here is to allow any number of optional opening and closing paranthesis before and after every number, counting the total number of open parentheses (?<p> \( ), subtracting closing parentheses from that number (?<-p> \) ) and at the end of the expression make sure that the number of open parentheses is 0 (?(p)(?!)).
(I believe this is equivalent to the original structure, altho I haven't made any formal proof.)
Resulting in the following .NET pattern:
(?x)
^
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
(?>(?:
[-+*/]
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
)*)
(?(p)(?!))
$
C# Example:
using System;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
var expressions = new string[] {
"((2+3.1)/2)*4.456",
"1",
"(2)",
"2+2",
"(1+(2+3))",
"-2*(2+-2)",
"1+(3/(2+7-(4+3)))",
"1-",
"2+2)",
"(2+2",
"(1+(2+3)",
};
var regex = new Regex(#"(?x)
^
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
(?>(?:
[-+*/]
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
)*)
(?(p)(?!))
$
");
foreach (var expr in expressions)
{
Console.WriteLine("Expression: " + expr);
Console.WriteLine(" Result: " + (regex.IsMatch(expr) ? "Matched" : "Failed"));
}
}
}
}
Output:
Expression: ((2+3.1)/2)*4.456
Result: Matched
Expression: 1
Result: Matched
Expression: (2)
Result: Matched
Expression: 2+2
Result: Matched
Expression: (1+(2+3))
Result: Matched
Expression: -2*(2+-2)
Result: Matched
Expression: 1+(3/(2+7-(4+3)))
Result: Matched
Expression: 1-
Result: Failed
Expression: 2+2)
Result: Failed
Expression: (2+2
Result: Failed
Expression: (1+(2+3)
Result: Failed
You could write a simple lexer in F# using fslex/fsyacc. Here is an example which is very close to your requirement: http://blogs.msdn.com/b/chrsmith/archive/2008/01/18/fslex-sample.aspx

Categories