Regex to find inner if conditions - c#

I had a regex to find single if-then-else condition.
string pattern2 = #"if( *.*? *)then( *.*? *)(?:else( *.*? *))?endif";
Now, I need to extend this & provide looping if conditions. But the regex is not suitable to extract the then & else parts properly.
Example Looped IF condition:
if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif) else 1 endif
Expected Result with Regex:
condition = (2>1)
then part = ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif)
else part = 1
I can check if else & then part have real values or a condition. Then i can use the same regex on this inner condition until everything is resolved.
The current regex returns result like:
condition = (2 > 1)
then part = ( if( 3>2) then ( if(4>3) then 3
else part = 3
Meaning, it returns the value after first "else" found. But actually, it has to extract from the last else.
Can someone help me with this?

You can adapt the solution on answer Can regular expressions be used to match nested patterns? ( http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/ ).
That solution shows how to match content between html tags , even if it contains nested tags. Applying the same idea for parenthesis pairs should solve your problem.
EDIT:
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
String matchParenthesis = #"
(?# line 01) \((
(?# line 02) (?>
(?# line 03) \( (?<DEPTH>)
(?# line 04) |
(?# line 05) \) (?<-DEPTH>)
(?# line 06) |
(?# line 07) .?
(?# line 08) )*
(?# line 09) (?(DEPTH)(?!))
(?# line 10) )\)
";
//string source = "if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif) else 1 endif";
string source = "if (2 > 1) then 2 else ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif) endif";
string pattern = #"if\s*(?<condition>(?:[^(]*|" + matchParenthesis + #"))\s*";
pattern += #"then\s*(?<then_part>(?:[^(]*|" + matchParenthesis + #"))\s*";
pattern += #"else\s*(?<else_part>(?:[^(]*|" + matchParenthesis + #"))\s*endif";
Match match = Regex.Match(source, pattern,
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
Console.WriteLine(match.Success.ToString());
Console.WriteLine("source: " + source );
Console.WriteLine("condition = " + match.Groups["condition"]);
Console.WriteLine("then part = " + match.Groups["then_part"]);
Console.WriteLine("else part = " + match.Groups["else_part"]);
}
}
}

If you replace endif with end you get
if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 end) else 2 end) else 1 end
and you also got a perfectly fine Ruby expression. Download IronRuby and add references to IronRuby, IronRuby.Libraries, and Microsoft.Scripting to your project. You find them in C:\Program Files\IronRuby 1.0v4\bin then
using Microsoft.Scripting;
using Microsoft.Scripting.Hosting;
using IronRuby;
and in your code
var engine = Ruby.CreateEngine();
int result = engine.Execute("if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 end ) else 2 end) else 1 end");

Related

Getting string between characters in Regex

I have a list of strings with the output below
stop = F6, quantity ( 1 ) // stop 0
stop = F8, quantity ( 1 ) // stop 1
stop = BN, quantity ( 1 ) // stop 2
stop = F6, quantity ( 1 ) // stop 3
stop = F8, quantity ( 1 ) // stop 4
stop = BN, quantity ( 1 ) // stop 5
stop = F6, quantity ( 1 ) // stop 6
stop = F8, quantity ( 1 ) // stop 7
stop = SC, quantity ( 1 ) // stop 8
etc
using a foreach loop i'm retrieving each line in the list ie
`stop = F6, quantity ( 1 ) // stop 0`
However I only need the character F6.
I Know I need to use regex to retrieve f6 in this instance, however, I am unsure on the expression. From a brief tutorial on regex, I've tried using the code below to achieve this with no luck
`Regex.Match(output, #"=\w*,").Value.Replace("\"", "");`
Any help would be appreciated.
You can use this pattern:
"=\\s([A-Za-z0-9]{2}),"
//or
"=\\s(\\w+),"
Code:
string str = "stop = F6, quantity ( 1 ) ";
var res = Regex.Matches(str, "=\\s([A-Za-z0-9]{2}),")[0].Groups[1].Value;
i don't know much in C# but you're regex is this : "= (\w+),". That regex get any words/digit between = and ,.
In regex, an expression between parenthesis is call a "Capturing Group". In any languages you have some API to retrieve content capture in capturing group. I found this for C# : https://msdn.microsoft.com/fr-fr/library/system.text.regularexpressions.match.groups(v=vs.110).aspx
So the code for retrieve you're data look like that :
String pattern = #"=\\s(\\w+),";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("Value : {0}", match.Groups[1].Value);
}
To test you're regex in live, https://regex101.com/ is so usefull ! Use it to see visually what the regex request do while you write it.

Regex valid AdresseEmail c#

i'm currently developping an application with , i have probleme with Regex.
i have a file txt that contain email like that:
test#test.uk
test1#test.uk
my function loademail must import email from txt and add him to list result.
but the probleme he still work he dont add any email
this is my code :
public class Loademail
{
public EmailAddress email;
public List<Loademail> loademail()
{
var result = new List<Loademail>();
string fileSocks = Path.GetFullPath(Path.Combine(Application.StartupPath, "liste.txt"));
var input = File.ReadAllText(fileSocks);
var r = new Regex(#"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#"
+ #"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])\."
+ #"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|"
+ #"([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-Z0-9-]{1,23})$", RegexOptions.IgnoreCase);
foreach (Match match in r.Matches(input))
{
string Email = match.Groups[1].Value;
Loademail bi = new Loademail();
bi.email = EmailAddress.Parse(Email);
result.Add(bi);
//result.Add(Email);
}
return result;
}
what i should do thnks?
Use ignore pattern whitespace.
Edit
Try it using a while () { next match ...}
Like this
Match _mData = Rx.Match( Input );
while (_mData.Success)
{
if (_mData.Groups[1].Success )
Console.WriteLine("{0} \r\n", _mData.Groups[1].Value);
_mData = _mData.NextMatch();
}
// -------------------
Regex Rx = new Regex(
#"
^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#((([0
-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{
1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-
5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][
0-9])){1}|([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-
Z0-9-]{1,23})$
",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace );
Use a good tool to format and process large expressions.
Formatted:
^
( # (1 start)
( [\w-]+ \. )+ # (2)
[\w-]+
| ( [a-zA-Z]{1} | [\w-]{2,} ) # (3)
) # (1 end)
#
( # (4 start)
( # (5 start)
( # (6 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (6 end)
\.
( # (7 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (7 end)
\.
( # (8 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (8 end)
\.
( # (9 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (9 end)
){1} # (5 end)
|
( [a-zA-Z0-9]+ [\w-]+ \. )+ # (10)
[a-zA-Z]{1} [a-zA-Z0-9-]{1,23}
) # (4 end)
$
As a side note, this is a good email regex as well.
# http://www.w3.org/TR/html5/forms.html#valid-e-mail-address
# ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
^
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+
#
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
(?:
\.
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
)*
$

Regex split on parentheses getting double results

Im taking a string like "4 + 5 + ( 7 - 9 ) + 8" and trying to split on the parentheses to get a list containing 4 + 5, (7-9), + 8. So im using the regex string below. But it is giving me 4 + 5, (7-9), 7-9 , + 8. Hoping its just something easy. Thanks.
List<string> test = Regex.Split("4 + 5 + ( 7 - 9 ) + 8", #"(\(([^)]+)\))").ToList();
Remove the extra set of parenthesis you have in your regex:
(\(([^)]+)\)) // your regex
( ) // outer parens
\( \) // literal parens match
( ) // extra parens you don't need
[^)]+ // one or more 'not right parens'
The extra parens create a match for 'inside the literal parens', which is the extra 7 - 9 you see.
So you should have:
#"(\([^)]+\))"
List<string> test = Regex.Split("4 + 5 + ( 7 - 9 ) + 8", #"(\([^)]+\))").ToList();

Regex, match string ending with ) and ignore any () inbetween [duplicate]

This question already has answers here:
Using RegEx to balance match parenthesis
(4 answers)
Closed 9 years ago.
I want to select a part of a string, but the problem is that the last character I want to select can have multiple occurrences.
I want to select 'Aggregate(' and end at the matching ')', any () in between can be ignored.
Examples:
string: Substr(Aggregate(SubQuery, SUM, [Model].Remark * [Object].Shortname + 10), 0, 1)
should return: Aggregate(SubQuery, SUM, [Model].Remark * [Object].Shortname + 10)
string: Substr(Aggregate(SubQuery, SUM, [Model].Remark * ([Object].Shortname + 10)), 0, 1)
should return: Aggregate(SubQuery, SUM, [Model].Remark * ([Object].Shortname + 10))
string: Substr(Aggregate(SubQuery, SUM, ([Model].Remark) * ([Object].Shortname + 10) ), 0, 1)
should return: Aggregate(SubQuery, SUM, ([Model].Remark) * ([Object].Shortname + 10) )
Is there a way to solve this with a regular expression? I'm using C#.
This is a little ugly, but you could use something like
Aggregate\(([^()]+|\(.*?\))*\)
It passes all your tests, but it can only match one level of nested parentheses.
This solution works with any level of nested parenthesis by using .NETs balancing groups:
(?x) # allow comments and ignore whitespace
Aggregate\(
(?:
[^()] # anything but ( and )
| (?<open> \( ) # ( -> open++
| (?<-open> \) ) # ) -> open--
)*
(?(open) (?!) ) # fail if open > 0
\)
I'm not sure how much the input varies but for the string examples in the question something as simple as this would work:
Aggregate\(.*\)(?=,)
If eventually consider avoiding regular expressions, here's an alternative for parsing, which uses the System.Xml.Linq namespace:
class Program
{
static void Main()
{
var input = File.ReadAllLines("input.txt");
input.ToList().ForEach(item => {
Console.WriteLine(item.GetParameter("Aggregate"));
});
}
}
static class X
{
public static string GetParameter(this string expression, string element)
{
XDocument doc;
var input1 = "<root>" + expression
.Replace("(", "<n1>")
.Replace(")", "</n1>")
.Replace("[", "<n2>")
.Replace("]", "</n2>") +
"</root>";
try
{
doc = XDocument.Parse(input1);
}
catch
{
return null;
}
var agg=doc.Descendants()
.Where(d => d.FirstNode.ToString() == element)
.FirstOrDefault();
if (agg == null)
return null;
var param = agg
.Elements()
.FirstOrDefault();
if (param == null)
return null;
return element +
param
.ToString()
.Replace("<n1>", "(")
.Replace("</n1>", ")")
.Replace("<n2>", "[")
.Replace("</n2>", "]");
}
}
This regex works with any number of pairs of brackets, and nested to any level:
Aggregate\(([^(]*\([^)]*\))*[^()]\)
For example, it will find the bolded text here:
Substr(Aggregate(SubQuery, SUM(foo(bar), baz()), ([Model].Remark) * ([Object].Shortname + 10) ), 0, 1)
Notice the SUM(foo(bar), baz()) in there.
See a live demo on rubular.

How to parse a comma delimited string when comma and parenthesis exists in field

I have this string in C#
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
I want to use a RegEx to parse it to get the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
In addition to the above example, I tested with the following, but am still unable to parse it correctly.
"%exc.uns: 8 hours let # = ABC, DEF", "exc_it = 1 day" , " summ=graffe ", " a,b,(c,d)"
The new text will be in one string
string mystr = #"""%exc.uns: 8 hours let # = ABC, DEF"", ""exc_it = 1 day"" , "" summ=graffe "", "" a,b,(c,d)""";
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == ',' && scopeLevel == 0)
{
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
firstIndex = i + 1;
}
else if (str[i] == '(') scopeLevel++;
else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
Event faster:
([^,]*\x28[^\x29]*\x29|[^,]+)
That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
^ ^ ^ ^ ^
The Carets symbolize where the grouping stops.
Just this regex:
[^,()]+(\([^()]*\))?
A test example:
var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(#"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
.Cast<Match>()
.Select(m => m.Value);
returns
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
If you simply must use Regex, then you can split the string on the following:
, # match a comma
(?= # that is followed by
(?: # either
[^\(\)]* # no parens at all
| # or
(?: #
[^\(\)]* # ...
\( # (
[^\(\)]* # stuff in parens
\) # )
[^\(\)]* # ...
)+ # any number of times
)$ # until the end of the string
)
It breaks your input into the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.
Another way to implement what Snowbear was doing:
public static string[] SplitNest(this string s, char src, string nest, string trg)
{
int scope = 0;
if (trg == null || nest == null) return null;
if (trg.Length == 0 || nest.Length < 2) return null;
if (trg.IndexOf(src) >= 0) return null;
if (nest.IndexOf(src) >= 0) return null;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == src && scope == 0)
{
s = s.Remove(i, 1).Insert(i, trg);
}
else if (s[i] == nest[0]) scope++;
else if (s[i] == nest[1]) scope--;
}
return s.Split(trg);
}
The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");
The function would first turn your string into
adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO
then split on the ~, ignoring the nested commas.
Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:
MatchCollection matches = Regex.Matches(data, #"(?:[^(),]|\([^)]*\))+");
var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var result = string.Join(#"\n",Regex.Split(s, #"(?<=\)),|,\s"));
The pattern matches for ) and excludes it from the match then matches ,
or
matches , followed by a space.
result =
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
The TextFieldParser (msdn) class seems to have the functionality built-in:
TextFieldParser Class: - Provides methods and properties for parsing structured text files.
Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.
The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.
See the article which helped me find that
Here's a stronger option, which parses the whole text, including nested parentheses:
string pattern = #"
\A
(?>
(?<Token>
(?:
[^,()] # Regular character
|
(?<Paren> \( ) # Opening paren - push to stack
|
(?<-Paren> \) ) # Closing paren - pop
|
(?(Paren),) # If inside parentheses, match comma.
)*?
)
(?(Paren)(?!)) # If we are not inside parentheses,
(?:,|\Z) # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
# though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);
You can get all matches by iterating over match.Groups["Token"].Captures.

Categories