Regex split on parentheses getting double results - c#

Im taking a string like "4 + 5 + ( 7 - 9 ) + 8" and trying to split on the parentheses to get a list containing 4 + 5, (7-9), + 8. So im using the regex string below. But it is giving me 4 + 5, (7-9), 7-9 , + 8. Hoping its just something easy. Thanks.
List<string> test = Regex.Split("4 + 5 + ( 7 - 9 ) + 8", #"(\(([^)]+)\))").ToList();

Remove the extra set of parenthesis you have in your regex:
(\(([^)]+)\)) // your regex
( ) // outer parens
\( \) // literal parens match
( ) // extra parens you don't need
[^)]+ // one or more 'not right parens'
The extra parens create a match for 'inside the literal parens', which is the extra 7 - 9 you see.
So you should have:
#"(\([^)]+\))"

List<string> test = Regex.Split("4 + 5 + ( 7 - 9 ) + 8", #"(\([^)]+\))").ToList();

Related

Append arrays and lists

For example, if the entered input is:
1 2 3 |4 5 6 | 7 8
we should manipulate it to
1 2 3|4 5 6|7 8
Another example:
7 | 4 5|1 0| 2 5 |3
we should manipulate it to
7|4 5|1 0|2 5|3
This is my idea because I want to exchange some of the subarrays (7; 4 5; 1 0; 2 5; 3).
I'm not sure that this code is working and it can be the base of I want to do but I must upload it for you to see my work.
static void Main(string[] args)
{
List<string> arrays = Console.ReadLine()
.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.ToList();
foreach (var element in arrays)
{
Console.WriteLine("element: " + element);
}
}
You need to split your input by "|" first and then by space. After this, you can reassemble your input with string.Join. Try this code:
var input = "1 2 3 |4 5 6 | 7 8";
var result = string.Join("|", input.Split('|')
.Select(part => string.Join(" ",
part.Trim().Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))));
// now result is "1 2 3|4 5 6|7 8"
This could do this with a simple regular expression:
var result = Regex.Replace(input, #"\s?\|\s?", "|");
This will match any (optional) white space character, followed by a | character, followed by an (optional) white space character and replace it with a single | character.
Alternatively, if you need to potentially strip out multiple spaces around the |, replace the zero-or-one quantifiers (?) with zero-or-more quantifiers (*):
var result = Regex.Replace(input, #"\s*\|\s*", "|");
To also deal with multiple spaces between numbers (not just around | characters), I'd recommend something like this:
var result = Regex.Replace(input, #"\s*([\s|])\s*", "$1")
This will match any occurrence of zero or more white space characters, followed by either a white space character or a | character (captured in group 1), followed by zero or more white space characters and replace it with whatever was captured in group 1.

How do I find the Nth occurrence of a pattern with regex?

I have a string of numbers separated by some non-numeric character like this: "16-15316-273"
Is it possible to build regex expression the way it returns me Nth matching group? I heard that ${n} might help, but it does not work for me at least in this expression:
// Example: I want to get 15316
var r = new Regex(#"(\d+)${1}");
var m = r.Match("16-15316-273");
(\d+)${0} returns 16, but (\d+)${1} gives me 273 instead of expected 15316
So N which is order of pattern needed to be extracted and input string itself ("16-15316-273" is just an example) are dynamic values which might change during app execution. The task is to build regex expression the way where the only thing changed inside it is N, and to be applicable to any such string.
Please do not offer solutions with any additional c# code like m.Groups[n] or Split, I'm intentionally asking for building proper Regex pattern for that. In short, I can not modify the code for every new N value, all I can modify is regex expression which is built dynamically, N will be passed as a parameter to the method. All the rest is static, no way to change it.
Maybe this expression will help you?
(?<=(\d+[^\d]+){1})\d+
You will need to modify {1} according to your N.
I.e.
(?<=(\d+[^\d]+){0})\d+ => 16
(?<=(\d+[^\d]+){1})\d+ => 15316
(?<=(\d+[^\d]+){2})\d+ => 273
Your regular expression
(\d+)${1}
says to match this:
(\d+): match 1 or more decimal digits, followed by
${1}: match the atomic zero-width assertion "end of input string" exactly once.
One should note that the {1} quantifier is redundant since there's normally only one end-of-input-string (unless you've turned on the multiline option).
That's why you're matching `273': it's the longest sequence of digits anchored at end-of-string.
You need to use a zero-width positive look-behind assertion. To capture the Nth field in your string, you need to capture that string of digits that is preceded by N-1 fields. Given this source string:
string input = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
The regular expression to match the 3rd field, where the first field is 1 rather than 0 looks like this:
(?<=^(\d+(-|$)){2})\d+
It says to
match the longest sequence of digits that is preceded by
start of text, followed by
a group, consisting of
1 or more decimal digits, followed by
either a - or end-of-text
with that group repeated exactly 2 times
Here's a sample program:
string src = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
for ( int n = 1 ; n <= 10 ; ++n )
{
int n1 = n-1 ;
string x = n1.ToString(CultureInfo.InvariantCulture) ;
string regex = #"(?<=^(\d+(-|$)){"+ x + #"})\d+" ;
Console.Write( "regex: {0} ",regex);
Regex rx = new Regex( regex ) ;
Match m = rx.Match( src ) ;
Console.WriteLine( "N={0,-2}, N-1={1,-2}, {2}" ,
n ,
n1 ,
m.Success ? "success: " + m.Value : "failure"
) ;
}
It produces this output:
regex: (?<=^(\d+(-|$)){0})\d+ N= 1, N-1=0 , success: 1
regex: (?<=^(\d+(-|$)){1})\d+ N= 2, N-1=1 , success: 22
regex: (?<=^(\d+(-|$)){2})\d+ N= 3, N-1=2 , success: 333
regex: (?<=^(\d+(-|$)){3})\d+ N= 4, N-1=3 , success: 4444
regex: (?<=^(\d+(-|$)){4})\d+ N= 5, N-1=4 , success: 55555
regex: (?<=^(\d+(-|$)){5})\d+ N= 6, N-1=5 , success: 666666
regex: (?<=^(\d+(-|$)){6})\d+ N= 7, N-1=6 , success: 7777777
regex: (?<=^(\d+(-|$)){7})\d+ N= 8, N-1=7 , success: 88888888
regex: (?<=^(\d+(-|$)){8})\d+ N= 9, N-1=8 , success: 999999999
regex: (?<=^(\d+(-|$)){9})\d+ N=10, N-1=9 , failure
Try this:
string text = "16-15316-273";
Regex r = new Regex(#"\d+");
var m = r.Match(text, text.IndexOf('-'));
The output is 15316 ;)

Regex in c# for allowing numbers and alphabets

I am trying to write a regex that allows different set of inputs.
first 9 characters should be numeric - 123456789
10 character is optional and if present should be Alphabet - 123456789A
11 Character if preset should be aplphanumeric - 123456789AA or 123456789A1
12 - 14 Character if preset should be numeric - 123456789AA123 or 123456789A1123
I tried this but it is not working..
string sMatch = "^[0-9]{9}([a-zA-Z])\?{1}([0-9A-Za-z])\?{1}([0-9])?{1}([0-9])\?{1}$";
System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex(sMatch);
i dont know c#'s regex implementation but how about:
\d{9}[a-zA-Z]?[a-zA-Z0-9]?\d{0,3}
Try the following
string sMatch = "^(?i)\\b\\d{9}[a-z]?[^\W_]?\\d{0,3}\\b$";
See live demo
A. You need to put an '#' character before the first quotation mark in the string to mark it as a string literal so it handles \? properly.
B. I'd break it up into a few statements, IE:
string preset1 = #"^[0-9]{9}", preset2 = #"[a-zA-Z]{1}", preset3 = #"[0-9A-Za-z]{1}",
preset4 = #"[0-9]{3}$";
if (Regex.IsMatch(input, preset1){
//Do fits first preset
if (Regex.IsMatch(input, preset1 + preset2){
//Do fits second preset
if (Regex.IsMatch(input, preset1 + preset2 + preset3)){
//Do fits third preset
if (Regex.IsMatch(input, preset1 + preset2 + preset 3 + preset4)){
//Do fits fourth preset
}
}
}
}

Combining substrings in C# with a custom format?

Part of an app I'm creating in C# replaces certain substrings in a string with a value in square brackets like [11]. Often there can be the same value straight after - so I want to reduce the amount of text by combining them into one like [11,numberOfSame]
For example, if the string contains:
blahblah[122][122][122]blahblahblahblah[18][18][18][18]blahblahblah
The desired new string would be:
blahblah[122,3]blahblahblahblah[18,4]blahblahblah
Would anyone know how I would do this? Thanks! :)
Regex.Replace("blahblah[122][122][122]blahblahblahblah[18][18][18][18]blahblahblah",
#"(\[([^]]+)])(\1)+",
m => "[" + m.Groups[2].Value + "," + (m.Groups[3].Captures.Count + 1) + "]")
Returns:
blahblah[122,3]blahblahblahblah[18,4]blahblahblah
Explanation of regex:
( Starts group 1
\[ Matches [
( Starts group 2
[^]]+ Matches 1 or more of anything but ]
) Ends group 2
] Matches ]
) Ends group 1
( Starts group 3
\1 Matches whatever was in group 1
) Ends group 3
+ Matches one or more of group 3
Explanation of lambda:
m => Accepts a Match object
"[" + A [
m.Groups[2].Value + Whatever was in group 2
"," + A ,
(m.Groups[3].Captures.Count + 1) + The number of times group 3 matched + 1
"]" A ]
I am using this overload, which accepts a delegate to compute the replacement value.
string input = "[122][44][122]blah[18][18][18][18]blah[122][122]";
string output = Regex.Replace(input, #"((?<firstMatch>\[(.+?)\])(\k<firstMatch>)*)", m => "[" + m.Groups[2].Value + "," + (m.Groups[3].Captures.Count + 1) + "]");
Returns:
[122,1][44,1][122,1]blah[18,4]blah[122,2]
Explanation:
(?<firstMatch>\[(.+?)\]) Matches the [123] group, names group firstMatch
\k<firstMatch> matches whatever text was that was matched by the firstMatch group and adding * matches it zero or more times, giving us our count used in the lambda.
My reference for anything Regex: http://www.regular-expressions.info/

Regex to find inner if conditions

I had a regex to find single if-then-else condition.
string pattern2 = #"if( *.*? *)then( *.*? *)(?:else( *.*? *))?endif";
Now, I need to extend this & provide looping if conditions. But the regex is not suitable to extract the then & else parts properly.
Example Looped IF condition:
if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif) else 1 endif
Expected Result with Regex:
condition = (2>1)
then part = ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif)
else part = 1
I can check if else & then part have real values or a condition. Then i can use the same regex on this inner condition until everything is resolved.
The current regex returns result like:
condition = (2 > 1)
then part = ( if( 3>2) then ( if(4>3) then 3
else part = 3
Meaning, it returns the value after first "else" found. But actually, it has to extract from the last else.
Can someone help me with this?
You can adapt the solution on answer Can regular expressions be used to match nested patterns? ( http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/ ).
That solution shows how to match content between html tags , even if it contains nested tags. Applying the same idea for parenthesis pairs should solve your problem.
EDIT:
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
String matchParenthesis = #"
(?# line 01) \((
(?# line 02) (?>
(?# line 03) \( (?<DEPTH>)
(?# line 04) |
(?# line 05) \) (?<-DEPTH>)
(?# line 06) |
(?# line 07) .?
(?# line 08) )*
(?# line 09) (?(DEPTH)(?!))
(?# line 10) )\)
";
//string source = "if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif) else 1 endif";
string source = "if (2 > 1) then 2 else ( if(3>2) then ( if(4>3) then 4 else 3 endif ) else 2 endif) endif";
string pattern = #"if\s*(?<condition>(?:[^(]*|" + matchParenthesis + #"))\s*";
pattern += #"then\s*(?<then_part>(?:[^(]*|" + matchParenthesis + #"))\s*";
pattern += #"else\s*(?<else_part>(?:[^(]*|" + matchParenthesis + #"))\s*endif";
Match match = Regex.Match(source, pattern,
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
Console.WriteLine(match.Success.ToString());
Console.WriteLine("source: " + source );
Console.WriteLine("condition = " + match.Groups["condition"]);
Console.WriteLine("then part = " + match.Groups["then_part"]);
Console.WriteLine("else part = " + match.Groups["else_part"]);
}
}
}
If you replace endif with end you get
if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 end) else 2 end) else 1 end
and you also got a perfectly fine Ruby expression. Download IronRuby and add references to IronRuby, IronRuby.Libraries, and Microsoft.Scripting to your project. You find them in C:\Program Files\IronRuby 1.0v4\bin then
using Microsoft.Scripting;
using Microsoft.Scripting.Hosting;
using IronRuby;
and in your code
var engine = Ruby.CreateEngine();
int result = engine.Execute("if (2 > 1) then ( if(3>2) then ( if(4>3) then 4 else 3 end ) else 2 end) else 1 end");

Categories