Regex not able to parse last group [duplicate] - c#

This question already has answers here:
What is the difference between .*? and .* regular expressions?
(3 answers)
Closed 3 years ago.
This is the test. I expect the last group to be ".png", but this pattern returns "" instead.
var inputStr = #"C:\path\to\dir\[yyyy-MM-dd_HH-mm].png";
var pattern = #"(.*?)\[(.*?)\](.*?)";
var regex = new Regex(pattern);
var match = regex.Match(inputStr);
var thirdGroupValue = match.Groups[3].Value;
// ✓ EXPECTED: ".png"
// ✗ CURRENT: ""
The 1st and 2nd groups work fine.

This is because you made the * in Group 3 lazy:
(.*?)\[(.*?)\](.*?)
^
here
This means it will match as little as possible. What's the least .* can match? An empty string!
You can learn more about lazy vs greedy here.
You can fix this either by removing ?, making it greedy, or put a $ at the end, telling it to match until the end of the string:
(.*?)\[(.*?)\](.*)
or
(.*?)\[(.*?)\](.*?)$

Related

Get value between parentheses [duplicate]

This question already has answers here:
How do I extract text that lies between parentheses (round brackets)?
(19 answers)
Closed 4 years ago.
I need to get the all strings that sit between open and closed parentheses. An example string is as follows
[CDATA[[(MyTag),xi(Tag2) ]OT(OurTag3).
The output needs to be an array with MyTag, Tag2, OurTag3 i.e. The strings need to have the parentheses removed.
The code below works but retains the parentheses. How do I adjust the regex pattern to remove the parentheses from the output?
string pattern = #"\(([^)]*)\)";
string MyString = "[CDATA[[(MyTag),xi(Tag2) ]OT(OurTag3)";
Regex re = new Regex(pattern);
foreach (Match match in re.Matches(MyString))
{
Console.WriteLine(match.Groups[1]); // print the captured group 1
}
You should be able to use the following:
(?<=\().+?(?=\))
(?<=() - positive lookbehind for (
.*? - non greedy match for the content
(?=)) - positive lookahead for )

Split a string by Regex [duplicate]

This question already has answers here:
Regular expression to extract text between square brackets
(15 answers)
Closed 5 years ago.
I'm currently thinking of how to split this kind of string into regex using c#.
[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]
Can someone knowledgeable on regex can point me on how to achieved this goal?
sample regex pattern that don't work:
[\dd,\dd,\dd]
sample output:
[01,01,01]
[02,03,00]
[03,07,00]
[04,06,00]
[05,02,00]
[06,04,00]
[07,08,00]
[08,05,00]
This will do the job in C# (\[.+?\]), e.g.:
var s = #"[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
var reg = new Regex(#"(\[.+?\])");
var matches = reg.Matches(s);
foreach(Match m in matches)
{
Console.WriteLine($"{m.Value}");
}
EDIT This is how the expression (\[.+?\]) works
first the outter parenthesis, ( and ), means to capture whatever the inside pattern matched
then the escaped square brackets, \[ and \], is to match the [ and ] in the source string
finally the .+? means to match one or more characters, but as few times as possible, so that it won't match all the characters before the first [ and the last ]
I know you stipulated Regex, however it's worth looking at Split again, if for only for academic purposes:
Code
var input = "[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
var output = input.Split(']',StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + "]") // the bracket back
.ToList();
foreach(var o in output)
Console.WriteLine(o);
Output
[01,01,01]
[02,03,00]
[03,07,00]
[04,06,00]
[05,02,00]
[06,04,00]
[07,08,00]
[08,05,00]
The Regex solution below is restricted to 3 values of only 2 digits seperated by comma. Inside the foreach loop you can access the matching value via match.Value. >> Refiddle example
Remember to include using System.Text.RegularExpressions;
var input = "[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
foreach(var match in Regex.Matches(input, #"(\[\d{2},\d{2},\d{2}\])+"))
{
// do stuff
}
Thanks all for the answer i also got it working by using this code
string pattern = #"\[\d\d,\d\d,\d\d]";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(myResult);
Debug.WriteLine(matches.Count);
foreach (Match match in matches)
Debug.WriteLine(match.Value);

C# Regular Expression Capturing Empty String [duplicate]

This question already has answers here:
C# Regex.Split: Removing empty results
(9 answers)
Closed 5 years ago.
I'm trying to create a simple regular expression in C# to split a string into tokens. The problem I'm running into is that the pattern I'm using captures an empty string, which throws off my expected results. What can I do to change my regular expression so it doesn't capture an empty string?
var input = "ID=123&User=JohnDoe";
var pattern = "(?:id=)|(?:&user=)";
var tokens = Regex.Split(input, pattern, RegexOptions.IgnoreCase);
// Expected Results
// tokens[0] == "123"
// tokens[1] == "JohnDoe"
// Actual Results
// tokens[0] == ""
// tokens[1] == "123"
// tokens[2] == "JohnDoe"
While the comments to your OP regarding using a different approach may have merit, they don't address your specific question regarding the RegEx behavior.
I think that the reason though you're getting the regex behavior has to do with an implicit capture group (ed: or it could just be limiting the capture behavior of the first group is sufficient), but I haven't made it to the top level of the RegEx hierarchy of understanding.
Edit:
Working RegEx for the given test case:
(?>id=)|(?:&user=)
If none of this is to your liking, you could always tack a predicate to the tokens list:
tokens.Where(x => !string.IsNullOrWhiteSpace(x))
I don't think you can solve this problem with Regex.Split to be honest. One brute force way to do this is to remove every "":
var input = "ID=123&User=JohnDoe";
var pattern = "(?:id=)|(?:&user=)";
var tokens = Regex.Split(input, pattern, RegexOptions.IgnoreCase).Where(x => x != "");
I think you should use regex that actually captures the tokens in groups.
var input = "ID=123&User=JohnDoe";
var pattern = "id=(.+)&user=(.+)";
var match = Regex.Match(input, pattern, RegexOptions
.IgnoreCase);
match.Groups[1] // 123

Easy Regex capture [duplicate]

This question already has answers here:
Regular Expression Groups in C#
(5 answers)
Closed 6 years ago.
New to using C# Regex, I am trying to capture two comma separated integers from a string into two variables.
Example: 13,567
I tried variations on
Regex regex = new Regex(#"(\d+),(\d+)");
var matches = regex.Matches("12,345");
foreach (var itemMatch in matches)
Debug.Print(itemMatch.Value);
This just captures 1 variable, which is the entire string. I did workaround this by changing the capture pattern to "(\d+)", but that then ignores the middle comma entirely and I would get a match if there were any text between the integers.
How do I get it to extract both integers and ensure it also sees a comma between.
Can do this with String.Split
Why not just use a split and parse?
var results = "123,456".Split(',').Select(int.Parse).ToArray();
var left = results[0];
var right = results[1];
Alternatively, you can use a loop and use int.TryParse to handle failures but for what you're looking for this should cover it
If you're really committed to a Regex
You can do this with a Regex too, just need to use groups of the match
Regex r = new Regex(#"(\d+)\,(\d+)", RegexOptions.Compiled);
var r1 = r.Match("123,456");
//first is total match
Console.WriteLine(r1.Groups[0].Value);
//Then first and second groups
var left = int.Parse(r1.Groups[1].Value);
var right = int.Parse(r1.Groups[2].Value);
Console.WriteLine("Left "+ left);
Console.WriteLine("Right "+right);
Made a dotnetfiddle you can test the solution in as well
With Regex, you can use this:
Regex regex = new Regex(#"\d+(?=,)|(?<=,)\d+");
var matches = regex.Matches("12,345");
foreach (Match itemMatch in matches)
Console.WriteLine(itemMatch.Value);
prints:
12
345
Actually this is doing a look-ahead and look-behind a , :
\d+(?=,) <---- // Match numbers followed by a ,
| <---- // OR
(?<=,)\d+ <---- // Match numbers preceeded by a ,

Replace text place holders with Regular Expression [duplicate]

This question already has answers here:
Extract string between braces using RegEx, ie {{content}}
(3 answers)
Closed 6 years ago.
I have a text template that has text variables wrapped with {{ and }}.
I need a regular expression to gives me all the matches that "Include {{ and }}".
For example if I have {{FirstName}} in my text I want to get {{FirstName}} back as a match to be able to replace it with the actual variable.
I already found a regular expression that probably gives me what is INSIDE { and } but I don't know how can I modify it to return what I want.
/\{([^)]+)\}/
This pattern should do the trick:
string str = "{{FirstName}} {{LastName}}";
Regex rgx = new Regex("{{.*?}}");
foreach (var match in rgx.Matches(str))
{
// {{FirstName}}
// {{LastName}}
}
Maybe:
alert(/^\{{2}[\w|\s]+\}{2}$/.test('{{FirstName}}'))
^: In the beginning.
$: In the end.
\{{2}: Character { 2 times.
[\w|\s]+: Alphabet characters or whitespace 1 or more times.
\}{2}: Character } 2 times.
UPDATE:
alert(/(^\{{2})?[\w|\s]+(\}{2})?$/.test('FirstName'))

Categories