Regex find all matches EXCEPT those surrounded by characters

Regex find all matches EXCEPT those surrounded by characters - c#

I have the following regular expression to find all of the instances of {word} in my string. In the following string, this (correctly) matches {eeid} and {catalog}:
Expression
{([^:]*?)}
String being searched
{?:participants::lookup(.,{eeid},{catalog})}
Now - I need to "escape" one of those values, so it is NOT matched/replaced. I'm trying to use double square brackets to do so:
{?:participants::lookup(.,{eeid},[[{catalog}]])}
How can I adjust my regular expression so it ignores {catalog} (enclosed in [[ ]]) but finds {eeid}?

You can use
(?<!\[\[)\{([^:{}]*)}(?!]])
See the .NET regex demo.
Details
(?<!\[\[) - a negative lookbehind that fails the match if there is [[ immediately to the left of the current location
\{ - a { char
([^:{}]*) - Group 1: any zero or more chars other than :, { and }
} - a } char
(?!]]) - a negative lookahead that fails the match if there is ]] immediately to the right of the current location.
See the C# demo:
var s = "{?:participants::lookup(.,{eeid},[[{catalog}]])}";
var rx = #"(?<!\[\[)\{([^:{}]*)}(?!]])";
var res = Regex.Matches(s, rx).Cast<Match>().Select(x => x.Groups[1].Value);
foreach (var t in res)
Console.WriteLine(t);
// => eeid

Related

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".

You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.

In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

Problem with brackets in regular expression in C#

can anybody help me with regular expression in C#?
I want to create a pattern for this input:
{a? ab 12 ?? cd}
This is my pattern:
([A-Fa-f0-9?]{2})+
The problem are the curly brackets. This doesn't work:
{(([A-Fa-f0-9?]{2})+)}
It just works for
{ab}

I would use {([A-Fa-f0-9?]+|[^}]+)}
It captures 1 group which:
Match a single character present in the list below [A-Fa-f0-9?]+
Match a single character not present in the list below [^}]+

If you allow leading/trailing whitespace within {...} string, the expression will look like
{(?:\s*([A-Fa-f0-9?]{2}))+\s*}
See this regex demo
If you only allow a single regular space only between the values inside {...} and no space after { and before }, you can use
{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}
See this regex demo. Note this one is much stricter. Details:
{ - a { char
(?:\s*([A-Fa-f0-9?]{2}))+ - one or more occurrences of
\s* - zero or more whitespaces
([A-Fa-f0-9?]{2}) - Capturing group 1: two hex or ? chars
\s* - zero or more whitespaces
} - a } char.
See a C# demo:
var text = "{a? ab 12 ?? cd}";
var pattern = #"{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
// => a?; ab; 12; ??; cd

If you want to capture pairs of chars between the curly's, you can use a single capture group:
{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}
Explanation
{ Match {
( Capture group 1
[A-Fa-f0-9?]{2} Match 2 times any of the listed characters
(?: [A-Fa-f0-9?]{2})* Optionally repeat a space and again 2 of the listed characters
) Close group 1
} Match }
Regex demo | C# demo
Example code
string pattern = #"{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}";
string input = #"{a? ab 12 ?? cd}
{ab}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
a? ab 12 ?? cd
ab

Regex to match string between curly braces (that allows to escape them via 'doubling')

I was using the regex from Extract values within single curly braces:
(?<!{){[^{}]+}(?!})
However, it does not cover the user case #3 (see below).
I would like to know if it's possible to define a regular expression that satisfied the use cases below
Use case 1
Given:
Hola {name}
It should match {name} and capture name
But I would like to be able to escape curly braces when needed by doubling them, like C# does for interpolated strings. So, in a string like
Use case 2
Hola {name}, this will be {{unmatched}}
The {{unmatched}} part should be ignored because it uses them doubled. Notice the {{ and }}.
Use case 3
In the last, most complex case, a text like this:
Buenos {{{dias}}}
The text {dias} should be a match (and capture dias) because the first outer-most doubled curly braces should be interpreted just like another character (they are escaped) so it should match: {{{dias}}}
My ultimate goal is to replace the matches later with another string, like a variable.
EDIT
This 4th use case pretty much summarized the whole requirements:
Given:
Hola {name}, buenos {{{dias}}}
Results in:
Match 1:
Matched text: {name}
Captured text: name
Match 2:
Matched text: {dias}
Captured text: dias

To optionally match double curly's, you could use an if clause and take the value from capture group 2.
(?<!{)({{)?{([^{}]+)}(?(1)}})(?!})
Explanation
(?<!{) Assert not { directly to the left
({{)? Optionally capture {{ in group 1
{([^{}]+)} Match from { till } without matching { and } in between
(?(1)}}) If clause, if group 1 exists, match }}
(?!}) Assert not } directly to the right
.Net regex demo | C# demo
string pattern = #"(?<!{)({{)?{([^{}]+)}(?(1)}})(?!})";
string input = #"Hola {name}
Hola {name}, this will be {{unmatched}}
Buenos {{{dias}}}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[2].Value);
}
Output
name
name
dias
If the double curly's should be balanced, you might use this approach:
(?<!{){(?>(?<={){{(?<c>)|([^{}]+)|}}(?=})(?<-c>))*(?(c)(?!))}(?!})
.NET regex demo

You can use
(?<!{)(?:{{)*{([^{}]*)}(?:}})*(?!})
See the .NET regex demo.
In C#, you can use
var results = Regex.Matches(text, #"(?<!{)(?:{{)*{([^{}]*)}(?:}})*(?!})").Cast<Match>().Select(x => x.Groups[1].Value).ToList();
Alternatively, to get full matches, wrap the left- and right-hand contexts in lookarounds:
(?<=(?<!{)(?:{{)*{)[^{}]*(?=}(?:}})*(?!}))
See this regex demo.
In C#:
var results = Regex.Matches(text, #"(?<=(?<!{)(?:{{)*{)[^{}]*(?=}(?:}})*(?!}))")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
Regex details
(?<=(?<!{)(?:{{)*{) - immediately to the left, there must be zero or more {{ substrings not immediately preceded with a { char and then {
[^{}]* - zero or more chars other than { and }
(?=}(?:}})*(?!})) - immediately to the right, there must be }, zero or more }} substrings not immediately followed with a } char.

Regular expression to find 3 repeated words

I'm trying to create a regular expression which matches the same word 3 times, they are separated by a comma. For example, some inputs would be:
HEY,HEY,HEY - match
NO,NO,NO - match
HEY,HI,HEY - no match
HEY,H,Y - no match
HEY,NO,HEY - no match
How can I go about doing this? I've had a look at some example but they are only good for characters, not words.

This should do the trick:
^(\w+),\1,\1$
Explanation:
^: beginning of the line. Needed to avoid matching "HHEY,HEY,HEY".
(\w+): matches one or more word characters. This is the first catpured group.
,: the character comma.
\1: a backreference to the first captured group. In the other words, matches whatever was matched in (\w+) before.
,: the character comma.
\1: a backreference to the first captured group.
$: end of the line. Needed to avoid matching "HEY,HEY,HEYY".
Source: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx#Anchor_5
Example usage
static void Main()
{
var threeWords = new Regex(#"^(\w+),\1,\1$");
var lines = new[]
{
"HEY,HEY,HEY",
"NO,NO,NO",
"HEY,HI,HEY",
"HEY,H,Y",
"HEY,NO,HEY",
"HHEY,HEY,HEY",
"HEY,HEY,HEYY",
};
foreach (var line in lines)
{
var isMatch = threeWords.IsMatch(line) ? "" : "no ";
Console.WriteLine($"{line} - {isMatch}match");
}
}
Output:
HEY,HEY,HEY - match
NO,NO,NO - match
HEY,HI,HEY - no match
HEY,H,Y - no match
HEY,NO,HEY - no match
HHEY,HEY,HEY - no match
HEY,HEY,HEYY - no match

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?
Example - I'm trying to have the expression match all the b characters following any number of a characters:
Regex expression = new Regex("(?<=a).*");
foreach (Match result in expression.Matches("aaabbbb"))
MessageBox.Show(result.Value);
returns aabbbb, the lookbehind matching only an a. How can I make it so that it would match all the as in the beginning?
I've tried
Regex expression = new Regex("(?<=a+).*");
and
Regex expression = new Regex("(?<=a)+.*");
with no results...
What I'm expecting is bbbb.

Are you looking for a repeated capturing group?
(.)\1*
This will return two matches.
Given:
aaabbbb
This will result in:
aaa
bbbb
This:
(?<=(.))(?!\1).*
Uses the above principal, first checking that the finding the previous character, capturing it into a back reference, and then asserting that that character is not the next character.
That matches:
bbbb

I figured it out eventually:
Regex expression = new Regex("(?<=a+)[^a]+");
foreach (Match result in expression.Matches(#"aaabbbb"))
MessageBox.Show(result.Value);
I must not allow the as to me matched by the non-lookbehind group. This way, the expression will only match those b repetitions that follow a repetitions.
Matching aaabbbb yields bbbb and matching aaabbbbcccbbbbaaaaaabbzzabbb results in bbbbcccbbbb, bbzz and bbb.

The reason the look-behind is skipping the "a" is because it is consuming the first "a" (but no capturing it), then it captures the rest.
Would this pattern work for you instead? New pattern: \ba+(.+)\b
It uses a word boundary \b to anchor either ends of the word. It matches at least one "a" followed by the rest of the characters till the word boundary ends. The remaining characters are captured in a group so you can reference them easily.
string pattern = #"\ba+(.+)\b";
foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
Console.WriteLine("Match: " + m.Value);
Console.WriteLine("Group capture: " + m.Groups[1].Value);
}
UPDATE: If you want to skip the first occurrence of any duplicated letters, then match the rest of the string, you could do this:
string pattern = #"\b(.)(\1)*(?<Content>.+)\b";
foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
Console.WriteLine("Match: " + m.Value);
Console.WriteLine("Group capture: " + m.Groups["Content"].Value);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex find all matches EXCEPT those surrounded by characters - c#

Related

Building a regular expression in C#

Problem with brackets in regular expression in C#

Regex to match string between curly braces (that allows to escape them via 'doubling')

Regular expression to find 3 repeated words

How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

Categories

Resources