Regex to extract text between parenthesis paired with asterisk - c#

This is a slightly different from similar posts in that the parenthesis are paired with an asterisk.
example input:
yada yada (* need to grab this text *) yoda
I thought Jennifers post could be altered for this but my attempts fail.
//Regex regex = new Regex("\\((?<TextInsideBrackets>\\w+)\\)"); //original
Regex regex = new Regex("\\(\\*(?<TextInsideBrackets>\\w+)\\*\\)"); // my attempt
string incomingValue = "Autocycleprestartcase := 20; (* Yayitme ve Konveyoru bosaltabilir *)";
string insideBrackets = null;
Match match = regex.Match(incomingValue);
if (match.Success)
{
insideBrackets = match.Groups["TextInsideBrackets"].Value;
}
Suggestions?
Also, I'd like to remove the enclosed text, with the enclosing parethesis/asterisk pairs, from the input line.
So the output of above would give me
yada yada yoda
and the value
need to grab this text
Thanks

Change it to
Regex regex = new Regex("\\(\\*(?<TextInsideBrackets>[\\w ]+)\\*\\)");
^^^^^^
to allow spaces

Here is a solution to get both the values while re-using the pattern dynamically:
string incomingValue = "Autocycleprestartcase := 20; (* Yayitme ve Konveyoru bosaltabilir *)";
string pattern = #"\(\*\s*(.*?)\s*\*\)";
string insideBrackets = Regex.Match(incomingValue, pattern).Groups[1].Value ?? string.Empty;
Console.WriteLine(insideBrackets); // => Yayitme ve Konveyoru bosaltabilir
Console.WriteLine(Regex.Replace(incomingValue, $#"\s*{pattern}", string.Empty)); // => Autocycleprestartcase := 20;
See the C# demo
Pattern details:
\( - a literal ( (note the single backslash is used as the string is defined via a verbatim string literal, #"")
\* - a literal *
\s* - 0+ whitespaces (trimming the value from the left)
(.*?) - Group 1 capturing zero or more chars other than newline, as few as possible, up to the first occurrence of the subsequent subpatterns
\s* - zero or more whitespaces (trimming from the right)
\* - a literal *
\) - a literal )
To get the second value, you may use the same pattern, but add \s* (zero or more whitespaces) at the beginning, what is done with Regex.Replace(incomingValue, $#"\s*{pattern}", string.Empty).

Related

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".
You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.
In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

Cannot filter content out of a string

I'm trying to get the content Camp de Futbol d'Aixovall in this string:
//<![CDATA[
//document.observe("dom:loaded", function() {mapsLoad()});
Event.observe(window, 'load', mapsLoad);
function loadMarkers()
{
setMarker(
"Camp de Futbol d'Aixovall",
42.476449269018,
1.487649679184,
null,
null,
1996,
null,
null,
null,
null,
"/venues/andorra/devk-arena/v1996/"
);
}
//]]>
I tried this regex pattern: loadMarkers[^\{]+(.*})
but this won't work, someone could help me?
update
var name = System.Text.RegularExpressions.Regex
.Match(c, #"(?<=function\s+loadMarkers\([^()]*\)(?:\r?\n.*){2}\r?\n\s*")[^ "]+(?=")").Groups[1].Value;
For the exact string as you posted, you may use the following regex
function\s+loadMarkers\([^()]*\)(?:\r?\n.*){2}\r?\n\s*"([^"]+)"
and extract Group 1 value. See the regex demo.
Details
function\s+loadMarkers\( - this matches function, 1+ whitespaces, loadMarkers(,
[^()]* - matches any 0+ chars other than ( and )
\) - a ) char
(?:\r?\n.*){2} - matches a line break and the whole line, two times (skipping two lines this way)
\r?\n - a line break
\s* - 0+ whitespaces
" - a double quote
([^"]+) - Group 1: any 1+ chars other than "
" - a double quote.
In C#, the code will look like
var pattern = #"function\s+loadMarkers\([^()]*\)(?:\r?\n.*){2}\r?\n\s*""([^""]+)""";
var name = System.Text.RegularExpressions.Regex.Match(c, pattern).Groups[1].Value;
The result will be Camp de Futbol d'Aixovall. See this regex demo.
If, for some reason, you want to get the result as a whole match value, wrap the left and right hand contexts in lookarounds:
var pattern = #"(?<=function\s+loadMarkers\([^()]*\)(?:\r?\n.*){2}\r?\n\s*"")[^""]+(?="")";
var name = Regex.Match(c, pattern).Value;
Alternatively you can also use the pattern
(?<=prefix)find(?=suffix)
which matches a position between a prefix and a suffix. It yields the find value directly without having to use groups.
(?<=function\sloadMarkers\(\)\s+\{\s+setMarker\(\s+")[^"]+(?=")
where
prefix = function\sloadMarkers\(\)\s+\{\s+setMarker\(\s+"
find = [^"]+
suffix = "

Regex pattern for splitting a delimited string in curly braces

I have the following string
{token1;token2;token3#somewhere.com;...;tokenn}
I need a Regex pattern, that would give a result in array of strings such as
token1
token2
token3#somewhere.com
...
...
...
tokenn
Would also appreciate a suggestion if can use the same pattern to confirm the format of the string, means string should start and end in curly braces and at least 2 values exist within the anchors.
You may use an anchored regex with named repeated capturing groups:
\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z
See the regex demo
\A - start of string
{ - a {
(?<val>[^;]*) - Group "val" capturing 0+ (due to * quantifier, if the value cannot be empty, use +) chars other than ;
(?:;(?<val>[^;]*))+ - 1 or more occurrences (thus, requiring at least 2 values inside {...}) of the sequence:
; - a semi-colon
(?<val>[^;]*) - Group "val" capturing 0+ chars other than ;
} - a literal }
\z - end of string.
.NET regex keeps each capture in a CaptureCollection stack, that is why all the values captured into "num" group can be accessed after a match is found.
C# demo:
var s = "{token1;token2;token3;...;tokenn}";
var pat = #"\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z";
var caps = new List<string>();
var result = Regex.Match(s, pat);
if (result.Success)
{
caps = result.Groups["val"].Captures.Cast<Capture>().Select(t=>t.Value).ToList();
}
Read it(similar to your problem): How to keep the delimiters of Regex.Split?.
For your RegEx testing use this: http://www.regexlib.com/RETester.aspx?AspxAutoDetectCookieSupport=1.
But RegEx is a very resource-intensive, slow operation.
In your case will be better to use the Split method of string class, for example : "token1;token2;token3;...;tokenn".Split(';');. It will return to you a collection of strings, that you want to obtain.

Regex matching excluding a specific context

I'm trying to search a string for words within single quotes, but only if those single quotes are not within parentheses.
Example string:
something, 'foo', something ('bar')
So for the given example I'd like to match foo, but not bar.
After searching for regex examples I'm able to match within single quotes (see below code snippet), but am not sure how to exclude matches in the context previously described.
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"'([^']*)");
if (name.Success)
{
string matchedName = name.Groups[1].Value;
Console.WriteLine(matchedName);
}
I would recommend using lookahead instead (see it live) using:
(?<!\()'([^']*)'(?!\))
Or with C#:
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
Console.WriteLine(name.Groups[1].Value);
}
The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:
\([^()]*\)|'([^']*)'
See the regex demo
Details:
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
| - or
' - a '
([^']*) - Group 1 capturing 0+ chars other than '
' - a single quote.
In C#, use .Groups[1].Value to get the values you need. See the online demo:
var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, #"\([^()]*\)|'([^']*)'")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:
(?<!\([^()]*)'([^']*)'(?![^()]*\))
See this regex demo.
Details:
(?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
'([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
(?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
Since you'd want to exclude ', the same code as above applies.

Semi-Fancy Regex.Replace() function

Words placed after these punctuation marks must be capitalized (note that there may be spaces or special characters on either side of these when used):
dash ( - ), slash ( / ), colon ( : ), period ( . ), question mark ( ? ), exclamation
point ( ! ), ellipsis (... OR …) (they are different)
I am sort of bogged down on this puzzle because of all of the special regex characters that I am trying to literally look for in my search. I believe I can use Regex.Escape although I cannot get it working for me right now in this case.
A few examples of starting strings to change to might be:
Change this:
This is a dash - example
To this:
This is a dash - Example <--capitalize "Example" with Regex
This is another dash -example
This is another dash -Example
This is an ellipsis ... example
This is an ellipsis ... Example
This is another ellipsis …example
This is another ellipsis …Example
This is a slash / example
This is a slash / Example
This is a question mark ? example
This is a question mark ? Example
Here is the code I have so far:
private static string[] postCaps = { "-", "/", ":", "?", "!", "...", "…"};
private static string ReplacePostCaps(string strString)
{
foreach (string postCap in postCaps)
{
strString = Regex.Replace(strString, Regex.Escape(postCap), "/(?<=(" + Regex.Escape(postCap) + "))./", RegexOptions.IgnoreCase);
}
return strString;
}
Thank you very much!
You shouldn't need to iterate over a list of punctuation but instead could just add a character set in a single regex:
(?:[/:?!…-]|\.\.\.)\s*([a-z])
To use it with Regex.Replace():
strString = Regex.Replace(
strString,
#"(?:[/:?!…-]|\.\.\.)\s*([a-z])",
m => m.ToString().ToUpper()
);
Regex Explained:
(?: # non-capture set
[/:?!…-] # match any of these characters
| \.\.\. # *or* match three `.` characters in a row
)
\s* # allow any whitespace between matched character and letter
([a-z]) # match, and capture, a single lowercase character
Maybe this works for you:
var phrase = "This is another dash ... example";
var rx = new System.Text.RegularExpressions.Regex(#"(?<=[\-./:?!]) *\w");
var newString = rx.Replace(phrase, new System.Text.RegularExpressions.MatchEvaluator(m => m.Value.ToUpperInvariant()));

Categories