C# regex extract string enclosed into single quotes - c#

I've the following string that I need to parse using RegEx.
abc = 'def' and size = '1 x(3\" x 5\")' and (name='Sam O\'neal')
This is an SQL filter, which I'd like to split into tokens using the following separators:
(, ), >,<,=, whitespace, <=, >=, !=
After the string is parsed, I'd like the output to be:
abc,
=,
def,
and,
size,
=,
'1 up(3\" x 5\")',
and,
(,
Sam O\'neal,
),
I've tried the following code:
string pattern = #"(<=|>=|!=|=|>|<|\)|\(|\s+)";
var tokens = new List<string>(Regex.Split(filter, pattern));
tokens.RemoveAll(x => String.IsNullOrWhiteSpace(x));
I'm not sure how to keep the string in single quotes as a one token. I'm new to Regex and would appreciate any help.

Your pattern needs an update with yet another alternative branch: '[^'\\]*(?:\\.[^'\\]*)*'.
It will match:
' - a single quote
[^'\\]* - 0+ chars other than ' and \
(?: - a non-capturing group matching sequences of:
\\. - any escape sequence
[^'\\]* - 0+ chars other than ' and \
)* - zero or more occurrences
' - a single quote
In C#:
string pattern = #"('[^'\\]*(?:\\.[^'\\]*)*'|<=|>=|!=|=|>|<|\)|\(|\s+)";
See the regex demo
C# demo:
var filter = #"abc = 'def' and size = '1 x(3"" x 5"")' and (name='Sam O\'neal')";
var pattern = #"('[^'\\]*(?:\\.[^'\\]*)*'|<=|>=|!=|=|>|<|\)|\(|\s+)";
var tokens = Regex.Split(filter, pattern).Where(x => !string.IsNullOrWhiteSpace(x));
foreach (var tok in tokens)
Console.WriteLine(tok);
Output:
abc
=
'def'
and
size
=
'1 x(3" x 5")'
and
(
name
=
'Sam O\'neal'
)

Related

Tokenize a string using multiple conditions

For the string below:
var str = "value0 'value 1/5' 'x ' value2";
Is there a way I can parse that string such that I get
arr[0] = "value0";
arr[1] = "value 1/5";
arr[2] = "x ";
arr[3] = "value2";
The order of values that might come with single quotes is arbitrary. Case does not matter.
I can get all values between single quotes using a regex like
"'(.*?)'"
but I need the order of those values relative other non-single-quoted values.
Use
'(?<val>.*?)'|(?<val>\S+)
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"'(?<val>.*?)'|(?<val>\S+)";
string input = #"value0 'value 1/5' 'x ' value2";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups["val"].Value);
}
}
}
In C# you can reuse the same named capture group, so you could use an alternation | using the same group name for both parts.
'(?<val>[^']+)'|(?<val>\S+)
The pattern matches:
' Match a single quote
(?<val>[^']+) Capture in group val matching 1+ times any char except ' to not match an empty string
' Match a single quote
| Or
(?<val>\S+) Capture in group val matching 1+ times any non whitespace char
See a .NET regex demo or a C# demo
For example
string pattern = #"'(?<val>[^']+)'|(?<val>\S+)";
var str = "value0 'value 1/5' 'x ' value2";
foreach (Match m in Regex.Matches(str, pattern))
{
Console.WriteLine(m.Groups["val"].Value);
}
Output
value0
value 1/5
x
value2

Regex escape put "\\" after astreik instead of before asterik

I need help to remove "\" before asterik and put it after asterik as explained in below examples -
string str = "*10.18).xlsx"; //Other Inputs - *.18).xlsx, *.10.18).xlsx, *(23.10.18).xlsx
string reg = "id:" + Regex.Replace(str, #"[][+&|!(){}^""~*?: \\/-]", "\\$&");
Current Output :
reg = id:\\*10.18\\).xlsx
Required Output :
reg = id:*\\10.18\\).xlsx
More example :
Input - id:*(23.10.18).xlsx
Required Output - id:*\\(23.10.18\\).xlsx
You may use an match evaluator with a bit modified regex:
var strs = new List<string> { "*10.18).xlsx", "*(23.10.18).xlsx" };
var block = #"[][+&|!(){}^""~?: \\/-]";
var rx = new Regex($#"(\*)({block}?)|{block}");
foreach (var str in strs) {
string reg = "id:" + rx.Replace(str, m =>
m.Groups[1].Success ? $"*\\{m.Groups[2].Value}" : $"\\{m.Value}");
Console.WriteLine(reg);
}
See the C# demo. Output: id:*\10.18\).xlsx (for *10.18).xlsx) and id:*\(23.10.18\).xlsx (for *(23.10.18).xlsx).
The pattern will match
(\*)([][+&|!(){}^""~?: \\/-]?) - an asterisk captured into Group 1 and any of the chars in the block, 1 or 0 times (optionally)
| - or
[][+&|!(){}^""~?: \\/-] - a character class matching ], [, +, &, |, !, (, ), {, }, ^, ", ~, ?, :, , \, / and -
If Group 1 matched the match is replaced with *\ + Group 2 value, else, the backslash is prepended before the matched char.

Regex to extract text between parenthesis paired with asterisk

This is a slightly different from similar posts in that the parenthesis are paired with an asterisk.
example input:
yada yada (* need to grab this text *) yoda
I thought Jennifers post could be altered for this but my attempts fail.
//Regex regex = new Regex("\\((?<TextInsideBrackets>\\w+)\\)"); //original
Regex regex = new Regex("\\(\\*(?<TextInsideBrackets>\\w+)\\*\\)"); // my attempt
string incomingValue = "Autocycleprestartcase := 20; (* Yayitme ve Konveyoru bosaltabilir *)";
string insideBrackets = null;
Match match = regex.Match(incomingValue);
if (match.Success)
{
insideBrackets = match.Groups["TextInsideBrackets"].Value;
}
Suggestions?
Also, I'd like to remove the enclosed text, with the enclosing parethesis/asterisk pairs, from the input line.
So the output of above would give me
yada yada yoda
and the value
need to grab this text
Thanks
Change it to
Regex regex = new Regex("\\(\\*(?<TextInsideBrackets>[\\w ]+)\\*\\)");
^^^^^^
to allow spaces
Here is a solution to get both the values while re-using the pattern dynamically:
string incomingValue = "Autocycleprestartcase := 20; (* Yayitme ve Konveyoru bosaltabilir *)";
string pattern = #"\(\*\s*(.*?)\s*\*\)";
string insideBrackets = Regex.Match(incomingValue, pattern).Groups[1].Value ?? string.Empty;
Console.WriteLine(insideBrackets); // => Yayitme ve Konveyoru bosaltabilir
Console.WriteLine(Regex.Replace(incomingValue, $#"\s*{pattern}", string.Empty)); // => Autocycleprestartcase := 20;
See the C# demo
Pattern details:
\( - a literal ( (note the single backslash is used as the string is defined via a verbatim string literal, #"")
\* - a literal *
\s* - 0+ whitespaces (trimming the value from the left)
(.*?) - Group 1 capturing zero or more chars other than newline, as few as possible, up to the first occurrence of the subsequent subpatterns
\s* - zero or more whitespaces (trimming from the right)
\* - a literal *
\) - a literal )
To get the second value, you may use the same pattern, but add \s* (zero or more whitespaces) at the beginning, what is done with Regex.Replace(incomingValue, $#"\s*{pattern}", string.Empty).

Regex matching excluding a specific context

I'm trying to search a string for words within single quotes, but only if those single quotes are not within parentheses.
Example string:
something, 'foo', something ('bar')
So for the given example I'd like to match foo, but not bar.
After searching for regex examples I'm able to match within single quotes (see below code snippet), but am not sure how to exclude matches in the context previously described.
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"'([^']*)");
if (name.Success)
{
string matchedName = name.Groups[1].Value;
Console.WriteLine(matchedName);
}
I would recommend using lookahead instead (see it live) using:
(?<!\()'([^']*)'(?!\))
Or with C#:
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
Console.WriteLine(name.Groups[1].Value);
}
The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:
\([^()]*\)|'([^']*)'
See the regex demo
Details:
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
| - or
' - a '
([^']*) - Group 1 capturing 0+ chars other than '
' - a single quote.
In C#, use .Groups[1].Value to get the values you need. See the online demo:
var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, #"\([^()]*\)|'([^']*)'")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:
(?<!\([^()]*)'([^']*)'(?![^()]*\))
See this regex demo.
Details:
(?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
'([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
(?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
Since you'd want to exclude ', the same code as above applies.

RegEx replace query to pick out wiki syntax

I've got a string of HTML that I need to grab the "[Title|http://www.test.com]" pattern out of e.g.
"dafasdfasdf, adfasd. [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad"
I need to replace "[Title|http://www.test.com]" this with "http://www.test.com/'>Title".
What is the best away to approach this?
I was getting close with:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string p18 = #"(\[.*?|.*?\])";
MatchCollection mc18 = Regex.Matches(test, p18, RegexOptions.Singleline | RegexOptions.IgnoreCase);
foreach (Match m in mc18)
{
string value = m.Groups[1].Value;
string fulltag = value.Substring(value.IndexOf("["), value.Length - value.IndexOf("["));
Console.WriteLine("text=" + fulltag);
}
There must be a cleaner way of getting the two values out e.g. the "Title" bit and the url itself.
Any suggestions?
Replace the pattern:
\[([^|]+)\|[^]]*]
with:
$1
A short explanation:
\[ # match the character '['
( # start capture group 1
[^|]+ # match any character except '|' and repeat it one or more times
) # end capture group 1
\| # match the character '|'
[^]]* # match any character except ']' and repeat it zero or more times
] # match the character ']'
A C# demo would look like:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string adjusted = Regex.Replace(test, #"\[([^|]+)\|[^]]*]", "$1");

Categories