REGEX Expression C#. Split string by whitespace outside the quotation marks

REGEX Expression C#. Split string by whitespace outside the quotation marks - c#

I'm trying to define a regular expression for the Split function in order to obtain all substring split by a whitespace omitting those whitespaces that are into single quotation marks.
Example:
key1:value1 key2:'value2 value3'
i Need these separated values:
key1:value1
key2:'value2 value3'
I'm tried to perform this in different ways:
Regex.Split(q, #"(\s)^('\s')").ToList();
Regex.Split(q, #"(\s)(^'.\s.')").ToList();
Regex.Split(q, #"(?=.*\s)").ToList();
What i am wrong with this code?
Could you please help me with this?
Thanks in advance

A working example:
(\w+):(?:(\w+)|'([^']+)')
(\w+) # key: 1 or more word chars (captured)
: # literal
(?: # non-captured grouped alternatives
(\w+) # value: 1 or more word chars (captured)
| # or
'([^']+)' # 1 or more not "'" enclosed by "'" (captured)
) # end of group
Demo
Your try:
(\s)^('\s')
^ means beginning of line, \s is a white-space characters. If you want to use the not-operator, this only works in a character class [^\s] -> 1 character not a white-space.

var st = "key1:value1 key2:'value2 value3'";
var result = Regex.Matches(st, #"\w+:\w+|\w+:\'[^']+\'");
foreach (var item in result)
Console.WriteLine(item);
The result should be:
key1:value1
key2:'value2 value3'

Try following :
static void Main(string[] args)
{
string input = "key1:value1 key2:'value2 value3'";
string pattern = #"\s*(?'key'[^:]+):((?'value'[^'][^\s]+)|'(?'value'[^']+))";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("Key : '{0}', Value : '{1}'", match.Groups["key"].Value, match.Groups["value"].Value);
}
Console.ReadLine();
}

Related

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".

You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.

In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

Regex - split by "_" and exclude file extension

I need to split the following string AAA_BBB_CCC.extension by "_" and exclude from the results any file extension.
Where A, B and C can be any character or space. I wish to get AAA, BBB and CCC.
I know that \.(?:.(?!\.))+$ will match .extension but I could not combine it with matching "_" for splitting.

Use the Path.GetFileNameWithoutExtension function to strip the extension from the file name.
Then use String.Split to get an array with three items:
var fileName = Path.GetFileNameWithoutExtension(fullName);
var parts = fileName.Split('_');
var partAAA = parts[0];
var partBBB = parts[1];
var partCCC = parts[2];
If the parts are always the same fixed number of characters long, you can as well extract them using the Substring function. No need to resort to regex here.

Another option is to make use of the .NET Group.Captures property and capture any char except an _ in a named capture group, which you can extract from the match using a named group.
^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$
Explanation
^ Start of string
(?'val'[^_]+) Named group val, match 1+ chars other than _ using a negated character class
(?: Non caputure group
_(?'val'[^_]+) Match an _ and capture again 1+ chars other than _ in same named group val
)+ Close the non capture group and repeat 1+ times for at least 1 occurrence with _
\.\w+ Match a . and 1+ word chars
$ End of string
Regex demo
string pattern = #"^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$";
string input = #"AAA_BBB_CCC.extension";
Match m = Regex.Match(input, pattern);
foreach (Capture capture in m.Groups["val"].Captures) {
Console.WriteLine(capture.Value);
}
Output
AAA
BBB
CCC

If you wanted to use a regex based approach here, you could try doing a find all on the following regex pattern:
[^_]+(?=.*\.\w+$)
This pattern will match every term in between underscore, except for the portion after the extension, which will be excluded by the lookahead.
Regex rx = new Regex(#"[^_]+(?=.*\.\w+$)");
string text = "AAA_BBB_CCC.extension";
MatchCollection matches = rx.Matches(text);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[0].Value);
}
This prints:
AAA
BBB
CCC

match.regex syntax with digit character and a #

i have a string with this format :
111111#1
the number of digit character is 5 or 6 and after that i set a '#' and also set a digit charcter.
i use Regex.IsMatch like this :
if (Regex.IsMatch(string, #"^d{6}#\d{1}"))
{...}
but it cant handle my string
what is my mistake?

You're missing the backslash on the first d so it's not matching against digits:
Regex.IsMatch("111111#1", #"^\d{6}#\d{1}")

This single line Regex will capture two groups: the leading five to six digits and the '#' followed by a single digit:
(\d{5,6})(#\d{1})
Example:
string pattern = #"(\d{5,6})(#\d{1})";
string input = "111111#1";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
var firstGroupValue = match.Groups[1]; // "111111"
var secondGroupValue = match.Groups[2]; // "#1"
}

How to grab specific elements out of a string

I need to be able to grab specific elements out of a string that start and end with curly brackets. If I had a string:
"asjfaieprnv{1}oiuwehern{0}oaiwefn"
How could I grab just the 1 followed by the 0.

Regex is very useful for this.
What you want to match is:
\{ # a curly bracket
# - we need to escape this with \ as it is a special character in regex
[^}] # then anything that is not a curly bracket
# - this is a 'negated character class'
+ # (at least one time)
\} # then a closing curly bracket
# - this also needs to be escaped as it is special
We can collapse this to one line:
\{[^}]+\}
Next, you can capture and extract the inner contents by surrounding the part you want to extract with parentheses to form a group:
\{([^}]+)\}
In C# you'd do:
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
}
Group 0 is the whole match (in this case including the { and }), group 1 the first parenthesized part, and so on.
A full example:
var input = "asjfaieprnv{1}oiuwehern{0}oaiwef";
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
Console.WriteLine(groupContents);
}
Outputs:
1
0

Use the Indexof method:
int openBracePos = yourstring.Indexof ("{");
int closeBracePos = yourstring.Indexof ("}");
string stringIWant = yourstring.Substring(openBracePos, yourstring.Len() - closeBracePos + 1);
That will get your first occurrence. You need to slice your string so that the first occurrence is no longer there, then repeat the above procedure to find your 2nd occurrence:
yourstring = yourstring.Substring(closeBracePos + 1);
Note: You MAY need to escape the curly braces: "{" - not sure about this; have never dealt with them in C#

This looks like a job for regular expressions
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string str = "asjfaieprnv{1}oiuwe{}hern{0}oaiwefn";
Regex regex = new Regex(#"\{(.*?)\}");
foreach( Match match in regex.Matches(str))
{
Console.WriteLine(match.Groups[1].Value);
}
}
}
}

RegEx replace query to pick out wiki syntax

I've got a string of HTML that I need to grab the "[Title|http://www.test.com]" pattern out of e.g.
"dafasdfasdf, adfasd. [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad"
I need to replace "[Title|http://www.test.com]" this with "http://www.test.com/'>Title".
What is the best away to approach this?
I was getting close with:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string p18 = #"(\[.*?|.*?\])";
MatchCollection mc18 = Regex.Matches(test, p18, RegexOptions.Singleline | RegexOptions.IgnoreCase);
foreach (Match m in mc18)
{
string value = m.Groups[1].Value;
string fulltag = value.Substring(value.IndexOf("["), value.Length - value.IndexOf("["));
Console.WriteLine("text=" + fulltag);
}
There must be a cleaner way of getting the two values out e.g. the "Title" bit and the url itself.
Any suggestions?

Replace the pattern:
\[([^|]+)\|[^]]*]
with:
$1
A short explanation:
\[ # match the character '['
( # start capture group 1
[^|]+ # match any character except '|' and repeat it one or more times
) # end capture group 1
\| # match the character '|'
[^]]* # match any character except ']' and repeat it zero or more times
] # match the character ']'
A C# demo would look like:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string adjusted = Regex.Replace(test, #"\[([^|]+)\|[^]]*]", "$1");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

REGEX Expression C#. Split string by whitespace outside the quotation marks - c#

var st = "key1:value1 key2:'value2 value3'"; var result = Regex.Matches(st, #"\w+:\w+|\w+:\'[^']+\'"); foreach (var item in result) Console.WriteLine(item); The result should be: key1:value1 key2:'value2 value3'

Related

Building a regular expression in C#

Regex - split by "_" and exclude file extension

match.regex syntax with digit character and a #

How to grab specific elements out of a string

RegEx replace query to pick out wiki syntax

Categories

Resources