Using regular expressions to parse occurrences of %SOMETEXT% out of a string - c#

I never use Regular expressions because they seem so complicated though I know that they are dense and powerful. I thought I would give them a shot with your help
How do I use regular expressions to extract all occurences of %sometext% in a string variable and return a string array of matching items?
For example, if the input string is:
set NewVariable=%Variable1%%Variable2%%Variable3%SomeText%Variable4%
The output array would be:
Array[0]=Variable1
Array[1]=Variable2
Array[2]=Variable3
Array[3]=Variable4

The regex should look like this:
%([^%]*)%
The delimiters are on both sides, the capturing group is i between them.
Here is how:
var mc = Regex.Matches(
"quick%brown%%fox%jumps%over%the%lazy%%dog%"
, "%([^%]*)%"
);
foreach (Match m in mc) {
Console.WriteLine(m.Groups[1]);
}
The output of the above looks like this:
brown
fox
over
lazy
dog
Here is a demo on ideone.

var NewVariable = "%Variable1%%Variable2%%Variable3%SomeText%Variable4%";
var Array = Regex.Matches(NewVariable, #"%(.+?)%")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToArray();

Your regular expression is %[^%]+% . Look at the Regex.Matches method.

Related

RegExp- How to get words from entire document that match the expression [duplicate]

I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received.
Thanks.
Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.
Regex regex = new Regex("(?<=<<).*?(?=>>)");
foreach (Match match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
LiveDemo in DotNetFiddle
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.
In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/

Regex c# does not give the same result with https://regex101.com/ [duplicate]

I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received.
Thanks.
Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.
Regex regex = new Regex("(?<=<<).*?(?=>>)");
foreach (Match match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
LiveDemo in DotNetFiddle
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.
In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/

How can I split a regex into exact words?

I need a little help regarding Regular Expressions in C#
I have the following string
"[[Sender.Name]]\r[[Sender.AdditionalInfo]]\r[[Sender.Street]]\r[[Sender.ZipCode]] [[Sender.Location]]\r[[Sender.Country]]\r"
The string could also contain spaces and theoretically any other characters. So I really need do match the [[words]].
What I need is a text array like this
"[[Sender.Name]]",
"[[Sender.AdditionalInfo]]",
"[[Sender.Street]]",
// ... And so on.
I'm pretty sure that this is perfectly doable with:
var stringArray = Regex.Split(line, #"\[\[+\]\]")
I'm just too stupid to find the correct Regex for the Regex.Split() call.
Anyone here that can tell me the correct Regular Expression to use in my case?
As you can tell I'm not that experienced with RegEx :)
Why dont you split according to "\r"?
and you dont need regex for that just use the standard string function
string[] delimiters = {#"\r"};
string[] split = line.Split(delimiters,StringSplitOptions.None);
Do matching if you want to get the [[..]] block.
Regex rgx = new Regex(#"\[\[.*?\]\]");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
The regex you are using (\[\[+\]\]) will capture: literal [s 2 or more, then 2 literal ]s.
A regex solution is capturing all the non-[s inside doubled [ and ]s (and the string inside the brackets should not be empty, I guess?), and cast MatchCollection to a list or array (here is an example with a list):
var str = "[[Sender.Name]]\r[[Sender.AdditionalInfo]]\r[[Sender.Street]]\r[[Sender.ZipCode]] [[Sender.Location]]\r[[Sender.Country]]\r";
var rgx22 = new Regex(#"\[\[[^]]+?\]\]");
var res345 = rgx22.Matches(str).Cast<Match>().ToList();
Output:

Grouping reluctant regular expressions with LINQ

I'm using the following LINQ command to extract a list of bracket-delimited parameters from a string using a reluctant regular expression:
var result = Regex.Matches("foo[a][b][cccc]bar", #"(\[.+?])")
.Cast<Match>()
.Select(match => match.ToString())
.ToArray();
This returns the following string array as expected:
- result {string[3]} string[]
[0] "[a]" string
[1] "[b]" string
[2] "[cccc]" string
Is there a way to modify the regular expression itself so that the brackets aren't included in the output? I tried placing the .+ part of the expression inside a named group but it broke the matching. Obviously I could run each result through another regular expression to remove the brackets but I'd like to find out if there's a cleaner/better way to do this.
Yes, you can use look-behind and look-ahead assertion:
(?<=\[)(.+?)(?=])
The code is then:
var result = Regex.Matches("foo[a][b][cccc]bar", #"(?<=\[).+?(?=])")
.Cast<Match>()
.Select(m => m.ToString())
.ToArray();
Please also note that you don't need grouping brackets () in your regex as you're not trying to capture any groups.

How can I split a string using regex to return a list of values?

How can I take the string foo[]=1&foo[]=5&foo[]=2 and return a collection with the values 1,5,2 in that order. I am looking for an answer using regex in C#. Thanks
In C# you can use capturing groups
private void RegexTest()
{
String input = "foo[]=1&foo[]=5&foo[]=2";
String pattern = #"foo\[\]=(\d+)";
Regex regex = new Regex(pattern);
foreach (Match match in regex.Matches(input))
{
Console.Out.WriteLine(match.Groups[1]);
}
}
I don't know C#, but...
In java:
String[] nums = String.split(yourString, "&?foo[]");
The second argument in the String.split() method is a regex telling the method where to split the String.
I'd use this particular pattern:
string re = #"foo\[\]=(?<value>\d+)";
So something like (not tested):
Regex reValues = new Regex(re,RegexOptions.Compiled);
List<integer> values = new List<integer>();
foreach (Match m in reValues.Matches(...putInputStringHere...)
{
values.Add((int) m.Groups("value").Value);
}
Use the Regex.Split() method with an appropriate regex. This will split on parts of the string that match the regular expression and return the results as a string[].
Assuming you want all the values in your querystring without checking if they're numeric, (and without just matching on names like foo[]) you could use this: "&?[^&=]+="
string[] values = Regex.Split(“foo[]=1&foo[]=5&foo[]=2”, "&?[^&=]+=");
Incidentally, if you're playing with regular expressions the site http://gskinner.com/RegExr/ is fantastic (I'm just a fan).
Assuming you're dealing with numbers this pattern should match:
/=(\d+)&?/
This should do:
using System.Text.RegularExpressions;
Regex.Replace(s, !#"^[0-9]*$”, "");
Where s is your String where you want the numbers to be extracted.
Just make sure to escape the ampersand like so:
/=(\d+)\&/
Here's an alternative solution using the built-in string.Split function:
string x = "foo[]=1&foo[]=5&foo[]=2";
string[] separator = new string[2] { "foo[]=", "&" };
string[] vals = x.Split(separator, StringSplitOptions.RemoveEmptyEntries);

Categories