I've got a Regex: [-]?\d+[-+^/\d] and this line of code:
foreach(Match m in Regex.Matches(s, "[-]?\d+[-+*^/\d]*")){
...
}
m.Value = (for example) 2+4*50
Is there any way to get a string array in the form of {"2", "+", "4", "*", "50"}?
First of all, you've got the wrong regular expression. You have a regex that matches the entire string, but you want to split the string on token boundaries, so you want to recognize tokens.
Second, don't attempt to solve the problem of whether - is a unary minus or an operator at lex time. That's a parse problem.
Third, you can use ordinary LINQ operators to turn the match collection into an array of strings.
Put it all together:
string s = "10*20+30-40/50";
var matches =
Regex.Matches(s, #"\d+|[-+*/]")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
This technique only works if your lexical grammar is regular, and many are not. (And even some languages that are technically regular are inconvenient to characterize as a regular expression.) As I noted in a comment: writing a lexer is not hard. Consider just writing a lexer rather than using regular expressions.
You can try split on word boundary
var st = "2+4*50";
var li = Regex.Split(st, #"\b");
foreach (var i in li)
Console.WriteLine(i);
2
+
4
*
50
Related
I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received.
Thanks.
Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.
Regex regex = new Regex("(?<=<<).*?(?=>>)");
foreach (Match match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
LiveDemo in DotNetFiddle
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.
In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/
This is very similar to the question here: How do I extract text that lies between parentheses (round brackets)? which I see this Regex code:
var matches = Regex.Matches("User name [[sales]] and [[anotherthing]]", #"\[\[([^)]*)\]\]");
But that doesn't seem to work with multi-character delimiters? This might not even be the correct way to go, but I am sure I am not the first to try this and I am drawing a blank here - anyone?
Your #"\[\[([^)]*)\]\]" pattern matches two consecutive [[, followed with zero or more characters other than a ) and then followed with two ]]. That means, if you have a ) inside [[...]], there won't be a match.
To deal with multicharacter-delimited substrings, you can use 2 things: either lazy dot matching, or unrolled patterns.
Note: to get multiple matches, use Regex.Matches as I wrote in my other answer.
1. Lazy dot solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}(.*?)]{2}", RegexOptions.Singleline)
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
See the regex demo. The RegexOptions.Singleline modifier is necessary for the . to match newline symbols.
2. Unrolled regex solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}([^]]*(?:](?!])[^]]*)*)]{2}")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
With this one, RegexOptions.Singleline is not necessary, and it is much more efficient and faster.
See regex demo
Use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
Sample code:
var matches = Regex.Matches("User name (sales) and (anotherthing)", #"\(([^)]*)\)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
I need a little help regarding Regular Expressions in C#
I have the following string
"[[Sender.Name]]\r[[Sender.AdditionalInfo]]\r[[Sender.Street]]\r[[Sender.ZipCode]] [[Sender.Location]]\r[[Sender.Country]]\r"
The string could also contain spaces and theoretically any other characters. So I really need do match the [[words]].
What I need is a text array like this
"[[Sender.Name]]",
"[[Sender.AdditionalInfo]]",
"[[Sender.Street]]",
// ... And so on.
I'm pretty sure that this is perfectly doable with:
var stringArray = Regex.Split(line, #"\[\[+\]\]")
I'm just too stupid to find the correct Regex for the Regex.Split() call.
Anyone here that can tell me the correct Regular Expression to use in my case?
As you can tell I'm not that experienced with RegEx :)
Why dont you split according to "\r"?
and you dont need regex for that just use the standard string function
string[] delimiters = {#"\r"};
string[] split = line.Split(delimiters,StringSplitOptions.None);
Do matching if you want to get the [[..]] block.
Regex rgx = new Regex(#"\[\[.*?\]\]");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
The regex you are using (\[\[+\]\]) will capture: literal [s 2 or more, then 2 literal ]s.
A regex solution is capturing all the non-[s inside doubled [ and ]s (and the string inside the brackets should not be empty, I guess?), and cast MatchCollection to a list or array (here is an example with a list):
var str = "[[Sender.Name]]\r[[Sender.AdditionalInfo]]\r[[Sender.Street]]\r[[Sender.ZipCode]] [[Sender.Location]]\r[[Sender.Country]]\r";
var rgx22 = new Regex(#"\[\[[^]]+?\]\]");
var res345 = rgx22.Matches(str).Cast<Match>().ToList();
Output:
I'm using the following LINQ command to extract a list of bracket-delimited parameters from a string using a reluctant regular expression:
var result = Regex.Matches("foo[a][b][cccc]bar", #"(\[.+?])")
.Cast<Match>()
.Select(match => match.ToString())
.ToArray();
This returns the following string array as expected:
- result {string[3]} string[]
[0] "[a]" string
[1] "[b]" string
[2] "[cccc]" string
Is there a way to modify the regular expression itself so that the brackets aren't included in the output? I tried placing the .+ part of the expression inside a named group but it broke the matching. Obviously I could run each result through another regular expression to remove the brackets but I'd like to find out if there's a cleaner/better way to do this.
Yes, you can use look-behind and look-ahead assertion:
(?<=\[)(.+?)(?=])
The code is then:
var result = Regex.Matches("foo[a][b][cccc]bar", #"(?<=\[).+?(?=])")
.Cast<Match>()
.Select(m => m.ToString())
.ToArray();
Please also note that you don't need grouping brackets () in your regex as you're not trying to capture any groups.
I have the following string fromat:
session=11;reserID=1000001
How to get string array of number?
My code:
var value = "session=11;reserID=1000001";
var numbers = Regex.Split(value, #"^\d+");
You probably were on the right track but forgot the character class:
Regex.Split(value, #"[^\d]+");
You can also write it shorter by using \D+ which is equivalent.
However, you'd get an empty element at the start of the returned array, so caution when consuming the result. Sadly, Regex.Split() doesn't have an option that removes empty elements (String.Split does, however). A not very pretty way of resolving that:
Regex.Replace(value, #"[^\d;]", "").Split(';');
based on the assumption that the semicolon is actually the relevant piece where you want to split.
Quick PowerShell test:
PS> 'session=11;reserID=1000001' -replace '[^\d;]+' -split ';'
11
1000001
Another option would be to just skip the element:
Regex.Split(...).Skip(1).ToArray();
Regex
.Matches("session=11;reserID=1000001", #"\d+") //match all digit groupings
.Cast<Match>() //promote IEnumerable to IEnumerable<Match> so we can use Linq
.Select(m => m.Value) //for each Match, select its (string) Value
.ToArray() //convert to array, as per question
.Net has built in feature without using RegEx.Try System.Web.HttpUtility.ParseQueryString, passing the string. You would need to reference the System.Web assembly, but it shouldn't require a web context.
var value = "session=11;reserID=1000001";
NameValueCollection numbers =
System.Web.HttpUtility.ParseQueryString(value.Replace(";","&"));
I will re-use my code from another question:
private void button1_Click(object sender, EventArgs e)
{
string sauce = htm.Text; //htm = textbox
Regex myRegex = new Regex(#"[0-9]+(?:\.[0-9]*)?", RegexOptions.Compiled);
foreach (Match iMatch in myRegex.Matches(sauce))
{
txt.AppendText(Environment.NewLine + iMatch.Value);//txt= textbox
}
}
If you want to play around with regex here is a good site: http://gskinner.com/RegExr/
They also have a desktop app: http://gskinner.com/RegExr/desktop/ - It uses adobe air so install that first.
var numbers = Regex.Split(value, #".*?(.\d+).*?");
or
to return each digit:
var numbers = Regex.Split(value, #".*?(\d).*?");