Im having a bit of trouble with this regex. I have a line that could look like this
PREF-FA/WV/WB/LO...could continue
or
PREF-FA
and I need to grab all the ratings(FA/WV/WB etc) for each line, and put them in their own class. Is this something regex could handle? or should I just split the string up?
I have a class called rating, and a List which length determines how many ratings are in that above line.
Thanks
How about
Regex
.Matches("PREF-FA/WV/WB/LO" , #".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*")
.Cast<Match>()
.SelectMany(m => m.Groups["rating"].Captures.Cast<Capture>().Select(c => c.Value))
gives an IEnumerable<string> with values "FA", "WV", "WB", "LO"
To go back to .Net2.0 world:
MatchCollection matches=Regex
.Matches("PREF-FA/WV/WB/LO",#".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*");
List<string> ratings=new List<string>();
foreach(Match m in matches)
{
CaptureCollection captures=m.Groups["rating"].Captures;
foreach(Capture c in captures)
{
ratings.Add(c.Value);
}
}
You could try:
((?:\w{2}/)*\w{2})$
?: to avoid capturing the 2-letter words and the slash.
Test it on Rubular if you want. The regex works with many regex engines.
If the line always begins with PREF-, you could use:
^PREF-((?:\w{2}/)*\w{2})$
You can use this regex (?<=PREF-).*$
resultString = Regex.Match(subjectString, "(?<=PREF-).*$",
RegexOptions.Singleline | RegexOptions.Multiline).Value;
It uses positive look behind to match PREF- and then mathces the succeeding string.
If you want to loop through all the mathces
Regex ItemRegex = new Regex(#"(?<=PREF-).*$", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(subjectString))
{
Console.WriteLine(ItemMatch);
}
Related
I'm trying to find all instances of the substring EnemyType('XXXX') where XXXX is an arbitrary string and the instasnce of EnemyType('XXXX') can appear multiple times.
Right now I'm using a consortium of index of/substring functions in C# but would like to know if there's a cleaner way of doing it?
Use regex. Example:
using System.Text.RegularExpressions;
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('\d{4}'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Value);
}
It will print:
EnemyType('1234')
EnemyType('5678')
The pattern to match is #"EnemyType\('\d{4}'\)", where \d{4} means 4 numeric characters (0-9). The parentheses are escaped with backslash.
Edit: Since you only want the string inside quotes, not the whole string, you can use named groups instead.
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('(?<id>[^']+)'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Groups["id"].Value);
}
Now it prints:
1234
5678
Regex is a really nice tool for parsing strings. If you often parse strings, regex can make life so much easier.
I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))
I have a string like this.
*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*
I wanna extract between *> and * characters.
I tried this pattern which is wrong here below:
string pattern = "\\*\\>..\\*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(seriGelen);
if (matches.Count > 0)
{
foreach (Match match in matches)
MessageBox.Show("{0}", match.Value);
}
You can use simple regex:
(?<=\*>).*?(?=\*)
Sample code:
string text = "*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
string[] values = Regex.Matches(text, #"(?<=\*>).*?(?=\*)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Looks like there are can be very different values (UPD: there was an integer positive value). So, let me to not check numbers format. Also I will consider that *> and >, and also * are just different variants of delimiters.
I'd like to suggest the following solution.
(?<=[>\*])([^>\*]+?)(?=[>\*]+)
(http://regex101.com/r/mM7nK1)
Not sure it is ideal. Will only works if your input starts and ends with delimiters, but will allow to you to use matches instead groups, as your code does.
========
But you know, why wouldn't you use String.Split function?
var toprint = seriGelen.Split(new [] {'>', '*'}, StringSplitOptions.RemoveEmptyEntries);
Is there an error at the beginning of the string? Missing an asterisk after first number? >-0.0532>-0.0534*>
If not try this.
>([-+]?[0-9]*\.?[0-9]+)\*
C# Code
string strRegex = #">([-+]?[0-9]*\.?[0-9]+)\*";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Singleline);
string strTargetString = #">-0.0532>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}
Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.
Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.
In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.
I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.
I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.