Regex all the characters in between each /. / could be zero to many - c#

Im having a bit of trouble with this regex. I have a line that could look like this
PREF-FA/WV/WB/LO...could continue
or
PREF-FA
and I need to grab all the ratings(FA/WV/WB etc) for each line, and put them in their own class. Is this something regex could handle? or should I just split the string up?
I have a class called rating, and a List which length determines how many ratings are in that above line.
Thanks

How about
Regex
.Matches("PREF-FA/WV/WB/LO" , #".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*")
.Cast<Match>()
.SelectMany(m => m.Groups["rating"].Captures.Cast<Capture>().Select(c => c.Value))
gives an IEnumerable<string> with values "FA", "WV", "WB", "LO"
To go back to .Net2.0 world:
MatchCollection matches=Regex
.Matches("PREF-FA/WV/WB/LO",#".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*");
List<string> ratings=new List<string>();
foreach(Match m in matches)
{
CaptureCollection captures=m.Groups["rating"].Captures;
foreach(Capture c in captures)
{
ratings.Add(c.Value);
}
}

You could try:
((?:\w{2}/)*\w{2})$
?: to avoid capturing the 2-letter words and the slash.
Test it on Rubular if you want. The regex works with many regex engines.
If the line always begins with PREF-, you could use:
^PREF-((?:\w{2}/)*\w{2})$

You can use this regex (?<=PREF-).*$
resultString = Regex.Match(subjectString, "(?<=PREF-).*$",
RegexOptions.Singleline | RegexOptions.Multiline).Value;
It uses positive look behind to match PREF- and then mathces the succeeding string.
If you want to loop through all the mathces
Regex ItemRegex = new Regex(#"(?<=PREF-).*$", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(subjectString))
{
Console.WriteLine(ItemMatch);
}

Related

Looking for patterns in a string how to?

I'm trying to find all instances of the substring EnemyType('XXXX') where XXXX is an arbitrary string and the instasnce of EnemyType('XXXX') can appear multiple times.
Right now I'm using a consortium of index of/substring functions in C# but would like to know if there's a cleaner way of doing it?
Use regex. Example:
using System.Text.RegularExpressions;
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('\d{4}'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Value);
}
It will print:
EnemyType('1234')
EnemyType('5678')
The pattern to match is #"EnemyType\('\d{4}'\)", where \d{4} means 4 numeric characters (0-9). The parentheses are escaped with backslash.
Edit: Since you only want the string inside quotes, not the whole string, you can use named groups instead.
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('(?<id>[^']+)'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Groups["id"].Value);
}
Now it prints:
1234
5678
Regex is a really nice tool for parsing strings. If you often parse strings, regex can make life so much easier.

C# regex. Everything inside curly brackets{} and mod(%) charaters

I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))

How do I extract a string of text that lies between *>...* using .NET C# regex or anything else?

I have a string like this.
*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*
I wanna extract between *> and * characters.
I tried this pattern which is wrong here below:
string pattern = "\\*\\>..\\*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(seriGelen);
if (matches.Count > 0)
{
foreach (Match match in matches)
MessageBox.Show("{0}", match.Value);
}
You can use simple regex:
(?<=\*>).*?(?=\*)
Sample code:
string text = "*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
string[] values = Regex.Matches(text, #"(?<=\*>).*?(?=\*)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Looks like there are can be very different values (UPD: there was an integer positive value). So, let me to not check numbers format. Also I will consider that *> and >, and also * are just different variants of delimiters.
I'd like to suggest the following solution.
(?<=[>\*])([^>\*]+?)(?=[>\*]+)
(http://regex101.com/r/mM7nK1)
Not sure it is ideal. Will only works if your input starts and ends with delimiters, but will allow to you to use matches instead groups, as your code does.
========
But you know, why wouldn't you use String.Split function?
var toprint = seriGelen.Split(new [] {'>', '*'}, StringSplitOptions.RemoveEmptyEntries);
Is there an error at the beginning of the string? Missing an asterisk after first number? >-0.0532>-0.0534*>
If not try this.
>([-+]?[0-9]*\.?[0-9]+)\*
C# Code
string strRegex = #">([-+]?[0-9]*\.?[0-9]+)\*";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Singleline);
string strTargetString = #">-0.0532>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Dot word pattern matching

I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}
Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.
Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.
In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.
I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.

C# - Regex Match whole words

I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.

Categories