C# Regex - endless matching - c#

I wanted to use Regex to get data from repeated XML tag:
<A>cat</A><A>dog</A>
So I've created Regex:
<A>(.*?)</A>
and code:
string text = "<A>asdasd</A><A>dsfsd</A>";
string regex = #"<A>(.*?)</A>";
Regex rgx = new Regex(regex);
Match match = rgx.Match(text);
while(match.Success)
{
i++;
Console.WriteLine(match.Groups[1].Value);
match.NextMatch();
}
But when I start my code, my loop is endless and never stop.
Can someone help me find what's wrong with code? Or find another solution?
(I don't want to deserialize XML).

This:
match.NextMatch();
just returns the next match, it doesn't change the state of match itself. You need to update the variable:
match = match.NextMatch();

While the failure reason is that you did not assign the next match to match, you can actually use Regex.Matches to get all the substrings you need in one go without the need for an explicit loop:
var results = rgx.Matches(text)
.Cast<Match>()
.Select(m => m.Groups[1].Value);
Console.WriteLine(string.Join("\n", results));
See the C# online demo:
var text = "<A>asdasd</A><A>dsfsd</A>";
var regex = #"<A>(.*?)</A>";
var rgx = new Regex(regex);
var results = rgx.Matches(text)
.Cast<Match>()
.Select(m => m.Groups[1].Value);
Console.WriteLine(string.Join("\n", results));
// asdasd
// dsfsd

Just use Regex.Matches to load them all into a collection, and proceed to iterate it.
string text = "<A>asdasd</A><A>dsfsd</A>";
string regex = #"<A>(.*?)</A>";
foreach (Match m in Regex.Matches(text, regex))
{
Console.WriteLine(m.Groups[1].Value);
}
Or single line using Linq:
Regex.Matches(text, regex).Cast<Match>().ToList().ForEach(m => Console.WriteLine(m.Groups[1].Value));

Related

Regular expression to extract based on capital letters

Hi please can someone help with a C# regex to split into just two words as follows:
"SetTable" ->> ["Set", "Table"]
"GetForeignKey" ->> ["Get", "ForeignKey"] //No split on Key!
This can be solved in different ways; one method is the following
string source = "GetForeignKey";
var result = Regex.Matches(source, "[A-Z]").OfType<Match>().Select(x => x.Index).ToArray();
string a, b;
if (result.Length > 1)
{
a = source.Substring(0, result[1]);
b = source.Substring(result[1]);
}
Try the regex below
(?![A-Z][a-z]+Key)[A-Z][a-z]+|[A-Z][a-z]+Key
c# code
var matches = Regex.Matches(input, #"(?![A-Z][a-z]+Key)[A-Z][a-z]+|[A-Z][a-z]+Key");
foreach (Match match in matches)
match.Groups[0].Value.Dump();
for Splitting
matches.OfType<Match>().Select(x => x.Value).ToArray().Dump();
Fiddle

Regex match and replace operators in math operation

Given an input string
12/3
12*3/12
(12*54)/(3/4)
I need to find and replace each operator with a string that contains the operator
some12text/some3text
some12text*some2text/some12text
(some12text*some54text)/(some3text/some4text)
practical application:
From a backend (c#), i have the following string
34*157
which i need to translate to:
document.getElementById("34").value*document.getElementById("157").value
and returned to the screen which can be run in an eval() function.
So far I have
var pattern = #"\d+";
var input = "12/3;
Regex r = new Regex(pattern);
var matches = r.Matches(input);
foreach (Match match in matches)
{
// im at a loss what to match and replace here
}
Caution: i cannot do a blanket input.Replace() in the foreach loop, as it may incorrectly replace (12/123) - it should only match the first 12 to replace
Caution2: I can use string.Remove and string.Insert, but that mutates the string after the first match, so it throws off the calculation of the next match
Any pointers appreciated
Here you go
string pattern = #"\d+"; //machtes 1-n consecutive digits
var input = "(12*54)/(3/4)";
string result = Regex.Replace(input, pattern, "some$0Text");
$0 is the character group matching the pattern \d+. You can also write
string result = Regex.Replace(input, pattern, m => "some"+ m.Groups[0]+ "Text");
Fiddle: https://dotnetfiddle.net/JUknx2

C# regex. Everything inside curly brackets{} and mod(%) charaters

I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))

How do I extract a string of text that lies between *>...* using .NET C# regex or anything else?

I have a string like this.
*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*
I wanna extract between *> and * characters.
I tried this pattern which is wrong here below:
string pattern = "\\*\\>..\\*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(seriGelen);
if (matches.Count > 0)
{
foreach (Match match in matches)
MessageBox.Show("{0}", match.Value);
}
You can use simple regex:
(?<=\*>).*?(?=\*)
Sample code:
string text = "*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
string[] values = Regex.Matches(text, #"(?<=\*>).*?(?=\*)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Looks like there are can be very different values (UPD: there was an integer positive value). So, let me to not check numbers format. Also I will consider that *> and >, and also * are just different variants of delimiters.
I'd like to suggest the following solution.
(?<=[>\*])([^>\*]+?)(?=[>\*]+)
(http://regex101.com/r/mM7nK1)
Not sure it is ideal. Will only works if your input starts and ends with delimiters, but will allow to you to use matches instead groups, as your code does.
========
But you know, why wouldn't you use String.Split function?
var toprint = seriGelen.Split(new [] {'>', '*'}, StringSplitOptions.RemoveEmptyEntries);
Is there an error at the beginning of the string? Missing an asterisk after first number? >-0.0532>-0.0534*>
If not try this.
>([-+]?[0-9]*\.?[0-9]+)\*
C# Code
string strRegex = #">([-+]?[0-9]*\.?[0-9]+)\*";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Singleline);
string strTargetString = #">-0.0532>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Regex all the characters in between each /. / could be zero to many

Im having a bit of trouble with this regex. I have a line that could look like this
PREF-FA/WV/WB/LO...could continue
or
PREF-FA
and I need to grab all the ratings(FA/WV/WB etc) for each line, and put them in their own class. Is this something regex could handle? or should I just split the string up?
I have a class called rating, and a List which length determines how many ratings are in that above line.
Thanks
How about
Regex
.Matches("PREF-FA/WV/WB/LO" , #".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*")
.Cast<Match>()
.SelectMany(m => m.Groups["rating"].Captures.Cast<Capture>().Select(c => c.Value))
gives an IEnumerable<string> with values "FA", "WV", "WB", "LO"
To go back to .Net2.0 world:
MatchCollection matches=Regex
.Matches("PREF-FA/WV/WB/LO",#".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*");
List<string> ratings=new List<string>();
foreach(Match m in matches)
{
CaptureCollection captures=m.Groups["rating"].Captures;
foreach(Capture c in captures)
{
ratings.Add(c.Value);
}
}
You could try:
((?:\w{2}/)*\w{2})$
?: to avoid capturing the 2-letter words and the slash.
Test it on Rubular if you want. The regex works with many regex engines.
If the line always begins with PREF-, you could use:
^PREF-((?:\w{2}/)*\w{2})$
You can use this regex (?<=PREF-).*$
resultString = Regex.Match(subjectString, "(?<=PREF-).*$",
RegexOptions.Singleline | RegexOptions.Multiline).Value;
It uses positive look behind to match PREF- and then mathces the succeeding string.
If you want to loop through all the mathces
Regex ItemRegex = new Regex(#"(?<=PREF-).*$", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(subjectString))
{
Console.WriteLine(ItemMatch);
}

Categories