I want to filter some forum entry content out of a forum page. The forum entries are located between two blockquote elements (as seen in the Regex). I want to filter the content out with a Regex. This is the code I'm using:
string pattern = #"(<blockquote class=\"postcontent restore \">)(.*?)(</blockquote>)";
Regex test = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);
MatchCollection m = test.Matches(downloadString);
var arr = m
.Cast<Match>()
.Select(n => n.Value)
.ToArray();
foreach (string match in arr)
{
Console.WriteLine(match);
}
Console.ReadLine();
I have this sample for example:
<blockquote class="postcontent restore ">
<br>
Some Stuff
<br>
Some Stuff #2
<br>
</blockquote>
The problem I got is that the returned array is empty. Any idea what could be wrong? I guess it's because of the whitespaces but I have no idea how to "ignore" them.
. matches any character except new line.
You can use this to include line breaks:
(<blockquote class=\"postcontent restore \">)(\n*.*)(<\/blockquote>)
Your pattern also did not use escapes for double qoute and forward slashes so here it is:
EDIT: Sorry. # is there, so the final version should be:)
EDIT 2: Full tested source code. It is your responsibility to check for IsMatch or null references
string pattern = #"(<blockquote class=\""postcontent restore \"">)+((\n*)(.*))+(</blockquote>)";
Regex test = new Regex(pattern);
MatchCollection matches = test.Matches(downloadString);
StringBuilder xmlContentBUilder = new StringBuilder();
foreach (Capture capture in matches[0].Groups[2].Captures)
{
xmlContentBUilder.Append(capture);
}
Console.WriteLine(xmlContentBUilder);
Related
I have a regex pattern which extract url and link text to turns custom tag to tag.
When i try my pattern on online checker it find 4 matceh, but when i run my code c# finds only one match.
Regex rgx = new Regex(#"(\[)+(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)( )(.)+(\])");
The Entry is like
[http://facebook.com/ LinkText]
[http://youtube.com/ LinkText]
[http://instagram.com/ LinkText]
[https://stackoverflow.com/users/1131979/cagri-d-kaynar LinkText]
My Code
Regex rgx =
new Regex(#"(\[)+(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)( )(.)+(\])");
foreach (Match match in rgx.Matches(entry))
{
var matchv = match.Value;
/*extract url and Link text from match value*/
var bknz =
String.Format("{1}", cc[0], cc[1]);
entry = rgx.Replace(entry, bknz);
}
Whats wron with my code? Did i missa flag or ste?
I replace the \r\n to <br /> before Regex pattern match and it causes the getting one match.
I do the replacement after checking regex matches. Now it's working well
I wanted to use Regex to get data from repeated XML tag:
<A>cat</A><A>dog</A>
So I've created Regex:
<A>(.*?)</A>
and code:
string text = "<A>asdasd</A><A>dsfsd</A>";
string regex = #"<A>(.*?)</A>";
Regex rgx = new Regex(regex);
Match match = rgx.Match(text);
while(match.Success)
{
i++;
Console.WriteLine(match.Groups[1].Value);
match.NextMatch();
}
But when I start my code, my loop is endless and never stop.
Can someone help me find what's wrong with code? Or find another solution?
(I don't want to deserialize XML).
This:
match.NextMatch();
just returns the next match, it doesn't change the state of match itself. You need to update the variable:
match = match.NextMatch();
While the failure reason is that you did not assign the next match to match, you can actually use Regex.Matches to get all the substrings you need in one go without the need for an explicit loop:
var results = rgx.Matches(text)
.Cast<Match>()
.Select(m => m.Groups[1].Value);
Console.WriteLine(string.Join("\n", results));
See the C# online demo:
var text = "<A>asdasd</A><A>dsfsd</A>";
var regex = #"<A>(.*?)</A>";
var rgx = new Regex(regex);
var results = rgx.Matches(text)
.Cast<Match>()
.Select(m => m.Groups[1].Value);
Console.WriteLine(string.Join("\n", results));
// asdasd
// dsfsd
Just use Regex.Matches to load them all into a collection, and proceed to iterate it.
string text = "<A>asdasd</A><A>dsfsd</A>";
string regex = #"<A>(.*?)</A>";
foreach (Match m in Regex.Matches(text, regex))
{
Console.WriteLine(m.Groups[1].Value);
}
Or single line using Linq:
Regex.Matches(text, regex).Cast<Match>().ToList().ForEach(m => Console.WriteLine(m.Groups[1].Value));
I want or get the name of mp3
I'm currently using this code
string str = "onClick=\"playVideo('upload/honour-3.mp3',this)\"/> onClick=\"playVideo('upload/honor is my honor .mp3',this)\"/> onClick=\"playVideo('upload/honour-6.mp3',this)\"/> ";
string Pattern = #"playVideo\(\'upload\/(?<mp3>\S*).mp3\'\,this\)";
if (Regex.IsMatch(str, Pattern))
{
MatchCollection Matches = Regex.Matches(str, Pattern);
foreach (Match match in Matches)
{
string fn = match.Groups["mp3"].Value;
Debug.Log(match.Groups["mp3"].Value);
}
}
But \ S * matches only like
honour-3
honour-6
i can't get "honor is my honor "
i try the"\S*\s*",but it not work
I have a lot of how many blank string uncertain
How do I use Regex to get mp3's name?
If you dont have to match "playVideo" and "upload", Your regex is unnecessarily complicated. This one produces the expected results:
#"[\w\s-]+\.mp3"
Results:
"honour-3.mp3",
"honor is my honor .mp3",
"honour-6.mp3"
If you don't want .mp3 at the end of the matches, you can change the regex to #"([\w\s-]+)\.mp3" and select the second group (the first one is the whole match).
Regex.Matches(str, #"([\w\s-]+)\.mp3").Cast<Match>().Select(m => m.Groups[1].Value).ToArray();
Results:
"honour-3",
"honor is my honor ",
"honour-6"
I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))
Im having a bit of trouble with this regex. I have a line that could look like this
PREF-FA/WV/WB/LO...could continue
or
PREF-FA
and I need to grab all the ratings(FA/WV/WB etc) for each line, and put them in their own class. Is this something regex could handle? or should I just split the string up?
I have a class called rating, and a List which length determines how many ratings are in that above line.
Thanks
How about
Regex
.Matches("PREF-FA/WV/WB/LO" , #".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*")
.Cast<Match>()
.SelectMany(m => m.Groups["rating"].Captures.Cast<Capture>().Select(c => c.Value))
gives an IEnumerable<string> with values "FA", "WV", "WB", "LO"
To go back to .Net2.0 world:
MatchCollection matches=Regex
.Matches("PREF-FA/WV/WB/LO",#".+?-(?<rating>.{2})(?:/(?<rating>.{2}))*");
List<string> ratings=new List<string>();
foreach(Match m in matches)
{
CaptureCollection captures=m.Groups["rating"].Captures;
foreach(Capture c in captures)
{
ratings.Add(c.Value);
}
}
You could try:
((?:\w{2}/)*\w{2})$
?: to avoid capturing the 2-letter words and the slash.
Test it on Rubular if you want. The regex works with many regex engines.
If the line always begins with PREF-, you could use:
^PREF-((?:\w{2}/)*\w{2})$
You can use this regex (?<=PREF-).*$
resultString = Regex.Match(subjectString, "(?<=PREF-).*$",
RegexOptions.Singleline | RegexOptions.Multiline).Value;
It uses positive look behind to match PREF- and then mathces the succeeding string.
If you want to loop through all the mathces
Regex ItemRegex = new Regex(#"(?<=PREF-).*$", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(subjectString))
{
Console.WriteLine(ItemMatch);
}