I have a regex pattern which extract url and link text to turns custom tag to tag.
When i try my pattern on online checker it find 4 matceh, but when i run my code c# finds only one match.
Regex rgx = new Regex(#"(\[)+(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)( )(.)+(\])");
The Entry is like
[http://facebook.com/ LinkText]
[http://youtube.com/ LinkText]
[http://instagram.com/ LinkText]
[https://stackoverflow.com/users/1131979/cagri-d-kaynar LinkText]
My Code
Regex rgx =
new Regex(#"(\[)+(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)( )(.)+(\])");
foreach (Match match in rgx.Matches(entry))
{
var matchv = match.Value;
/*extract url and Link text from match value*/
var bknz =
String.Format("{1}", cc[0], cc[1]);
entry = rgx.Replace(entry, bknz);
}
Whats wron with my code? Did i missa flag or ste?
I replace the \r\n to <br /> before Regex pattern match and it causes the getting one match.
I do the replacement after checking regex matches. Now it's working well
Related
I want to filter some forum entry content out of a forum page. The forum entries are located between two blockquote elements (as seen in the Regex). I want to filter the content out with a Regex. This is the code I'm using:
string pattern = #"(<blockquote class=\"postcontent restore \">)(.*?)(</blockquote>)";
Regex test = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);
MatchCollection m = test.Matches(downloadString);
var arr = m
.Cast<Match>()
.Select(n => n.Value)
.ToArray();
foreach (string match in arr)
{
Console.WriteLine(match);
}
Console.ReadLine();
I have this sample for example:
<blockquote class="postcontent restore ">
<br>
Some Stuff
<br>
Some Stuff #2
<br>
</blockquote>
The problem I got is that the returned array is empty. Any idea what could be wrong? I guess it's because of the whitespaces but I have no idea how to "ignore" them.
. matches any character except new line.
You can use this to include line breaks:
(<blockquote class=\"postcontent restore \">)(\n*.*)(<\/blockquote>)
Your pattern also did not use escapes for double qoute and forward slashes so here it is:
EDIT: Sorry. # is there, so the final version should be:)
EDIT 2: Full tested source code. It is your responsibility to check for IsMatch or null references
string pattern = #"(<blockquote class=\""postcontent restore \"">)+((\n*)(.*))+(</blockquote>)";
Regex test = new Regex(pattern);
MatchCollection matches = test.Matches(downloadString);
StringBuilder xmlContentBUilder = new StringBuilder();
foreach (Capture capture in matches[0].Groups[2].Captures)
{
xmlContentBUilder.Append(capture);
}
Console.WriteLine(xmlContentBUilder);
I'm new to RegEx and having trouble getting pattern
have request with first line that look like
GET /someFolder/someSubfolder/someFile.fileExtenstion?param1=abc HTTP/1.1
I would like to check that the correct pattren exist
meaning first word GET later some valid URL than HTTP/verison
What I have till now is
string input = line;
Match match = Regex.Match(input, #"GET /([A-Za-z0-9-.+!*'();:#&=+$,/?%#[]])\ HTTP/1.1",
RegexOptions.IgnoreCase);
// check the Match instance.
if (match.Success)
{
string URL = match.Groups[1].Value;
}
But I get No match
What am I missing ?
You can simplify the regex a lot as
^GET.*HTTP\/1\.1$
^ anchors the regex at the start of the string.
.* matches anything
$ anchors the regex at end of string. Ensures that nothing followes the matched string
Regex Example
Old question but it deserve new answer for anyone looking for correctly matching HTTP Start Line and extract values from it.
The (.*) will not match white space, also escaping forward slash not necessary in C# and will lead to not match .
Here is sample code with named capturing group:
var httpRegex = new Regex(#"^(?<method>[a-zA-Z]+)\s(?<url>.+)\sHTTP/(?<major>\d)\.(?<minor>\d+)$");
var match = httpRegex.Match("GET http://www.google.com HTTP/1.1");
if (match.Success)
{
Console.WriteLine(
$"Method: {match.Groups["method"].Value}\r\n" +
$"Url: {match.Groups["url"].Value}\r\n" +
$"httpVersion: HTTP/{match.Groups["major"].Value}.{match.Groups["minor"].Value}"
);
}
Escaping forward slash required in languages like PHP and JavaScript, and here the same code for PHP with escaping https://regex101.com/r/2l7k83/1/
I am trying to write a code in order to get the matches in a list but without the match tags, until now i have built the following code in WP7 application written on C#
public static MatchCollection MatchTags(string content, string string_start, string string_end)
{
MatchCollection matches = Regex.Matches(content, string_start + "(.*?)" + string_end, RegexOptions.IgnoreCase);
return matches;
}
So how to return matches without string_start, string_end (match tags) without use of replace function after the match extraction?
Use lookarounds..
String.Format("(?<={0}).*?(?={1})",string_start,string_end);
Though you can also use groups.i.e in your regex (.*?) would capture the content within Group 1.No need of lookarounds then..
MatchTags(content,start,end).Cast<Match>()
.Select(x=>x.Groups[1].Value);
It works when i get the result with the next code:
string my_string_no_tags = matches[number].Groups[1].Value;
Consider the following code...
MatchCollection matches = Regex.Matches(content, string.Format("(?<={0}).*?(?={1})", string_start, string_end), RegexOptions.IgnoreCase);
return matches;
Good Luck!
How to change absolute url within a paragraph:
<p>http://www.google.com</p>
into html link into paragraph:
<p>http://www.google.com</p>
Thare can be a lot of paragraphs. I want the regex to cut out the generic url value from this: <p>url<p>, and put it into template like this: <p>url</p>
How to do it in the short way ? Can it be done using regex.Replace() method ?
BTW: Regular expression used for absolute urls matching can be like this: ^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$ (taken from msdn)
Try to use this regex:
(?<!\")(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?(?!\")
to avoid matching <a href="http://www.google.com"> like strings(enclosed by").
And a sample code:
var inputString = #"<p>http://www.google.com</p><p>my web link</p>";
var pattern = #"(?<url>(?<!\")(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?(?!\"))";
var result = Regex.Replace(strInput, pattern, "${url}");
explain:
(?<!subexpression) Zero-width negative lookbehind assertion.
(?!subexpression) Zero-width negative lookahead assertion.
(?<name>subexpression) Captures the matched subexpression into a named group.
form your regex: remove first ^ and last $ - it means "match the whole input string from start to end"
string regexPattern = #"(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?";
string input = #"<p>http://www.google.com</p>";
var reg = new Regex(regexPattern, RegexOptions.IgnoreCase);
// $0 - substitution, refers to the text matched by the whole pattern
var output = reg.Replace(input, "$0");
more about substitutions http://msdn.microsoft.com/en-us/library/ewy2t5e0.aspx
I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.