I'm trying to write a parser that will create links found in posted text that are formatted like so:
[Site Description](http://www.stackoverflow.com)
to be rendered as a standard HTML link like this:
Site Description
So far what I have is the expression listed below and will work on the example above, but if will not work if the URL has anything after the ".com". Obviously there is no single regex expression that will find every URL but would like to be able to match as many as I can.
(\[)([A-Za-z0-9 -_]*)(\])(\()((http|https|ftp)\://[A-Za-z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?)(\))
Any help would be greatly appreciated. Thanks.
Darn. It seems #Jerry and #MikeH beat me to it. My answer is best, however, as the link tags are all uppercase ;)
Find what: \[([^]]+)\]\(([^)]+)\)
Replace with: $1
http://regex101.com/r/cY7lF0
Well, you could try negated classes so you don't have to worry about the parsing of the url itself?
\[([^]]+)\]\(([^)]+)\)
And replace with:
$1
regex101 demo
Or maybe use only the beginning parts to identify a url?
\[([^]]+)\]\(((?:https?|ftp)://[^)]+)\)
The replace is the same.
Related
i want to change some urls to nofollow and i also want, some urls dofollow
i try to do it with this Regex :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!blogs.cc)[^"]+)"([^>]*)>
i can support one url to dofollow (in this ex:"blogs.cc")
if i want to dofollow more of one, what do i do?
i try with :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)(((?!blogs.cc)[^"]+)||((?!wikipedia.org)[^"]+))"([^>]*)>
but i didn't get a correct answer
what's solution?
i resolved it and put my solution here for everybody who has same question.
just do it
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!(?:blogs.cc|wikipedia.org|moreUrls.com))[^"]+))"([^>]*)>
C# Sample Code:
Regex.Replace(str, "(<a\\s*(?!.*\brel=)[^>]*)(href=\"https?://)((?!(?:blogs.cc|wikipedia.org))[^\"]+)\"([^>]*)>", "<a $2$3\" $4 rel=\"nofollow\">")
i hope it would be useful
I have a dynamic web app built using DotNetNuke that uses the following url format:
/SeoDummy.aspx?template={VAR1}&keywords={VAR2}
My user friendly url format is like this:
http://domain.com/.{VAR1}/{VAR2}
I am really terrible with REGEX and need to somehow detect when the user friendly url is requested and rewrite it with the dynamic web app url. I have tried the following, but It is not catching it on the site, it is just 404'ing:
.*/^([^/]+)/([^/]+)/?$
I am sure you that know regex will find my attempt silly, but regex is my kryptonite!
Thanks for any help that can be offered.
Since you are using some custom url,I guess regex would be better than using URI class
In your regex you have misplaced ^..The regex should be
^https?://domain[.]com/[.]([^/]+)/([^/]+)/?$
I have not tested this, but give it a shot and tell me how it works out:
domain[.]com/\.([^/]+)/([^/]+)/?$
It looks like you had it mostly right except for the first carat, marking the beginning of the string... which is impossible since you specified .* right in front of it! Also you missed the period in front of {VAR1} (unless that is a typo?).
I also wouldn't put .* at the beginning because then you could be capturing VAR1 = domain.com, VAR2 = something that is actually VAR1
If you want to become immune to your kryptonite, then this website is really good for looking up stuff:
http://www.regular-expressions.info/reference.html
I want to match all "the act" outside the tags in the *.sgm that my professor gave me, I know that we can use XML parser, but our goal is to learn REGEX purely.
this is my current Regex:
(?<![""=<\/])\bthe act\b(?!\>)
The problem is with this example:
<ptext>Test example the act example</ptext>
My regex matches "the act". And that is correct.
But if this example now I will try:
<ptext tags="Test the act">Example the act</ptext>
The regex will match (2) two "the act", the one that is inside the tag attribute and the one outside, I dont want to match all the act inside the tag, how can I do that? thanks.
Maybe this will work: (?<=\>[^>]*)the act(?=[^<]*\<) It should work if the regex engine allows variable length look behind, I think c#'s engine does.
I have a list of links, but I need to FILTER-OUT and EXTRACT correct links from the String.
Extract should start with mywebsite.com and which end with 9-digitnumber.html
Links are strings, extracted to string
Example
http://blah.com?f=www.mywebsite.com/sdfsf/sdfsdf/sdfsdfsdf/123456789.html&sdfsdf/sf/sdfsd8sdfsdfsdf
and so on...
From this, regex must extract
mywebsite.com/sdfsf/sdfsdf/sdfsdfsdf/123456789.html
This should match the number in the end
'#"[0-9]{9}". but I am very new to regex and trying to learn how to use it properly
Parsing HTML with regexs is usually a bad idea. For you particular example, you can use:
(mywebsite.com/(.+?)\d{9})
but as Andrew said, using a regex for doing what you want is not really necessary.
/mywebsite\.com\/[a-zA-Z0-9\/]*[0-9]{9}\.html/
Quick question , I have been trying to match any word containing a '#' from a string list and remove it, but I don't know how to handle it . been playing around on http://regexhero.net/tester/ trying but to no avail.
Essentially if it comes across #ff or wha#s up i will just regex.replace them.
any ideas on the Regular expression to use?.
Thanks.
Don't use regex - just use string.replace - it's a lot faster.
I have a previous answer that covers some hashtag matching approaches.
In summary, if you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags (if the method you are calling, like statuses/show supports this parameter.
If you just need the regular expression to locate the hashtags and capture it's elements, Twitter provides it in an open source library that contains the following pattern.
(^|[^0-9A-Z&/]+)(#|\uFF03)([0-9A-Z_]*[A-Z_]+[a-z0-9_\\u00c0-\\u00d6\\u00d8-\\u00f6\\u00f8-\\u00ff]*)
More detail and additional links are provided in the original answer.
So you're trying to remove any words containing a #?
If so, give this a try...
\w*#\w*
And replace with nothing, like so...
http://regexhero.net/tester/?id=cda1e713-bdab-4aa2-b63d-a87e9b2c9bce
apple# orange ban#ana becomes orange
But if you're simply trying to remove all instances of #, then String.Replace is the better choice. myString = myString.Replace("#", "");