Using regex to split a formatted string to URL like StackOverFlow

Using regex to split a formatted string to URL like StackOverFlow - c#

I'm trying to write a parser that will create links found in posted text that are formatted like so:
[Site Description](http://www.stackoverflow.com)
to be rendered as a standard HTML link like this:
Site Description
So far what I have is the expression listed below and will work on the example above, but if will not work if the URL has anything after the ".com". Obviously there is no single regex expression that will find every URL but would like to be able to match as many as I can.
(\[)([A-Za-z0-9 -_]*)(\])(\()((http|https|ftp)\://[A-Za-z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?)(\))
Any help would be greatly appreciated. Thanks.

Darn. It seems #Jerry and #MikeH beat me to it. My answer is best, however, as the link tags are all uppercase ;)
Find what: \[([^]]+)\]\(([^)]+)\)
Replace with: $1
http://regex101.com/r/cY7lF0

Well, you could try negated classes so you don't have to worry about the parsing of the url itself?
\[([^]]+)\]\(([^)]+)\)
And replace with:
$1
regex101 demo
Or maybe use only the beginning parts to identify a url?
\[([^]]+)\]\(((?:https?|ftp)://[^)]+)\)
The replace is the same.

Related

How to change rel to nofollow with Regex - c#

i want to change some urls to nofollow and i also want, some urls dofollow
i try to do it with this Regex :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!blogs.cc)[^"]+)"([^>]*)>
i can support one url to dofollow (in this ex:"blogs.cc")
if i want to dofollow more of one, what do i do?
i try with :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)(((?!blogs.cc)[^"]+)||((?!wikipedia.org)[^"]+))"([^>]*)>
but i didn't get a correct answer
what's solution?

i resolved it and put my solution here for everybody who has same question.
just do it
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!(?:blogs.cc|wikipedia.org|moreUrls.com))[^"]+))"([^>]*)>
C# Sample Code:
Regex.Replace(str, "(<a\\s*(?!.*\brel=)[^>]*)(href=\"https?://)((?!(?:blogs.cc|wikipedia.org))[^\"]+)\"([^>]*)>", "<a $2$3\" $4 rel=\"nofollow\">")
i hope it would be useful

How can I transform this url with REGEX?

I have a dynamic web app built using DotNetNuke that uses the following url format:
/SeoDummy.aspx?template={VAR1}&keywords={VAR2}
My user friendly url format is like this:
http://domain.com/.{VAR1}/{VAR2}
I am really terrible with REGEX and need to somehow detect when the user friendly url is requested and rewrite it with the dynamic web app url. I have tried the following, but It is not catching it on the site, it is just 404'ing:
.*/^([^/]+)/([^/]+)/?$
I am sure you that know regex will find my attempt silly, but regex is my kryptonite!
Thanks for any help that can be offered.

Since you are using some custom url,I guess regex would be better than using URI class
In your regex you have misplaced ^..The regex should be
^https?://domain[.]com/[.]([^/]+)/([^/]+)/?$

I have not tested this, but give it a shot and tell me how it works out:
domain[.]com/\.([^/]+)/([^/]+)/?$
It looks like you had it mostly right except for the first carat, marking the beginning of the string... which is impossible since you specified .* right in front of it! Also you missed the period in front of {VAR1} (unless that is a typo?).
I also wouldn't put .* at the beginning because then you could be capturing VAR1 = domain.com, VAR2 = something that is actually VAR1
If you want to become immune to your kryptonite, then this website is really good for looking up stuff:
http://www.regular-expressions.info/reference.html

Issue regarding Regular Expression

I want to match all "the act" outside the tags in the *.sgm that my professor gave me, I know that we can use XML parser, but our goal is to learn REGEX purely.
this is my current Regex:
(?<![""=<\/])\bthe act\b(?!\>)
The problem is with this example:
<ptext>Test example the act example</ptext>
My regex matches "the act". And that is correct.
But if this example now I will try:
<ptext tags="Test the act">Example the act</ptext>
The regex will match (2) two "the act", the one that is inside the tag attribute and the one outside, I dont want to match all the act inside the tag, how can I do that? thanks.

Maybe this will work: (?<=\>[^>]*)the act(?=[^<]*\<) It should work if the regex engine allows variable length look behind, I think c#'s engine does.

Best way to get links from strings that contain them

I have a list of links, but I need to FILTER-OUT and EXTRACT correct links from the String.
Extract should start with mywebsite.com and which end with 9-digitnumber.html
Links are strings, extracted to string
Example
http://blah.com?f=www.mywebsite.com/sdfsf/sdfsdf/sdfsdfsdf/123456789.html&sdfsdf/sf/sdfsd8sdfsdfsdf
and so on...
From this, regex must extract
mywebsite.com/sdfsf/sdfsdf/sdfsdfsdf/123456789.html
This should match the number in the end
'#"[0-9]{9}". but I am very new to regex and trying to learn how to use it properly

Parsing HTML with regexs is usually a bad idea. For you particular example, you can use:
(mywebsite.com/(.+?)\d{9})
but as Andrew said, using a regex for doing what you want is not really necessary.

/mywebsite\.com\/[a-zA-Z0-9\/]*[0-9]{9}\.html/

Deal with '#' through regex

Quick question , I have been trying to match any word containing a '#' from a string list and remove it, but I don't know how to handle it . been playing around on http://regexhero.net/tester/ trying but to no avail.
Essentially if it comes across #ff or wha#s up i will just regex.replace them.
any ideas on the Regular expression to use?.
Thanks.

Don't use regex - just use string.replace - it's a lot faster.

I have a previous answer that covers some hashtag matching approaches.
In summary, if you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags (if the method you are calling, like statuses/show supports this parameter.
If you just need the regular expression to locate the hashtags and capture it's elements, Twitter provides it in an open source library that contains the following pattern.
(^|[^0-9A-Z&/]+)(#|\uFF03)([0-9A-Z_]*[A-Z_]+[a-z0-9_\\u00c0-\\u00d6\\u00d8-\\u00f6\\u00f8-\\u00ff]*)
More detail and additional links are provided in the original answer.

So you're trying to remove any words containing a #?
If so, give this a try...
\w*#\w*
And replace with nothing, like so...
http://regexhero.net/tester/?id=cda1e713-bdab-4aa2-b63d-a87e9b2c9bce
apple# orange ban#ana becomes orange
But if you're simply trying to remove all instances of #, then String.Replace is the better choice. myString = myString.Replace("#", "");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using regex to split a formatted string to URL like StackOverFlow - c#

Darn. It seems #Jerry and #MikeH beat me to it. My answer is best, however, as the link tags are all uppercase ;) Find what: \[([^]]+)\]\(([^)]+)\) Replace with: $1 http://regex101.com/r/cY7lF0

Well, you could try negated classes so you don't have to worry about the parsing of the url itself? \[([^]]+)\]\(([^)]+)\) And replace with: $1 regex101 demo Or maybe use only the beginning parts to identify a url? \[([^]]+)\]\(((?:https?|ftp)://[^)]+)\) The replace is the same.

Related

How to change rel to nofollow with Regex - c#

How can I transform this url with REGEX?

Issue regarding Regular Expression

Best way to get links from strings that contain them

Deal with '#' through regex

Categories

Resources