How to change rel to nofollow with Regex - c# - c#

i want to change some urls to nofollow and i also want, some urls dofollow
i try to do it with this Regex :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!blogs.cc)[^"]+)"([^>]*)>
i can support one url to dofollow (in this ex:"blogs.cc")
if i want to dofollow more of one, what do i do?
i try with :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)(((?!blogs.cc)[^"]+)||((?!wikipedia.org)[^"]+))"([^>]*)>
but i didn't get a correct answer
what's solution?

i resolved it and put my solution here for everybody who has same question.
just do it
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!(?:blogs.cc|wikipedia.org|moreUrls.com))[^"]+))"([^>]*)>
C# Sample Code:
Regex.Replace(str, "(<a\\s*(?!.*\brel=)[^>]*)(href=\"https?://)((?!(?:blogs.cc|wikipedia.org))[^\"]+)\"([^>]*)>", "<a $2$3\" $4 rel=\"nofollow\">")
i hope it would be useful

Related

Using regex to split a formatted string to URL like StackOverFlow

I'm trying to write a parser that will create links found in posted text that are formatted like so:
[Site Description](http://www.stackoverflow.com)
to be rendered as a standard HTML link like this:
Site Description
So far what I have is the expression listed below and will work on the example above, but if will not work if the URL has anything after the ".com". Obviously there is no single regex expression that will find every URL but would like to be able to match as many as I can.
(\[)([A-Za-z0-9 -_]*)(\])(\()((http|https|ftp)\://[A-Za-z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?)(\))
Any help would be greatly appreciated. Thanks.
Darn. It seems #Jerry and #MikeH beat me to it. My answer is best, however, as the link tags are all uppercase ;)
Find what: \[([^]]+)\]\(([^)]+)\)
Replace with: $1
http://regex101.com/r/cY7lF0
Well, you could try negated classes so you don't have to worry about the parsing of the url itself?
\[([^]]+)\]\(([^)]+)\)
And replace with:
$1
regex101 demo
Or maybe use only the beginning parts to identify a url?
\[([^]]+)\]\(((?:https?|ftp)://[^)]+)\)
The replace is the same.

How can I transform this url with REGEX?

I have a dynamic web app built using DotNetNuke that uses the following url format:
/SeoDummy.aspx?template={VAR1}&keywords={VAR2}
My user friendly url format is like this:
http://domain.com/.{VAR1}/{VAR2}
I am really terrible with REGEX and need to somehow detect when the user friendly url is requested and rewrite it with the dynamic web app url. I have tried the following, but It is not catching it on the site, it is just 404'ing:
.*/^([^/]+)/([^/]+)/?$
I am sure you that know regex will find my attempt silly, but regex is my kryptonite!
Thanks for any help that can be offered.
Since you are using some custom url,I guess regex would be better than using URI class
In your regex you have misplaced ^..The regex should be
^https?://domain[.]com/[.]([^/]+)/([^/]+)/?$
I have not tested this, but give it a shot and tell me how it works out:
domain[.]com/\.([^/]+)/([^/]+)/?$
It looks like you had it mostly right except for the first carat, marking the beginning of the string... which is impossible since you specified .* right in front of it! Also you missed the period in front of {VAR1} (unless that is a typo?).
I also wouldn't put .* at the beginning because then you could be capturing VAR1 = domain.com, VAR2 = something that is actually VAR1
If you want to become immune to your kryptonite, then this website is really good for looking up stuff:
http://www.regular-expressions.info/reference.html

Deal with '#' through regex

Quick question , I have been trying to match any word containing a '#' from a string list and remove it, but I don't know how to handle it . been playing around on http://regexhero.net/tester/ trying but to no avail.
Essentially if it comes across #ff or wha#s up i will just regex.replace them.
any ideas on the Regular expression to use?.
Thanks.
Don't use regex - just use string.replace - it's a lot faster.
I have a previous answer that covers some hashtag matching approaches.
In summary, if you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags (if the method you are calling, like statuses/show supports this parameter.
If you just need the regular expression to locate the hashtags and capture it's elements, Twitter provides it in an open source library that contains the following pattern.
(^|[^0-9A-Z&/]+)(#|\uFF03)([0-9A-Z_]*[A-Z_]+[a-z0-9_\\u00c0-\\u00d6\\u00d8-\\u00f6\\u00f8-\\u00ff]*)
More detail and additional links are provided in the original answer.
So you're trying to remove any words containing a #?
If so, give this a try...
\w*#\w*
And replace with nothing, like so...
http://regexhero.net/tester/?id=cda1e713-bdab-4aa2-b63d-a87e9b2c9bce
apple# orange ban#ana becomes orange
But if you're simply trying to remove all instances of #, then String.Replace is the better choice. myString = myString.Replace("#", "");

How do I fix this regular expression?

I have the string:
CN=Help & Technical,CN=Users,DC=dave,DC=com
And I want to strip out everything between the '=' and the ',' in a set of groups. Basically Im using this...
=([\w-\s]*)
And it is only dragging back the following :
=help
=users
=dave
So you can see im not getting Help & Technical in the first group which is what I want. Is this possible can anybody help me with the regex I just cant work it out...
I haven't tested this, but =([^,]*) should work.
You just need to include the & sign in your regular expression here.
=([\w-\s&]*)
Note that this is pretty restrictive so far... no apostrophes, no numbers, and no other punctuation. You may want to consider whether any of that will show up and add them as appropriate.
This should work
=(.+),|\w
It should match everything after the = until a , or a enter

Regex group capturing problem

If I had an html string containing this somewhere in the middle of it:
<img src="http://images.domain.com/Images/hello.jpg" alt="Failed to Load" />
What regex would I use in order to just obtain the name of the image file? i.e. hello.jpg
Currently I am using this:
(?<front>.*<img.*src="http://images.domain.com/Images/)(?<imgName>.*)"(?<end>.*)
However the value that it finds for the imgName group is:
hello.jpg" alt="Failed to Load
Does anyone know how to fix that?
The easiest fix is to have the imgName group match anything except for quotes by changing .* to [^"]*:
(?<front>.*<img.*src="http://images.domain.com/Images/)(?<imgName>[^"]*)"(?<end>.*)
Please see why you shouldn't be trying this.
Anyway, try (?<imgName>.*?) instead.

Categories