C# Need to locate web addresses using REGEX is that possible? - c#

C# Need to locate web addresses using REGEX is that possible?
Basically I need to parse a string prior to loading it into a WebBrowser
myString = "this is an example string http://www.google.com , and I need to make the link clickable";
webBrow.DocumentText = myString;
Basically what I want to happen is a replace of the web address so that it looks like a hyperlink, and do this with any address pulled in to the string. I would need to replace the web address so that web address would read like
<a href='web address'>web address</a>
This would allow me to have the links clickable..
Any Ideas?

new Regex(#"https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?").Match(myString)

It's possible depending on how strict or permissive you want your parsing to be.
As a first cut, you can try #"\bhttp://\S+" which will match any string starting with "http://" at a word boundary (non-word character, such as whitespace or punctuation).
To search using a regex and replace all occurrences with your custom text, you could use the Regex.Replace method.
You may want to read up on Regular Expression Language Elements to learn more.

Related

Regular expressions redirection

I want to set redirection from
www.somesite.com/products/dynamicstring/randomtext1/randomtext2
to www.somesite.com/products/dynamicstring
Is it possible to do that through Regex ?
It means if my incming url is
www.somesite.com/products/myproducts/test1/test2 it should redirect to www.somesite.com/products/myproducts/
just briefing more about this :
#TomLord i am using HttpContext.Current.Response.RedirectPermanent(matchingDefinition.To) i have all the redirects "From" and "To" in a class object, in the form of REGEX expressions.Example in From "/product/*" and To "/products" , i am reading these object and trying to redirect them, but i am not able to redirect something like /products/dynamicstring/randomtext1/ to /products/dynamicstring where dynamic string is random string , i dont find any regular expression which can be use to do this. For example /products/samples/randomtext1 should redirect to /products/samples/
Redirection cannot be done with regex alone. Google a bit what is a regular expression in reality. The short answer is: it's string-like expression that describes search pattern. So it can't redirect, not even replace a substring with substring or do anything else then match and capture parts of the matched string.
That being said, regex can help us do what you wanna. I am gonna assume you can use Javascript, cause I can't put a solution in every language. I am also gonna assume you will try to go over the code not copy paste and press enter. If you only need that hire a programmer. If you use another language, principle should be the same:
obtain URL
define regex
use capture group to extract the part of your URL that you need
construct a new URL
redirect to it
While matching the URLs in general is a fair bit more complex, like:
^(?:https?://)?(?:[\w]+\.)(?:\.?[\w]{2,})+$
As long as you are sure you will only be getting URLs and in the format you wanna, we will do it far simpler.
Basically, let's say you have:
some text with 2 dots that ends in com
then a /products/dynamicstring/
then text
then /
then text
As a regex that is:
/\w*.\w*.com\/products\/dynamicstring\/\w*\/\w*/g
Curde matching is done, but we still need to add a capture group we will use to extract part of the string we need:
/(\w*.\w*.com\/products\/)dynamicstring\/\w*\/\w*/g
Oke, now let's leverage this regex to do rest of the work:
Define regex:
var regex = /\w*.\w*.com\/products\/dynamicstring\/\w*\/\w*/g;
Get current URL. If you already have URL use it.
var currUrl = window.location.href;
Extract capture group from string:
var match = regex.exec(currUrl);
Use that to get a new URL from old one:
var redirectUrl = match[1] + myproducts/
Finally, we redirect with:
window.location.replace(redirectUrl);
I wrote all this straight from my head so I recommend you go over each step, look how it works, read some documentation about functions used. You might find an error as well as learn a lot.

Extracts all sub strings between string separators in a string (C#)

I'm trying to parse content of a string to see if the string includes urls, to convert the full string to html, to make the string clickable.
I'm not sure if there is a smarter way of doing this, but I started trying creating a parser with the Split method for strings, or Regex.Split in C#. But I can't find a good way of doing it.
(It is a ASP.NET MVC application, so perhaps there is some smarter way of doing this)
I want to ex. convert the string;
"Customer office is responsible for this. Contact info can be found {link}{www.customerservice.com}{here!}{link} More info can be found {link}{www.customerservice.com/moreinfo}{here!}{link}"
Into
"Customer office is responsible for this. Contact info can be found <a href=www.customerservice.com>here!</a> More info can be found <a href=www.customerservice.com/moreinfo>here!</a>"
i.e.
{link}{url}{text}{link} --> <a href=url>text</a>
Anyone have a good suggestion? I can also change the way the input string is formatted.
You can use the following to match:
{link}{([^}]*)}{([^}]*)}{link}
And replace with:
<a href=$1>$2</a>
See DEMO
Explanation:
{link} match {link} literally
{([^}]*)} match all characters except } in capturing group 1 (for url)
{([^}]*)} match all characters except } in capturing group 2 (for value)
{link} match {link} literally again
you can use regex as
{link}{(.*?)}{(.*?)}{link}
and substution as
<a href=\1>\2</a>
Regex
For your simple link format {link}{url}{text} you can use simple Regex.Replace:
Regex.Replace(input, #"\{link\}\{([^}]*)\}\{([^}]*)\}", #"$2");
Also this non-regex idea may help
var input = "Customer office is responsible for this. Contact info can be found {link}{www.customerservice.com}{here!}{link} More info can be found {link}{www.customerservice.com/moreinfo}{here!}{link}";
var output = input.Replace("{link}{", "<a href=")
.Replace("}{link}", "</a>")
.Replace("}{", ">");

Deal with '#' through regex

Quick question , I have been trying to match any word containing a '#' from a string list and remove it, but I don't know how to handle it . been playing around on http://regexhero.net/tester/ trying but to no avail.
Essentially if it comes across #ff or wha#s up i will just regex.replace them.
any ideas on the Regular expression to use?.
Thanks.
Don't use regex - just use string.replace - it's a lot faster.
I have a previous answer that covers some hashtag matching approaches.
In summary, if you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags (if the method you are calling, like statuses/show supports this parameter.
If you just need the regular expression to locate the hashtags and capture it's elements, Twitter provides it in an open source library that contains the following pattern.
(^|[^0-9A-Z&/]+)(#|\uFF03)([0-9A-Z_]*[A-Z_]+[a-z0-9_\\u00c0-\\u00d6\\u00d8-\\u00f6\\u00f8-\\u00ff]*)
More detail and additional links are provided in the original answer.
So you're trying to remove any words containing a #?
If so, give this a try...
\w*#\w*
And replace with nothing, like so...
http://regexhero.net/tester/?id=cda1e713-bdab-4aa2-b63d-a87e9b2c9bce
apple# orange ban#ana becomes orange
But if you're simply trying to remove all instances of #, then String.Replace is the better choice. myString = myString.Replace("#", "");

Matching a URL Encoded e-mail address in C#

I did some searching and didn't quite figure out why my solution is not working. Basically I need to take a string (which is HTML code) parse it and look for mailto links (which I then want to replace as part of an obfuscation). Here is what I have thus far:
string text = "<p>Some Person<br /> Person's Position<br />p. 123-456-7890<br /> e. <a title=\"Email Some Person\" target=\"_blank\" href=\"mailto:someperson%40domain.com\">someperson#domain.com</a></p>";
text = Server.UrlDecode(text);
string safeEmails = Regex.Replace(text, "()(.*?)()", "<a class=\"mailme\" href=\"$2*$4\">$6</a>");
Response.Write( Server.HtmlDecode(safeEmails));
The text is coming out of a WYSIWYG text editor (Telrik RadEditor for those familiar) and for all intents and purposes I don't have access to be able to control what is coming out of it.
Basically I need to find and replace any:
someone#domain.com
With:
<a class="mailme" href="someone#domain.com">someone#domain.com</a>
Some background: I am attempting to create a mailto link that will avoid detection by harvesters. The problem is that I receive a string with the e-mail as a standard mailto link. I cannot control the incoming string, so the mailto will always be an unprotected mailto. My object is to find all of them, obfuscate them, then use JavaScript to "fix" the link so that human vistors can easily use the mailto links. I am open to new approaches as well as modifications to the above code.
You could use a regex or the HTML agility pack to find and obfuscate all your mailto. If you want a good obfuscation try reading ten methods to obfuscate e-mail addresses compared
EDIT:
sorry, from the first version of your question I didn't get you had a problem in making your regex work. Since you're usign a WYSIWYG text editor, I think the HTML that comes out of it should be pretty "regular", so you may be fine using a regex.
You can try changing your Replace line like this:
string safeEmails = Regex.Replace(text, "href=\"mailto:.*\">(.*)</a>", "class=\"mailme\" href=\"$1\">$1</a>");

Regular expression to define format of backup filenames

In the application I am currently working on, I have an option to create automatic backups of a certain file on the hard disk. What I would like to do is offer the user the possibility to configure the name of the file and its extension.
For example, the backup filename could be something like : "backup_month_year_username.bak". I had the idea to save the format in the form of a regular expression. For the example above, the regexp would look like :
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w).(?<extension>bak)$"
I thought about using regex because I will also have to browse through the directory of backuped files to delete those older than a certain date. The main trouble I have now is how to create a filename using the regex. In a way I should replace the tags with the information. I could do that using regex.replace and another regex, but I feel it's a big weird doing that and it might be a better way.
Thanks
[Edit] Maybe I wasn't really clear in the first go, but the idea is of course that the user (in this case an admin that will know regex syntax) will have the possibility to modify the form of the filename, that's all the idea behind it[/Edit]
... and if the regex changes, it is next to impossible to reconstruct a string from a given regex.
Edit:
Create some predefined "place-holders": %u could be the user's name, %y could be the year, etc.:
backup_%m_%y_%u.bak
and then simple replace the %? with their actual values.
It sounds like you're trying to use the regular expression to create the file name from a pattern which the user should be able to specify.
Regular expressions can - AFAIK - not be used to create output, but only to validate input, so you'd have the user specify two things:
a file name production pattern like Bart suggested
a validation pattern in form of a regular expression that helps you split the file names into their parts
EDIT
By the way, your sample regex contains an error: The "." is use for "any character", also \w only matches one word character, so I guess you meant to write
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
If the filename is always in this form, there is no reason for a regex, as it's easier to process with string.Split ...
With Bart's solution it is easy enough to split (using string.Split) the generated file name using underscore as the delimiter, to get back the information.
Ok, I think I have found a way to use only the regex. As I am using groups to get the information, I will use another regular expression to match the regular expression and replace the groups with the value:
Regex rgx = new Regex("\(\?\<Month\>.+?\)");
rgx.Replace("^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
, DateTime.Now.Month.ToString());
Ok, it's really a hack, but at least it works and I have only one pattern defined by the user. It might not work if the regex is too complex, but I think I can deal with that problem.
What do you think?

Categories