C# Replace URL Regex - c#

I am trying to pull a URL out of a string and use it later to create a Hyperlink. I would like to be able to do the following:
- determine if the input string contains a URL
- remove the URL from the input string
- store the extracted URL in a variable for later use
Can anyone help me with this?

Here is a great solution for recognizing URLs in popular formats such as:
www.google.com
http://www.google.com
mailto:somebody#google.com
somebody#google.com
www.url-with-querystring.com/?url=has-querystring
The regular expression used is:
/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/
However, I would recommend you go to http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the to see the working example.

Replace input with your input
string input = string.Empty;
var matches = Regex.Matches(input,
#"/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[.\!\/\\w]*))?)/");
List<string> urlList = (matches.Cast<object>().Select(match => match.ToString())).ToList();

Related

How to extract a word which is having an extension from a string using regex?

I need to extract a word from a string before an extension. Let's say I've got a string like :
"Hey Stackoverflow.xyz Whats up?"
I need to extract a word with extension .xyz i.e Stackoverflow. How can this be achieved?
You can use positive look ahead to ensure the string you want to extract follows .xyz using this regex,
\S+(?=\.xyz)
Demo
Try these C# codes,
string str = "Hey Stackoverflow.xyz Whats up?";
var m = Regex.Match(str,#"\S+(?=\.xyz)");
Console.WriteLine(m.Groups[0].Value);
Outputs,
Stackoverflow
Online C# demo
In case you want to extract your string with extension Stackoverflow.xyz, just change the look ahead part of regex to normal string like this,
\S+\.xyz
Use the following regex to extract the word you need before the extension
\s(.*)?\.
Here the word will be captured using the brackets.
string str = "Hey Stackoverflow.xyz Whats up?";
var regexResult = Regex.Match(str,#"\s(.*)?\.");
Console.WriteLine(regexResult.Groups[1].Value);
/(\w+)\\.[^\W]+/
Play with it at regex101

How to get a part of a string (url)?

I have the following url
http://example.com/pa/TaskDetails.aspx?Proj=A5AF5C0D-648A-4892-A995-CDA8013F2643&Assn=2A992D9C-C511-E611-80E4-005056A13B51
I need to extract the A5AF5C0D-648A-4892-A995-CDA8013F2643 portion of the url parameter:
Proj=A5AF5C0D-648A-4892-A995-CDA8013F2643
This can be in the middle or at the end of the url. I cannot guarantee the position of it. But i always starts with Proj= and end with &. The string between this is what i want. How can i grab this within C#?
It seems that you are trying to retrieve the IDFA from a url address. I think you can easily do that by applying regular expressions to the url string.
For example, the following:
[0-9a-fA-F]{8}[-][0-9a-fA-F]{4}[-][0-9a-fA-F]{4}[-][0-9a-fA-F]{4}[-][[0-9a-fA-F]{12}
Picks up every valid IDFA when applied to the URL string. You can add conditions for the head and tail of the IDFA to retrieve exactly what you are looking for:
Proj=[0-9a-fA-F]{8}[-][0-9a-fA-F]{4}[-][0-9a-fA-F]{4}[-][0-9a-fA-F]{4}[-][[0-9a-fA-F]{12}&
You can test the above Regex (regular expression) syntax on one of the many free online Regex applets (e.g. https://regex101.com/)
To apply Regex to your code, please see the following thread:
c# regex matches example
You may need to create a Uri and pass the value of its Query property to the HttpUtility.ParseQueryString method:
string value = HttpUtility.ParseQueryString(new Uri("http://example.com/pa/TaskDetails.aspx?Proj=A5AF5C0D-648A-4892-A995-CDA8013F2643&Assn=2A992D9C-C511-E611-80E4-005056A13B51").Query)["Proj"];
The method is defined in System.Web.dll by the way so you need to add a reference to this one.
string som = "http://example.com/pa/TaskDetails.aspx?Proj=A5AF5C0D-648A-4892-A995-CDA8013F2643&Assn=2A992D9C-C511-E611-80E4-005056A13B51";
int startPos = som.LastIndexOf("Proj=") + "Proj=".Length + 1;
int length = som.IndexOf("&") - startPos;
string sub = som.Substring(startPos, length); //<- This will return your key
This should do it.
One solution:
string param = HttpUtility
.ParseQueryString("http://example.com/pa/TaskDetails.aspx?Proj=A5AF5C0D-648A-4892-A995-CDA8013F2643&Assn=2A992D9C-C511-E611-80E4-005056A13B51")
.Get("Proj");

Query String issue when it contains Arabic text

I am trying to get query string from url With this code:
this.site_query = Request.Url.Query;
When I have get url:
http://localhost:1751/ar/search?q=سيارة
It gives me blow output in code:
http://localhost:1751/ar/Search?q=%D8%B3%D9%8A%D8%A7%D8%B1%D8%A9&Location=%D8%A3%D8%A8%D9%87%D8%A7,Abha
But I need Arabic text that I send in query string. When query string contains text in English then in c# it is correct.
There is nothing wrong with the second URL you have shown in your answer, it's just being URL encoded due to the limitations of what characters are allowed in URLs.
If you wish to get parts of the query string in code, you can use code like this:
var query = Request.QueryString["q"];
Additionally, if you are building your URLs in code, you should always URL encode and values that may contain non standard characters:
var urlEncodedValue = HttpUtility.UrlEncode(someValue);
As others said already: it's an encoded URL. You can decode with
var decodedUrl = HttpUtility.UrlDecode(url);
or
var decodedUrl = Uri.UnescapeDataString(url);
Is that what you need? If not, show us your expected output.
For this use
string name = HttpUtility.UrlEncode(Encrypt(txtName.Text.Trim()));
string technology = HttpUtility.UrlEncode(Encrypt(ddlTechnology.SelectedItem.Value));
for encoding url.

Extracts all sub strings between string separators in a string (C#)

I'm trying to parse content of a string to see if the string includes urls, to convert the full string to html, to make the string clickable.
I'm not sure if there is a smarter way of doing this, but I started trying creating a parser with the Split method for strings, or Regex.Split in C#. But I can't find a good way of doing it.
(It is a ASP.NET MVC application, so perhaps there is some smarter way of doing this)
I want to ex. convert the string;
"Customer office is responsible for this. Contact info can be found {link}{www.customerservice.com}{here!}{link} More info can be found {link}{www.customerservice.com/moreinfo}{here!}{link}"
Into
"Customer office is responsible for this. Contact info can be found <a href=www.customerservice.com>here!</a> More info can be found <a href=www.customerservice.com/moreinfo>here!</a>"
i.e.
{link}{url}{text}{link} --> <a href=url>text</a>
Anyone have a good suggestion? I can also change the way the input string is formatted.
You can use the following to match:
{link}{([^}]*)}{([^}]*)}{link}
And replace with:
<a href=$1>$2</a>
See DEMO
Explanation:
{link} match {link} literally
{([^}]*)} match all characters except } in capturing group 1 (for url)
{([^}]*)} match all characters except } in capturing group 2 (for value)
{link} match {link} literally again
you can use regex as
{link}{(.*?)}{(.*?)}{link}
and substution as
<a href=\1>\2</a>
Regex
For your simple link format {link}{url}{text} you can use simple Regex.Replace:
Regex.Replace(input, #"\{link\}\{([^}]*)\}\{([^}]*)\}", #"$2");
Also this non-regex idea may help
var input = "Customer office is responsible for this. Contact info can be found {link}{www.customerservice.com}{here!}{link} More info can be found {link}{www.customerservice.com/moreinfo}{here!}{link}";
var output = input.Replace("{link}{", "<a href=")
.Replace("}{link}", "</a>")
.Replace("}{", ">");

need help with regex helicon rule

I need some help with the regex as i am writing a new rule in the helicon.
the sample url will have file name and a query string parameter i want to match on both
www.testwebsite.com/hello.aspx?filename=/test.asp&employeeid=2100&age=20
in the above url i want to check if it is hello.aspx and has query string filename=/test.asp
filename can be anywhere in the querystring.
i want to break the above url into some other page
mynewpage.aspx $2$3 etc///
i wrote the following url but its not working , it matching pattern for all like sample1.aspx or any file name
(.*)(\/hello.aspx\?+)(.*)(filename=\/test\.asp)(.*)
any help will be appreciated
What you need are non capturing groups:
(?:.*)(\/hello.aspx\?+)(?:.*)(filename=\/test\.asp)(?:.*)
["www.testwebsite.com/hello.aspx?filename=/test.asp&employeeid=2100&age=20", "/hello.aspx?", "filename=/test.asp"]
(?:.*)(\/hello.aspx\?+)(?:.*)(filename=\/test\.asp)(.*)
["www.testwebsite.com/hello.aspx?filename=/test.asp&employeeid=2100&age=20", "/hello.aspx?", "filename=/test.asp", "&employeeid=2100&age=20"]
If you want to get all the parameters separately from the query string you can do it like this:
string queryString = (new Uri("...")).Query;
NameValueCollection parameters = HttpUtility.ParseQueryString(queryString);
parameters.Get("filename");
parameters.Get("employeeid");
parameters.Get("age");

Categories