When developing my initial MonoTouch iOS application I was trying to use System.Uri instances to download protected resources from a private web service but as all instances are always returning unescaped URLs, my requests are failing when having their signature checked.
Example of a good request:
http://example.com/folder%2Ftest?signature=000%3D
Example of a bad request:
http://example.com/folder/test?signature=000=
In order to properly download my resources I need to convert a good URL from a simple string to a good Uri instance. I have just started the process but I am not able to conclude it:
string urlStringVersion = "http://example.com/folder%2Ftest?signature=000%3D";
//
// urlStringVersion == http://example.com/folder%2Ftest?signature=000%3D
Uri urlUriVersion = new Uri (urlStringVersion);
//
// urlUriVersion == http://example.com/folder/test?signature=000=
var fi = typeof (Uri).GetField ("host", BindingFlags.NonPublic | BindingFlags.Instance);
fi.SetValue (urlUriVersion, "example.com");
//
// urlUriVersion.AbsoluteUri is now == http://example.com/folder/test?signature=000%3D
<Another C# command here>
//
// urlUriVersion.AbsoluteUri is now == http://example.com/folder%2Ftest?signature=000%3D
Finally, which command may I be using to replace the Another C# command here in order to have my final urlUriVersion.AbsoluteUri pointing to the same initial urlStringVersion described URL?
I need this conversion working, otherwise I will be forced to make my resources public at my private web service.
I have also tested some alternatives:
From other questions like: GETting a URL with an url-encoded slash but some exceptions are happening:
System.NullReferenceException : Object reference not set to an instance of an object
Using configuration files but in this case since mobile devices don't technically have an Application Domain they are not available.
Double-encoding the URL by replacing the percent signs with an
encoded percent sign (so '%' becomes '%25').
None of my tries solved my problem.
Thanks in advance,
Piva
The Uri constructor you're using takes an unescaped url string, not an escaped url string.
Try unescaping the url string before creating the Uri instance:
string urlStringVersion = "http://example.com/folder%2Ftest?signature=000%3D";
urlStringVersion = Uri.UnescapeDataString (urlStringVersion);
Uri urlUriVersion = new Uri (urlStringVersion);
If you want to convert a Uri to an (un)escaped string, use the GetComponents method:
var unescaped = urlUriVersion.GetComponents (UriComponents.HttpRequestUri, UriFormat.Unescaped);
var escaped = urlUriVersion.GetComponents (UriComponents.HttpRequestUri, UriFormat.UriEscaped);
Do not change the values of private fields, that is bound to break your app one day. You are circumventing every single test for the Uri class (you're using the class in a way it was not designed for, nor tested), and besides the implementation may change at any time.
Related
I am given a code and on one of its pages which shows a "search result" after showing different items, it allows user to click on one of records and it is expected to bring up a page so that specific selected record can be modified.
However, when it is trying to bring up the page I get (by IE) "This page cannot be displayed".
It is obvious the URL is wrong because first I see something http://www.Something.org/Search.aspx then it turns into http://localhost:61123/ProductPage.aspx
I did search in the code and found the following line which I think it is the cause. Now, question I have to ask:
What should I do to avoid using a static URL and make it dynamic so it always would be pointing to the right domain?
string url = string.Format("http://localhost:61123/ProductPage.aspx?BC={0}&From={1}", barCode, "Search");
Response.Redirect(url);
Thanks.
Use HttpContext.Current.Request.Url in your controller to see the URL. Url contains many things including Host which is what you're looking for.
By the way, if you're using the latest .Net 4.6+ you can create the string like so:
string url = $"{HttpContext.Current.Request.Url.Host}/ProductPage.aspx?BC={barCode}&From={"Search"}";
Or you can use string.Format
string host = HttpContext.Current.Request.Url.Host;
string url = string.Format("{0}/ProductPage.aspx?BC={1}&From={2}"), host, barCode, "Search";
You can store the Host segment in your AppSettings section of your Web.Config file (per config / environment like so)
Debug / Development Web.Config
Production / Release Web.Config (with config override to replace the localhost value with something.org host)
and then use it in your code like so.
// Creates a URI using the HostUrlSegment set in the current web.config
Uri hostUri = new Uri(ConfigurationManager.AppSettings.Get("HostUrlSegment"));
// does something like Path.Combine(..) to construct a proper Url with the hostName
// and the other url segments. The $ is a new C# construct to do string interpolation
// (makes for readable code)
Uri fullUri = new Uri(hostUri, $"ProductPage.aspx?BC={barCode}&From=Search");
// fullUrl.AbosoluteUri will contain the proper Url
Response.Redirect(fullUri.AbsoluteUri);
The Uri class has a lot of useful properties and methods to give you Relative Url, AbsoluteUrl, your Url Fragments, Host name etc etc.
This should do it.
string url = string.Format("ProductPage.aspx?BC={0}&From={1}", barCode, "Search");
Response.Redirect(url);
If you are using .Net 4.6+ you can also use this string interpolation version
string url = $"ProductPage.aspx?BC={barcode}&From=Search";
Response.Redirect(url);
You should just be able to omit the hostname to stay on the current domain.
I have got the following log of URL strings. The logs contain millions of records.
www.example.com/p1?q=k
example.com/p1?q=k
http://example.com/p1?q=k
https://example.com/p1?q=k
http://www.example.com/p1?q=k
I used the C# Uri class but it throws an excepition for format of type "example.com/p1?q=K"
I was wondering if there is a generally/standard accepted method for dealing with such different types of URL to get websitename & the relative URL.
P.S: I could strip off http:// & https:// by using a regex or string comparision, but curious to know if there are any elegant solutions
If you try it with your existing example it will not work.. however you can play around with this and do some appending code where needed which means you will need to create a few variables to store the http://, https://, and www.
System.Uri uriPre = new Uri ("http://www.example.com/p1?q=k");
string uriString = uriPre.Host + uriPre.PathAndQuery;
uriString = uriString.Replace("www.", "");
yields
"example.com/p1?q=k"
the rest of the coding you will have to figure out because only you would know when to utilize the different protocols base on the example I've provided
to expand on Alexei Levenkov answer here is an example that you can use to try to create a new Uri.
Uri tempValue;
var uriPre = new Uri(string.Empty, UriKind.Relative);
if (Uri.TryCreate("example.com/p1?q=k", UriKind.Relative, out tempValue))
{
// do something or retrun tempValue;
}
Uri it the class that is designed to deal with Uris
var noSchemaRelativeUri = new Uri("example.com/foo", UriKind.Relative);
Either UriBuilder or Uri(Uri base, Uri relative) can be used to construct absolute Uri.
To pick between relative and aboslute you can use Uri.TryCreate.
Note. "www.example.com" and "example.com" strictly speaking are unrelated domain names, converting one to another is not guaranteed to always produce registered domain name (also indeed most sites register both and do some sort of redirect between).
I fetch the domain from the URL as follows:
var uri = new Uri("Http://www.google.com");
var host = uri.Host;
//host ="www.google.com"
But I want only google.com in Host,
host = "google.com"
Given the accepted answer I guess the issue was not knowing how to manipulate strings rather than how to deal with uris... but for anyone else who ends up here:
The Uri class does not have this property so you will have to parse it yourself.
Presumably you do not know what the subdomain is before time so a simple replace may not be possible.
This is not trivial since the TLDs are so varied (http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains), and there maybe be multiple parts to the url (eg http://pre.subdomain.domain.co.uk).
You will have to decide exactly what you want to get and how complex you want the solution to be.
simple - do a string replace, see ekad's answer
medium - regex that works most of the time, see Strip protocol and subdomain from a URL
or complex - refer to a list of suffixes in order to figure out what is subdomain and what is domain eg
Get the subdomain from a URL
If host begins with "www.", you can replace "www." with an empty string using String.Replace Method like this:
var uri = new Uri("Http://www.google.com");
var host = uri.Host.ToLower();
if (host.StartsWith("www."))
{
host = host.Replace("www.", "");
}
I have a list of 100,000 urls in list(Of string) which can contain urls in the form.
yahoo.com
http://yahoo.com
http://www.yahoo.com
i have tried using a combination of regex and the Uri class, but that didn't help, so i dumped the code. i also tried using this code, but it will only remove duplicatse of exact form, since its not domain specific.
list = new ArrayList<T>(new HashSet<T>(list))
How filter these duplicates and keep just one of these url if it contains the same name e.g yahoo.
thanks
[EDIT]
Please note that
all URL are of different domains, but can usually have duplicates like the example i gave above
also, am using .net 2.0, so i can't use linq
This worked for me
[TestMethod]
public void TestMethod1()
{
var sites = new List<string> {"yahoo.com", "http://yahoo.com", "http://www.yahoo.com"};
var result = sites.Select(
s =>
s.StartsWith("http://www.")
? s
: s.StartsWith("http://")
? "http://www." + s.Substring(7)
: "http://www." + s).Distinct();
Assert.AreEqual(1, result.Count());
}
I think the Uri Class would be able to help in this case. I am not at a VS machine where I can test; however, pass the Uri constructor the string of the Url, and try the Host property for comparison:
List<string> distinctHosts = new List<string>();
foreach (string url in UrlList)
{
Uri uri = new Uri(url)
if (! disctinctHosts.Contains(uri.Host))
{
distinctHosts.Add(uri.Host);
}
}
This feels a bit primitive, and could probably be more elegant - possibly without a foreach; but like I said, I'm not at a development machine where I could work with it.
I think this would be able to handle any variation of a valid Url. Building an ArrayList is not a good idea; in my opinion, Regex would require that you maintain some sort of custom 'MatchList' that could get unwieldy.
As #Damokles points out, you should have some form of validation. The Uri class does require a protocol: 'http://' or 'ftp://'. You do not want to assume 'badurl.com' is actually invalid; however:
if (!url.StartsWith("http://")) { /* add protocol */ } // then check Host domain as above
...should be sufficient simply to retrieve a distinct host or domain name. I recommend any option that does not require guessing the index position of any part of the Url as that is tightly bound to specific formats.
You can do this with the Uri class and Linq/extension methods. The trick is to normalize the Url before using it with the Uri class. Also note that the Uri class requires the scheme, so that will have to be added for ones where it's not present. You can use a different property of the Uri class to achieve different results. The example below returns all unique Urls and treats yahoo.com differently than www.yahoo.com.
string[] urls = new[] {
"yahoo.com",
"http://yahoo.com",
"http://www.yahoo.com" };
var unique = urls.
Select(url => new System.Uri(
url.StartsWith("http") ? url : "http://" + url).Host).
Distinct();
(Edited to clean up formatting and to make the scheme addition part support both "http://" and "https://")
Try a Regex then .*?(\w+\.\w+)$ assuming you don't have anything after the tld.
My goal is to safely open a web page in a users default browser. The URL for this web page is considered "untrusted" (think of it as a link in a document opened with this software, but the document could be from anywhere and the links in it could be malicious)
I want to avoid someone passing "C:\Windows\malicious_code.exe" off as a URL
My current thought is to do something like this:
Uri url = new Uri(urlString, UriKind.Absolute);
if( url.Scheme == Uri.UriSchemeHttp || url.Scheme == Uri.UriSchemeHttps )
{
Process.Start(url.AbsoluteUri);
}
Am I forgetting about anything else that my 'urlString' might contain that makes this dangerous (e.g. a new line character which would allow someone to sneak a second process to be started in after the URL or a possible execution of a relative executable starting with http)?
I'm pretty sure both of those cases are handled by this (as I don't believe Process.Start allows you to start two processes as you would in a BATCH file and this should only allow strings starting with http: or https: and are valid urls)
Is there a better way to do this in C#?
What you want to check is the scheme of the url (i.e. ftp://, http://, file://, etc.) Here is a list of schemes: http://en.wikipedia.org/wiki/URI_scheme#Official_IANA-registered_schemes
To find the scheme of a URL, use:
Uri u = new Uri("C:\\Windows");
String scheme = (u.GetLeftPart(UriPartial.Scheme).ToString());
For me, the above example gives file://. Just check the scheme, using the code above, and reject the ones you want to filter. Also, surround the parsing with a try-catch block and if an exception is caught, reject the URL; it can't be parsed so you shouldn't trust it.
If you want to ultra-paranoid-safe, you could always parse the URL using a URL parser and reconstruct it, validating each part as you go along.