UrlPathEncode() alternative

UrlPathEncode() alternative - c#

The MSDN page for UrlPathEncode states the UrlPathEncode shouldn't be used, and that I should use UrlEncode instead.
Do not use; intended only for browser compatibility. Use UrlEncode.
But UrlEncode does not do the same thing as UrlPathEncode.
My use case is that I want to encode a file system path so that a file can be downloaded. The spaces in a path need to be escaped, but not the forward slashes etc. UrlPathEncode does exactly this.
// given the path
string path = "Directory/Path to escape.exe";
Console.WriteLine(System.Web.HttpUtility.UrlPathEncode(path));
// returns "Installer/My%20Installer.msi" <- This is what I require
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path));
// returns "Installer%2fMy+Installer.msi"
// none of these return what I require, either
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.ASCII));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.BigEndianUnicode));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.Default));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.UTF32));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.UTF7));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.UTF8));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(path, Encoding.Unicode));
Another method I've tried is using Uri.EscapeDataString, but this escapes the slashes.
// returns Directory%2FPath%20to%20escape.exe
Console.WriteLine(Uri.EscapeDataString(path));
Question:
If I'm not supposed to use UrlPathEncode, and UrlEncode doesn't produce the required output, what method is equivalent and recommended?

It's funny that when trying to write a question properly, you find your answer:
Uri.EscapeUriString(path);
Produces the required output.
I do think the MSDN page should reflect this, though.
Edit (2020-11-22)
I've recently come across this again, but needing to URL encode URLs with special characters (instead of file names with spaces), but it's essentially the same thing. The approach I used this time was to instantiate the Uri class:
var urlWithSpecialChars = "https://www.example.net/something/contàins-spécial-chars?query-has=spécial-chars-as-well";
var uri = new Uri(urlWithSpecialChars);
// outputs "https://www.example.net/something/contàins-spécial-chars?query-has=spécial-chars-as-well"
Debug.WriteLine(uri.OriginalString);
// outputs "https://www.example.net/something/cont%C3%A0ins-sp%C3%A9cial-chars?query-has=sp%C3%A9cial-chars-as-well"
Debug.WriteLine(uri.AbsoluteUri);
// outputs "/something/cont%C3%A0ins-sp%C3%A9cial-chars?query-has=sp%C3%A9cial-chars-as-well"
Debug.WriteLine(uri.PathAndQuery);
This give you quite a bit of useful Uri properties that are likely to cover most/many Uri processing requirements:

Related

How to include number (hash) character # in path segment?

I have to download a file (using existing Flurl-Http endpoints [1]) whose name contains a "#" which of course has to be escaped to %23 to not conflict with uri-fragment detection.
But Flurl always escapes the rest but not this character, resulting in a non working uri where half of the path and all query params are missing because they got parsed as uri-fragment:
Url url = "http://server/api";
url.AppendPathSegment("item #123.txt");
Console.WriteLine(url.ToString());
Returns: http://server/api/item%20#123.txt
This means a http request (using Flurl.Http) would only try to download the non-existing resource http://server/api/item%20.
Even when I pre-escape the segment, the result still becomes exactly the same:
url.AppendPathSegment("item %23123.txt");
Console.WriteLine(url.ToString());
Again returns: http://server/api/item%20#123.txt.
Any way to stop this "magic" happen?
[1] This means I have delegates/interfaces where input is an existing Flurl.Url instance which I have to modify.

It looks like you've uncovered a bug. Here are the documented encoding rules Flurl follows:
Query string values are fully URL-encoded.
For path segments, reserved characters such as / and % are not encoded.
For path segments, illegal characters such as spaces are encoded.
For path segments, the ? character is encoded, since query strings get special treatment.
According to the 2nd point, it shouldn't encode # in the path, so how it handles AppendPathSegment("item #123.txt") is correct. However, when you encode the # to %23 yourself, Flurl certainly shouldn't unencode it. But I've confirmed that's what's happening. I invite you to create an issue on GitHub and it'll be addressed.
In the mean time, you could write your own extension method to cover this case. Something like this should work (and you wouldn't even need to pre-encode #):
public static Url AppendFileName(this Url url, string fileName) {
url.Path += "/" + WebUtility.UrlEncode(fileName);
return url;
}

I ended up using Uri.EscapeDataString(foo) because suggested WebUtility.UrlEncode replaces space with '+' which I didn't want to.

C# Uri.EscapeDataString adds incorrect "%25" in the decoded string

I'm trying to UrlEncode a web address using Uri.EscapeDataString, but the result isn't correct. Here's an example:
string url = "https://mega.co.nz/#!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ";
string encodedUrl = Uri.EscapeDataString(url);
Expected result would be:
https%3a%2f%2fmega.co.nz%2f%23!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ
But the actual one is:
https%253a%252f%252fmega.co.nz%252f%2523%21GVZFwAbB%21NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ
As you can see, there's a bunch of extra %25s that don't belong there. Isn't %25 the encode for "%"? There are no %s in my original string... what's going on?
EDIT: I can't use the System.Web assembly for this project, so unfortunately I can't use the HttpUtility.UrlEncode() method for this.

Well, after searching around a bit more, it seems that this does the job, without relying on system web:
System.Net.WebUtility.UrlEncode(url);
The encoding is the correct one, without %25s.

Uri.EscapeDataString doesn't encode URL. Use HttpUtility.UrlEncode instead.
string url = "https://mega.co.nz/#!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ";
string encodedUrl = HttpUtility.UrlEncode(url);
Result is:
https%3a%2f%2fmega.co.nz%2f%23!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ

C# URL QueryString Trouble

I have a WP7 project where I am using the below code. It normally works ok, but I am getting a strange result with some particular strings being passed through.
Service = "3q%23L3t41tGfXQDTaZMbn%23w%3D%3D?f"
NavigationService.Navigate(new Uri("/Details.xaml?service=" + Service, UriKind.Relative));
Next Page:
NavigationContext.QueryString.TryGetValue("service", out Service1);
Service1 now = 3q#L3t41tGfXQDTaZMbn#w==?f
Why has the string changed?

The string hasn't changed, but you're looking at it in two different ways.
The way to encode 3q#L3t41tGfXQDTaZMbn#w==?f for as URI content is as 3q%23L3t41tGfXQDTaZMbn%23w%3D%3D?f. (Actually, it's 3q%23L3t41tGfXQDTaZMbn%23w%3D%3D%3Ff but you get away with the ? near the end not being properly escaped to %3F in this context).
Your means of writing the string, expects to receive it escaped.
Your means of reading the string, returns it unescaped.
Things are working pretty much perfectly, really.
When you need to write the string again, then just escape it again:
Service = Uri.EscapeDataString(Service1);

In your first code snippet the string is URL Encoded.
In the 2nd code snippet, the string is URL Decoded.
They are essentially the same strings, just with encoding applied/removed.
For example: urlencoding # you get %23
For further reading check out this wikipedia article on encoding.
Since HttpUtility isn't part of WP7 Silverlight stack, I'd recommend using Uri.EscapeUriString to escape any URI's that have not been escaped.

You should probably URL encode the string if you want it to pass through unscathed.

HttpUtility.ParseQueryString without decoding special characters

Uri uri = new Uri(redirectionUrl);
NameValueCollection col = HttpUtility.ParseQueryString(uri.Query)
uri.Query is already decoded - so is there any way I can prevent ParseQueryString decoding it again?
Apart from that - is there another method to retrieve a name value collection from a Uri without modifying any components?

Encoding the uri.Query before passing it to ParseQueryString is the first thing that comes to my head.
UPDATE
Just checked the ParseQueryString method with Reflector: it assumes that the query string is encoded and you can't do anything with it... Bummer. So I think you need to parse it manually (there are plenty of ready-to-use algorithms on the Web).
Alternatively you could encode your query string properly (taking into account variable names and all special characters) before passing it to ParseQueryString method.
-- Pavel

I have faced the same problem. The solution is adding the second parameter - the encoding. It seams that everything works if you set UTF8 encoding.
NameValueCollection col = HttpUtility.ParseQueryString(uri.Query, Encoding.UTF8)

How can I make this regex match correctly?

Given this regex:
^((https?|ftp):(\/{2}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}
|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1})
Reformatted for readability:
#"^((https?|ftp):(\/{2}))?" + // http://, https://, ftp:// - Protocol Optional
#"(" + // Begin URL payload format section
#"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" + // IPv4 Address support
#")|("+ // Delimit supported payload types
#"((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1}" + // FQDNs
#")"; // End URL payload format section
How can I make it fail (i.e. not match) on this "fail" test case?
http://www.google
As I am specifying {1} on the TLD section, I would think it would fail without the extension. Am I wrong?
Edit: These are my PASS conditions:
"http://www.zi255.com?Req=Post&PID=4",
"http://www.zi255.com?Req=Post&ID=4",
"http://www.zi255.com/?Req=Post&PID=4",
"http://www.zi255.com?Req=Post&PostID=4",
"http://www.zi255.com/?Req=Post&ID=4"
"http://www.zi255.com?Req=Post&Post=4",
"http://www.zi255.com?Req=Post&Entry=4",
"http://www.zi255.com?PID=4"
"http://www.zi255.com/Post.aspx?Req=Post&ID=4",
"http://www.zi255.com/Post.aspx?Req=Post&PID=4",
"http://www.zi255.com/Post.aspx?Req=Post&Post=4",
"http://www.zi255.com/Post.aspx?Req=Post&Title=Random%20Post%20Name"
"http://www.zi255.com/?Req=Post&Title=Random%20Post%20Name",
"http://www.zi255.com?Req=Post&Title=Random%20Post%20Name",
"http://www.zi255.com?Req=Post&PostID=4",
"http://www.zi255.com?Req=Post&Post=4",
"http://www.zi255.com?Req=Post&Entry=4",
"http://www.zi255.com?PID=4"
"http://www.zi255.com",
"http://www.damnednice.com"
These are my FAIL conditions:
"http://.com",
"http://.com/",
"http:/www.google.com",
"http:/www.google.com/",
"http://www.google",
"http://www.googlecom",
"http://www.google.c",
".com",
"https://www..."

I'll throw out an alternative suggestion. You may want to use a combination of the parsing of the built-in System.Uri class and a couple targeted regexes (or simple string checks when appropriate).
Example:
string uriString = "...";
Uri uri;
if (!Uri.TryCreate(uriString, UriKind.Absolute, out uri))
{
// Uri is totally invalid!
}
else
{
// validate the scheme
if (!uri.Scheme.Equals("http", StringComparison.OrdinalIgnoreCase))
{
// not http!
}
// validate the authority ('www.blah.com:1234' portion)
if (uri.Authority // ...)
{
}
// ...
}

Sometimes, one catch-all reqex is not the best solution, however tempting. While debugging this regex is feasible (see Greg Hewgills answer), consider doing a couple of tests for different categories of problems, e.g. one test for numerical addresses and one test for named addresses.

You need to force your regex to match up until the end of the string. Add a $ at the very end of it. Otherwise, your regex is probably just matching http://, or something else shorter than your whole string.

The "validate a url" problem has been solved* numerous times. I suggest you use the System.Uri class, it validates more cases than you can shake a stick at.
The code Uri uri = new Uri("http://whatever"); throws a UriFormatException if it fails validation. That is probably what you'd want.
*) Or kind of solved. It's actually pretty tricky to define what is a valid url.

Its all about definitions, a "valid url" should provide you with a IP address when you do a DNS Lookup. The IP should be connected to and when a request is send out, you get a reply in the form of a HTML information that you can use.
So what we are looking for is a "valid URL Format" and that is where the system.uri comes in very handy. BUT, if the URL is hidden in a large piece of tekst, you would first like to find something that validates as a valid URL-Format.
The thing that distinquishes a URL from any given readable tekst is the dot not followed by whitespace. "123.com" could validate as a real URL.
Using the regex
[a-z_\.\-0-9]+\.[a-z]+[^ ]*
to find any possible valid url in a text and then do a system.uri check to see if its a valid URL format and then do a lookup. Only when the lookup gives you a result then you know the URL is valid.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

UrlPathEncode() alternative - c#

Related

How to include number (hash) character # in path segment?

C# Uri.EscapeDataString adds incorrect "%25" in the decoded string

C# URL QueryString Trouble

HttpUtility.ParseQueryString without decoding special characters

How can I make this regex match correctly?

Categories

Resources