UrlEncoding - which encoding should I use? - c#

Using HttpUtility.UrlEncode and passing via the URL the receiving page sees the variables as:
brand new -> brand+new
Airconaire+Ltd -> Airconaire+Ltd
Can you see how the first and the second both have a + in them where they didn't at the start? I'm assuming this is something to do with the encoding (specifically RFC3986 or RFC2396) but how do I solve this?
I think ideally the spaces should be converted to %20 but is this the best way forward?

Try using HttpUtility.UrlPathEncode rather than URLEncode.

The UrlEncode() method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx

Related

C# - Decoding a string does not return the original encoded one

I have a random generated string that I need to put it in a URL, so I encode it like this:
var encodedToken = System.Web.HttpUtility.UrlEncode(token, System.Text.Encoding.UTF8);
In an ASP.NET action method, I receive this token and decode it:
var token = System.Web.HttpUtility.UrlDecode(encodedToken, System.Text.Encoding.UTF8);
but these tokens are not the same. For example the ab+cd string would encode to ab%2bcd and decoding the result would give me the ab cd string (the plus character changed to whitespace).
So far I have only noticed the + character problem, there may be others.
How can I solve this issue?
In your context, it appears that you don't need to call UrlDecode (since %2b decodes to + and + decodes to a blank space - i.e. you have double decoded).
Given, the framework appears to have already decoded it for you, you may remove your use of UrlDecode.
According to the Microsoft documentation:
You can encode a URL using with the UrlEncode method or the UrlPathEncode method. However, the methods return different results. The UrlEncode method converts each space character to a plus character (+). The UrlPathEncode method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.urlencode?view=netframework-4.7.2

How to include number (hash) character # in path segment?

I have to download a file (using existing Flurl-Http endpoints [1]) whose name contains a "#" which of course has to be escaped to %23 to not conflict with uri-fragment detection.
But Flurl always escapes the rest but not this character, resulting in a non working uri where half of the path and all query params are missing because they got parsed as uri-fragment:
Url url = "http://server/api";
url.AppendPathSegment("item #123.txt");
Console.WriteLine(url.ToString());
Returns: http://server/api/item%20#123.txt
This means a http request (using Flurl.Http) would only try to download the non-existing resource http://server/api/item%20.
Even when I pre-escape the segment, the result still becomes exactly the same:
url.AppendPathSegment("item %23123.txt");
Console.WriteLine(url.ToString());
Again returns: http://server/api/item%20#123.txt.
Any way to stop this "magic" happen?
[1] This means I have delegates/interfaces where input is an existing Flurl.Url instance which I have to modify.
It looks like you've uncovered a bug. Here are the documented encoding rules Flurl follows:
Query string values are fully URL-encoded.
For path segments, reserved characters such as / and % are not encoded.
For path segments, illegal characters such as spaces are encoded.
For path segments, the ? character is encoded, since query strings get special treatment.
According to the 2nd point, it shouldn't encode # in the path, so how it handles AppendPathSegment("item #123.txt") is correct. However, when you encode the # to %23 yourself, Flurl certainly shouldn't unencode it. But I've confirmed that's what's happening. I invite you to create an issue on GitHub and it'll be addressed.
In the mean time, you could write your own extension method to cover this case. Something like this should work (and you wouldn't even need to pre-encode #):
public static Url AppendFileName(this Url url, string fileName) {
url.Path += "/" + WebUtility.UrlEncode(fileName);
return url;
}
I ended up using Uri.EscapeDataString(foo) because suggested WebUtility.UrlEncode replaces space with '+' which I didn't want to.

Encoding JSON serialized for URL

I have to create a webrequest using HTTP GET and I have to add on the url a JSON serialized and encoded with all parameters.
But I have a problem when I'm encoding the serialized object, they encode also symbols like "{" and ":"
I would like to know what I have to do in order to encode the serialized object like the followed above:
Serialized Object:
{\"Name\":\"Bob\"}
Encoded With HttpUtility.Utility, or another encoder will encode all symbols:like "{" ":"
"%7b%22Name%22%3a%22Bob%22%2
What I'm looking for:
http://tmpserviceURL.test?parameters={%20%22Name%22:%20%22Bob%22}
#Ant P is correct: you want those characters to be encoded. It is a bad idea not to encode them.
HttpUtility.UrlEncode and other similar methods encode {, } and : because they MUST do so per section 2.2 of the Uniform Resource Locators specification (RFC 1738).
From page 2:
Octets [within a URL] must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.
The spec goes on to define : as being in the set of "reserved" characters (those that have special meaning within a URL), and defines { and } as being in the set of "unsafe" characters (those that are known to be sometimes modified by gateways and other transport agents).
So, in short, if you send these characters unencoded in a URL, then you risk the URL not being interpretted correctly, or the data being corrupted by the time it reaches its destination. It may work sometimes, but you can't rely on it always working.
If you really feel you must ignore the URL spec, then you will have to roll your own URL encoder that does not encode those particular characters. I doubt you're going to find an off-the-shelf encoder that allows you to do this.

Replacing illegal or undesired characters in requested URL before processing it .net

My site throws an exception every time a special kind of character is included in the request, or when the size of the URL exceeds a certain length.
How can I control the URL and transform it before processing it (For example : if the request was http://xwz.com/"ert I want to turn it into http://xwz.com/ert). Something like that.
I am using .net and c#
use this : HttpServerUtility.UrlEncode Method (String)
You can use it like this :
System.Web.HttpUtility.UrlEncode("test t");
You will need this library : UrlEncode usesSystem.Web.HttpUtility.UrlEncodeto encode strings.
Looking for HttpUtility.UrlEncode
The UrlEncode(String) method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream without encoding, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
The code below will replace any invalid characters in your URL by an empty space
string url = System.Text.RegularExpressions.Regex.Replace(url , #"/^[!#$&-;=?-[]_a-z~]+$/", "");
I think this is what you're looking for:
System.Web.HttpUtility.UrlEncode(string url)

QueryString converting %E1 to %ufffd

I have a URL like the following
http://mysite.com/default.aspx?q=%E1
Where %E1 is supposed to be á. When I call Request.QueryString from my C# page I receive
http://mysite.com/default.aspx?q=%ufffd
It does this for any accented character. %E1, %E3, %E9, %ED etc. all get passed as %ufffd. Normal encoded values (%2D, %2E, %27) all get passed correctly.
The config file already has the responseEncoding/requestEncoding in the globalization section set to UTF-8.
How could I read the correct values?
Please note that I'm not the one generating the query string and I have no control over it.
While it's true that á is encoded as U+00E1, the UTF-8 encoding (which is relevant for URL parameters) is 0xC3 0xA1.
You can verify by called a Wikipedia entry on an accented letter, such as http://en.wikipedia.org/wiki/%C3%81
U+FFFD is the Unicode Replacement Character which indicates the a given character value cannot be correctly encoded in Unicode.
Update:
Your question has two points.
First: How do I encode a Unicode string as parameter. Use
"?q=" + HttpUtility.UrlEncode(value)
Second: How do I retrieve a Unicode value? Use:
Request["q"]
If you receive the %E1 from some other source you do not control, maybe the RawUrl can help you. (I have not tried)

Categories