URL encoding issue - HttpWebRequest doesn't work, Firefox does

URL encoding issue - HttpWebRequest doesn't work, Firefox does - c#

I'm trying to call a rest webservice provided by a lims system (basically a chemistry lab database + interface). It was working great until some > ascii characters showed up (specifically characters with circumflexes, umlauts, etc.)
When calling the webservice passing the value àèïõû I have the following argument:
&componentValue=àèïõû
HttpWebRequest, without any pre-escaping OR with Uri.EscapeDataString() called on the value gives:
Ã Ã¨Ã¯ÃµÃ»
Firefox, with the same website as was passed to HttpWebRequest gives the correct value:
àèïõû
Now for the escaping itself:
Uri.EscapeDataString() appears to escape "àèïõû" as:
%C3%A0%C3%A8%C3%AF%C3%B5%C3%BB
Firefox escapes "àèïõû" as:
%E0%E8%EF%F5%FB
As the latter works I would of course prefer to use that as my escape method, but I really don't know where to begin. I've found plenty of information on different methods of handling encodes on the response data, but not on the request.

From MSDN:
Uri.EscapeDataString Method
[...] All Unicode characters are converted to UTF-8 format before being escaped.
So what you're seeing is the UTF-8 encoded version of àèïõû.
Unlike Uri.EscapeDataString, HttpUtility.UrlEncode allows you to specify an encoding explicitly:
HttpUtility.UrlEncode("àèïõû", Encoding.GetEncoding("latin1"));
Alternatively, you could write your own version; for example:
string.Concat(Encoding
.GetEncoding("latin1")
.GetBytes("àèïõû")
.Select(x => "%" + x.ToString("x2"))
.ToArray());
Both result in "%e0%e8%ef%f5%fb".
A better solution would probably be to accept UTF-8 encoded query strings in the webservice.

It appears that Uri.HexEscape() will do what you want, but only one character at a time. I'd roll your own escaping function and hope that your codepage is always the same codepage that the webservice is using, since it appears that the webservice doesn't support Unicode.

Related

How do I correctly parse a URI query string into a name-value collection in C#?

I'm using .NET 4.5 and I'm trying to parse a URI query string into a NameValueCollection. The right way seems to be to use HttpUtility.ParseQueryString(string query) which takes the string obtained from Uri.Queryand returns a NameValueCollection. Uri.Query returns a string that is escaped according to RFC 2396, and HttpUtility.ParseQueryString(string query) expects a string that is URL-encoded. Assuming RFC 2396 and URL-encoding are the same thing, this should work fine.
However, the documentation for ParseQueryString claims that it "uses UTF8 format to parse the query string". There is also an overloaded method which takes a System.Text.Encoding and then uses that instead of UTF8.
My question is: what does it mean to use UTF8 as the encoding? The input is a string, which by definition (in C#) is UTF-16. How is that interpreted as UTF-8? What is the difference between using UTF8 and UTF16 as the encoding in this case? My concern is that since I'm accepting arbitrary user input, there might be some security risk if I botch the encoding (i.e. the user might be able to slip through some script exploit).
There is a previous question on this topic (How to parse a query string into a NameValueCollection in .NET) but it doesn't specifically adress the encoding problem.

When parsing encoded values, it treats those values as UTF-8. Take the character ¢, for example. The UTF-8 encoding is C2 A2. So if it were in a query string, it would be encoded as %C2%A2.
Now, when ParseQueryString is decoding, it needs to know what encoding to use. The default is UTF-8, meaning that the character would be decoded correctly. But perhaps the user was using Microsoft's Cyrillic code page (Windows-1251), where C2 and A2 are two different characters. In that case, interpreting it as UTF-8 would be an error.
If this is a user interface application (i.e. the user is entering data directly), then you probably want to use whatever encoding is defined for the current UI culture. If you're getting this information from Web pages, then you'll want to use whatever encoding the page uses. And if you're writing a Web service then you can tell the users that their input has to be UTF-8 encoded.

UrlEncoding - which encoding should I use?

Using HttpUtility.UrlEncode and passing via the URL the receiving page sees the variables as:
brand new -> brand+new
Airconaire+Ltd -> Airconaire+Ltd
Can you see how the first and the second both have a + in them where they didn't at the start? I'm assuming this is something to do with the encoding (specifically RFC3986 or RFC2396) but how do I solve this?
I think ideally the spaces should be converted to %20 but is this the best way forward?

Try using HttpUtility.UrlPathEncode rather than URLEncode.

The UrlEncode() method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.

First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?

If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

Double/incomplete Parameter Url Encoding

In my web app, my parameters can contain all sorts of crazy characters (russian chars, slashes, spaces etc) and can therefor not always be represented as-is in a URL.
Sending them on their merry way will work in about 50% of the cases. Some things like spaces are already encoded somewhere (I'm guessing in the Html.BuildUrlFromExpression does). Other things though (like "/" and "*") are not.
Now I don't know what to do anymore because if I encode them myself, my encoding will get partially encoded again and end up wrong. If I don't encode them, some characters will not get through.
What I did is manually .replace() the characters I had problems with.
This is off course not a good idea.
Ideas?
--Edit--
I know there are a multitude of encoding/decoding libraries at my disposal.
It just looks like the mvc framework is already trying to do it for me, but not completely.
<a href="<%=Html.BuildUrlFromExpression<SearchController>(c=>c.Search("", 1, "a \v/&irdStr*ng"))%>" title="my hat's awesome!">
will render me
<a href="/Search.mvc/en/Search/1/a%20%5Cv/&irdStr*ng" title="my hat's awesome!">
Notice how the forward slash, asterisk and ampersand are not escaped.
Why are some escaped and others not? How can I now escape this properly?
Am I doing something wrong or is it the framework?

Parameters should be escaped using Uri.EscapeDataString:
string url = string.Format("http://www.foo.bar/page?name={0}&address={1}",
Uri.EscapeDataString("adlknad /?? lkm#"),
Uri.EscapeDataString(" qeio103 8182"));
Console.WriteLine(url);
Uri uri = new Uri(url);
string[] options = uri.Query.Split('?','&');
foreach (string option in options)
{
string[] parts = option.Split('=');
if (parts.Length == 2)
{
Console.WriteLine("{0} = {1}",parts[0],
Uri.UnescapeDataString(parts[1]));
}
}

AS others have mentioned, if you encode your string first you aviod the issue.
The MVC Framework is encoding characters that it knows it needs to encode, but leaving those that are valid URL characters (e.g. & % ? * /). This is because these are valid URL characters, although they are special chracters in a URL that might not acheive the result you are after.

Try using the Microsoft Anti-Cross Site Scripting library. It contains several Encode methods, which encode all the characters (including #, and characters in other languages). As for decoding, the browser should handle the encoded Url just fine, however if you need to manually decode the Url, use Uri.UnescapeDataString
Hope that helps.

Escaping of forward slahes and dots in path part of url is prohibited by security reason (althrough, it works in mono).

Html.BuildUrlFromExpression needs to be fixed then, would submit this upstream to the MVC project... alternatively do the encoding to the string before passing to BuildUrlFromExpression, and decode it when it comes back out on the other side.
It may not be readily fixable, as IIS may be handling the decoding of the url string beforehand... may need to do some more advanced encoding/decoding for alternative path characters in the utility methods, and decode on your behalf coming out.

I've seen similar posts on this. Too me, it looks like a flaw in MVC. The function would be more appropriately named "BuildUrlFromEncodedExpression". Whats worse, is that the called function needs to decode its input parameters. Yuk.
If there is any overlap between the characters encoded BuildUrlFromExpression() and the characters encoded by the caller (who, I think might fairly just encode any non-alphanumeric for simplicities sake) then you have potential for nasty bugs.

Server.URLEncode or HttpServerUtility.UrlEncode
I see what you're saying now - I didn't realize the question was specific to MVC. Looks like a limitation of that part of the MVC framework - particularly BuildUrlFromExpression is doing some URL encoding, but it knows that also needs some of those punctation as part of the framework URLs.
And also unfortunately, URLEncoding doesn't produce an invariant, i.e.
URLEncode(x) != URLEncode(URLEncode(x))
Wouldn't that be nice. Then you could pre-encode your variables and they wouldn't be double encoded.
There's probably an ASP.NET MVC framework best practice for this. I guess another thing you could do is encode into base64 or something that is URLEncode-invariant.

Have you tried using the Server.UrlEncode() method to do the encoding, and the Server.UrlDecode() method to decode?
I have not had any issues with using it for passing items.

QueryString malformed after URLDecode

I'm trying to pass in a Base64 string into a C#.Net web application via the QueryString. When the string arrives the "+" (plus) sign is being replaced by a space. It appears that the automatic URLDecode process is doing this. I have no control over what is being passed via the QueryString. Is there any way to handle this server side?
Example:
http://localhost:3399/Base64.aspx?VLTrap=VkxUcmFwIHNldCB0byAiRkRTQT8+PE0iIHBsdXMgb3IgbWludXMgNSBwZXJjZW50Lg==
Produces:
VkxUcmFwIHNldCB0byAiRkRTQT8 PE0iIHBsdXMgb3IgbWludXMgNSBwZXJjZW50Lg==
People have suggested URLEncoding the querystring:
System.Web.HttpUtility.UrlEncode(yourString)
I can't do that as I have no control over the calling routine (which is working fine with other languages).
There was also the suggestion of replacing spaces with a plus sign:
Request.QueryString["VLTrap"].Replace(" ", "+");
I had though of this but my concern with it, and I should have mentioned this to start, is that I don't know what other characters might be malformed in addition to the plus sign.
My main goal is to intercept the QueryString before it is run through the decoder.
To this end I tried looking at Request.QueryString.toString() but this contained the same malformed information. Is there any way to look at the raw QueryString before it is URLDecoded?
After further testing it appears that .Net expects everything coming in from the QuerString to be URL encoded but the browser does not automatically URL encode GET requests.

The suggested solution:
Request.QueryString["VLTrap"].Replace(" ", "+");
Should work just fine. As for your concern:
I had though of this but my concern with it, and I should have mentioned this to start, is that I don't know what other characters might be malformed in addition to the plus sign.
This is easy to alleviate by reading about base64. The only non alphanumeric characters that are legal in modern base64 are "/", "+" and "=" (which is only used for padding).
Of those, "+" is the only one that has special meaning as an escaped representation in URLs. While the other two have special meaning in URLs (path delimiter and query string separator), they shouldn't pose a problem.
So I think you should be OK.

You could manually replace the value (argument.Replace(' ', '+')) or consult the HttpRequest.ServerVariables["QUERY_STRING"] (even better the HttpRequest.Url.Query) and parse it yourself.
You should however try to solve the problem where the URL is given; a plus sign needs to get encoded as "%2B" in the URL because a plus otherwise represents a space.
If you don't control the inbound URLs, the first option would be preferred as you avoid the most errors this way.

I'm having this exact same issue except I have control over my URL. Even with Server.URLDecode and Server.URLEncode it doesn't convert it back to a + sign, even though my query string looks as follows:
http://localhost/childapp/default.aspx?TokenID=0XU%2fKUTLau%2bnSWR7%2b5Z7DbZrhKZMyeqStyTPonw1OdI%3d
When I perform the following.
string tokenID = Server.UrlDecode(Request.QueryString["TokenID"]);
it still does not convert the %2b back into a + sign. Instead I have to do the following:
string tokenID = Server.UrlDecode(Request.QueryString["TokenID"]);
tokenID = tokenID.Replace(" ", "+");
Then it works correctly. Really odd.

I had similar problem with a parameter that contains Base64 value and when it comes with '+'.
Only Request.QueryString["VLTrap"].Replace(" ", "+"); worked fine for me;
no UrlEncode or other encoding helping because even if you show encoded link on page yourself with '+' encoded as a '%2b' then it's browser that changes it to '+' at first when it showen and when you click it then browser changes it to empty space. So no way to control it as original poster says even if you show links yourself. The same thing with such links even in html emails.

If you use System.Uri.UnescapeDataString(yourString) it will ignore the +. This method should only be used in cases like yours where when the string was encoded using some sort of legacy approach either on the client or server.
See this blog post:
http://blogs.msdn.com/b/yangxind/archive/2006/11/09/don-t-use-net-system-uri-unescapedatastring-in-url-decoding.aspx

If you URLEncode the string before adding it to the URL you will not have any of those problems (the automatic URLDecode will return it to the original state).

Well, obviously you should have the Base64 string URLEncoded before sending it to the server.
If you cannot accomplish that, I would suggest simply replacing any embedded spaces back to +; since b64 strings are not suposed to have spaces, its a legitimate tactic...

System.Web.HttpUtility.UrlEncode(yourString) will do the trick.

As a quick hack you could replace space with plus character before base64-decoding.

I am by no means a C# developer but it looks like you need to url ENCODE your Base64 string before sending it as a url.

Can't you just assume a space is a + and replace it?
Request.QueryString["VLTrap"].Replace(" ", "+");
;)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.