What is the correct encoding for querystrings? - c#

I am trying to send a request to an url like this "http://mysite.dk/tværs?test=æ" from an asp.net application, and I am having trouble getting the querystring to encode correctly. Or maybe the querystring is encoded correctly, the service I am connecting to just doesn't understand it correctly.
I have tried to send the request with different browsers and logging how they encode the request with Wireshark, and I get these results:
Firefox: http://mysite.dk/tv%C3%A6rs?test=%E6
Ie8: http://mysite.dk/tv%C3%A6rs?test=\xe6
Curl: http://mysite.dk/tv\xe6rs?test=\xe6
Both Firefox, IE and Curl receive the correct results from the service. Note that they encode the danish special character 'æ' differently in the querystring.
When I send the request from my asp.net application using HttpWebRequest, the URL gets encoded this way:
http://mysite.dk/tv%C3%A6rs?test=%C3%A6
It encodes the querystring the same way as the path part of the url. The remote service does not understand this encoding, so I don't get a correct answer.
For the record, 'æ' (U+00E6) is %E6 in ISO-LATIN-1, and %C3%A6 in UTF-8.
I could change the remote service to accept the UTF-8 encoded querystring, but then the service would stop working in browsers and I am not really interested in that. Is there a way to specify to .NET that it shouldn't encode querystrings with UTF-8?
I am creating the webrequest like this:
var req = WebRequest.Create("http://mysite.dk/tværs?test=æ") as HttpWebRequest;
But the problem seems to originate from System.Uri which is apparently used inside WebRequest.Create:
var uri = new Uri("http://mysite.dk/tværs?test=æ");
// now uri.AbsolutePath == "http://mysite.dk/tv%C3%A6rs?test=%C3%A6"

It looks like you're applying UrlEncode over the entire URL - this isn't correct, paths and query strings are encoded differently as you've seen. What is doing the encoding of the URI, WebRequest?
You could manually build the various parts using a UriBuilder, or manually encode using UrlPathEncode for the path and UrlEncode for the query string names and values.
Edit:
If the problem lies in the path, rather than the query string you could try turning on IRI support, via web.config
<configuration>
<uri>
<iriParsing enabled="true" />
</uri>
</configuration>
That should then leave international characters alone in the path.

Have you tried the UrlEncode?
http://msdn.microsoft.com/en-us/library/zttxte6w.aspx

I ended up changing my remote webservice to expect the querystring to be UTF-8 encoded. It solves my immediate problem, the webservice can not be correctly called by both PHP and the .NET framework.
However, the behavior is now strange in browsers. Copy pasting an url like "http://mysite.dk/tv%C3%A6rs?test=%C3%A6" into the browser and then pressing return works, it even corrects the encoded characters and displays the location as "http://mysite.dk/tværs?test=æ". If then reload the page (F5) it still works. But if I click on the location bar and press return again, the querystring will become encoded with latin-1 and fail.
For anyone interested here is an old Firefox bugreport about the problem: https://bugzilla.mozilla.org/show_bug.cgi?id=284474 (thanks to #dtb)
So, it seems there is no good solution.
Thanks to everyone who helped though!

Related

Setting POST data's encoding on WPF WebBrowser

Backgrounds
I've been using my own site to send a POST request to another site that requires a specific sender url, which is my site's url. And now I'm trying to send the automated POST request with my WPF application. I can't use httpclient's sendAsync function because that way I can't include valid site url for header. Therefore I'm trying to send the request from WebBrowser by loading the page and filling the input values, and then submitting the form. And here's my problem.
Problem
This target server that receives POST request requires a specific encoding, something other than UTF-8. And my site is configured to fit that encoding and works fine on the actual web browsers like chrome or IE. But when I send the exact same request from the WPF WebBrowser by loading the same page, somehow the encoding goes wrong. I'd like my WPF WebBrowser to send the POST request under specific encoding, but I can't find the way to do so. I'd be graceful for any advice. Thank you in advance.
What I've tried
I've inserted <meta encoding="~">to my html, also set <form charset="~"> on the form. But it still sends the invalid encoding request.
I've looked into my request's broken text and ran it through decoder which restores broken text into original one, so I could be confident that my request in WPF was encoded with UTF-8. Then I could find a WebBrowser.Document.encoding option. My defaultEncoding was set to UTF-8 and it caused all the problems. By fixing it to document.encoding = "myEncoding";, my problem was solved.

ASP.Net Core Returning 404 when image url spaces encoded as +

I have URLs stored in a database where the spaces are encoded as +.
When the browser requests these urls the web server returns a 404 response.
These URLs are all for static images stored on the web server in the wwwroot folder.
If I manually change the + for %20 then the image is returned correctly.
Is this a deliberate change in ASP.Net Core or is this a bug?
If it's deliberate, then it's going to be very painful for me going through the database and re-encoding all the URLs, many of which are embedded in HTML snippets (I know storing HTML in the DB or having spaces in image files aren't a good idea but it was done long before I joined the company and that's the state we're already in).
I'm using ASP.Net 2.1, running on .Net Framework.
It's running through IIS Express at the moment (during development) but will be deployed with full IIS.
I have seen this other question but it's specifically to do with API calls and the answer doesn't seem to be applicable to my question as there are no routes to change as I'm requesting static image files.
Edit: Extra detail
The html is output using #Html.Raw(html)
The resulting html output to the browser is of the form
<img src="/BorderThemes/grey+4px+rounded+corners_TL.png" />
The Html was generated on the server and then stored in the DB so we can be confident it's safe to output to the browser and, no, I have no idea why anyone would do that rather than building the HTML when it's needed but it was before my time and it's the situation I'm already in.
Update:
I've looked deeper into this and if I enter http://localhost:8000/BorderThemes/grey+4px+rounded+corners_TL.png into my web browser I get a page from IIS saying Http Error 404.11 saying that my URL is double encoded and linking to here for more information. This does include instructions on how to allow double-encoding but with warnings that it can have security consequences.
If I enter the url http://localhost:8000/BorderThemes/grey%204px%20rounded%20corners_TL.png I get an image back.
I was having issues with paths / html stored in the DB but after experimenting, it appears that System.Net.WebUtility.UrlEncode encodes spaces as +. For example WebUtility.UrlEncode("foo bar.png") returns foo+bar.png, which is rejected as double-encoded by IIS.
Am I missing something or is Microsoft's function for encoding URLs encoding the URLs in a way that Microsoft's web server rejects?
If you want %20 instead of + tryusing EscapeDataString to encode URI :
Uri.EscapeDataString(someString);
Refer https://stackoverflow.com/a/50682381/704008
But you have already generated url & can'e do anything now so try using HtmlDecode like
System.Net.WebUtility.HtmlDecode.HtmlDecode(html);
I am not sure it best to use with Raw or some method exists like decode in #Html but try using :
#Html.Raw(System.Net.WebUtility.HtmlDecode(html))
Refer:
https://learn.microsoft.com/en-us/dotnet/api/system.net.webutility?view=netstandard-2.0
https://learn.microsoft.com/en-us/dotnet/api/system.net.webutility.urldecode?view=netstandard-2.0#System_Net_WebUtility_UrlDecode_System_String_

Browser converts encoded slash (%2F) to literal slash (/) in path portion of URL

I'm currently working with an email confirmation after registration using ASP.NET Identity.
This library provides a token generation which is needed to complete the registration. This token is used in our application in the following path:
https://localhost/#/account/{token}/setup
And the token is generated by invoking:
var emailToken = _userManager.GenerateEmailConfirmationToken(newUser.Id);
Once I have my token generated, I add it in the path by doing a string.Format this way:
string.Format("https://localhost/#/account/{0}/setup", HttpUtility.UrlEncode(emailToken));
The result looks like this:
https://localhost/#/account/AQAAANCMnd8BFdERjHoAwE%2fCl%2bsBAAAA6gbQhGTTMUWVHDgOwC9T9AAAAAACAAAAAAAQZgAAAAEAACAAAAAqo%2fiAv8iIn7Zox9pS3MOUMVNisAo7Bnada6%2f9wKEe6wAAAAAOgAAAAAIAACAAAABUu7WkD9vHvN2EDz2%2bqGwvJ4j6gj%2f4PaBTbI861jfEcWAAAADJV74LZjKAXv5v1FqYVuWLyTpPBCnLfopSi3rsEEwMHFKwltHL3moL2h%2fvYVs%2fu3LB%2br5Qytuu%2fZYOUWQTY5KzBqHeZoi7RJ02emDI0NTRhIKxfSGGIdbYxuAjsW14G0BAAAAACsC8L%2bdUDzFMgKUOkxWhKofAz8L0mH5VFEt8Oq%2fKYsxIiu4fiA2sGlPfDhhKQnV2lg%2ba8qHydUjqmyfxNex0Pg%3d%3d/setup
but when I open this url in the browser I get:
...and so on!
What I see is that the url is encoded correctly in the body of the email, but is decoded when I open it in the browser by replacing the encoded "%2f" to "/". This leads to an invalid route in my application being that I expect the "/" to be a separator between different resources.
Any thoughts of this behaviour?
References:
Another guy with my problem too
It's probably decoding it because it considers it part of the path.
I would suggest you explicitly treat it as a parameter. That will tell the browser not to decode it. For instance, instead of having this path:
https://localhost/#/account/AQAAANCMnd8BFdERjHoAwE%2fCl%2bsBAAAA6gbQh..........
Use this path:
https://localhost/#/account/?t=AQAAANCMnd8BFdERjHoAwE%2fCl%2bsBAAAA6gbQh......
Notice the addition of ?t= after the end of the account path.
Then consume the t parameter in your application. That will tell the browser that the value at the end is not to be decoded as part of the path but rather preserved in encoded form because it's a parameter.
This would obviously change the path you have (because of the setup part) so adjust accordingly.

WebAPI get request parameters being UrlDecoded at the controller

I have an issue where I'm issuing a GET to a WebAPI controller, essentially:
$.getJSON('/api/feefo/getproductfeedback?id='+ encodeURIComponent(skuNum))
I'm using encodeURIComponent to url encode the skuNum parameter, viewing a request in dev tools I get the expected result for a skuNum that needs to be encoded:
The skuNum has gone from 1000EF+ to 1000EF%2B as expected.
However, when I view the id parameter in the WebAPI controller, it's coming through un-encoded:
It's as though the client side url encoding is being undone somehow, can anyone explain what's going on here? Obviously I can work around this by just doing the encoding in the controller, but I'd like to understand why this is happening.
That is by design. The API framework will decode the URL encoded parameters by default. the encoding should only be used for transporting the data. once on server developer shouldn't have to deal with having to decode it (cross cutting concern). Use the value as intended.

Passing network path in URL

I am creating a WCF Service with a method
[OperationContract]
[WebGet(UriTemplate = "acl/f={fullFileName}")]
string GetACL(string fullFileName);
fullFileName is a full path to a network file, or a file on the host.
The host is a Windows Service with webHttpBinding and behavior configuration.
I want to call this from a browser using something like
http://localhost/webservice/acl/f=[my network path here]
I have tried .../acl/f=file://\server\share\file.ext
.../acl/f=file://c:\file.ext
In the browser I receive "Endpoint not found".
I know this works because I can call .../acl/f=file.txt and I get back the proper response from my service indicating that the file was not found. So the method is getting called correctly when I don't use slashes of anysort in the URI.
Any thoughts on this will be greatly appreciated.
Thanks,
beezlerco at hotmail...
You need to encode the slashes, colons, and technically the periods as well.
\ should be %5C
/ should be %2F
. should be %2E
: should be %3A
for most other special characters see http://www.asciitable.com/ and use '%' plus the hex column on that table.
I believe HttpUtility.UrlEncode is what you are looking for.
(For a detailed description, see Using HttpUtility.UrlEncode to Encode your QueryStrings)

Categories