QueryString converting %E1 to %ufffd

QueryString converting %E1 to %ufffd - c#

I have a URL like the following
http://mysite.com/default.aspx?q=%E1
Where %E1 is supposed to be á. When I call Request.QueryString from my C# page I receive
http://mysite.com/default.aspx?q=%ufffd
It does this for any accented character. %E1, %E3, %E9, %ED etc. all get passed as %ufffd. Normal encoded values (%2D, %2E, %27) all get passed correctly.
The config file already has the responseEncoding/requestEncoding in the globalization section set to UTF-8.
How could I read the correct values?
Please note that I'm not the one generating the query string and I have no control over it.

While it's true that á is encoded as U+00E1, the UTF-8 encoding (which is relevant for URL parameters) is 0xC3 0xA1.
You can verify by called a Wikipedia entry on an accented letter, such as http://en.wikipedia.org/wiki/%C3%81
U+FFFD is the Unicode Replacement Character which indicates the a given character value cannot be correctly encoded in Unicode.
Update:
Your question has two points.
First: How do I encode a Unicode string as parameter. Use
"?q=" + HttpUtility.UrlEncode(value)
Second: How do I retrieve a Unicode value? Use:
Request["q"]
If you receive the %E1 from some other source you do not control, maybe the RawUrl can help you. (I have not tried)

Related

How do I correctly parse a URI query string into a name-value collection in C#?

I'm using .NET 4.5 and I'm trying to parse a URI query string into a NameValueCollection. The right way seems to be to use HttpUtility.ParseQueryString(string query) which takes the string obtained from Uri.Queryand returns a NameValueCollection. Uri.Query returns a string that is escaped according to RFC 2396, and HttpUtility.ParseQueryString(string query) expects a string that is URL-encoded. Assuming RFC 2396 and URL-encoding are the same thing, this should work fine.
However, the documentation for ParseQueryString claims that it "uses UTF8 format to parse the query string". There is also an overloaded method which takes a System.Text.Encoding and then uses that instead of UTF8.
My question is: what does it mean to use UTF8 as the encoding? The input is a string, which by definition (in C#) is UTF-16. How is that interpreted as UTF-8? What is the difference between using UTF8 and UTF16 as the encoding in this case? My concern is that since I'm accepting arbitrary user input, there might be some security risk if I botch the encoding (i.e. the user might be able to slip through some script exploit).
There is a previous question on this topic (How to parse a query string into a NameValueCollection in .NET) but it doesn't specifically adress the encoding problem.

When parsing encoded values, it treats those values as UTF-8. Take the character ¢, for example. The UTF-8 encoding is C2 A2. So if it were in a query string, it would be encoded as %C2%A2.
Now, when ParseQueryString is decoding, it needs to know what encoding to use. The default is UTF-8, meaning that the character would be decoded correctly. But perhaps the user was using Microsoft's Cyrillic code page (Windows-1251), where C2 and A2 are two different characters. In that case, interpreting it as UTF-8 would be an error.
If this is a user interface application (i.e. the user is entering data directly), then you probably want to use whatever encoding is defined for the current UI culture. If you're getting this information from Web pages, then you'll want to use whatever encoding the page uses. And if you're writing a Web service then you can tell the users that their input has to be UTF-8 encoded.

Decoding Base64 / Quoted Printable encoded UTF8 string

In my ASP.Net application working process, I need to do some work with string, which equals something like
=?utf-8?B?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=
How can I decode it to normal human language?
Thanks in advance!
Update:
Convert.FromBase64String() does not work for string, which equals
=?UTF-8?Q?Bestellbest=C3=A4tigung?=
I get The format of s is invalid. s contains a non-base-64 character, more than two padding characters, or a non-white space-character among the padding characters. exception.
Update:
Solution Here
Alternative solution
Update:
What kind of string encoding is that: Nweiß ???

It's actually a base-64 string:
string zz = "SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=";
byte[] dd = Convert.FromBase64String(zz);
// Returns Ihre Bestellung - Versandbestätigung - 1105891246
string yy = System.Text.Encoding.UTF8.GetString(dd);

I've written a library that will decode these sorts of strings. You can find it at http://github.com/jstedfast/MimeKit
Specifically, take a look at MimeKit.Utils.Rfc2047.DecodeText()

This seems to be MIME Header Encoding. The Q in your second example indicates that it is Quoted Printable.
This question seems to cover the variants fairly well. In a quick search I didn't find any .NET libraries to decode this automatically, but it shouldn't be hard to do manually if you need to.

That's not UTF8. Thats a Base64 encoded string.
the UTF-8 only indicates that the target string is in UTF8 format.
After decoding the Base64 string:
SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=
You'll get the following result:
Ihre Bestellung - Versandbestätigung - 1105891246
See Base64 online decode/encode

Looks like a base64 string.
Try Convert.FromBase64String
http://msdn.microsoft.com/en-us/library/system.convert.frombase64string.aspx

This is an encoded word, which is used in email headers when there is non-ASCII content. Encoded words are defined in RFC 2047:
https://www.rfc-editor.org/rfc/rfc2047#section-2
The BNF for an encoded word is:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
So the correct way to interpret this is:
The data is the stuff between the 3rd and 4th question marks
It has been Base64 encoded (the 'B' stands for Base64; if it were a
'Q' then it would be quoted-printable).
Once you decode the
data, it will be in the UTF-8 character set.
The result, as #Shai correctly pointed out, is:
Ihre Bestellung - Versandbestätigung - 1105891246
This is German. The umlaut is obviously the reason for the UTF-8 and thus the need for an encoded word. The translation is:
Your order - Delivery confirmation - 1105891246
Apparently it's a tracking number for an order.
All modern email clients (and Outlook) transparently support encoded words.

This is a bit of guesswork, but let's try
remove =? from start and ?= from end
keep the start up to the next ? as the character set
Remove the B? - don't know, what it is
Convert the rest to a byte[] via System.Convert.FromBase64String()
Convert this to the final String via Encoding.GetSTring() using the character set remembered in the second step

Unescape an escaped url in c#

I have urls which is escaped in this form:
http://www.someurl.com/profile.php?mode=register&agreed=true
I want to convert it to unescaped form
http://www.someurl.com/profile.php?mode=register&agreed=true
is this the same thing as escapped html?
how do i do this?
thanks

& is an HTML entity and is used when text is encoded into HTML because you have to "escape" the & that has a special meaning in HTML. Apparently, this escaping mechanism was used on the URL presumably because it is used in some HTML for instance in a link. I'm not sure why you want to decode it because the browser will do the proper decoding when the link is clicked. But anyway, to revert it you can use HttpUtility.HtmlDecode in the System.Web namespace:
var encoded = "http://www.someurl.com/profile.php?mode=register&agreed=true";
var decoded = HttpUtility.HtmlDecode(encoded);
The value of decoded is:
http://www.someurl.com/profile.php?mode=register&agreed=true
Another form of encoding/decoding used is URL encoding. This is used to be able to include special characters in parts of the URL. For instance the characters /, ? and & have a special meaning in a URL. If you need to include any of these characters in a say a query parameter you will have to URL encode the parameter to not mess up the URL. Here is an example of an URL where URL escaping has been used:
http://www.someurl.com/profile.php?company=Barnes+%26+Noble
The company name Barnes & Noble was encoded as Barnes+%26+Noble. If the & hadn't been escaped the URL would have contained not one but two query parameters because & is used as a delimiter between query parameters.

not sure why but decode from #Martin's answer doesn't work in my case (filename in my case is "%D1%8D%D1%84%D1%84%D0%B5%D0%BA%D1%82%D0%B8%D0%BD%D0%BE%D0%B2%D1%81%D1%82%D1%8C%20%D0%BF%D1%80%D0%BE%D0%B4%D0%B0%D0%B6%202020%20(1)-8.xml").
For me works method - https://learn.microsoft.com/en-us/dotnet/api/system.uri.unescape?view=netcore-3.1 .
Be aware that this is obsolete.

UrlEncoding - which encoding should I use?

Using HttpUtility.UrlEncode and passing via the URL the receiving page sees the variables as:
brand new -> brand+new
Airconaire+Ltd -> Airconaire+Ltd
Can you see how the first and the second both have a + in them where they didn't at the start? I'm assuming this is something to do with the encoding (specifically RFC3986 or RFC2396) but how do I solve this?
I think ideally the spaces should be converted to %20 but is this the best way forward?

Try using HttpUtility.UrlPathEncode rather than URLEncode.

The UrlEncode() method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.

First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?

If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.