Getting U+fffd/65533 instead of special character from Query String - c#

I have a C# .net web project that have a globalization tag set to:
<globalization requestEncoding="utf-8" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
When this URL a Flash application (you get the same problem when you enter the URL manually in a browser): c_product_search.aspx?search=kjøkken (alternatively: c_product_search-aspx?search=kj%F8kken
Both return the following character codes:
k U+006b 107
j U+006a 106
� U+fffd 65533
k U+006b 107
k U+006b 107
e U+0065 101
n U+006e 110
I don't know too much about character encoding, but it seems that the ø has been given a unicode replacement character, right?
I tried to change the globalization tag to:
<globalization requestEncoding="iso-8859-1" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
That made the request work. However, now, other searches on my page stopped working.
I also tried the following with similar results:
NameValueCollection qs = HttpUtility.ParseQueryString(Request.QueryString.ToString(), Encoding.GetEncoding("iso-8859-1"));
string search = (string)qs["search"];
What should I do?
Kind Regards,
nitech

The problem comes from the combination Firefox/Asp.Net. When you manually entered a URL in Firefox's address bar, if the url contains french or swedish characters, Firefox will encode the url with "ISO-8859-1" by default.
But when asp.net recieves such a url, it thinks that it's utf-8 encoded ... And encoded characters become "U+fffd". I couldn't find a way in asp.net to detect that the url is "ISO-8859-1". Request.Encoding is set to utf-8 ... :(
Several solutions exist :
put <globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"/> in your Web.config. But your may comme with other problems, and your application won't be standard anymore (it will not work with languages like japanese) ... And anyway, I prefer using UTF-8 !
go to about:config in Firefox and set the value of network.standard-url.encode-query-utf8 to true. It will now work for you (Firefox will encode all your url with utf-8). But not for anybody else ...
The least worst solution I could come with was to handle this with code. If the default decoding didn't work, we reparse QueryString with iso8859-1 :
string query = Request.QueryString["search"];
if (query.Contains("%ufffd"))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))["search"];
query = HttpUtility.UrlDecode(query);
It works with hyperlinks and manually-entered url, in french, english, or japanese. But I don't know how it will handle other encodings like ISO8859-5 (russian) ...
Does anyone have a better solution ?
This solves only the problem of manually-entered url. In your hyperlinks, don't forget to encode url parameters with HttpUtility.UrlEncode on the server, or encodeURIComponent on the javascript code. And use HttpUtility.UrlDecode to decode it.

public string GetEncodedQueryString(string key)
{
string query = Request.QueryString[key];
if (query != null)
if (query.Contains((char)0xfffd))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))[key];
return query;
}

i think your problem is in the flash, not the .net.
it sends the special character in a weird way.
try to urlencode the search string bevore you send it to the server.

If the app is expecting the URL-encoded request to be based on UTF-8, the character "ø" should be "%C3%B8", not "%F8". Whatever function you're using to escape/encode that request, you probably need to pass it the name of the underlying character encoding, "UTF-8".

It turns out that ActionScript 2.0 will send the URL encoded/escaped with UTF-8 while ActionScript 3.0 used ISO-8859-1. The way to solve this was to change the Request.Encoding value inside Global.asax if an encoding is specified in the URL:
void Application_BeginRequest(object sender, EventArgs e)
{
HttpContext ctx = HttpContext.Current;
// encoding specified?
if (!String.IsNullOrEmpty(Request["encoding"]))
{
ctx.Request.ContentEncoding = System.Text.Encoding.GetEncoding(ctx.Request["encoding"]);
}
}
Could it be done differently?
Regards,
nitech

Related

C# Percentage Decoding

I am trying to decode a percentage encoded string passed from a PHP script to my C# application. The PHP encrypts the data, so there are many special characters that I percentage encode.
Here's the string I'm passing in the URL:
%C9%90%04L%EFEA%D1U%AFi%CBc%3A%E5%D0%40Q%D6%1Bn%C9%C3%B5%0FT%FC%E5h%95m%EF%BF%24tB%A6%D1%08%3B%83%A1%CF%1B%99Zo%02
But it has trouble percentage decoding parts of it: when I fetch the query string, like so:
var queryString = HttpContext.Current.Request.QueryString;
var token = queryString["token"];
The variable token, though, equals this:
%C9%90%04L%EF%BF%BDEA%EF%BF%BDU%EF%BF%BDI%EF%BF%BDC%3A%EF%BF%BD%EF%BF%BD%40Q%EF%BF%BD%1BN%EF%BF%BD%C3%B5%0FT%EF%BF%BD%EF%BF%BDH%EF%BF%BDM%EF%BF%BD%24TB%EF%BF%BD%EF%BF%BD%08%3B%EF%BF%BD%EF%BF%BD%EF%BF%BD%1B%EF%BF%BDZO%02
This is definitely not what I put into the query string. When looking at it, the first time it messes up is %EF(starts 11th character into the original query string). Instead of %EF it shows: %EF%BF%BD. When I searched a little, I found this webpage which says the "Hex UTF-8 Bytes" are EF BF BD.
EDIT:
Forgot to mention, QueryString looks like this:
token=%c9%90%04L%ef%bf%bdEA%ef%bf%bdU%ef%bf%bdi%ef%bf%bdc%3a%ef%bf%bd%ef%bf%bd%40Q%ef%bf%bd%1bn%ef%bf%bd%c3%b5%0fT%ef%bf%bd%ef%bf%bdh%ef%bf%bdm%ef%bf%bd%24tB%ef%bf%bd%ef%bf%bd%08%3b%ef%bf%bd%ef%bf%bd%ef%bf%bd%1b%ef%bf%bdZo%02&oauth_token_secret=S%23%2bw%ef%bf%bd%ef%bf%bdX%17%ef%bf%bd0%ef%bf%bd%60%ef%bf%bd%ef%bf%bd%ef%bf%bd%ef%bf%bd*%ef%bf%bdi%08%ef%bf%bd%ef%bf%bd%ef%bf%bd%ef%bf%bd%07%ef%bf%bd%12RS07%ef%bf%bdgl%1e%ef%bf%bd%d7%832%d1%a1%ef%bf%bd%275%ef%bf%bdv%ef%bf%bd
You might be looking for the HttpServerUtility.UrlDecode method:
HttpContext.Current.Server.UrlDecode(HttpContext.Current.Request.QueryString["token"]);
Your error is somewhere else...
I created a new web page in a .Net 4.0 project, and put this in the Page_Load:
protected void Page_Load(object sender, EventArgs e)
{
var queryString = HttpContext.Current.Request.QueryString;
var token = queryString["token"];
throw new Exception(token);
}
Then I ran the page by going to this URL which matches the querystring you gave above:
http://localhost:27151/test.aspx?token=%c9%90%04L%ef%bf%bdEA%ef%bf%bdU%ef%bf%bdi%ef%bf%bdc%3a%ef%bf%bd%ef%bf%bd%40Q%ef%bf%bd%1bn%ef%bf%bd%c3%b5%0fT%ef%bf%bd%ef%bf%bdh%ef%bf%bdm%ef%bf%bd%24tB%ef%bf%bd%ef%bf%bd%08%3b%ef%bf%bd%ef%bf%bd%ef%bf%bd%1b%ef%bf%bdZo%02
The page decoded the token and displayed it in the exception message as this:
ɐL�EA�U�i�c:��#Q�n�õT��h�m�$tB��;����Zo
(The URL is encoding a binary string so when the actual string is printed, there are some characters that don't get displayed.)
If you run this and the token actually prints "%C9%90%04L..." then your token has probably been double encoded. All of the percent signs will be replaced with "%25" so your URL would look like this:
http://localhost:27151/test.aspx?token=%25c9%2590%2504L%25ef%25bf%25bdEA%25ef%25bf%25bdU%25ef%25bf%25bdi%25ef%25bf%25bdc%253a%25ef%25bf%25bd%25ef%25bf%25bd%2540Q%25ef%25bf%25bd%251bn%25ef%25bf%25bd%25c3%25b5%250fT%25ef%25bf%25bd%25ef%25bf%25bdh%25ef%25bf%25bdm%25ef%25bf%25bd%2524tB%25ef%25bf%25bd%25ef%25bf%25bd......
Since you didn't mention the "%25", the error is somewhere else in your code. The URL you think you are using is not the one being decoded.

Special phone and e-mail characters

I am parsing web pages by .NET (c#, HtmlAgilityPack). There are some values in the special format in the web page code (phone, email). Target values are "+420 221 513 222" and "revize#secar.cz" for instance but in html source code the values are like
<span class="p none">420%8722%AC1%87513%87%AC222</span>
<a class="e none">rev%DBize%DB%A7se%DBcar%DB%96cz</a>
I think I am missing something. I tried to use replace function etc. but to no avail. Can somebody help me with converting this values to right string values? (regex?)
Thank you for your help.
You could use:
HttpUtility.HtmlDecode(S)
This can be found in the System.Web namespace.
Sure. You're looking for Uri.UnescapeDataString(url). However, it doesn't quite decode all of it at the same time. So what you need to do is use it in a loop, like this:
public static string DecodeUrlString(this string url)
{
string newUrl;
while ((newUrl = Uri.UnescapeDataString(url)) != url)
url = newUrl;
return newUrl;
}

301 Redirect with unicode characters - C#

I need to do a 301 redirect on a URL that may have Unicode characters in it.
HttpUtility.UrlEncode isn't doing what I need because if I encode the whole URL it encodes any ':' or '/'
HttpUtility.UrlEncode("http://www.हिन्दी.com") = http%3a%2f%2fwww.%e0%a4%b9%e0%a4%bf%e0%a4%a8%e0%a5%8d%e0%a4%a6%e0%a5%80.com
(also: http://www.%e0%a4%b9%e0%a4%bf%e0%a4%a8%e0%a5%8d%e0%a4%a6%e0%a5%80.com doesn't seem to work in firefox or IE, but it does in Chrome)
Only other thing I can think of is to encode the different parts of the URL so that the protocol doesn't get encoded.
You need to take a look at RFC 3490 which details how to correctly encode international domain names -- this is also why when you encode just the domain portion it only works in Chrome)
So I figured out a almost 100% solution to this. Thanks to Rowland Shaw and Rup for pointing me in the direction of IDNs.
I tried using an IdnMapper, whose function GetAscii will convert unicode domain names to punycode, but I didn't have the domain separated from the rest of the URL. I tried putting the url into a Uri object, but I would get a UriFormatException if the url had unicode characters.
That led me to: http://msdn.microsoft.com/en-us/library/system.uri(v=VS.90).aspx
which tells how to enable the Uri class to accept unicode and do the IDN and IRI conversions. It says you have to add something to the .NET 2.0 machine.config file, but you can put the line in web.config and it will work.
After I got the Uri working with unicode, I pieced together the url and did a redirect:
Response.Clear();
Response.Status = "301 Moved Permanently";
Response.AddHeader("Location", uri.Scheme + "://" + uri.DnsSafeHost + uri.PathAndQuery + uri.Fragment);
Response.End();
This works for Chrome and Firefox 3.6, but fails in IE8. I'm still trying to solve that problem and will update here if I find a solution.

Broken encoding after postback

I have a query string with a parameter value that contains the norwegian character å encoded as %e5. The page contains a form with an action attribute which is automatically filled by ASP.Net. When the URL is output into said attribute it is printed with a full two byte encoding: %u00e5.
When posting back this seems to be ok when debugging the code behind. However the page actually does a redirect to itself (for some other reason) and the redirect location header looks like this: Location: /myFolder/MyPage.aspx?Param1=%C3%A5
So the %e5 has been translated to %C3%A5 which breaks the output somehow.
In HTML text the broken characters look like å after having been output via HttpUtility.HtmlEncode.
The entire web application is ISO8859-1 encoded.
PS. When removing the u00 from the output %u00e5 in the action attribute before posting the form, everything is output nicely. But the error seems to be the translation from %e5 to %C3%A5. (And of course the self redirect, but that's another matter.)
Any pointers?
The solution I ended up with was encoding the redirect URL manually.
public void ReloadPage()
{
UrlBuilder url = new UrlBuilder(Context, Request.Path);
foreach (string queryParam in Request.QueryString.AllKeys)
{
string queryParamValue = Request.QueryString[queryParam];
url.AddQueryItem(queryParam, queryParamValue);
}
Response.Redirect( url.ToString(), true);
}
The url.AddQueryItem basically does HttpContext.Server.UrlDecode(queryParamValue) and the url.ToString builds the query string and for each query item does HttpContext.Server.UrlEncode( queryParamValue).
The UrlBuilder is a class already present in our library, so once I found the problem and realized that C#/.Net didn't provide tools for this, coding the fix was quick :)

Encoding non UTF-8 text in Parameters in ASP.NET MVC

Background
I have a web application that uses ISO-8859-1 encoding. When I pass parameters using Html.ActionLink(), the value is decoded to UTF-8:
Web.config:
<globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"
fileEncoding="iso-8859-1" />
Index.aspx
This is a <%= Html.ActionLink("test", "Read", new { name="Cosméticos" }) %>
generates the following:
This is a test
The problem is the value I receive in my controller is UTF-8, not iso-8859-1:
TestController:
public ActionResult Read(string name) {
//name is "Cosméticos" here!
}
Question
Why the string is not decoded to Cosméticos?
Does your aspx files are physically saved in iso-8859-1?
"File / Save Xyz As" And click at the right of the save button to have more encoding options to save your file in..
A guess
public static string ActionLinkNoEncode(this HtmlHelper htmlHelper, string linkText, ActionResult action )
{
var urlHelper = new UrlHelper(htmlHelper.ViewContext.RequestContext);
var url = Uri.UnescapeDataString(urlHelper.Action(action)).ToLowerInvariant();
var linkTagBuilder = new TagBuilder("a");
linkTagBuilder.MergeAttribute("href", url);
linkTagBuilder.InnerHtml = linkText;
return linkTagBuilder.ToString();
}
I found the problem and the workaround: the value I receive is UTF-8, but if I try to use System.Text.Encoding.UTF8.GetBytes(name) it converts the characters "é" to UTF-8 values instead of "É".
The workaround is to copy the string to a byte[] and then use System.Text.Encoding.Convert().
I don't know if this is the best way, but now everything is working for me.
A few things you might want to consider.
First, if you haven't already read it -- I highly recommend reading Joel Spolsky's article 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)' It sets the stage for learning about character encoding and programming.
Second, looking at the docs on the globalization element in the web.config it sounds like there are ways to (accidentally?) override the specified encoding scheme. From the docs:
requestEncoding
Specifies the assumed encoding of each incoming request, including
posted data and the query string. If the request comes with a request
header containing an Accept-Charset attribute, it overrides the
requestEncoding in configuration. The default encoding is UTF-8,
specified in the <globalization> tag included in the Machine.config
file created when the .NET Framework is installed. If request encoding
is not specified in a Machine.config or Web.config file, encoding
defaults to the computer's Regional Options locale setting. In
single-server applications, requestEncoding and responseEncoding
should be the same. For the less common case (multiple-server
applications where the default server encodings are different), you
can vary the request and response encoding using local Web.config
files.
Have you tried using something like Fiddler to see what the Accept-Charset attribute is set to?

Categories