Encoding non UTF-8 text in Parameters in ASP.NET MVC - c#

Background
I have a web application that uses ISO-8859-1 encoding. When I pass parameters using Html.ActionLink(), the value is decoded to UTF-8:
Web.config:
<globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"
fileEncoding="iso-8859-1" />
Index.aspx
This is a <%= Html.ActionLink("test", "Read", new { name="Cosméticos" }) %>
generates the following:
This is a test
The problem is the value I receive in my controller is UTF-8, not iso-8859-1:
TestController:
public ActionResult Read(string name) {
//name is "Cosméticos" here!
}
Question
Why the string is not decoded to Cosméticos?

Does your aspx files are physically saved in iso-8859-1?
"File / Save Xyz As" And click at the right of the save button to have more encoding options to save your file in..
A guess

public static string ActionLinkNoEncode(this HtmlHelper htmlHelper, string linkText, ActionResult action )
{
var urlHelper = new UrlHelper(htmlHelper.ViewContext.RequestContext);
var url = Uri.UnescapeDataString(urlHelper.Action(action)).ToLowerInvariant();
var linkTagBuilder = new TagBuilder("a");
linkTagBuilder.MergeAttribute("href", url);
linkTagBuilder.InnerHtml = linkText;
return linkTagBuilder.ToString();
}

I found the problem and the workaround: the value I receive is UTF-8, but if I try to use System.Text.Encoding.UTF8.GetBytes(name) it converts the characters "é" to UTF-8 values instead of "É".
The workaround is to copy the string to a byte[] and then use System.Text.Encoding.Convert().
I don't know if this is the best way, but now everything is working for me.

A few things you might want to consider.
First, if you haven't already read it -- I highly recommend reading Joel Spolsky's article 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)' It sets the stage for learning about character encoding and programming.
Second, looking at the docs on the globalization element in the web.config it sounds like there are ways to (accidentally?) override the specified encoding scheme. From the docs:
requestEncoding
Specifies the assumed encoding of each incoming request, including
posted data and the query string. If the request comes with a request
header containing an Accept-Charset attribute, it overrides the
requestEncoding in configuration. The default encoding is UTF-8,
specified in the <globalization> tag included in the Machine.config
file created when the .NET Framework is installed. If request encoding
is not specified in a Machine.config or Web.config file, encoding
defaults to the computer's Regional Options locale setting. In
single-server applications, requestEncoding and responseEncoding
should be the same. For the less common case (multiple-server
applications where the default server encodings are different), you
can vary the request and response encoding using local Web.config
files.
Have you tried using something like Fiddler to see what the Accept-Charset attribute is set to?

Related

Change the way how HttpValueCollection encodes the arguments

var original = "АБ";
var query = HttpUtility.ParseQueryString("");
query["Arg"] = original;
var tmp1 = query.ToString();
The code above (which is the recommended way of building query strings) encodes the argument as Arg=%u0410%u0411
However, the target API doesn't accept this argument and demands it to be encoded this way: Arg=%D0%90%D0%91
Is it possible to make HttpValueCollection use this encoding?
There is a comment in the source code of HttpValueCollection that explains your problem:
// DevDiv #762975: <form action> and other similar URLs are mangled since we use non-standard %uXXXX encoding.
// We need to use standard UTF8 encoding for modern browsers to understand the URLs.
https://referencesource.microsoft.com/#System.Web/HttpValueCollection.cs,9938b1dbd553e753,references
It looks like this behavior can be controlled with an appSetting in web.config. To get the behavior you want add this:
<add key="aspnet:DontUsePercentUUrlEncoding" value="true" />
If you are targeting .NET 4.5.2+ this value should be set to true by default.
You could use in the FormUrlEncodedContent class in the System.Net.Http namespace instead. Here is an example of how you could do it:
string query;
using (var content = new FormUrlEncodedContent(new KeyValuePair<string, string>[]{
new KeyValuePair<string, string>("Arg", "АБ")
}))
{
query = content.ReadAsStringAsync().Result;
}
Console.WriteLine(query);
Also you can google "querystring builder c#" tosee solutions that others have come up with.
MVC is using UTF-8 for encoding where as your requirement uses some other codepage.
Just use the overload with the Encoding parameter (you need to choose your encoding which you need, for a list see here:
Encoding encoding = Encoding.GetEncoding(1252);
HttpUtility.ParseQueryString(string, encoding);
MSDN

Can I allow raw unicode in HTTP headers using NSUrlSession?

I'm constructing an NSUrlSession as follows:
NSUrlSessionConfiguration sessionCfg = NSUrlSessionConfiguration.CreateBackgroundSessionConfiguration("mySpecialSessionName");
NSUrlSessionDelegate sessionDelegate = new MySessionDelegate();
urlSession = NSUrlSession.FromConfiguration(sessionCfg, sessionDelegate, NSOperationQueue.MainQueue);
And invoking background downloads with custom HTTP headers:
NSMutableUrlRequest mutableRequest = new NSMutableUrlRequest();
mutableRequest.HttpMethod = "POST";
mutableRequest.Url = NSUrl.FromString(someEndpoint);
mutableRequest["MyCustomHeader"] = someStringWithUnicodeChars;
mutableRequest.Body = NSData.FromString(somePostBody);
NSUrlSessionDownloadTask downloadTask = m_UrlSession.CreateDownloadTask(mutableRequest);
downloadTask.Resume();
However, the header value string seems to get truncated at the first character above 255. For example, the header value:
SupeЯ Σario Bros
is received by the server as
Supe
When instead using .NET HttpClient on xamarin, unicode header strings successfully make it to the server unmodified. However, I'd like to make use of NSUrlSession's background downloading feature.
(I realize that support of unicode in HTTP headers is hit-and-miss, but since the HTTP server in this case is a particular custom server that doesn't currently support things like base64 encoding, passing the raw string is desired)
I don't know whether you'll be able to make that work, but two things come to mind:
What you have here is equivalent to calling setValue:forKey: on the URL request. I don't think that will do what you're expecting. Try calling the setValue:forHTTPHeaderField: method instead.
Try specifying the encoding before you specify your custom header value, e.g. [theRequest setValue:#"...; charset=UTF-8" forHTTPHeaderField:#"Content-Type"];
If neither of those helps, you'll probably have to encode the data in some way. I would suggest using URL encoding, because that's a lot simpler to implement on the server side than Base64. For the iOS side, see this link for info on how to URL-encode a string:
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/URLLoadingSystem/WorkingwithURLEncoding/WorkingwithURLEncoding.html

Box API v2 creating folder with cyrillic letters in it's name

I'm trying to create folder using new API.
If folder name contains cyrillic letters, I receive HTTP 400 Bad Request.
However it works fine with latin letters.
Is it known issue?
I found correct answer here: Detecting the character encoding of an HTTP POST request
the default encoding of a HTTP POST is ISO-8859-1.
The only thing I need is to manually set encoding of the request.
By the way, here is working code:
public static Task<string> Post(string url, string data, string authToken) {
var client = new WebClient { Encoding = Encoding.UTF8 };
client.Headers.Add("Content-Type:application/x-www-form-urlencoded");
client.Headers.Add(AuthHeader(authToken));
return client.UploadStringTaskAsync(new Uri(url), "POST", data);
}
Usually, complications involving international characters in Box API calls just need minor adjustments to the encoding of the requests. I'm guessing you'll just have to encode the target folder name with a urlencode.
If that doesn't do the trick, we may be able to help more if you send a sample request or code snippet. If you do, keep the api key and auth token to yourself.

Broken encoding after postback

I have a query string with a parameter value that contains the norwegian character å encoded as %e5. The page contains a form with an action attribute which is automatically filled by ASP.Net. When the URL is output into said attribute it is printed with a full two byte encoding: %u00e5.
When posting back this seems to be ok when debugging the code behind. However the page actually does a redirect to itself (for some other reason) and the redirect location header looks like this: Location: /myFolder/MyPage.aspx?Param1=%C3%A5
So the %e5 has been translated to %C3%A5 which breaks the output somehow.
In HTML text the broken characters look like å after having been output via HttpUtility.HtmlEncode.
The entire web application is ISO8859-1 encoded.
PS. When removing the u00 from the output %u00e5 in the action attribute before posting the form, everything is output nicely. But the error seems to be the translation from %e5 to %C3%A5. (And of course the self redirect, but that's another matter.)
Any pointers?
The solution I ended up with was encoding the redirect URL manually.
public void ReloadPage()
{
UrlBuilder url = new UrlBuilder(Context, Request.Path);
foreach (string queryParam in Request.QueryString.AllKeys)
{
string queryParamValue = Request.QueryString[queryParam];
url.AddQueryItem(queryParam, queryParamValue);
}
Response.Redirect( url.ToString(), true);
}
The url.AddQueryItem basically does HttpContext.Server.UrlDecode(queryParamValue) and the url.ToString builds the query string and for each query item does HttpContext.Server.UrlEncode( queryParamValue).
The UrlBuilder is a class already present in our library, so once I found the problem and realized that C#/.Net didn't provide tools for this, coding the fix was quick :)

Getting U+fffd/65533 instead of special character from Query String

I have a C# .net web project that have a globalization tag set to:
<globalization requestEncoding="utf-8" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
When this URL a Flash application (you get the same problem when you enter the URL manually in a browser): c_product_search.aspx?search=kjøkken (alternatively: c_product_search-aspx?search=kj%F8kken
Both return the following character codes:
k U+006b 107
j U+006a 106
� U+fffd 65533
k U+006b 107
k U+006b 107
e U+0065 101
n U+006e 110
I don't know too much about character encoding, but it seems that the ø has been given a unicode replacement character, right?
I tried to change the globalization tag to:
<globalization requestEncoding="iso-8859-1" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
That made the request work. However, now, other searches on my page stopped working.
I also tried the following with similar results:
NameValueCollection qs = HttpUtility.ParseQueryString(Request.QueryString.ToString(), Encoding.GetEncoding("iso-8859-1"));
string search = (string)qs["search"];
What should I do?
Kind Regards,
nitech
The problem comes from the combination Firefox/Asp.Net. When you manually entered a URL in Firefox's address bar, if the url contains french or swedish characters, Firefox will encode the url with "ISO-8859-1" by default.
But when asp.net recieves such a url, it thinks that it's utf-8 encoded ... And encoded characters become "U+fffd". I couldn't find a way in asp.net to detect that the url is "ISO-8859-1". Request.Encoding is set to utf-8 ... :(
Several solutions exist :
put <globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"/> in your Web.config. But your may comme with other problems, and your application won't be standard anymore (it will not work with languages like japanese) ... And anyway, I prefer using UTF-8 !
go to about:config in Firefox and set the value of network.standard-url.encode-query-utf8 to true. It will now work for you (Firefox will encode all your url with utf-8). But not for anybody else ...
The least worst solution I could come with was to handle this with code. If the default decoding didn't work, we reparse QueryString with iso8859-1 :
string query = Request.QueryString["search"];
if (query.Contains("%ufffd"))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))["search"];
query = HttpUtility.UrlDecode(query);
It works with hyperlinks and manually-entered url, in french, english, or japanese. But I don't know how it will handle other encodings like ISO8859-5 (russian) ...
Does anyone have a better solution ?
This solves only the problem of manually-entered url. In your hyperlinks, don't forget to encode url parameters with HttpUtility.UrlEncode on the server, or encodeURIComponent on the javascript code. And use HttpUtility.UrlDecode to decode it.
public string GetEncodedQueryString(string key)
{
string query = Request.QueryString[key];
if (query != null)
if (query.Contains((char)0xfffd))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))[key];
return query;
}
i think your problem is in the flash, not the .net.
it sends the special character in a weird way.
try to urlencode the search string bevore you send it to the server.
If the app is expecting the URL-encoded request to be based on UTF-8, the character "ø" should be "%C3%B8", not "%F8". Whatever function you're using to escape/encode that request, you probably need to pass it the name of the underlying character encoding, "UTF-8".
It turns out that ActionScript 2.0 will send the URL encoded/escaped with UTF-8 while ActionScript 3.0 used ISO-8859-1. The way to solve this was to change the Request.Encoding value inside Global.asax if an encoding is specified in the URL:
void Application_BeginRequest(object sender, EventArgs e)
{
HttpContext ctx = HttpContext.Current;
// encoding specified?
if (!String.IsNullOrEmpty(Request["encoding"]))
{
ctx.Request.ContentEncoding = System.Text.Encoding.GetEncoding(ctx.Request["encoding"]);
}
}
Could it be done differently?
Regards,
nitech

Categories