HttpRequestHeader Content encoding issue - c#

I am using below code snippet to download HTTP response to local file.
Sometimes my content which is in url is multi-lingual (chinese, japanese, thai data etc.).
I am using ContentEncoding header to specify my content is in UTF-8 encoding, but this has no effect in my local output file which is generating in ASCII. Due to this, multi-lingual data is corrupted. Any help?
using (var webClient = new WebClient())
{
webClient.Credentials = CredentialCache.DefaultCredentials;
webClient.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/4.0");
webClient.Headers.Add(HttpRequestHeader.ContentEncoding, "utf-8");
webClient.DownloadFile(url, #"c:\temp\tempfile.htm");
}

The ContentEncoding header is not used to specify the character set. It's used by the client to say what kind of encoding (compression) it supports.
The client can't tell the server what character set to send. The server sends its data and some header fields that say what character set is being used. Typically it's in the ContentTypeheader and looks like: text/html; charset=UTF-8.
When you're using WebClient, you want to set the Encoding property as a fallback so that if the server doesn't identify the character set, your default will be used. For example:
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
string s = client.DownloadString(DownloadUrl);
See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=800 for a bit more information.

Related

Is there any Issue or Risk of data lose if i am sending URL encoded data with a UTF8 encoding inside WebClient()

I am working on an asp.net mvc-4 web application . and my asp.net mvc-4 web application integrates with a 3rd party API. now the third part API should accept the json object in url encoded format + the 3rd party API allow passing non-ASCII characters.
so to be able to send url encoded json data and pass non-ASCII characters. i ended up with the following WebClient:-
using (WebClient wc = new WebClient())
{
var data = JsonConvert.SerializeObject(mainresourceinfo);
string url = currentURL + "resources?AUTHTOKEN=" + pmtoken;
Uri uri = new Uri(url);
wc.Encoding = System.Text.Encoding.UTF8;
wc.Headers.Add(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
crudoutput = wc.UploadString(uri, "INPUT_DATA=" + HttpUtility.UrlEncode(data));
}
where as shown above i have defined :-
wc.Encoding = System.Text.Encoding.UTF8;
to allow passing non-ASCII characters.
also since the 3rd part API should accept the json in url encoded format so i ended up defining HttpUtility.UrlEncode(data) inside the UploadString and defining wc.Headers.Add(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
First Question. is defining all these encoding and url encoding together a valid appraoch ?
Second Question. i though that defining
wc.Encoding = System.Text.Encoding.UTF8;
will ONLY affect how the data will be uploaded,, mainly to allow non-ASCII characters,, but seems wc.Encoding = System.Text.Encoding.UTF8; will also affect how the UploadString call output will be handled...so in my case if i i remove the wc.Encoding = System.Text.Encoding.UTF8; and the API rely back "The resource named ££123 has been added successfully" ,, then the crudoutput variable will have a value such as "The resource named ÂÂ123 has been added successfully".where the ££ will be assigned as ÂÂ . so this mean the wc.Encoding = System.Text.Encoding.UTF8; will mainly affect how the data will be uploaded and also how the output will be handled ? is this correct ?
So can anyone adivce about my above 2 questions ?
Thanks
You approach is valid as long as your pmtoken doesn't contain any characters such as & or + which have special meaning in a url. If it does, I'd recommend using URLEncode() to encode the pmtoken. As for your second question, yes it affects both request and response. Also make sure you data is not extremely long. Cos, browser(s) may have a limit on the url length and may cause issue. AFAIK, Microsoft has 2K character limit on url.

Can I allow raw unicode in HTTP headers using NSUrlSession?

I'm constructing an NSUrlSession as follows:
NSUrlSessionConfiguration sessionCfg = NSUrlSessionConfiguration.CreateBackgroundSessionConfiguration("mySpecialSessionName");
NSUrlSessionDelegate sessionDelegate = new MySessionDelegate();
urlSession = NSUrlSession.FromConfiguration(sessionCfg, sessionDelegate, NSOperationQueue.MainQueue);
And invoking background downloads with custom HTTP headers:
NSMutableUrlRequest mutableRequest = new NSMutableUrlRequest();
mutableRequest.HttpMethod = "POST";
mutableRequest.Url = NSUrl.FromString(someEndpoint);
mutableRequest["MyCustomHeader"] = someStringWithUnicodeChars;
mutableRequest.Body = NSData.FromString(somePostBody);
NSUrlSessionDownloadTask downloadTask = m_UrlSession.CreateDownloadTask(mutableRequest);
downloadTask.Resume();
However, the header value string seems to get truncated at the first character above 255. For example, the header value:
SupeЯ Σario Bros
is received by the server as
Supe
When instead using .NET HttpClient on xamarin, unicode header strings successfully make it to the server unmodified. However, I'd like to make use of NSUrlSession's background downloading feature.
(I realize that support of unicode in HTTP headers is hit-and-miss, but since the HTTP server in this case is a particular custom server that doesn't currently support things like base64 encoding, passing the raw string is desired)
I don't know whether you'll be able to make that work, but two things come to mind:
What you have here is equivalent to calling setValue:forKey: on the URL request. I don't think that will do what you're expecting. Try calling the setValue:forHTTPHeaderField: method instead.
Try specifying the encoding before you specify your custom header value, e.g. [theRequest setValue:#"...; charset=UTF-8" forHTTPHeaderField:#"Content-Type"];
If neither of those helps, you'll probably have to encode the data in some way. I would suggest using URL encoding, because that's a lot simpler to implement on the server side than Base64. For the iOS side, see this link for info on how to URL-encode a string:
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/URLLoadingSystem/WorkingwithURLEncoding/WorkingwithURLEncoding.html

c# WebClient problem using a URL with URL-encoded cyrillic chars

I'm tryinig to load a file from a web server with a request URL that contains a parameter with cyrillic chars.
But I'm not getting this to work in c#, even if I URL-Encode the param.
When I open the page in IE with
http://translate.google.com/translate_tts?tl=ru&q=ЗДРАВСТВУЙТЕ
the server does not respond.
Using the URL-encoded version
http://translate.google.com/translate_tts?tl=ru&q=%d0%97%d0%94%d0%a0%d0%90%d0%92%d0%a1%d0%a2%d0%92%d0%a3%d0%99%d0%a2%d0%95
the server responds as expected.
Now my problem:
I want to download the MP3 from C# ...
var url = string.Format("http://translate.google.com/translate_tts?tl=ru&q={0}",
Server.UrlEncode("ЗДРАВСТВУЙТЕ"));
System.Net.WebClient client = new WebClient();
var res = client.DownloadData(url);
And this does NOT work with cyrillic chars. I always get a zero-byte answer, like the first, non-encoded request.
When I send "normal" chars, the code above works fine.
So ... I'm doing something wrong.
Any hints? Tipps? Solutions?
Thanks
Michael
You have to set the user-agent for the WebClient - this works:
string url = "http://translate.google.com/translate_tts?tl=ru&q=ЗДРАВСТВУЙТЕ";
WebClient client = new WebClient();
client.Headers.Add("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
var res = client.DownloadData(url);
From the msdn documentation:
A WebClient instance does not send
optional HTTP headers by default. If
your request requires an optional
header, you must add the header to the
Headers collection. For example, to
retain queries in the response, you
must add a user-agent header. Also,
servers may return 500 (Internal
Server Error) if the user agent header
is missing.
Try to add
client.Encoding = System.Text.Encoding.UTF8;
I don't use user-agent header but for me it works:
WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.UTF8;
string response = client.DownloadString(_url);

Downloading a Cab file using WebClient gives too few bytes

I need to download a Cab file from a Url into a stream.
using (WebClient client = new WebClient())
{
client.Credentials = CredentialCache.DefaultCredentials;
byte[] fileContents = client.DownloadData("http://localhost/sites/hfsc/FormServerTemplates/HfscInspectionForm.xsn");
using (MemoryStream ms = new MemoryStream(fileContents))
{
FormTemplate = formExtractor.ExtractFormTemplateComponent(ms, "template.xml");
}
}
This is fairly straight forward, however my cab extractor (CabLib) is throwing an exception that it's not a valid cabinet.
I was previously using a SharePoint call to get the byte stream and that was returning 30942 bytes. The stream I get through that method worked correctly with CabLib. The stream I get with the WebClient returns only 28087 bytes.
I have noticed that the responce header content-type is coming back as text/html; charset=utf-8
I'm not too sure why but I think it's what's affecting the data I get back.
I beleive the problem is that SharePoint is passing the xsn to the Forms Server to render as an info path form in HTML for you. You need to stop this from happening. You can do this by adding some query string parameters to the URL request.
These can be found at:
http://msdn.microsoft.com/en-us/library/ms772417.aspx
I suggest you use NoRedirect=true

Mono WebClient encoding issue

I'm trying to port a .NET application from Windows to Mono, but certain code that was working on Windows is no longer working (as expected) on mono:
WebClient client = new WebClient ();
Console.WriteLine (client.DownloadString("http://www.maxima.fm/51Chart/"));
it seems to detect correctly the encoding as UTF-8 (and manually setting the encoding to UTF-8 or ASCII don't work either) there are still '?' characters
You are writing to the console. Maybe your console is not configured properly to show certain characters. Make sure by debugging and storing the result into an intermediary variable.
Also the site you gave as example is completely messed up. The web server sends Content-Type: text/html; charset=iso-8859-1 HTTP header and in the resulting HTML you see <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> which of course is completely incoherent. You cannot expect an HTTP client to behave correctly when confronted to non-standard site, what you get is unexpected behavior.
Try testing on some web site that respects a minimum of web standards.
Remark: WebClient implements IDisposable, so make sure you wrap it in a using statement.
UPDATE:
To make it work with this particular site you may try downloading the response manually and specifying the correct encoding:
// You may try different encodings here (for me it worked with iso-8859-1)
var encoding = Encoding.GetEncoding("iso-8859-1");
using (var client = new WebClient())
{
using (var stream = client.OpenRead("http://www.maxima.fm/51Chart/"))
using (var reader = new StreamReader(stream, encoding))
{
var result = reader.ReadToEnd();
Console.WriteLine(result);
}
}
using (var client = new WebClient())
{
client.Encoding = Encoding.UTF8;
Console.WriteLine (client.DownloadString("http://www.maxima.fm/51Chart/"));
}

Categories