Mono WebClient encoding issue - c#

I'm trying to port a .NET application from Windows to Mono, but certain code that was working on Windows is no longer working (as expected) on mono:
WebClient client = new WebClient ();
Console.WriteLine (client.DownloadString("http://www.maxima.fm/51Chart/"));
it seems to detect correctly the encoding as UTF-8 (and manually setting the encoding to UTF-8 or ASCII don't work either) there are still '?' characters

You are writing to the console. Maybe your console is not configured properly to show certain characters. Make sure by debugging and storing the result into an intermediary variable.
Also the site you gave as example is completely messed up. The web server sends Content-Type: text/html; charset=iso-8859-1 HTTP header and in the resulting HTML you see <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> which of course is completely incoherent. You cannot expect an HTTP client to behave correctly when confronted to non-standard site, what you get is unexpected behavior.
Try testing on some web site that respects a minimum of web standards.
Remark: WebClient implements IDisposable, so make sure you wrap it in a using statement.
UPDATE:
To make it work with this particular site you may try downloading the response manually and specifying the correct encoding:
// You may try different encodings here (for me it worked with iso-8859-1)
var encoding = Encoding.GetEncoding("iso-8859-1");
using (var client = new WebClient())
{
using (var stream = client.OpenRead("http://www.maxima.fm/51Chart/"))
using (var reader = new StreamReader(stream, encoding))
{
var result = reader.ReadToEnd();
Console.WriteLine(result);
}
}

using (var client = new WebClient())
{
client.Encoding = Encoding.UTF8;
Console.WriteLine (client.DownloadString("http://www.maxima.fm/51Chart/"));
}

Related

WebRequest returns unreadable string [duplicate]

I'm trying to download an html document from Amazon but for some reason I get a bad encoded string like "��K��g��g�e".
Here's the code I tried:
using (var webClient = new System.Net.WebClient())
{
var url = "https://www.amazon.com/dp/B07H256MBK/";
webClient.Encoding = Encoding.UTF8;
var result = webClient.DownloadString(url);
}
Same thing happens when using HttpClient:
var url = "https://www.amazon.com/dp/B07H256MBK/";
var httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(url);
I also tried reading the result in Bytes and then convert it back to UTF-8 but I still get the same result. Also note that this DOES NOT always happen. For example, yesterday I was running this code for ~2 hours and I was getting a correctly encoded HTML document. However today I always get a bad encoded result. It happens every other day so it's not a one time thing.
==================================================================
However when I use the HtmlAgilitypack's wrapper it works as expected everytime:
var url = "https://www.amazon.com/dp/B07H256MBK/";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);
What causes the WebClient and HttpClient to get a bad encoded string even when I explicitly define the correct encoding? And how does the HtmlAgilityPack's wrapper works by default?
Thanks for any help!
I fired up Firefox's web dev tools, requested that page, and looked at the response headers:
See that content-encoding: gzip? That means the response is gzip-encoded.
It turns out that Amazon gives you a response compressed with gzip even when you don't send an Accept-Encoding: gzip header (verified with another tool). This is a bit naughty, but not that uncommon, and easy to work around.
This wasn't a problem with character encodings at all. HttpClient is good at figuring out the correct encoding from the Content-Type header.
You can tell HttpClient to un-zip responses with:
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip,
};
using (var client = new HttpClient(handler))
{
// your code
}
This will be set automatically if you're using the NuGet package versions 4.1.0 to 4.3.2, otherwise you'll need to do it yourself.
You can do the same with WebClient, but it's harder.

how to pass unicode in asp.net web api to sql server database [duplicate]

I'm trying to send special characters through an http request, now I'm using Loopj as my http client. The problem is that when I try to send special characters i.e. "áéíóú" the request goes out with the characters "·ÈÌÛ˙", this is causing some issues on the server sider.
I've gone through the Loopj code and couldn't find anything relative to recoding my string or anything like it. In the worst case it seems like it would be encoded in UTF-8 which actually supports this characters.
Hope anyone can help.
Best Regards.
I am guessing you mean AsyncHttpClient library, correct?
AHC defaults to encoding all I/O in UTF-8. Due to the lack of source code, I would point you to investigate the following:
What is the encoding of the input? Make sure it's in UTF-8.
Are you running the input through a filter/function that might change its encoding? Make sure that the filter/function produces UTF-8 also.
Prior to checking what your backend actually receives, change your client to submit to http://httpbin.org/post and then check the result.
If you receive correct submission in httpbin, and bad submission in your backend, the problem is NOT in AHC but in your backend.
If you receive bad submissions in both httpbin and the backend, then the data being sent was originally bad or in a wrong encoding.
I hope this helps you find the problem quickly.
Why Don't you use this Approach:
HttpParams httpParameters = new BasicHttpParams();
HttpProtocolParams.setContentCharset(httpParameters, HTTP.UTF_8);
HttpProtocolParams.setHttpElementCharset(httpParameters, HTTP.UTF_8);
HttpClient client = new DefaultHttpClient(httpParameters);
client.getParams().setParameter("http.protocol.version", HttpVersion.HTTP_1_1);
client.getParams().setParameter("http.socket.timeout", new Integer(2000));
client.getParams().setParameter("http.protocol.content-charset", HTTP.UTF_8);
httpParameters.setBooleanParameter("http.protocol.expect-continue", false);
HttpPost request = new HttpPost("http://www.server.com/some_script.php?sid=" + String.valueOf(Math.random()));
request.getParams().setParameter("http.socket.timeout", new Integer(5000));
List<NameValuePair> postParameters = new ArrayList<NameValuePair>();
// you get this later in php with $_POST['value_name']
postParameters.add(new BasicNameValuePair("value_name", "value_val"));
UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(postParameters, HTTP.UTF_8);
request.setEntity(formEntity);
HttpResponse response = client.execute(request);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
StringBuffer sb = new StringBuffer("");
String line = "";
String lineSeparator = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line);
sb.append(lineSeparator);
}
in.close();
String result = sb.toString();
Users of above code says, this code works like charm. And i think if you are facing issues with your approach then you should change your approach to solve your problem.
See this Link which i found useful for you: Android default charset when sending http post/put - Problems with special characters

Why does WebClient.UploadValues overwrites my html web page?

I'm familiar with Winform and WPF, but new to web developing. One day saw WebClient.UploadValues and decided to try it.
static void Main(string[] args)
{
using (var client = new WebClient())
{
var values = new NameValueCollection();
values["thing1"] = "hello";
values["thing2"] = "world";
//A single file that contains plain html
var response = client.UploadValues("D:\\page.html", values);
var responseString = Encoding.Default.GetString(response);
Console.WriteLine(responseString);
}
Console.ReadLine();
}
After run, nothing printed, and the html file content becomes like this:
thing1=hello&thing2=world
Could anyone explain it, thanks!
The UploadValues method is intended to be used with the HTTP protocol. This means that you need to host your html on a web server and make the request like that:
var response = client.UploadValues("http://some_server/page.html", values);
In this case the method will send the values to the server by using application/x-www-form-urlencoded encoding and it will return the response from the HTTP request.
I have never used the UploadValues with a local file and the documentation doesn't seem to mention anything about it. They only mention HTTP or FTP protocols. So I suppose that this is some side effect when using it with a local file -> it simply overwrites the contents of this file with the payload that is being sent.
You are using WebClient not as it was intended.
The purpose of WebClient.UploadValues is to upload the specified name/value collection to the resource identified by the specified URI.
But it should not be some local file on your disk, but instead it should be some web-service listening for requests and issuing responces.

German characters sending data using POST method from ASP page to PHP page

I have a problem with sending data from ASP with the POST Method to a PHP page.
I would like to send mail with names. And since I live in Austria the names are in German and we have some Special characters. These characters don't arrive write.
I'm still pretty new to programming with C# btw. I had the Website before in Java-Script but I had to connect it with a database and therefore I switched to C# and now I'm like a "babe in the woods".
this.hdnDaten.Value = "ÄÖÜ|äöü|ß|é|#";
// mit POST versuchen
using (var client = new WebClient())
{
var postData = new System.Collections.Specialized.NameValueCollection();
postData.Add("von", this.hdnVon.Value);
postData.Add("an", this.hdnAn.Value);
postData.Add("betreff", this.hdnBetreff.Value);
postData.Add("daten", this.hdnDaten.Value);
byte[] response = client.UploadValues("http://xxxxxx.php", "POST", postData);
var responsebody = Encoding.UTF8.GetString(response);
}
And this is how the characters (in this.hdnDaten.Value) from above arrive in the mail-body:
ÄÖÜ|äöü|ß|é|#
Does anybody know what I can do to get the same characters in the end?
Edit 20143013: I think I have a clue: I have to encode the postData into ANSI (Codepage 1252). I tried do do this, but it doesn't work. Does anybody have an Idea how I could do this?
Edit 20140320: I don't even dare to give you the answer: I was looking all the time in the wrong place (somewhat like MH370): The problem was with the receiving side of the mail (I was using a POP3-Viewer for testing); when I downloaded the mail to Outlook everything was OK. The funny thing was that this didn't happen in the original (Javascript) Version that's why I was looking at the wrong place.
Thanks
Eddie
Try setting client.Encoding to UTF-8 before calling UploadValues. Also ensure that you read the text as UTF-8 on the server.
Try this.hdnDaten.Value = HttpUtility.UrlEncode("ÄÖÜ|äöü|ß|é|#"); on your post parameters.
on PHP you'll need to decode the parameters via html_entity_decode

HttpRequestHeader Content encoding issue

I am using below code snippet to download HTTP response to local file.
Sometimes my content which is in url is multi-lingual (chinese, japanese, thai data etc.).
I am using ContentEncoding header to specify my content is in UTF-8 encoding, but this has no effect in my local output file which is generating in ASCII. Due to this, multi-lingual data is corrupted. Any help?
using (var webClient = new WebClient())
{
webClient.Credentials = CredentialCache.DefaultCredentials;
webClient.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/4.0");
webClient.Headers.Add(HttpRequestHeader.ContentEncoding, "utf-8");
webClient.DownloadFile(url, #"c:\temp\tempfile.htm");
}
The ContentEncoding header is not used to specify the character set. It's used by the client to say what kind of encoding (compression) it supports.
The client can't tell the server what character set to send. The server sends its data and some header fields that say what character set is being used. Typically it's in the ContentTypeheader and looks like: text/html; charset=UTF-8.
When you're using WebClient, you want to set the Encoding property as a fallback so that if the server doesn't identify the character set, your default will be used. For example:
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
string s = client.DownloadString(DownloadUrl);
See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=800 for a bit more information.

Categories