HttpWebResponse - Encoding problem - c#

I have a problem with encoding. When I get site's source code I have:
I set encoding to UTF8 like this:
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string sourceCode = reader.ReadToEnd();
Thanks for your help!

Try to use the encoding specified:
Encoding encoding;
try
{
encoding = Encoding.GetEncoding(response.CharacterSet);
}
catch (ArgumentException)
{
// Cannot determine encoding, use dafault
encoding = Encoding.UTF8;
}
StreamReader reader = new StreamReader(response.GetResponseStream(), encoding);
string sourceCode = reader.ReadToEnd();
If you are accepting gzip somehow, this may help: (Haven't tried it myself and admittedly it doesn't make much sense since your encoding is not gzip?!)
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

I had the same issue, I tried changing encoding, from the source to the result, and I got nothing. in the end, I come across a thread that leads me to the following...
Take look here...
.NET: Is it possible to get HttpWebRequest to automatically decompress gzip'd responses?
you need to use the following code, before retrieving the response from the request.
rqst.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
since once we use accept-encoding 'gzip' or 'deflate', the data get compressed, and turn into data unreadable by us. so we need to decompress them.

But the response might not be UTF-8. Have you checked the CharacterSet and the ContentType properties of the response object to make sure you're using the right encoding?
In any event, those two characters look like the code page 437 characters for values 03 and 08. It looks like there's some binary data in your data stream.
I would suggest that for debugging, you use Stream.Read to read the first few bytes from the response into a byte array and then examine the values to see what you're getting.

Change this line in your code:
using (StreamReader streamReader = new StreamReader(stream, Encoding.GetEncoding(1251)))
it may help you..

Related

StreamReader fails to detect BOM

I have the following piece of code:
using (StreamReader sr = new StreamReader(path, Encoding.GetEncoding("shift-jis"), true)) {
mCertainFileIsUTFFormat = !sr.CurrentEncoding.Equals(Encoding.GetEncoding("shift-jis"));
mCodingFromBOM = sr.CurrentEncoding;
String line = sr.ReadToEnd();
return line.Split('\n');
}
Basically reading a file and assuming Shift-Jis if there is no BOM. Alas, this method is always, no matter what, returning Shift-JIS encoding, even if the file in question has a BOM within it. Am I doing something wrong here or perhaps there is a known issue? I could always open the file binary and do the work myself, but this is supposed to do what I want :)
You need to call Read of any kind - StreamReader will not detect encoding before reading. I.e. get encoding after your ReadToEnd call:
String line = sr.ReadToEnd();
mCodingFromBOM = sr.CurrentEncoding;
Info: StreamReader.CurrentEncoding
The value can be different after the first call to any Read` method of StreamReader, since encoding autodetection is not done until the first call to a Read method.

Getting JSON data from a response stream and reading it as a string?

I am trying to read a response from a server that I receive when I send a POST request. Viewing fiddler, it says it is a JSON response. How do I decode it to a normal string using C# Winforms with preferably no outside APIs. I can provide additional code/fiddler results if you need them.
The fiddler and gibberish images:
The gibberish came from my attempts to read the stream in the code below:
Stream sw = requirejs.GetRequestStream();
sw.Write(logBytes, 0, logBytes.Length);
sw.Close();
response = (HttpWebResponse)requirejs.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader sr = new StreamReader(stream);
MessageBox.Show(sr.ReadToEnd());
As mentioned in the comments, Newtonsoft.Json is really a good library and worth using -- very lightweight.
If you really want to only use Microsoft's .NET libraries, also consider System.Web.Script.Serialization.JavaScriptSerializer.
var serializer = new System.Web.Script.Serialization.JavaScriptSerializer();
var jsonObject = serializer.DeserializeObject(sr.ReadToEnd());
Going to assume (you haven't clarified yet) that you need to actually decode the stream, since A) retrieving a remote stream of text is well documented, and B) you can't do anything much with a non-decoded JSON stream.
Your best course of action is to implement System.Web.Helpers.Json:
using System.Web.Helpers.Json
...
var jsonObj = Json.Decode(jsonStream);

converting \u0040 to # in C#

The Facebook graph API's return to me the user's email address as
foo\u0040bar.com.
in a JSON object. I need to convert it to
foo#bar.com.
There must be a built in method in .NET that changes the Unicode character expression (\u1234) to the actual unicode symbol.
Do you know what it is?
Note: I prefer not to use JSON.NET or JavaScriptSerializer for performance issues.
I think the problem is in my StreamReader:
requestUrl = "https://graph.facebook.com/me?access_token=" + accessToken;
request = WebRequest.Create(requestUrl) as HttpWebRequest;
try
{
using (HttpWebResponse response2 = request.GetResponse() as HttpWebResponse)
{
// Get the response stream
reader = new StreamReader(response2.GetResponseStream(),System.Text.Encoding.UTF8);
string json = reader.ReadToEnd();
I tried different encodings for the StreamReader, UTF8, UTF7, Unicode, ... none worked.
Many thanks!
Thanks to L.B for correcting me. The problem was not in the StreamReader.
Yes, there is some built in method for that, but that would involve something like using a compiler to parse the string as code...
Use a simple replace:
s = s.Replace(#"\u0040", "#");
For a more flexible solution, you can use a regular expression that can handle any unicode character:
s = Regex.Replace(s, #"\\u([\dA-Fa-f]{4})", v => ((char)Convert.ToInt32(v.Groups[1].Value, 16)).ToString());
Json responses are not binary data to convert to a string using some encodings. Instead they are strings correctly decoded by your browser or by HttpWebResponse as in your example. You need a second procesing on it(regex, deserializers etc) to get the final data.
See what you get with
webClient.DownloadString("https://graph.facebook.com/HavelVaclav?access_token=????") without any encoding
{"id":"100000042150992",
"name":"Havel V\u00e1clav",
"first_name":"Havel",
"last_name":"V\u00e1clav",
"link":"http:\/\/www.facebook.com\/havel.vaclav",
"username":"havel.vaclav",
"gender":"male",
"locale":"cs_CZ"
}
Would your encoding change \/ to /?
So, the problem is not in your StreamReader.

Receiving post data with different encodings

I am trying to integrate with a third-party system and in the documentation is mentions that when they send xml data via HttpPost, they sometimes use "text/xml charset=\"UTF-8**"" for the "Content-Type", and in other cases they use "**application/x-www.form-urlencoded" as the Content-Type.
Would there be any differences in parsing the request? Right now I just pull the post data using the folllowing code:
StreamReader reader = new StreamReader(Request.InputStream);
String xmlData = reader.ReadToEnd();
When you open the stream reader, you should pass the encoding specified on the HttpRequest object.
StreamReader reader = new StreamReader(request.InputStream, request.ContentEncoding);
string xmlData = reader.ReadToEnd();
This should allow you to get the original contents of the request into a proper .NET string regardless of whatever encoding is used.
Always give preference to use Encoding.UTF8. This will ensure that, in most cases, the reading is always done in a correct coding standard.
StreamReader sr = new StreamReader(Request.InputStream, Encoding.UTF8);
Hope it helps.
You can pass an encoding to your StreamReader at construction like so:
StreamReader s = new StreamReader(new FileStream(FILE), Encoding.UTF8);
application/x-www.form-urlencoded is HTTP Form Data, not XML.
Your code would most likely fail if you expect that Request.InputStream will be a parsable XML string when the Content-Type is application/x-www.form-urlencoded

Why does HttpWebResponse return a null terminated string?

I recently was using HttpWebResponse to return xml data from a HttpWebRequest, and I noticed that the stream returned a null terminated string to me.
I assume its because the underlying library has to be compatible with C++, but I wasn't able to find a resource to provide further illumination.
Mostly I'm wondering if there is an easy way to disable this behavior so I don't have to sanitize strings I'm passing into my xml reader.
Edit here is a sample of the relevant code:
httpResponse.GetResponseStream().Read(serverBuffer, 0, BUFFER_SIZE);
output = processResponse(System.Text.UTF8Encoding.UTF8.GetString(serverBuffer))
where processResponse looks like:
processResponse(string xmlResponse)
{
var Parser = new XmlDocument();
xmlResponse = xmlResponse.Replace('\0',' '); //fix for httpwebrequest null terminating strings
Parser.LoadXml(xmlResponse);
This definitely isn't normal behaviour. Two options:
You made a mistake in the reading code (e.g. creating a buffer and then calling Read on a stream, expecting it to fill the buffer)
The web server actually returned a null-terminated response
You should be able to tell the difference using Wireshark if nothing else.
Could it be that you are setting a size (wrong size) to the buffer you are loading?
You can use a StreamReader to avoid the temp buffer if you don't need it.
using(var stream = new StreamReader(httpResponse.GetResponseStream()))
{
string output = stream.ReadToEnd();
//...
}
Hmm... I doubt it returns a null-terminated string since simply there is no such concept in C#. At best you could have a string with a \0u0000 character at the end, but in this case it would mean that the return from the server contains such a character and the HttpWebRequest is simply doing it's duty and returns whatever the server returned.
Update
after reading your code, the mistake is pretty obvious: you are Read()-ing from a stream into a byte[] but not tacking notice of how much you actually read:
int responseLength = httpResponse.GetResponseStream().Read(
serverBuffer, 0, BUFFER_SIZE);
output = processResponse(System.Text.UTF8Encoding.UTF8.GetString(
serverBuffer, 0, responseLength));
this would fix the immediate problem, leaving only the other bugs in your code to deal with, like the fact that you cannot handle correctly a response larger than BUFFER_SIZE... I would suggest you open a XML document reader on the returned stream instead of manipulating the stream via an (unnecessary) byte[ ] copy operation:
Parser.Load(httpResponse.GetResponseStream());

Categories