Question marks in string due to character encoding in Request.Form - c#

I have a textarea where I type some unicode characters which become question marks by the time the string reaches the server.
On the input I typed the following:
Don’t “quote” me on that.
On the server I checked Request.Form["fieldID"] in Page_Load() and I saw:
"Don�t �quote� me on that."
I checked my web.config file and it says <globalization requestEncoding="utf-8" responseEncoding="utf-8" />. Anything else I should check to ensure UTF-8 is enabled?

Question marks like that generally show up when UTF-8 nulls are passed.
You need to HTML encode your strings.

Check the encoding of the Page where the form is, and/or the accept-charset of the form.
I can replicate what you are seeing with ISO-8859-1 - e.g.
<form action="foo" method="post" accept-charset="ISO-8859-1">
....
</form>
In VS watch window:
Inspecting Request.Form (before accessing the key itself):
message=Don%ufffdt+%ufffdquote%ufffd+me+on+that.
Inspecting Request.Form["message"] - accessing the collection keys which means ASP.Net has already automatically urldecoded:
"Don�t �quote� me on that."
It seems something is overriding your web.config settings on that specific page (?)
Hth...

Once I again I solve my own problem. It is quite simple. The short answer is add the following before sending any response back to the client:
Response.ContentType = "text/html; charset=utf-8";
The long answer is that a "feature" called Cache Mode circumvented all other response data by writing a UTF-8 encoded file that is really just a cached response. Adding that line before it write the file solved my problem.
if (cacheModeEnabled) {
Response.ContentType = "text/html; charset=utf-8"; // WriteFile doesn't know the file encoding
Response.WriteFile(Server.MapPath("CacheForm.aspx"), true);
Response.End();
} else {
// perform normal response here
}
Thanks for all the answers and comments. They definitely helped me solve this issue. Most notably, Fiddler2 let me see what the heck is really in the request and response.

Related

C# display XML from html POST

Got a problem here... If I put the XML file on the server, then I can read it through steamReader, convert to variable and got everything working in the MSSQL database.
However, it is required that I send through html POST, and it doesn't work for the code below:
page.Response.ContentType = "text/xml";
StreamReader reader = new StreamReader(page.Request.InputStream);
inputString = reader.ReadToEnd();
deleteShip(inputString);
it seems to me that the above code didn't get the XML that POST from my program. Because for the same code in deleteShip, if I use an xml on the server then it works fine.
Is there a way to solve this problem? As long as I can send any string to deleteShip(string s) then I'm happy. The string will be in XML format though
Thanks for the help!
It would be useful to see how the XML is POSTed to your program. Typically, data is sent from an HTML form as name-value pairs in the HTTP request body when using the POST method. It's not clear from your question whether you're using an HTML form to POST the XML to your program and it's hard to tell what might be going wrong without more information.
From your code it looks like you're reading the entire HTTP request where you'd usually read the value of a request parameter for example:
Request["XmlParameterName"]
Where XmlParameterName is the name of an HTML form input field.
Have you inspected the value of the inputString variable? Is it valid XML? Is it encoded correctly? Are any invalid characters like ampersands (&) escaped correctly?
Update your question with a bit more information if none of the things I mentioned are the problem.
OK, I got it fixed.
Here is the code.
System.IO.Stream stream;
string inputString;
Int32 stringLength;
stream = Request.InputStream;
stringLength = Convert.ToInt32(stream.Length);
byte[] stringArray = new byte[stringLength];
inputString = System.Text.Encoding.ASCII.GetString(stringArray, 0, stringLength);
deleteShip(inputString);
By this it will access the POST body from my html request (which in this case XML).

C# - Korean Encoding

This might be different with other Korean encoding questions.
There is this site I have to scrape and it's Korean.
An example sentence in their site is this
"개인정보보호를 위해 뒤로가기 버튼 대신 검색결과 화면 상단과 하단의 이전 버튼을 사용하시기 바랍니다."
I am using HttpWebRequest and HttpWebResponse to scrape the site.
this is how I retreive the html
-- partial code --
using (Stream data = resp.GetResponseStream())
{
response.Append(new StreamReader(data, Encoding.GetEncoding(code), true).ReadToEnd());
}
now my problem is, am not getting the correct Korean characters. In my "code" variable, I'm basing the code page here in MSDN http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx (let me narrow it down).
here are the Korean code pages:
51949, 50225, 20949, 20833, 10003, 949
but am still not getting the correct Korean characters? What you think is the problem?
It is very likely that the page is not in a specific Korean encoding, but one of the Unicode encodings.
Try Encoding.UTF8, Encoding.Default (UTF-16) instead of the specific code pages. There are also Encoding.UTF7 and Encoding.UTF32, but they are not as common.
To be certain, examine the meta tags and headers for the content-type returned by the server.
Update (gleaned from commments):
Since the content-type header is EUC-KR, the corresponding codepage is 51949 and this is what you need to use to retrieve the page.
It was not clear that you are writing this out to a file - you need to use the same encoding when writing the file out, or convert the byte[] from the original to the output file encoding (using Encoding.Convert).
While having exact same issue I've finished it with code below:
Encoding.UTF8.GetString(DownloadData(URL));
This directly transform output for the WebClient GET request to UTF8 encoding.

encoding when get page from net

I need to download a webpage, I have the following code to determe the encoding
System.IO.StreamReader sr=null;
mFrm.InfoShotcut("Henter webside....");
if(response.ContentEncoding!=null && response.ContentEncoding!="")
{
sr=new System.IO.StreamReader(srm,System.Text.Encoding.GetEncoding(response.ContentEncoding));
}
else
{
//System.Windows.Forms.MessageBox.Show();
sr=new System.IO.StreamReader(srm,System.Text.Encoding.GetEncoding(response.CharacterSet));
}
if(sr!=null)
{
result=sr.ReadToEnd();
if(response.CharacterSet!=GetCharatset(result))
{
System.Text.Encoding CorrectEncoding=System.Text.Encoding.GetEncoding(GetCharatset(result));
HttpWebRequest client2=(HttpWebRequest)HttpWebRequest.Create(Helper.value1);
HttpWebResponse response2=(HttpWebResponse)client2.GetResponse();
System.IO.Stream srm2=response2.GetResponseStream();
sr=new System.IO.StreamReader(srm2,CorrectEncoding);
result=sr.ReadToEnd();
}
}
mFrm.InfoShotcut("Henter webside......");
}
catch (Exception ex)
{
// handle error
MessageBox.Show( ex.Message );
}
And it had worked great, but now i have tried it with a site, where it states it uses
<pre>
<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</pre>
But realy is in UTF-8, how do I find out that sow i can save the file with the right encoding.
First off, the Content-Encoding header does not describe the character set being used. As the RFC says:
Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information.
The character set used is described in the Content-Type header. For example:
Content-Type: text/html; charset=UTF-8
Your code above that uses the Content-Encoding header will not correctly identify the character set. You have to look at the Content-Type header, find the semicolon if it's there, and then parse the charset parameter.
And, as you've discovered, it can also be described in an HTML META tag.
Or, there might not be a character set definition at all, in which case you have to default to something. My experience has been that defaulting to UTF-8 is a good choice. It's not 100% reliable, but it seems that sites that don't include the charset parameter with the Content-Type field usually default to UTF-8. I've also found that META tags, when they exist, are wrong almost half the time.
As L.B mentioned in his comment, it's possible to download the bytes and examine them to determine the encoding. That can be done with a surprising degree of accuracy, but it requires a lot of code.

c# with SOAP - problem with utf-8 encoding

I'm using automatic conversion from wsdl to c#, everything works apart from encoding, whenever
I have native characters (like 'ł' or 'ó') I get '??' insted of them in string fields ('G????wny' instead of 'Główny'). How to deal with it? Server sends document with correct encoding, with header .
EDIT: I noticed in Wireshark, that packets send FROM me have BOM, but packets sends TO me, don't have it - maybe it's a root of problem?
So maybe the following will help:
What I am sure I did is:
In the webservice PHP file, after connecting to the Mysql Database I call:
mysql_query("SET CHARSET utf8");
mysql_query("SET NAMES utf8 COLLATE utf8_polish_ci");
The second I did:
In the same PHP file,
I added utf8_encode to the service on the $POST_DATA variable:
$server->service(utf8_encode($POST_DATA));
in the class.nusoap_base.php I changed:
`//var $soap_defencoding = 'ISO-8859-1';
var $soap_defencoding = 'UTF-8';`
and olso in the nusoap.php the same as above:
//var $soap_defencoding = 'ISO-8859-1';
var $soap_defencoding = 'UTF-8';
and in the nusoap.php file again:
var $decode_utf8 = true;
Now I can send and receive properly encoded data.
Hope this helps.
Regards,
The problem was on the server side with sent Content-Type parameter in header (it was set to "text/xml"). It occurs that for utf-8 it HAVE TO be "text/xml; charset=utf-8", other methods such as placing BOM aren't correct (according to RFC 3023). More info here: http://annevankesteren.nl/2005/03/text-xml

C# Writing to the output stream

This code will always make my aspx page load twice. And this has nothing to do with AutoEventWireup.
Response.Clear();
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-Disposition", "inline;filename=data.pdf");
Response.BufferOutput = true;
byte[] response = GetDocument(doclocation);
Response.AddHeader("Content-Length", response.Length.ToString());
Response.BinaryWrite(response);
Response.End();
This code will only make my page load once (as it should) when I hardcode some dummy values.
Response.Clear();
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-Disposition", "inline;filename=data.pdf");
Response.BufferOutput = true;
byte[] response = new byte[] {10,11,12,13};
Response.AddHeader("Content-Length", response.Length.ToString());
Response.BinaryWrite(response);
Response.End();
I have also increased the request length for good measure in the web.config file.
<httpRuntime executionTimeout="180" maxRequestLength="400000"/>
Still nothing. Anyone see something I don't?
GetDocument(doclocation);
May be this method somehow returns Redirection code ? or may be an iframe or img for your dynamic content?
If so:
In general the control could get called twice because of the url response. First it renders the content. After that your browser tries to download the tag (iframe,img) source which is actually a dynamic content that is generated. So it makes another request to the web server. In that case another page object created which has a different viewstate, because it is a different Request.
Have you found a resolution to this yet? I having the same issue, my code is pretty much a mirror of yours. Main difference is my pdf is hosted in an IFrame.
So interesting clues I have found:
If I stream back a Word.doc it only gets loaded once, if pdf it gets loaded twice. Also, I have seen different behavior from different client desktops. I am thinking that Adobe version may have something to do with it.
Update:
In my case I was setting the HttpCacheability to NoCache. In verifying this, any of the non client cache options would cause the double download of the pdf. Only not setting it at all (defaults to Private) or explicitly setting it to Private or Public would fix the issue, all other settings duplicated the double load of the document.
Quick Guess: Could it be that at this stage in the page life cycle, the class that contains GetDocument() has already gone through garbage collection? The ASP.NET Worker process then needs to reload the page in order to read that method again?
Have you tried it in the Page_Load ? and why is GetDocument a static method?

Categories