Encoding array of bytes from string (Polish fonts) - c#

I can't handle with encoding in my language (Poland).
When I write żółw it works like a charm, but when I write ślimak there isn't ś in my array.
I tried also with UTF-8, but with no results.
Here is encoding in 1250. Works with ż,ó,ł, not with ą,ź....
byte[] buffer = Encoding.GetEncoding(1250).GetBytes(postdata);
Above code is used to communicate with web server, so I think the problem is before communication.
Tried also:
byte[] buffer = Encoding.GetEncoding(28592).GetBytes(postdata); //iso-8859-2 Central European (ISO)
Solved, iso-8859-2 Central European (ISO) was the correct answer. (I was running old exe project file).

You should not expect there to be a ś in the array; it needs to be encoded, and the encoded value is differerent. I would advise using UTF-8 here in which case you should expect 0xC5 0x9B in the output, as that is the UTF-8 encoding of ś.
If you use 28592, then 0xB6 is the encoded form, and round-trips successfully.

byte[] buffer = Encoding.GetEncoding(28592).GetBytes(postdata); //iso-8859-2 Central European (ISO)
Solved, iso-8859-2 Central European (ISO) was the correct answer. (I was running old exe project file).

Related

SendGrid inbound parse nordic chars

Completely stuck on a problem related to the inbound parse webhook functionality offered by SendGrid: https://sendgrid.com/docs/for-developers/parsing-email/setting-up-the-inbound-parse-webhook/
First off everything is working just fine with retrieving the mail sent to my application endpoint. Using Request.Form I'm able to retrieve the data and work with it.
The problem is that we started noticing question mark symbols instead of letters when recieving some mails (written in swedish using Å Ä and Ö). This occured both when sending plaintext mails, and mails with an HTML-body.
However, this only happens every now and then. After a lot of searching I found out that if the mail is sent from e.g. Postbox or Outlook (or the like), and the application has the charset set to iso-8859-1 that's when Å Ä Ö is replaced by question marks.
To replicate the error and be able to debug it I set up a HTML page with a form using the iso-8859-1 encoding, sending a similar payload as the one seen in the link above (the default one). And after that been through testing a multitude of things trying to get it to work.
As of now I'm trying to recode the input, without success. Code I'm testing:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(Request.Form.["html"]);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8,wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
This only results in the utf8String producing the same result with "???" where Å Ä Ö should be. My guess here is that perhaps it's due to the Request.Form["html"] returning a UTF-16 string, of the content that is encoded already in the wrong encoding iso-8859-1.
The method for fetching the POST is as follows
public async Task<InboundParseModel> FetchMail(IFormCollection form)
{
InboundParseModel _em = new InboundParseModel
{
To = form["to"].SingleOrDefault(),
From = form["from"].SingleOrDefault(),
Subject = form["subject"].SingleOrDefault(),
Html = form["html"].SingleOrDefault(),
Text = System.Net.WebUtility.HtmlEncode(form["text"].SingleOrDefault()),
Envelope = form["envelope"].SingleOrDefault()
};
}
Called from another method that the POST is done to by FetchMail(Request.Form);
Project info: ASP.NET Core 2.2, C#
So as stated earlier, I am completely stuck and don't really have any ideas on how to solve this. Any help would be much appreciated!

Converting windows-1252 encoding to UTF-8 in Silverlight

In my Silverlight Application I am getting an XML File encoded with windows-1252.
Now my Problem it won't display correctly until the windows-1252 string is converted to a UTF8 string.
In a normal C# enviornment that wouldn't be that big of a problem: There I could do something like this:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
(Convert a string's character encoding from windows-1252 to utf-8)
But silverlight doesn't support windows-1252 - it is unicode only.
PS
I stumbled upon "Encoding for Silverlight" http://encoding4silverlight.codeplex.com/ - but it seems there is no support for windows-1252 there either?
EDIT:
I solved my problem on the "Server Side" - The actual problem is still open.
Encoding for Silverlight is a third party encoding system but only supported all DBCS (Double-Byte Character Set) now. However, windows-1252 is SBCS (Single-Byte Character Set).
But you can write a encoder/decoder for Encoding for Silverlight, I Think will be very easy.

Detect Stream or Byte Array Encoding on Windows Phone

I'm trying to read xmls that I downloaded with the WebClient.OpenReadAsync() in a Windows Phone application. The problem is that sometimes, the xml won't come with UTF8 Encoding, it might come with other encodings such as "ISO-8859-1", so the accents come messed up.
I was able to load one of the ISO-8858-1 xmls perfectly using the code:
var buff = e.Result.ReadFully(); //Gets byte array from the stream
var resultBuff = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.UTF8, buff);
var result = Encoding.UTF8.GetString(resultBuff, 0, resultBuff.Length);
It works beautifully with ISO-8859-1, the text came perfect after, but It messed up the UTF8 xmls.
So, the idea here is to detect the encoding of the byte array or the stream before doing this, then if it's not UTF8, it will convert the data using the method above with the detected encoding.
I am searching for some method that can detect the encoding on the internet but I cannot find any!
Does anybody know how I could do this kind of thing on Windows Phone?
Thanks!
You can look for the "Content-Type" value in the WebClient.ResponseHeaders property; If you are lucky the server is setting it to indicate the type of media plus its encoding (e.g. "text/html; charset=ISO-8859-4").

Encoding problem between C# TCP server and Java TCP Client

i'm facing some encoding issue which i'm not able to find the correct solution.
I have a C# TCP server, running as a window service which received and respond XML, the problem comes down when passing special characters in the output such as spanish characters with accents (like á,é,í and others).
Server response is being encoded as UTF-8, and java client is reading using UTF-8. But when i print its output the character is totally different.
This problem only happens in Java client(C# TCP client works as expected).
Following is an snippet of the server code that shows the encoding issue:
C# Server
byte[] destBytes = System.Text.Encoding.UTF8.GetBytes("á");
try
{
clientStream.Write(destBytes, 0, destBytes.Length);
clientStream.Flush();
}catch (Exception ex)
{
LogErrorMessage("Error en SendResponseToClient: Detalle::", ex);
}
Java Client:
socket.connect(new InetSocketAddress(param.getServerIp(), param.getPort()), 20000);
InputStream sockInp = socket.getInputStream();
InputStreamReader streamReader = new InputStreamReader(sockInp, Charset.forName("UTF-8"));
sockReader = new BufferedReader(streamReader);
String tmp = null;
while((tmp = sockReader.readLine()) != null){
System.out.println(tmp);
}
For this simple test, the output show is:
ß
I did some testing printing out the byte[] on each language and while on C# á output as:
195, 161
In java byte[] read print as:
-61,-95
Will this have to do with the Signed (java), UnSigned (C#) of byte type?.
Any feedback is greatly appreciated.
To me this seems like an endianess problem... you can check that by reversing the bytes in Java before printing the string...
which usually would be solved by including a BOM... see http://de.wikipedia.org/wiki/Byte_Order_Mark
Are you sure that's not a unicode character you are attemping to encode to bytes as UTF-8 data?
I found the below has a useful way of testing to see if the data in that string is ccorrect UTF-8 before you send it.
How to test an application for correct encoding (e.g. UTF-8)

CSV encoding issues (Microsoft Excel)

I am dynamically creating CSV files using C#, and I am encountering some strange encoding issues. I currently use the ASCII encoding, which works fine in Excel 2010, which I use at home and on my work machine. However, the customer uses Excel 2007, and for them there are some strange formatting issues, namely that the '£' sign (UK pound sign) is preceded with an accented 'A' character.
What encoding should I use? The annoying thing is that I can hardly test these fixes as I don't have access to Excel 2007!
I'm using Windows ANSI codepage 1252 without any problems on Excel 2003. I explicitly changed to this because of the same issue you are seeing.
private const int WIN_1252_CP = 1252; // Windows ANSI codepage 1252
this._writer = new StreamWriter(fileName, false, Encoding.GetEncoding(WIN_1252_CP));
I've successfully used UTF8 encoding when writing CSV files intended to work with Excel.
The only problem I had was making sure to use the overload of the StreamWriter constructor that takes an encoding as a parameter. The default encoding of StreamWriter says it is UTF8 but it's really UTF8-Without-A-Byte-Order-Mark and without a BOM Excel will mess up characters using multiple bytes.
You need to add Preamble to file:
var data = Encoding.UTF8.GetBytes(csv);
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(new MemoryStream(result), "application/octet-stream", "file.csv");

Categories