Converting windows-1252 encoding to UTF-8 in Silverlight

Converting windows-1252 encoding to UTF-8 in Silverlight - c#

In my Silverlight Application I am getting an XML File encoded with windows-1252.
Now my Problem it won't display correctly until the windows-1252 string is converted to a UTF8 string.
In a normal C# enviornment that wouldn't be that big of a problem: There I could do something like this:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
(Convert a string's character encoding from windows-1252 to utf-8)
But silverlight doesn't support windows-1252 - it is unicode only.
PS
I stumbled upon "Encoding for Silverlight" http://encoding4silverlight.codeplex.com/ - but it seems there is no support for windows-1252 there either?
EDIT:
I solved my problem on the "Server Side" - The actual problem is still open.

Encoding for Silverlight is a third party encoding system but only supported all DBCS (Double-Byte Character Set) now. However, windows-1252 is SBCS (Single-Byte Character Set).
But you can write a encoder/decoder for Encoding for Silverlight, I Think will be very easy.

Related

Windows-1252 Encoding in StreamWriter return ANSI encoded file

I try to encode a string in Windows-1252 with a StreamWriter. The input string (dataString) is encode in UTF8.
StreamWriter sw = new StreamWriter(#"C:\Temp\data.txt", true, Encoding.GetEncoding(1252));
sw.Write(dataString);
sw.Close();
When I open the file in Notepad++ I get a ANSI file. I need a Windows-1252 encoded file.
Someone have an idea?

Your file is Windows-1252 encoded. There is no data in the file of a non-Unicode to indicate how the file is encoded. In this case ANSI just means not Unicode. If you where to encode the as Russian/Windows-1251 and open it in Notepad++, Notepad++ would display it as ANSI as well.
See Unicode, UTF, ASCII, ANSI format differences for more info.

ICQ encoding of Special Characters

I'm working with ICQ protocol and I found problem with special letters (fxp diacritics). I read that ICQ using another encoding (CP-1251 if I remember).
How can I decode string with text to correct encoding?
I've tried using UTF8Encoding class, but without success.
Using ICQ-sharp library.
private void ParseMessage (string uin, byte[] data)
{
ushort capabilities_length = LittleEndianBitConverter.Big.ToUInt16 (data, 2);
ushort msg_tlv_length = LittleEndianBitConverter.Big.ToUInt16 (data, 6 + capabilities_length);
string message = Encoding.UTF8.GetString (data, 12 + capabilities_length, msg_tlv_length - 4);
Debug.WriteLine(message);
}
If contact using the same client it's OK, but if not incoming and outcoming messages with diacritics are just unreadable.
I've determinated (using this -> https://stackoverflow.com/a/12853721/846232) that it's in BigEndianUnicode encoding. But if string not contains diacritics its unreadable (chinese letters). But if I use UTF8 encoding on text without diacritics its ok. But I don't know how to do that it will be encoded right allways.

If UTF-8 kinda works (i.e. it works for "english", or any US-ASCII characters), then you don't have UTF-16. Latin1 (or Windows-1252, Microsoft's variant), or e.g. Windows-1251 or Windows-1250 are perfectly possible though, since these the first part containing latin letters without diacritics are the same.
Decode like this:
var encoding = Encoding.GetEncoding("Windows-1250");
string message = encoding.GetString(data, 12 + capabilities_length, msg_tlv_length - 4);

Encoding array of bytes from string (Polish fonts)

I can't handle with encoding in my language (Poland).
When I write żółw it works like a charm, but when I write ślimak there isn't ś in my array.
I tried also with UTF-8, but with no results.
Here is encoding in 1250. Works with ż,ó,ł, not with ą,ź....
byte[] buffer = Encoding.GetEncoding(1250).GetBytes(postdata);
Above code is used to communicate with web server, so I think the problem is before communication.
Tried also:
byte[] buffer = Encoding.GetEncoding(28592).GetBytes(postdata); //iso-8859-2 Central European (ISO)
Solved, iso-8859-2 Central European (ISO) was the correct answer. (I was running old exe project file).

You should not expect there to be a ś in the array; it needs to be encoded, and the encoded value is differerent. I would advise using UTF-8 here in which case you should expect 0xC5 0x9B in the output, as that is the UTF-8 encoding of ś.
If you use 28592, then 0xB6 is the encoded form, and round-trips successfully.

byte[] buffer = Encoding.GetEncoding(28592).GetBytes(postdata); //iso-8859-2 Central European (ISO)
Solved, iso-8859-2 Central European (ISO) was the correct answer. (I was running old exe project file).

CSV encoding issues (Microsoft Excel)

I am dynamically creating CSV files using C#, and I am encountering some strange encoding issues. I currently use the ASCII encoding, which works fine in Excel 2010, which I use at home and on my work machine. However, the customer uses Excel 2007, and for them there are some strange formatting issues, namely that the '£' sign (UK pound sign) is preceded with an accented 'A' character.
What encoding should I use? The annoying thing is that I can hardly test these fixes as I don't have access to Excel 2007!

I'm using Windows ANSI codepage 1252 without any problems on Excel 2003. I explicitly changed to this because of the same issue you are seeing.
private const int WIN_1252_CP = 1252; // Windows ANSI codepage 1252
this._writer = new StreamWriter(fileName, false, Encoding.GetEncoding(WIN_1252_CP));

I've successfully used UTF8 encoding when writing CSV files intended to work with Excel.
The only problem I had was making sure to use the overload of the StreamWriter constructor that takes an encoding as a parameter. The default encoding of StreamWriter says it is UTF8 but it's really UTF8-Without-A-Byte-Order-Mark and without a BOM Excel will mess up characters using multiple bytes.

You need to add Preamble to file:
var data = Encoding.UTF8.GetBytes(csv);
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(new MemoryStream(result), "application/octet-stream", "file.csv");

Encoding issue in .NET

I have a requirement to encode and decode Japanese characters. I tried in JAVA and it worked fine with "Cp939" encoding but am unable to find that encoding in .NET. The 932 encoding doesn't encode all the characters and so i need to find out a way of implementing 939 encoding in .NET.
Java Code :
convStr = new String(str8859_1.getBytes("Cp037"), "Cp939");
.NET :
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(932), bytesConverted);
// This result is a junk of characters and is totally different
// from the expected output 'ﾆﾂﾎﾟﾝﾊﾞ'
convStr = Encoding.GetEncoding(1252).GetString(bytesConverted);

The encoded bytes are in the encoding 932, so why are you using the encoding 1252 when you convert the encoded bytes to a string?
The following should work:
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(932), bytesConverted);
// This result is a junk of characters and is totally different
// from the expected output 'ﾆﾂﾎﾟﾝﾊﾞ'
convStr = Encoding.GetEncoding(932).GetString(bytesConverted);

is this an error or just how you typed it ?
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(932), bytesConverted);
should be:
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(939), bytesConverted);
Surely ?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Converting windows-1252 encoding to UTF-8 in Silverlight - c#

Encoding for Silverlight is a third party encoding system but only supported all DBCS (Double-Byte Character Set) now. However, windows-1252 is SBCS (Single-Byte Character Set). But you can write a encoder/decoder for Encoding for Silverlight, I Think will be very easy.

Related

Windows-1252 Encoding in StreamWriter return ANSI encoded file

ICQ encoding of Special Characters

Encoding array of bytes from string (Polish fonts)

CSV encoding issues (Microsoft Excel)

Encoding issue in .NET

Categories

Resources