I'm writing an application for windows mobile. I use a scan, i get a string encoding ISO-8859-5.How do I convert a string in UTF8?
Here is my code
var str_source = "³¿±2";
Console.WriteLine(str_source);
Encoding iso = Encoding.GetEncoding("iso-8859-5");
Encoding utf8 = Encoding.UTF32;
byte[] utfBytes = utf8.GetBytes(str_source);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
var str_result = iso.GetString(isoBytes, 0, isoBytes.Length);
Console.WriteLine(str_result);
You should never start off your testing code with using string literals when dealing with encoding issues. Always use bytes to start with.
Encoding iso = Encoding.GetEncoding("iso-8859-5");
Encoding utf = Encoding.UTF8;
var isoBytes = new byte[] { 228, 232 }; // фш
// iso to utf8
var utfBytes = Encoding.Convert(iso, utf, isoBytes);
// utf8 to iso
var isoBytes2 = Encoding.Convert(utf, iso, utfBytes);
// get all strings (with the correct encoding)
// all 3 strings will contain фш
string s1 = iso.GetString(isoBytes);
string s2 = utf.GetString(utfBytes);
string s3 = iso.GetString(isoBytes2);
Edit: If you do want to use string literals to get you started, then you can use the code below to change their encoding (Encoding.Unicode) to the expected 'incoming text' encoding:
string stringLiteral = "фш";
Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding("iso-8859-5"),
Encoding.Unicode.GetBytes(stringLiteral)); // { 228, 232 }
Related
Example:
"Заполни профиль"
I try
var latinString = "Заполни профиль"; // år
Encoding latinEncoding = Encoding.GetEncoding("iso-8859-1");
Encoding utf8Encoding = Encoding.GetEncoding("WINDOWS-1252");
byte[] latinBytes = latinEncoding.GetBytes(latinString);
byte[] utf8Bytes = Encoding.Convert(latinEncoding, utf8Encoding, latinBytes);
var utf8String = Encoding.UTF8.GetString(utf8Bytes);
but it doesn't work:
�?аполни п�?о�?ил�?
this is russian text, help plz
It seems, latinString is UTF-8 string in Win-1252 encoding. Let's return it back into UTF-8:
// Uncomment in case of .Net Core or .Net 5
// Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var latinString = "Заполни профиль";
string result = Encoding.UTF8.GetString(
Encoding.GetEncoding(1252).GetBytes(latinString));
// Let's have a look
Console.Write(result);
Outcome:
Заполни профиль
I am having a problem with sending a cyrillic (russian letters) instead of english ones to server(java spring boot utf-8). Here are my frames examples below. The one with english works fine, but cyrillic have a wrong calculating null octet. I am using websocket-csharp-net-stomp-client for it.
I have also tried to change encoding of the string with message to UTF-8
The one that works:
The one that does not work:
public static string SendMessage(string messageText, string chatID)
{
Encoding utf16 = Encoding.GetEncoding("utf-16"); //also tried encode by 1251 instead of utf-16
Encoding utf8 = Encoding.UTF8;
byte[] utf8Bytes = utf8.GetBytes(messageText);
byte[] utf16Bytes = Encoding.Convert(utf8, utf16, utf8Bytes);
string msg = utf16.GetString(isoBytes);
StompMessageSerializer serializer = new StompMessageSerializer();
var content = new MessageContent() { text = msg };
var broad = new StompMessage("SEND", JsonConvert.SerializeObject(content));
broad["token"] = $"{Global.AuthCompTokenFinal}";
broad["contentType"] = "application/json";
broad["destination"] = $"/app/send/{chatID}";
var str = serializer.Serialize(broad);
Console.WriteLine(str);
Global.ws.Send(str);
return str;
}
content length is getting here (library text)
internal StompMessage(string command, string body, Dictionary<string, string> headers)
{
stompCommand = command;
Body = body;
nativeHeaders = headers;
this["content-length"] = body.Length.ToString();
}
What am I missing here?
Here is an error example:
Just deleted this["content-length"] = body.Length.ToString(); and encoding to UTF-8
and it works fine . Wow
Here is my problem, Im trying to Encode the response of my webservice with the following Code.
public static string ConvertToUTF8(string Cadena)
{
string mensajeex = Cadena;
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(mensajeex);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[utf8.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
utf8.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string Utf8string = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", mensajeex);
Console.WriteLine("Ascii converted string: {0}", Utf8string);
return Utf8string;
}
And actually it works! But when I try to Encode a string and then pass through an exception as a Message property like this
throw new Exception(XMLHelper.ConvertToUTF8(Message));
It give me the response wrong like:
El valor 'R' no es válido segú
Any ideas? Thanks
I need to convert a .csv file from UTF-8 to W1252 (West European).
I have tried the example from the MSDN page and the following code without succes
Encoding utf8 = Encoding.UTF8;
//Encoding utf8 = new UTF8Encoding();
Encoding win1252 = Encoding.GetEncoding(1252);
string src = today.ToString("dd-MM-yyyy") + "-ups.csv";
string source = File.ReadAllText(src);
byte[] input = source.ToUTF8ByteArray();
byte[] output = Encoding.Convert(utf8, win1252, input);
File.WriteAllText(src + "w1252", win1252.GetString(output));
with the extension method
public static class StringHelper
{
public static byte[] ToUTF8ByteArray(this string str)
{
Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(str);
}
}
After this, the file still reads with broken characters when opened as W1252 and works perfectly if opening with UTF-8, confirming that it is not good.
Thanks!
Why not read in the initial encoding (Encoding.UTF8), and write in target one (Encoding.GetEncoding(1252)):
string fileName = #"C:\MyFile.csv";
File.WriteAllText(fileName, File
.ReadAllText(fileName, Encoding.UTF8), Encoding.GetEncoding(1252));
I have a text in a variable, text, encoded in the default (UTF-16) encoding. I would like to change it to Windows-1250. I have:
public static string EncodeToWin1250(string text)
{
Encoding unicode = Encoding.Unicode;
Encoding win1250 = Encoding.GetEncoding(1250);
byte[] unicodeBytes = unicode.GetBytes(text);
byte[] win1250Bytes = Encoding.Convert(unicode, win1250, unicodeBytes);
char[] win1250Chars = new char[win1250.GetCharCount(win1250Bytes, 0, win1250Bytes.Length)];
win1250.GetChars(win1250Bytes, 0, win1250Bytes.Length, win1250Chars, 0);
text = new string(win1250Chars);
return text;
}
but so far it doesn't work.
How do I fix this problem?
I am returning the string as a file:
[...]
result = BLL.DataExchange.MoneyS3.MoneyS3Export.EncodeToWin1250(result);
Context.Response.Clear();
Context.Response.AddHeader("Content-Disposition", "attachment; filename=invoicesIssued.xml");
Context.Response.ContentType = "application/octet-stream";
Context.Response.BufferOutput = false;
Context.Response.Write(result);
Context.Response.Flush();
Context.Response.Close();
All strings are stored internally as Unicode in .NET.
You can convert a string to a byte stream using a codepage, as your code does. But your can't change the internal representation of the string: It's Unicode (encoded as UTF16), period.
You may dump your encoded byte stream to a file or wherever you want. But you can't change the internal encoding of .NET string objects.
Your function should return a byte[] instead of a string (win1250Chars actually)