c# converting a .csv file from Windows UTF-8 to w1252 - c#

I need to convert a .csv file from UTF-8 to W1252 (West European).
I have tried the example from the MSDN page and the following code without succes
Encoding utf8 = Encoding.UTF8;
//Encoding utf8 = new UTF8Encoding();
Encoding win1252 = Encoding.GetEncoding(1252);
string src = today.ToString("dd-MM-yyyy") + "-ups.csv";
string source = File.ReadAllText(src);
byte[] input = source.ToUTF8ByteArray();
byte[] output = Encoding.Convert(utf8, win1252, input);
File.WriteAllText(src + "w1252", win1252.GetString(output));
with the extension method
public static class StringHelper
{
public static byte[] ToUTF8ByteArray(this string str)
{
Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(str);
}
}
After this, the file still reads with broken characters when opened as W1252 and works perfectly if opening with UTF-8, confirming that it is not good.
Thanks!

Why not read in the initial encoding (Encoding.UTF8), and write in target one (Encoding.GetEncoding(1252)):
string fileName = #"C:\MyFile.csv";
File.WriteAllText(fileName, File
.ReadAllText(fileName, Encoding.UTF8), Encoding.GetEncoding(1252));

Related

Websocket having Error: Frame must be terminated with a null octet while using Cyrillic instead of english

I am having a problem with sending a cyrillic (russian letters) instead of english ones to server(java spring boot utf-8). Here are my frames examples below. The one with english works fine, but cyrillic have a wrong calculating null octet. I am using websocket-csharp-net-stomp-client for it.
I have also tried to change encoding of the string with message to UTF-8
The one that works:
The one that does not work:
public static string SendMessage(string messageText, string chatID)
{
Encoding utf16 = Encoding.GetEncoding("utf-16"); //also tried encode by 1251 instead of utf-16
Encoding utf8 = Encoding.UTF8;
byte[] utf8Bytes = utf8.GetBytes(messageText);
byte[] utf16Bytes = Encoding.Convert(utf8, utf16, utf8Bytes);
string msg = utf16.GetString(isoBytes);
StompMessageSerializer serializer = new StompMessageSerializer();
var content = new MessageContent() { text = msg };
var broad = new StompMessage("SEND", JsonConvert.SerializeObject(content));
broad["token"] = $"{Global.AuthCompTokenFinal}";
broad["contentType"] = "application/json";
broad["destination"] = $"/app/send/{chatID}";
var str = serializer.Serialize(broad);
Console.WriteLine(str);
Global.ws.Send(str);
return str;
}
content length is getting here (library text)
internal StompMessage(string command, string body, Dictionary<string, string> headers)
{
stompCommand = command;
Body = body;
nativeHeaders = headers;
this["content-length"] = body.Length.ToString();
}
What am I missing here?
Here is an error example:
Just deleted this["content-length"] = body.Length.ToString(); and encoding to UTF-8
and it works fine . Wow

C# equivalent to parse cryptojs

I'm trying to create C# that does this in CryptoJS
var hash = CryptoJS.HmacSHA512(msg, key);
var crypt = CryptoJS.enc.Utf8.parse(hash.toString());
var base64 = CryptoJS.enc.Base64.stringify(crypt);
My question is in the second statement where hash variable is put into a string then parsed.
Is there an equivalent in C#? Once parsed how do you encode the result into Utf8.
Thanks
I'm not 100% if I understand exactly which piece you are looking for here. But there is no such thing as a UTF8 System.String in C#. However when you write a string to a stream you can choose the encoding of the bytes in the stream to be UTF8
For example by passing that encoding as an option to a StreamWriter.
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8)) {
writer.Write(text);
}
My boss find the answer to this. The difference is that before you return the base64 string using C# you have to change the bytes into hexadecimal.
var encoder = new UTF8Encoding();
byte[] keyBytes = encoder.GetBytes(key);
var newlinemsg = action + "\n" + msg;
byte[] messageBytes = encoder.GetBytes(newlinemsg);
byte[] hashBytes = new HMACSHA512(keyBytes).ComputeHash(messageBytes);
var hexString = ToHexString(hashBytes);
var base64 = Convert.ToBase64String(encoder.GetBytes(hexString));

convert a string from ISO-8859-5 to UTF8

I'm writing an application for windows mobile. I use a scan, i get a string encoding ISO-8859-5.How do I convert a string in UTF8?
Here is my code
var str_source = "³¿±2";
Console.WriteLine(str_source);
Encoding iso = Encoding.GetEncoding("iso-8859-5");
Encoding utf8 = Encoding.UTF32;
byte[] utfBytes = utf8.GetBytes(str_source);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
var str_result = iso.GetString(isoBytes, 0, isoBytes.Length);
Console.WriteLine(str_result);
You should never start off your testing code with using string literals when dealing with encoding issues. Always use bytes to start with.
Encoding iso = Encoding.GetEncoding("iso-8859-5");
Encoding utf = Encoding.UTF8;
var isoBytes = new byte[] { 228, 232 }; // фш
// iso to utf8
var utfBytes = Encoding.Convert(iso, utf, isoBytes);
// utf8 to iso
var isoBytes2 = Encoding.Convert(utf, iso, utfBytes);
// get all strings (with the correct encoding)
// all 3 strings will contain фш
string s1 = iso.GetString(isoBytes);
string s2 = utf.GetString(utfBytes);
string s3 = iso.GetString(isoBytes2);
Edit: If you do want to use string literals to get you started, then you can use the code below to change their encoding (Encoding.Unicode) to the expected 'incoming text' encoding:
string stringLiteral = "фш";
Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding("iso-8859-5"),
Encoding.Unicode.GetBytes(stringLiteral)); // { 228, 232 }

Convert UTF-16 text to another encoding (Windows-1250)

I have a text in a variable, text, encoded in the default (UTF-16) encoding. I would like to change it to Windows-1250. I have:
public static string EncodeToWin1250(string text)
{
Encoding unicode = Encoding.Unicode;
Encoding win1250 = Encoding.GetEncoding(1250);
byte[] unicodeBytes = unicode.GetBytes(text);
byte[] win1250Bytes = Encoding.Convert(unicode, win1250, unicodeBytes);
char[] win1250Chars = new char[win1250.GetCharCount(win1250Bytes, 0, win1250Bytes.Length)];
win1250.GetChars(win1250Bytes, 0, win1250Bytes.Length, win1250Chars, 0);
text = new string(win1250Chars);
return text;
}
but so far it doesn't work.
How do I fix this problem?
I am returning the string as a file:
[...]
result = BLL.DataExchange.MoneyS3.MoneyS3Export.EncodeToWin1250(result);
Context.Response.Clear();
Context.Response.AddHeader("Content-Disposition", "attachment; filename=invoicesIssued.xml");
Context.Response.ContentType = "application/octet-stream";
Context.Response.BufferOutput = false;
Context.Response.Write(result);
Context.Response.Flush();
Context.Response.Close();
All strings are stored internally as Unicode in .NET.
You can convert a string to a byte stream using a codepage, as your code does. But your can't change the internal representation of the string: It's Unicode (encoded as UTF16), period.
You may dump your encoded byte stream to a file or wherever you want. But you can't change the internal encoding of .NET string objects.
Your function should return a byte[] instead of a string (win1250Chars actually)

Convert a string's character encoding from windows-1252 to utf-8

I had converted a Word Document(docx) to html, the converted html has windows-1252 as its character encoding. In .Net for this 1252 character encoding all the special characters are being displayed as '�'. This html is being displayed in a Rad Editor which displays correctly if the html is in Utf-8 format.
I had tried the following code but no vein
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
char[] utf8Chars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0);
string utf8String = new string(utf8Chars);
Any suggestions on how to convert the html into UTF-8?
This should do it:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
Actually the problem lies here
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
We should not get the bytes from the html String. I tried the below code and it worked.
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
How you are planning to use resulting html? The most appropriate way in my opinion to solve your problem would be add meta with encoding specification. Something like:
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
Use Encoding.Convert method. Details are in the Encoding.Convert method MSDN article.

Categories