Good day!
I convert binary file into char array:
var bytes = File.ReadAllBytes(#"file.wav");
char[] outArr = new char[(int)(Math.Ceiling((double)bytes.Length / 3) * 4)];
var result = Convert.ToBase64CharArray(bytes, 0, bytes.Length, outArr, 0, Base64FormattingOptions.None);
string resStr = new string(outArr);
So, is it little endian?
And does it convert to UTF-8?
Thank you!
You don't have any UTF-8 here - and UTF-8 doesn't have an endianness anyway, as its code unit size is just a single byte.
Your code would be simpler as:
var bytes = File.ReadAllBytes(#"file.wav");
string base64 = Convert.ToBase64String(bytes);
If you then write the string to a file, that would have an encoding, which could easily be UTF-8 (and will be by default), but again there's no endianness to worry about.
Note that as base64 text is always in ASCII, each character within a base64 string will take up a single byte in UTF-8 anyway. Even if UTF-8 did have different representations for multi-byte values, it wouldn't be an issue here.
C# char represents a UTF-16 character element. So there is no UTF-8 here.
Since .net is little endian, and since char is two bytes wide, then the char array, and the string, are both stored in little endian byte order.
If you want to convert your byte array to base64 and then encode as UTF-8 do it like this:
byte[] base64utf8 = Encoding.UTF8.GetBytes(Convert.ToBase64String(bytes));
If you wish to save the base64 text to a file, encoded as UTF-8, you could do that like so:
File.WriteAllText(filename, Convert.ToBase64String(bytes), Encoding.UTF8);
Since UTF-8 is a byte oriented encoding, endianness is not an issue.
Related
I am converting my string to byte array using ASCII encoding using below code.
String data = "<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"
byte[] buffer = Encoding.ASCII.GetBytes(data);
The problem i am facing is it's adding "?" in my string.
Now if i again convert back my byte array to string
var str = System.Text.Encoding.Default.GetString(buffer);
my string becomes
string str = "?<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"
Does any one know why it's adding "?" in my string and how to remove it.
It seems that you showed only simplified code. Am I right that you read data from a file? If yes, check for a BOM (byte order mark) field at the begining of the file. It is used for encoding: UTF-8, UTF-16 and UTF-32.
There a several things wrong here. One is not showing the relevant code.
Nonetheless, if you use valid methods to read text from a UTF-8, UTF-32, etc file, you won't have a BOM in your string because the string will hold the text and the BOM is not part of the text.
One the other hand, if you are reading an XML file, it is not a "text" file. You should use an XML reader. That would take care to use the encoding that is (most likely) indicated in the file.
And, when you write an XML file (which I presume you'll be doing with the byte array), you should use an XML writer. That would take care to use the encoding you specify and write it into the file.
Keep in mind, though, that conversion from Unicode (for which UTF-8 is one encoding) to some other character set can silently corrupt your data with a replacement character (typically '?') for those that are not in the target character set.
Here is my extension method:
public static byte[] ToByteArray(this string str)
{
var bytes = new byte[str.Length * sizeof(char)];
Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
I'm trying to convert some strings from UTF 16 LE to UTF 16 BE but it fails to encode the second Chinese character.
Sample string: test馨俞
Code:
byte[] bytes = Encoding.Unicode.GetBytes(sendMsg.Text);
sendMsg.Text = Encoding.BigEndianUnicode.GetString(bytes)
I've also tried
var encode = new UnicodeEncoding(false, true, true);
var messageAsBytes = encode.GetBytes(sendMsg.Text);
var enc = new UnicodeEncoding(true, true, true);
sendMsg.Text = enc.GetString(messageAsBytes);
Which results in the following error: Unable to translate bytes [DE][4F] at index 184 from specified code page to Unicode on the line:
sendMsg.Text = enc.GetString(messageAsBytes);
Thanks.
I think you should process your input string with the BigEndianUnicode class.
I made this code from the one you provided. It works fine, without error:
String input = "馨俞";
var messageAsBytes = Encoding.BigEndianUnicode.GetBytes(input);
input = Encoding.BigEndianUnicode.GetString(messageAsBytes);
If I process "input" with Encoding.Unicode, and print out both byte arrays (the one processed with unicode and the one with big endian), it show the differences:
So, input is converted to the endian you need.
The result of encoding a string is a byte array, not another string.
Just use
byte[] bytes = Encoding.BigEndianUnicode.GetBytes(sendMsg.Text);
to encode the string to bytes using the UTF 16 BE encoding.
Then send those bytes to the mainframe.
How you send those bytes to the mainframe may be the topic of another question, but it sounds like you somehow need to present those encoded bytes in a variable of type string. That sounds like a bug in the library you are using. We would need to understand the nature of that library and its possible bug to find a workaround. One option you could try, but it's a shot in the dark, is this:
string toSend = Encoding.Default.GetString(bytes);
That will produce a string where each character is the representation of one byte from the encoded string, in UTF 16 BE order. It's length will be double the length of the original string.
I got it working by setting this property without any conversion.
sendMsg.SetIntProperty(XMSC.JMS_IBM_CHARACTER_SET, 1201);
var test = "sdfsdfsdfasfwerqwer";
var q = UTF8Encoding.UTF8.GetBytes(test);
var sha256 = SHA256.Create();
var hash = sha256.ComputeHash(q);
var z = UTF8Encoding.UTF8.GetString(hash);
var t = UTF8Encoding.UTF8.GetBytes(z);
In the above example, hash and t have different values. Why is this?
hash is not an UTF-8 encoded byte array, just some random bytes. Note: not all byte arrays are valid as UTF-8, UTF-8 has its own rules. Therefore, it cannot necessarily be decoded into a string. (Specifically, invalid bytes are usually decoded into a question mark in .NET.)
You can try a regular 8-bit encoding which supports all possible byte arrays, like ISO-8859-1. Of course you will still get garbage when you try to read that as a string, but it should work back and forth.
If you are trying to transfer a random byte array as a string, I suggest you use BASE-64 encoding, which converts byte arrays to an ASCII string, which should be safe in all circumstances.
[return: System.Xml.Serialization.XmlElementAttribute("return", DataType="base64Binary")]
public byte[] get(...)
I am trying to get a xml(utf-8) from this webservice. I have tried multiple things to try and get the xml out of the byte array like:
stream
encoding
decoder
converter
[Extra info]
When decoding the byte array with Encoding.UTF8.GetString(bytes)
I get a string with strange signs and symbols but also with some text
Starting with: %PDF-1.4
[SOLUTION]
Writing the byte array to a pdf file makes it readable.
I think the web service provides a byte stream that is simply base64-encoded data represented as integers instead of chars. I believe that the base64 chars are a subset of ASCII, so you need to convert the byte array to ASCII (i.e. base64 represented as chars), then convert these chars from base64:
var base64AsAscii = Encoding.ASCII.GetString(bytesFromWebService);
var decodedBytes = Convert.FromBase64String(bytesAsAscii);
var text = Encoding.UTF8.GetString(decodedBytes);
You can try Convert.ToBase64String.
my question is very simple but at the moment i don't know how to do this. I have a string in ISO-8859-1 format and i need to convert this string to UTF-8. I need to do it in c# on windows phone 7 sdk. How can i do it? Thanks
The MSDN page for the Encoding class lists the recognized encodings.
28591 iso-8859-1 Western European (ISO)
For your question the correct choice is iso-8859-1 which you can pass to Encoding.GetEncoding.
var inputEncoding = Encoding.GetEncoding("iso-8859-1");
var text = inputEncoding.GetString(input);
var output = Encoding.Utf8.GetBytes(text);
Two clarifications on the previous answers:
There is no Encoding.GetText method (unless it was introduced specifically for the WP7 framework). The method should presumably be Encoding.GetString.
The Encoding.GetString method takes a byte[] parameter, not a string. All strings in .NET are internally represented as UTF-16; there is no way of having a “string in ISO-8859-1 format”. Thus, you must be careful how you read your source (file, network), rather than how you process your string.
For example, to read from a text file encoded in ISO-8859-1, you could use:
string text = File.ReadAllText(path, Encoding.GetEncoding("iso-8859-1"));
To save to a text file encoded in UTF-8, you could use:
File.WriteAllText(path, text, Encoding.UTF8);
Reply to comment:
Yes. You can use Encoding.GetString to decode your byte array (assuming it contains character values for text under a particular encoding) into a string, and Encoding.GetBytes to convert your string back into a byte array (possibly of a different encoding), as demonstrated in the other answers.
The concept of “encoding” relates to how byte sequences (be they a byte[] array in memory or the content of a file on disk) are to be interpreted. The string class is oblivious to the encoding that the text was read from, or should be saved to.
You can use Convert which works pretty well, especially when you have byte array:
var latinString = "Řr"; // år
Encoding latinEncoding = Encoding.GetEncoding("iso-8859-1");
Encoding utf8Encoding = Encoding.UTF8;
byte[] latinBytes = latinEncoding.GetBytes(latinString);
byte[] utf8Bytes = Encoding.Convert(latinEncoding, utf8Encoding, latinBytes);
var utf8String = Encoding.UTF8.GetString(utf8Bytes);