How to convert string to "iso-8859-1"? - c#

How can i convert an UTF-8 string into an ISO-8859-1 string?

Try:
System.Text.Encoding iso_8859_1 = System.Text.Encoding.GetEncoding("iso-8859-1");
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// Unicode string.
string s_unicode = "abcéabc";
// Convert to ISO-8859-1 bytes.
byte[] isoBytes = iso_8859_1.GetBytes(s_unicode);
// Convert to UTF-8.
byte[] utf8Bytes = System.Text.Encoding.Convert(iso_8859_1, utf_8, isoBytes);

.NET Strings are all UTF-16 internally. There is no UTF-8 or ISO-8859-1 encoded System.String in .NET. To get the binary representation of the string in a particular encoding use System.Text.Encoding class:
byte[] bytes = Encoding.GetEncoding("iso-8859-1").GetBytes("hello world");

Related

Convert String to ByteArray in C#

I want to convert a String to ByteArray in C# for Decrypt some data.
When I get de String from the ByteArray created, it shows question marks (?).
Example code:
byte[] strTemp = Encoding.ASCII.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.ASCII.GetString(strTemp));
The string is "Ê<,,l"x¡" (With the double quotation mark) and the result to convert again to string is: ???l?x?
I hope this helps you:
To Get byte array from a string
private byte[] StringToByteArray(string str)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetBytes(str);
}
To get a string back from byte array:
private string ByteArrayToString(byte[] arr)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetString(arr);
}
For specific of that input, this BigEndianUnicode encoding seems works fine
byte[] strTemp = Encoding.BigEndianUnicode.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.BigEndianUnicode.GetString(strTemp));
`
You are getting a byte array for the ASCII representation of your string, but your string is Unicode.
C# uses Unicode to encode strings, Unicode being able to represent far more symbols as ASCII.
In your example, every symbol which has no ASCII representation is replaced by '?', this is why only 'l' and 'x' appear in the output.
The proper way to do it is to use a Unicode encoding instead:
byte[] strTemp = Encoding.UTF8.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.UTF8.GetString(strTemp));
Basically, any Unicode encoding can be used: UTF8, UTF32, Unicode, BigEndianUnicode (https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings).

UtF-8 gives extra string in German character

I have file name testtäöüßÄÖÜ . I want to convert in UTF-8 using c#.
string test ="testtäöüß";
var bytes = new List<byte>(test.Length);
foreach (var c in test)
bytes.Add((byte)c);
var retValue = Encoding.UTF8.GetString(bytes.ToArray());
after running this code my output is : 'testt mit Umlaute äöü?x. where mit Umlaute is extra
text.
Can anybody help me ?
Thanks in advance.
You can't do that. You can't cast an UTF-8 character to byte. UTF-8 for anything other than ASCII requires at least two bytes, byte can can't store this
Instead of creating a list, use
byte[] bytes = System.Text.Encoding.UTF8.GetBytes (test);
I think, Tseng means the following
Taken from: http://www.chilkatsoft.com/p/p_320.asp
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);
MessageBox.Show(s_unicode2);

Decoding an UTF-8 string to Windows-1256

I used this code to encode a UTF-8 string to Windows-1256 string:
string q = textBox1.Text;
UTF7Encoding utf = new UTF7Encoding();
byte[] winByte = Encoding.GetEncoding(1256).GetBytes(q);
string result = utf.GetString(winByte);
This code is working but I can't decode the result or encoded to original string!
How I can decode an encoded string (result variable) to same before converted (q variable)?
You are converting the strings incorrectly.
Have a look at the commented code below. The comments explain what is wrong, and how to do it correctly, but basically what is happening is:
Firstly you use Encoding.GetEncoding(1256).GetBytes(q) to convert a string (which is UTF16) to an ANSI codepage 1256 string.
Then you use a UTF7 encoding to convert it back. But that's wrong because you need to use an ANSI codepage 1256 encoding to convert it back:
string q = "ABئبئ"; // UTF16.
UTF7Encoding utf = new UTF7Encoding(); // Used to convert UTF16 to/from UTF7
// Convert UTF16 to ANSI codepage 1256. winByte[] will be ANSI codepage 1256.
byte[] winByte = Encoding.GetEncoding(1256).GetBytes(q);
// Convert UTF7 to UTF16.
// But this is WRONG because winByte is ANSI codepage 1256, NOT UTF7!
string result = utf.GetString(winByte);
Debug.Assert(result != q); // So result doesn't equal q
// The CORRECT way to convert the ANSI string back:
// Convert ANSI codepage 1256 string to UTF16
result = Encoding.GetEncoding(1256).GetString(winByte);
Debug.Assert(result == q); // Now result DOES equal q

C# Encoding.Convert Vs C++ MultiByteToWideChar

I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}

How to convert ’ to apostrophe in C#?

I have tried several combinations of ASCII, Latin1, Windows-1252, UTF-8 and Unicode to convert ’ to apostrophe in C#, but to no avail.
byte[] uBytes = Encoding.Unicode.GetBytes(questionString);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, uBytes);
string converted = Encoding.UTF8.GetString(utf8Bytes);
I am using this conversion chart to discover what each code should be: http://www.i18nqa.com/debug/utf8-debug.html
Try the following:
var bytes = Encoding.Default.GetBytes("’");
var text = Encoding.UTF8.GetString(bytes);
Console.WriteLine(text);

Categories