Converting UTF 16 string to ANSI string - c#

So I have a string:
string test = "Checking";
And it's being stored as a UTF 16 string. I however want to convert the string into the format of ANSI. Is there a way to do that?

If "ANSI" is the default code page:
byte[] result = Encoding.Default.GetBytes(test);
Otherwise you can define specific ANSI code page:
int cp = 1250; // e.g. Windows-1250
byte[] result = Encoding.GetEncoding(cp).GetBytes(test);

Related

How to convert Arabic string to UTF-8 using C#?

I need to convert some arabic text to utf-8 and convert it to Hexa I made some codes but it turns the output to like what in next image.
Codes I trid :
string myName = _name.Text;
string myNameLength = _name.TextLength.ToString("X2");
byte[] nameByte = Encoding.Default.GetBytes(myName);
var hexStringName = BitConverter.ToString(nameByte);
hexStringCo = hexStringCo.Replace("-", "");
Picture
Getting the utf8 bytes is:
string name = "عبود";
byte[] utf8 = Encoding.UTF8.GetBytes(name);
var hex = BitConverter.ToString(utf8);
hex = hex.Replace("-", "");
Console.WriteLine(hex); // D8B9D8A8D988D8AF
What you do with those is up to you; there's zero chance that a hex string was rendered with the replacement character (aka �), so: you're doing something else that we can't see. Maybe show us what you're doing with the value once you have it.

UtF-8 gives extra string in German character

I have file name testtäöüßÄÖÜ . I want to convert in UTF-8 using c#.
string test ="testtäöüß";
var bytes = new List<byte>(test.Length);
foreach (var c in test)
bytes.Add((byte)c);
var retValue = Encoding.UTF8.GetString(bytes.ToArray());
after running this code my output is : 'testt mit Umlaute äöü?x. where mit Umlaute is extra
text.
Can anybody help me ?
Thanks in advance.
You can't do that. You can't cast an UTF-8 character to byte. UTF-8 for anything other than ASCII requires at least two bytes, byte can can't store this
Instead of creating a list, use
byte[] bytes = System.Text.Encoding.UTF8.GetBytes (test);
I think, Tseng means the following
Taken from: http://www.chilkatsoft.com/p/p_320.asp
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);
MessageBox.Show(s_unicode2);

Decoding an UTF-8 string to Windows-1256

I used this code to encode a UTF-8 string to Windows-1256 string:
string q = textBox1.Text;
UTF7Encoding utf = new UTF7Encoding();
byte[] winByte = Encoding.GetEncoding(1256).GetBytes(q);
string result = utf.GetString(winByte);
This code is working but I can't decode the result or encoded to original string!
How I can decode an encoded string (result variable) to same before converted (q variable)?
You are converting the strings incorrectly.
Have a look at the commented code below. The comments explain what is wrong, and how to do it correctly, but basically what is happening is:
Firstly you use Encoding.GetEncoding(1256).GetBytes(q) to convert a string (which is UTF16) to an ANSI codepage 1256 string.
Then you use a UTF7 encoding to convert it back. But that's wrong because you need to use an ANSI codepage 1256 encoding to convert it back:
string q = "ABئبئ"; // UTF16.
UTF7Encoding utf = new UTF7Encoding(); // Used to convert UTF16 to/from UTF7
// Convert UTF16 to ANSI codepage 1256. winByte[] will be ANSI codepage 1256.
byte[] winByte = Encoding.GetEncoding(1256).GetBytes(q);
// Convert UTF7 to UTF16.
// But this is WRONG because winByte is ANSI codepage 1256, NOT UTF7!
string result = utf.GetString(winByte);
Debug.Assert(result != q); // So result doesn't equal q
// The CORRECT way to convert the ANSI string back:
// Convert ANSI codepage 1256 string to UTF16
result = Encoding.GetEncoding(1256).GetString(winByte);
Debug.Assert(result == q); // Now result DOES equal q

C# Encoding.Convert Vs C++ MultiByteToWideChar

I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}

C# Encoding.Converting Latin to Hebrew

I'm trying to fetch and parse an online excel document which is written in hebrew but unfortunately in a non-hebrew encoding.
As an example I'm trying to convert the following string: "âìéåï_1", which serves as the 1st sheet name to hebrew using C# code, but I'm unable to do so.
I know the above is convertible, since when I open it up in NotePad++ and select Encoding/Character Sets/Hebrew/Windows 1255, I can see: "גליון_1" which is the correct hebrew representation of the above string.
I'm using the below code
string str = "âìéåï_1";
Encoding windows = Encoding.GetEncoding("Windows-1255");
Encoding ascii = Encoding.GetEncoding("Windows-1252");
byte[] asciiBytes = ascii.GetBytes(str);
byte[] windowsBytes = Encoding.Convert(ascii, windows, asciiBytes);
char[] windowsChars = new char[windows.GetCharCount(windowsBytes, 0, windowsBytes.Length)];
windows.GetChars(windowsBytes, 0, windowsBytes.Length, windowsChars, 0);
string windowsString = new string(windowsChars);
I assumed that the encoding of the origin string is Windows-1252 since when I paste it in NotePad++ and change the encoding to Windows-1252 the string remains the same...
I'm probably doing something wrong here, anyone know how to convert the above correctly?
Thanks,
Mikey
const string Str = "âìéåï_1";
Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding("Windows-1255");
byte[] latinBytes = latinEncoding.GetBytes(Str);
string hebrewString = hebrewEncoding.GetString(latinBytes);
hebrewString:
גליון_1
In your supplied example "Window-1252" is not actualy ASCII, it is extended ASCII, and for some reason Encoding.Convert with these two encodings cannot convert extended range ASCII, so all +127 characters are converted as 63 (i.e. ?). When "converting" from one extended ASCII character byte[] to another, I would expect the bytes to be the same, it is only when you convert them to a .Net unicode string I would expect them to be different. Not sure why Convert is converting +127 chars to '?'.

Categories