How to convert ’ to apostrophe in C#? - c#

I have tried several combinations of ASCII, Latin1, Windows-1252, UTF-8 and Unicode to convert ’ to apostrophe in C#, but to no avail.
byte[] uBytes = Encoding.Unicode.GetBytes(questionString);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, uBytes);
string converted = Encoding.UTF8.GetString(utf8Bytes);
I am using this conversion chart to discover what each code should be: http://www.i18nqa.com/debug/utf8-debug.html

Try the following:
var bytes = Encoding.Default.GetBytes("’");
var text = Encoding.UTF8.GetString(bytes);
Console.WriteLine(text);

Related

how to deocde some coded text in c# [duplicate]

I have following string as utf-8. i want convert it to persian unicode:
ابراز داشت: امام رضا برخال� دیگر ائمه با جنگ نرم
this site correctly do convert and result is: ابراز داشت: امام رضا برخالف دیگر ائمه با جنگ نرم
I test many method and ways but can't resolve this problem, for example these two lines did not produce the desired result:
string result = Encoding.GetEncoding("all type").GetString(input);
and
byte[] preambleBytes= Encoding.UTF8.GetPreamble();
byte[] inputBytes= Encoding.UTF8.GetBytes(input);
byte[] resultBytes= preambleBytes.Concat(inputBytes).ToArray();
string result=Encoding.UTF8.GetString(resultBytes.ToArray());
string resultAscii=Encoding.Ascii.GetString(inputBytes);
string resultUnicode=Encoding.Unicode.GetString(inputBytes);
You can use Encoding.Convert.
string source = // Your source
byte[] utfb = Encoding.UTF8.GetBytes(source);
byte[] resb = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding("ISO-8859-6"), utfb);
string result = Encoding.GetEncoding("ISO-8859-6").GetString(resb);
NOTE: I wasn't sure which standard you wanted so for the example I used ISO-8859-6 (Arabic).
I understand what is problem by reading What is problem and Solution .
when i converted string to byte[], i forced that to convert as utf-8 format but really i should use default format for converting.
False converting:
byte[] bytes = Encoding.UTF8.GetBytes(inputString);
resultString = Encoding.UTF8.GetString(bytes);
But
True converting:
byte[] bytes = Encoding.Default.GetBytes(inputString);
resultString = Encoding.UTF8.GetString(bytes);
Tanks for your comments and answers.
I get bytes by UTF8 and Get String by Default as follow. This worked for me.
byte[] bytes = Encoding.UTF8.GetBytes(inputString);
resultString = Encoding.Default.GetString(bytes);

Convert String to ByteArray in C#

I want to convert a String to ByteArray in C# for Decrypt some data.
When I get de String from the ByteArray created, it shows question marks (?).
Example code:
byte[] strTemp = Encoding.ASCII.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.ASCII.GetString(strTemp));
The string is "Ê<,,l"x¡" (With the double quotation mark) and the result to convert again to string is: ???l?x?
I hope this helps you:
To Get byte array from a string
private byte[] StringToByteArray(string str)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetBytes(str);
}
To get a string back from byte array:
private string ByteArrayToString(byte[] arr)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetString(arr);
}
For specific of that input, this BigEndianUnicode encoding seems works fine
byte[] strTemp = Encoding.BigEndianUnicode.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.BigEndianUnicode.GetString(strTemp));
`
You are getting a byte array for the ASCII representation of your string, but your string is Unicode.
C# uses Unicode to encode strings, Unicode being able to represent far more symbols as ASCII.
In your example, every symbol which has no ASCII representation is replaced by '?', this is why only 'l' and 'x' appear in the output.
The proper way to do it is to use a Unicode encoding instead:
byte[] strTemp = Encoding.UTF8.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.UTF8.GetString(strTemp));
Basically, any Unicode encoding can be used: UTF8, UTF32, Unicode, BigEndianUnicode (https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings).

UtF-8 gives extra string in German character

I have file name testtäöüßÄÖÜ . I want to convert in UTF-8 using c#.
string test ="testtäöüß";
var bytes = new List<byte>(test.Length);
foreach (var c in test)
bytes.Add((byte)c);
var retValue = Encoding.UTF8.GetString(bytes.ToArray());
after running this code my output is : 'testt mit Umlaute äöü?x. where mit Umlaute is extra
text.
Can anybody help me ?
Thanks in advance.
You can't do that. You can't cast an UTF-8 character to byte. UTF-8 for anything other than ASCII requires at least two bytes, byte can can't store this
Instead of creating a list, use
byte[] bytes = System.Text.Encoding.UTF8.GetBytes (test);
I think, Tseng means the following
Taken from: http://www.chilkatsoft.com/p/p_320.asp
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);
MessageBox.Show(s_unicode2);

C# Encoding.Convert Vs C++ MultiByteToWideChar

I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}

C# Encoding.Converting Latin to Hebrew

I'm trying to fetch and parse an online excel document which is written in hebrew but unfortunately in a non-hebrew encoding.
As an example I'm trying to convert the following string: "âìéåï_1", which serves as the 1st sheet name to hebrew using C# code, but I'm unable to do so.
I know the above is convertible, since when I open it up in NotePad++ and select Encoding/Character Sets/Hebrew/Windows 1255, I can see: "גליון_1" which is the correct hebrew representation of the above string.
I'm using the below code
string str = "âìéåï_1";
Encoding windows = Encoding.GetEncoding("Windows-1255");
Encoding ascii = Encoding.GetEncoding("Windows-1252");
byte[] asciiBytes = ascii.GetBytes(str);
byte[] windowsBytes = Encoding.Convert(ascii, windows, asciiBytes);
char[] windowsChars = new char[windows.GetCharCount(windowsBytes, 0, windowsBytes.Length)];
windows.GetChars(windowsBytes, 0, windowsBytes.Length, windowsChars, 0);
string windowsString = new string(windowsChars);
I assumed that the encoding of the origin string is Windows-1252 since when I paste it in NotePad++ and change the encoding to Windows-1252 the string remains the same...
I'm probably doing something wrong here, anyone know how to convert the above correctly?
Thanks,
Mikey
const string Str = "âìéåï_1";
Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding("Windows-1255");
byte[] latinBytes = latinEncoding.GetBytes(Str);
string hebrewString = hebrewEncoding.GetString(latinBytes);
hebrewString:
גליון_1
In your supplied example "Window-1252" is not actualy ASCII, it is extended ASCII, and for some reason Encoding.Convert with these two encodings cannot convert extended range ASCII, so all +127 characters are converted as 63 (i.e. ?). When "converting" from one extended ASCII character byte[] to another, I would expect the bytes to be the same, it is only when you convert them to a .Net unicode string I would expect them to be different. Not sure why Convert is converting +127 chars to '?'.

Categories