How to convert Hex to Chinese ASCII character in c#? [duplicate] - c#

I have the following code to convert from HEX to ASCII.
//Hexadecimal to ASCII Convertion
private static string hex2ascii(string hexString)
{
MessageBox.Show(hexString);
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= hexString.Length - 2; i += 2)
{
sb.Append(Convert.ToString(Convert.ToChar(Int32.Parse(hexString.Substring(i, 2), System.Globalization.NumberStyles.HexNumber))));
}
return sb.ToString();
}
input hexString = D3FCC4A7B6FABBB7
output return = Óüħ¶ú»·
The output that I need is 狱魔耳环, but I am getting Óüħ¶ú»· instead.
How would I make it display the correct string?

First, convert the hex string to a byte[], e.g. using code at How do you convert Byte Array to Hexadecimal String, and vice versa?. Then use System.Text.Encoding.Unicode.GetString(myArray) (use proper encoding, might not be Unicode, but judging from your example it is a 16-bit encoding, which, incidentally, is not "ASCII", which is 7-bit) to convert it to a string.

Related

C# - converting a stripped UTF encoded string back to UTF

So, I have a string that is actually UTF encoded characters with the ASCII representation codes stripped out:
"537465616d6c696e6564"
This would be represented in ASCII encoded UTF as \x53\x74\x65 [...]
I've tried to Regexp replace in \x at the correct positions, byte encoding it and reading it back as UTF, but to no avail.
What's the most effective way of turning the ASCII string into readable UTF in C#?
So what I understand you have a string "537465616d6c696e6564" that actually means char[] chars = { '\x53', '\x74', ... }.
First convert this string to array of bytes How can I convert a hex string to a byte array?
For your convenience:
public static byte[] StringToByteArray(string hex) {
return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
Then there are many UTF encodings (UTF-8, UTF-16), C# internally uses UTF-16 (actually subset of it), so I assume that you want UTF-16 string:
string str = System.Text.Encoding.Unicode.GetString(array);
If you get incorrect characters after decoding you may also try UTF-8 encoding (just in case if you don't know exact encoding, Encoding.UTF8).
I don't know much about string encodings, but assuming that your original string is the hex representation of a series of bytes, you could do something like this:
class Program
{
private const string encoded = "537465616d6c696e6564";
static void Main(string[] args)
{
byte[] bytes = StringToByteArray(encoded);
string text = Encoding.ASCII.GetString(bytes);
Console.WriteLine(text);
Console.ReadKey();
}
// From https://stackoverflow.com/questions/311165/how-do-you-convert-byte-array-to-hexadecimal-string-and-vice-versa
public static byte[] StringToByteArray(String hex)
{
int NumberChars = hex.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
return bytes;
}
}
If you later wanted to encode the result as UTF8, you could then use:
Encoding.UTF8.GetBytes(text);
I've taken one implementation of the StringToByteArray conversion but there are many. If performance is important, you may want to choose a more efficient one. See the links below for more info.
On byte to string conversion (some interesting discussions on performance):
How do you convert Byte Array to Hexadecimal String, and vice versa?
How can I convert a hex string to a byte array?
On strings in .NET
Determine a string's encoding in C#
http://csharpindepth.com/Articles/General/Strings.aspx

How to UNHEX() MySQL binary string in C# .NET?

I need to use HEX() in MySQL to get data out of the database and process in C# WinForm code. The binary string needs to be decoded in C#, is there an equivalent UNHEX() function?
From MySQL Doc:
For a string argument str, HEX() returns a hexadecimal string
representation of str where each byte of each character in str is
converted to two hexadecimal digits. (Multi-byte characters therefore
become more than two digits.) The inverse of this operation is
performed by the UNHEX() function.
For a numeric argument N, HEX() returns a hexadecimal string
representation of the value of N treated as a longlong (BIGINT)
number. This is equivalent to CONV(N,10,16). The inverse of this
operation is performed by CONV(HEX(N),16,10).
mysql> SELECT 0x616263, HEX('abc'), UNHEX(HEX('abc'));
-> 'abc', 616263, 'abc' mysql> SELECT HEX(255), CONV(HEX(255),16,10);
-> 'FF', 255
You can use this not-widely-known SoapHexBinary class to parse hex string
string hex = "616263";
var byteArr = System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary.Parse(hex).Value;
var str = Encoding.UTF8.GetString(byteArr);
After fetching the binary string from the database, you can "unhex" it this way:
public static string Hex2String(string input)
{
var builder = new StringBuilder();
for(int i = 0; i < input.Length; i+=2){ //throws an exception if not properly formatted
string hexdec = input.Substring(i, 2);
int number = Int32.Parse(hexdec, NumberStyles.HexNumber);
char charToAdd = (char)number;
builder.Append(charToAdd);
}
return builder.ToString();
}
The method builds a string from the hexadecimal format of the numbers, their char representation being concatenated to the builder branch.

"Specified value has invalid Control characters" when converting SHA512 output to string

I am attempting to create an Hash for an API.
my input is something like this:
FBN|Web|3QTC0001|RS1|260214133217|000000131127897656
And my expected output is like :
17361DU87HT56F0O9967E34FDFFDFG7UO334665324308667FDGJKD66F9888766DFKKJJR466634HH6566734JHJH34766734NMBBN463499876554234343432456
I tried the bellow but I keep getting
"Specified value has invalid Control characters. Parameter name: value"
I am actually doing this in a REST service.
public static string GetHash(string text)
{
string hash = "";
SHA512 alg = SHA512.Create();
byte[] result = alg.ComputeHash(Encoding.UTF8.GetBytes(text));
hash = Encoding.UTF8.GetString(result);
return hash;
}
What am I missing?
The problem is Encoding.UTF8.GetString(result) as the data in result is invalid UTF-8 (it's just binary goo!) so trying to convert it to text is invalid - in general, and specifically for this input - which results in the Exception being thrown.
Instead, convert the byte[] to the hex representation of said byte sequence; don't treat it as UTF-8 encoded text.
See the questions How do you convert Byte Array to Hexadecimal String, and vice versa? and How can I convert a hex string to a byte array?, which discuss several different methods of achieving this task.
In order to make this work you need to convert the individual byte elements into a hex representation
var builder = new StringBuilder();
foreach(var b in result) {
builder.AppendFormat("{0:X2}", b);
}
return builder.ToString();
You might want to consider using Base64 encoding (AKA UUEncode):
public static string GetHash(string text)
{
SHA512 alg = SHA512.Create();
byte[] result = alg.ComputeHash(Encoding.UTF8.GetBytes(text));
return Convert.ToBase64String(result);
}
For your example string, the result is
OJgzW5JdC1IMdVfC0dH98J8tIIlbUgkNtZLmOZsjg9H0wRmwd02tT0Bh/uTOw/Zs+sgaImQD3hh0MlzVbqWXZg==
It has an advantage of being more compact than encoding each byte into two characters: three bytes takes four characters with Base64 encoding or six characters the other way.

C# Encoding.Convert Vs C++ MultiByteToWideChar

I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}

String to raw byte array

I have a string which contains binary data (non-text data).
How do I convert this to a raw byte array?
A string in C# - by definition - does not contain binary data. It consists of a sequence of Unicode characters.
If your string contains only Unicode characters in the ASCII (7-bit) character set, you can use Encoding.ASCII to convert the string to bytes:
byte[] result = Encoding.ASCII.GetBytes(input);
If you have a string that contains Unicode characters in the range u0000-u00ff and want to interpret these as bytes, you can cast the characters to bytes:
byte[] result = new byte[input.Length];
for (int i = 0; i < input.Length; i++)
{
result[i] = (byte)input[i];
}
It is a very bad idea to store binary data in a string. However, if you absolutely must do so, you can convert a binary string to a byte array using the 1252 code page. Do not use code page 0 or you will lose some values when in foreign languages. It just so happens that code page 1252 properly converts all byte values from 0 to 255 to Unicode and back again.
There have been some poorly written VB6 programs that use binary strings. Unfortunately some are so many lines of code that it is almost impossible to convert it all to byte() arrays in one go.
You've been warned. Use at your own peril:
Dim bData() As Byte
Dim sData As String
'convert string binary to byte array
bData = System.Text.Encoding.GetEncoding(1252).GetBytes(sData)
'convert byte array to string binary
sData = System.Text.Encoding.GetEncoding(1252).GetString(bData)
Here is one way:
public static byte[] StrToByteArray(string str)
{
System.Text.ASCIIEncoding encoding=new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] theBytes = encoding.GetBytes("Some String");
Note, there are other encoding formats you may want to use.

Categories