C# - converting a stripped UTF encoded string back to UTF - c#

So, I have a string that is actually UTF encoded characters with the ASCII representation codes stripped out:
"537465616d6c696e6564"
This would be represented in ASCII encoded UTF as \x53\x74\x65 [...]
I've tried to Regexp replace in \x at the correct positions, byte encoding it and reading it back as UTF, but to no avail.
What's the most effective way of turning the ASCII string into readable UTF in C#?

So what I understand you have a string "537465616d6c696e6564" that actually means char[] chars = { '\x53', '\x74', ... }.
First convert this string to array of bytes How can I convert a hex string to a byte array?
For your convenience:
public static byte[] StringToByteArray(string hex) {
return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
Then there are many UTF encodings (UTF-8, UTF-16), C# internally uses UTF-16 (actually subset of it), so I assume that you want UTF-16 string:
string str = System.Text.Encoding.Unicode.GetString(array);
If you get incorrect characters after decoding you may also try UTF-8 encoding (just in case if you don't know exact encoding, Encoding.UTF8).

I don't know much about string encodings, but assuming that your original string is the hex representation of a series of bytes, you could do something like this:
class Program
{
private const string encoded = "537465616d6c696e6564";
static void Main(string[] args)
{
byte[] bytes = StringToByteArray(encoded);
string text = Encoding.ASCII.GetString(bytes);
Console.WriteLine(text);
Console.ReadKey();
}
// From https://stackoverflow.com/questions/311165/how-do-you-convert-byte-array-to-hexadecimal-string-and-vice-versa
public static byte[] StringToByteArray(String hex)
{
int NumberChars = hex.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
return bytes;
}
}
If you later wanted to encode the result as UTF8, you could then use:
Encoding.UTF8.GetBytes(text);
I've taken one implementation of the StringToByteArray conversion but there are many. If performance is important, you may want to choose a more efficient one. See the links below for more info.
On byte to string conversion (some interesting discussions on performance):
How do you convert Byte Array to Hexadecimal String, and vice versa?
How can I convert a hex string to a byte array?
On strings in .NET
Determine a string's encoding in C#
http://csharpindepth.com/Articles/General/Strings.aspx

Related

How to convert Hex to Chinese ASCII character in c#? [duplicate]

I have the following code to convert from HEX to ASCII.
//Hexadecimal to ASCII Convertion
private static string hex2ascii(string hexString)
{
MessageBox.Show(hexString);
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= hexString.Length - 2; i += 2)
{
sb.Append(Convert.ToString(Convert.ToChar(Int32.Parse(hexString.Substring(i, 2), System.Globalization.NumberStyles.HexNumber))));
}
return sb.ToString();
}
input hexString = D3FCC4A7B6FABBB7
output return = Óüħ¶ú»·
The output that I need is 狱魔耳环, but I am getting Óüħ¶ú»· instead.
How would I make it display the correct string?
First, convert the hex string to a byte[], e.g. using code at How do you convert Byte Array to Hexadecimal String, and vice versa?. Then use System.Text.Encoding.Unicode.GetString(myArray) (use proper encoding, might not be Unicode, but judging from your example it is a 16-bit encoding, which, incidentally, is not "ASCII", which is 7-bit) to convert it to a string.

"Specified value has invalid Control characters" when converting SHA512 output to string

I am attempting to create an Hash for an API.
my input is something like this:
FBN|Web|3QTC0001|RS1|260214133217|000000131127897656
And my expected output is like :
17361DU87HT56F0O9967E34FDFFDFG7UO334665324308667FDGJKD66F9888766DFKKJJR466634HH6566734JHJH34766734NMBBN463499876554234343432456
I tried the bellow but I keep getting
"Specified value has invalid Control characters. Parameter name: value"
I am actually doing this in a REST service.
public static string GetHash(string text)
{
string hash = "";
SHA512 alg = SHA512.Create();
byte[] result = alg.ComputeHash(Encoding.UTF8.GetBytes(text));
hash = Encoding.UTF8.GetString(result);
return hash;
}
What am I missing?
The problem is Encoding.UTF8.GetString(result) as the data in result is invalid UTF-8 (it's just binary goo!) so trying to convert it to text is invalid - in general, and specifically for this input - which results in the Exception being thrown.
Instead, convert the byte[] to the hex representation of said byte sequence; don't treat it as UTF-8 encoded text.
See the questions How do you convert Byte Array to Hexadecimal String, and vice versa? and How can I convert a hex string to a byte array?, which discuss several different methods of achieving this task.
In order to make this work you need to convert the individual byte elements into a hex representation
var builder = new StringBuilder();
foreach(var b in result) {
builder.AppendFormat("{0:X2}", b);
}
return builder.ToString();
You might want to consider using Base64 encoding (AKA UUEncode):
public static string GetHash(string text)
{
SHA512 alg = SHA512.Create();
byte[] result = alg.ComputeHash(Encoding.UTF8.GetBytes(text));
return Convert.ToBase64String(result);
}
For your example string, the result is
OJgzW5JdC1IMdVfC0dH98J8tIIlbUgkNtZLmOZsjg9H0wRmwd02tT0Bh/uTOw/Zs+sgaImQD3hh0MlzVbqWXZg==
It has an advantage of being more compact than encoding each byte into two characters: three bytes takes four characters with Base64 encoding or six characters the other way.

Error in converting from Encrypted value

I have a encryption method GetDecryptedSSN(). I tested it’s correctness by the following test. It works fine
//////////TEST 2//////////
byte[] encryptedByteWithIBMEncoding2 = DecryptionServiceHelper.GetEncryptedSSN("123456789");
string clearTextSSN2 = DecryptionServiceHelper.GetDecryptedSSN(encryptedByteWithIBMEncoding2);
But when I do a conversion to ASCII String and then back, it is not working correctly. What is the problem in the conversion logic?
//////////TEST 1//////////
//String -- > ASCII Byte --> IBM Byte -- > encryptedByteWithIBMEncoding
byte[] encryptedByteWithIBMEncoding = DecryptionServiceHelper.GetEncryptedSSN("123456789");
//encryptedByteWithIBMEncoding --> Encrypted Byte ASCII
string EncodingFormat = "IBM037";
byte[] encryptedByteWithASCIIEncoding = Encoding.Convert(Encoding.GetEncoding(EncodingFormat), Encoding.ASCII,
encryptedByteWithIBMEncoding);
//Encrypted Byte ASCII - ASCII Encoded string
string encodedEncryptedStringInASCII = System.Text.ASCIIEncoding.ASCII.GetString(encryptedByteWithASCIIEncoding);
//UpdateSSN(encodedEncryptedStringInASCII);
byte[] dataInBytesASCII = System.Text.ASCIIEncoding.ASCII.GetBytes(encodedEncryptedStringInASCII);
byte[] bytesInIBM = Encoding.Convert(Encoding.ASCII, Encoding.GetEncoding(EncodingFormat),
dataInBytesASCII);
string clearTextSSN = DecryptionServiceHelper.GetDecryptedSSN(bytesInIBM);
Helper Class
public static class DecryptionServiceHelper
{
public const string EncodingFormat = "IBM037";
public const string SSNPrefix = "0000000";
public const string Encryption = "E";
public const string Decryption = "D";
public static byte[] GetEncryptedSSN(string clearTextSSN)
{
return GetEncryptedID(SSNPrefix + clearTextSSN);
}
public static string GetDecryptedSSN(byte[] encryptedSSN)
{
return GetDecryptedID(encryptedSSN);
}
private static byte[] GetEncryptedID(string id)
{
ServiceProgram input = new ServiceProgram();
input.RequestText = Encodeto64(id);
input.RequestType = Encryption;
ProgramInterface inputRequest = new ProgramInterface();
inputRequest.Test__Request = input;
using (MY_Service operation = new MY_Service())
{
return ((operation.MY_Operation(inputRequest)).Test__Response.ResponseText);
}
}
private static string GetDecryptedID(byte[] id)
{
ServiceProgram input = new ServiceProgram();
input.RequestText = id;
input.RequestType = Decryption;
ProgramInterface request = new ProgramInterface();
request.Test__Request = input;
using (MY_Service operationD = new MY_Service())
{
ProgramInterface1 response = operationD.MY_Operation(request);
byte[] encodedBytes = Encoding.Convert(Encoding.GetEncoding(EncodingFormat), Encoding.ASCII,
response.Test__Response.ResponseText);
return System.Text.ASCIIEncoding.ASCII.GetString(encodedBytes);
}
}
private static byte[] Encodeto64(string toEncode)
{
byte[] dataInBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
Encoding encoding = Encoding.GetEncoding(EncodingFormat);
return Encoding.Convert(Encoding.ASCII, encoding, dataInBytes);
}
}
REFERENCE:
Getting incorrect decryption value using AesCryptoServiceProvider
This is the problem, I suspect:
string encodedEncryptedStringInASCII =
System.Text.ASCIIEncoding.ASCII.GetString(encryptedByteWithASCIIEncoding);
(It's not entirely clear because of all the messing around with encodings beforehand, which seems pointless to me, but...)
The result of encryption is not "text encoded in ASCII" - so you shouldn't try to treat it that way. (You haven't said what kind of encryption you're using, but it would be very odd for it to produce ASCII text.)
It's just an arbitrary byte array. In order to represent that in text only using the ASCII character set, the most common approach is to use base64. So the above code would become:
string encryptedText = Convert.ToBase64(encryptedByteWithIBMEncoding);
Then later, you'd convert it back to a byte array ready for decryption as:
encryptedByteWithIBMEncoding = Convert.FromBase64String(encryptedText);
I would strongly advise you to avoid messing around with the encodings like this if you can help it though. It's not clear why ASCII needs to get involved at all. If you really want to encode your original text as IBM037 before encryption, you should just use:
Encoding encoding = Encoding.GetEncoding("IBM037");
string unencryptedBinary = encoding.GetBytes(textInput);
Personally I'd usually use UTF-8, as an encoding which can handle any character data rather than just a limited subset, but that's up to you. I think you're making the whole thing much more complicated than it needs to be though.
A typical "encrypt a string, getting a string result" workflow is:
Convert input text to bytes using UTF-8. The result is a byte array.
Encrypt result of step 1. The result is a byte array.
Convert result of step 2 into base64. The result is a string.
To decrypt:
Convert the string from base64. The result is a byte array.
Decrypt the result of step 1. The result is a byte array.
Convert the result of step 2 back to a string using the same encoding as step 1 of the encryption process.
In DecryptionServiceHelper.GetEncryptedSSN you are encoding the text in IBM037 format BEFORE encrypting.
So the following piece of code is not correct as you are converting the encrypted bytes to ASCII assuming that its in the IBM037 format. That's wrong as the encrypted bytes is not in IBM037 format (the text was encoded before encryption)
//encryptedByteWithIBMEncoding --> Encrypted Byte ASCII
string EncodingFormat = "IBM037";
byte[] encryptedByteWithASCIIEncoding = Encoding.Convert(Encoding.GetEncoding(EncodingFormat), Encoding.ASCII,
encryptedByteWithIBMEncoding);
One possible solution is to encode the encrypted text using IBM037 format, that should fix the issue I guess.

C# Encoding.Convert Vs C++ MultiByteToWideChar

I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}

String to raw byte array

I have a string which contains binary data (non-text data).
How do I convert this to a raw byte array?
A string in C# - by definition - does not contain binary data. It consists of a sequence of Unicode characters.
If your string contains only Unicode characters in the ASCII (7-bit) character set, you can use Encoding.ASCII to convert the string to bytes:
byte[] result = Encoding.ASCII.GetBytes(input);
If you have a string that contains Unicode characters in the range u0000-u00ff and want to interpret these as bytes, you can cast the characters to bytes:
byte[] result = new byte[input.Length];
for (int i = 0; i < input.Length; i++)
{
result[i] = (byte)input[i];
}
It is a very bad idea to store binary data in a string. However, if you absolutely must do so, you can convert a binary string to a byte array using the 1252 code page. Do not use code page 0 or you will lose some values when in foreign languages. It just so happens that code page 1252 properly converts all byte values from 0 to 255 to Unicode and back again.
There have been some poorly written VB6 programs that use binary strings. Unfortunately some are so many lines of code that it is almost impossible to convert it all to byte() arrays in one go.
You've been warned. Use at your own peril:
Dim bData() As Byte
Dim sData As String
'convert string binary to byte array
bData = System.Text.Encoding.GetEncoding(1252).GetBytes(sData)
'convert byte array to string binary
sData = System.Text.Encoding.GetEncoding(1252).GetString(bData)
Here is one way:
public static byte[] StrToByteArray(string str)
{
System.Text.ASCIIEncoding encoding=new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] theBytes = encoding.GetBytes("Some String");
Note, there are other encoding formats you may want to use.

Categories