How to convert between string and byte[] without losing integrity

How to convert between string and byte[] without losing integrity - c#

I know how to convert from a string to byte[] in C#. In this particular case, I'm working with the string representation of an HMAC-SHA256 key. Let's say the string representation of this key I get from an OpenID OP is:
"81FNybKWfcM539vVGtJrXRmoVMxNmZHY3OgUro8+pZ8="
I convert it to byte[] like this:
byte[] myByteArr = Encoding.UTF8.GetBytes("81FNybKWfcM539vVGtJrXRmoVMxNmZHY3OgUro8+pZ8=");
The problem I have with that is that it seems to be losing the original data. If I take the byte array from the previous step and convert it back to a string, it's different from the original string.
string check = Convert.ToBase64String(myByteArr);
check ends up being:
"ODFGTnliS1dmY001Mzl2Vkd0SnJYUm1vVk14Tm1aSFkzT2dVcm84K3BaOD0="
which is obviously not the same as the original string representation I started with.

With crypto keys, always use Convert.FromBase64String and Convert.ToBase64String. That way you'll be doing it the standard way and will not lose bytes due to encoding problems. Base 64 string may not be space efficient but it is the preferred method for storage and transport of keys in many schemes.
Here is a quick verification:
byte[] myByteArr = Convert.FromBase64String("81FNybKWfcM539vVGtJrXRmoVMxNmZHY3OgUro8+pZ8=");
string check = Convert.ToBase64String(myByteArr);
Console.WriteLine(check);
// Writes: 81FNybKWfcM539vVGtJrXRmoVMxNmZHY3OgUro8+pZ8=

The first function (Encoding.UTF8.GetBytes) takes a string (of any kind) and returns a byte[] that represents that string in a particular encoding -- in your case, UTF8.
The second function (Convert.ToBase64String) takes a byte array (of any kind) and returns a string in base64 format so that you can store this binary data in any ASCII-compatible field using only printable characters.
These functions are not counterparts. It looks like the string you're getting is a base64-encoded string. If this is the case, use Convert.FromBase64String to get the byte[] that it represents, not Encoding.UTF8.GetBytes.

The bytes you get when using byte[] Encoding.GetBytes(string) and decoding a base64 string are not the same things. The former gives you the bytes that represent the string. You however want to decode a base64 string back to the bytes it represents. In that case you want to use Convert.FromBase64String().
string encoded = "81FNybKWfcM539vVGtJrXRmoVMxNmZHY3OgUro8+pZ8=";
byte[] decoded = Convert.FromBase64String(encoded); // this gives the bytes that the encoded string represents

The encoding classes have a GetString method, to convert it from a byte array back to a string.
If you used the UTF8 encoding to create the byte array, you should use the same coding to get it back again.
var original = "81FNybKWfcM539vVGtJrXRmoVMxNmZHY3OgUro8+pZ8=";
var byteArray = Encoding.UTF8.GetBytes(original);
var copy = Encoding.UTF8.GetString(byteArray);
bool match = (copy == original); // This returns true

Related

.NET C# conversion from UTF 16 LE to UTF 16 BE failing

I'm trying to convert some strings from UTF 16 LE to UTF 16 BE but it fails to encode the second Chinese character.
Sample string: test馨俞
Code:
byte[] bytes = Encoding.Unicode.GetBytes(sendMsg.Text);
sendMsg.Text = Encoding.BigEndianUnicode.GetString(bytes)
I've also tried
var encode = new UnicodeEncoding(false, true, true);
var messageAsBytes = encode.GetBytes(sendMsg.Text);
var enc = new UnicodeEncoding(true, true, true);
sendMsg.Text = enc.GetString(messageAsBytes);
Which results in the following error: Unable to translate bytes [DE][4F] at index 184 from specified code page to Unicode on the line:
sendMsg.Text = enc.GetString(messageAsBytes);
Thanks.

I think you should process your input string with the BigEndianUnicode class.
I made this code from the one you provided. It works fine, without error:
String input = "馨俞";
var messageAsBytes = Encoding.BigEndianUnicode.GetBytes(input);
input = Encoding.BigEndianUnicode.GetString(messageAsBytes);
If I process "input" with Encoding.Unicode, and print out both byte arrays (the one processed with unicode and the one with big endian), it show the differences:
So, input is converted to the endian you need.

The result of encoding a string is a byte array, not another string.
Just use
byte[] bytes = Encoding.BigEndianUnicode.GetBytes(sendMsg.Text);
to encode the string to bytes using the UTF 16 BE encoding.
Then send those bytes to the mainframe.
How you send those bytes to the mainframe may be the topic of another question, but it sounds like you somehow need to present those encoded bytes in a variable of type string. That sounds like a bug in the library you are using. We would need to understand the nature of that library and its possible bug to find a workaround. One option you could try, but it's a shot in the dark, is this:
string toSend = Encoding.Default.GetString(bytes);
That will produce a string where each character is the representation of one byte from the encoded string, in UTF 16 BE order. It's length will be double the length of the original string.

I got it working by setting this property without any conversion.
sendMsg.SetIntProperty(XMSC.JMS_IBM_CHARACTER_SET, 1201);

Converted to and form UTF8 after SHA256 yields different results, why?

var test = "sdfsdfsdfasfwerqwer";
var q = UTF8Encoding.UTF8.GetBytes(test);
var sha256 = SHA256.Create();
var hash = sha256.ComputeHash(q);
var z = UTF8Encoding.UTF8.GetString(hash);
var t = UTF8Encoding.UTF8.GetBytes(z);
In the above example, hash and t have different values. Why is this?

hash is not an UTF-8 encoded byte array, just some random bytes. Note: not all byte arrays are valid as UTF-8, UTF-8 has its own rules. Therefore, it cannot necessarily be decoded into a string. (Specifically, invalid bytes are usually decoded into a question mark in .NET.)
You can try a regular 8-bit encoding which supports all possible byte arrays, like ISO-8859-1. Of course you will still get garbage when you try to read that as a string, but it should work back and forth.
If you are trying to transfer a random byte array as a string, I suggest you use BASE-64 encoding, which converts byte arrays to an ASCII string, which should be safe in all circumstances.

Why am I getting two different 'formats' of hex in my bytes while evaluating an HMAC?

I'm getting a signed payload from an authentication source that comes in a base64 encoded and URL encoded format. I'm getting confused somewhere while evaluating, and ending up with similar data in different 'formats'.
Here's my code:
//Split the message to payload and signature
string[] split = raw_message.Split('.');
//Payload
string base64_payload = WebUtility.UrlDecode(split[0]);
byte[] payload = Convert.FromBase64String(base64_payload);
//Expected signature
string base64_expected_sig = WebUtility.UrlDecode(split[1]);
byte[] expected_sig = Convert.FromBase64String(base64_expected_sig);
//Signature
byte[] signature = hmacsha256.ComputeHash(payload);
//Output as a string
var foo = System.Text.Encoding.UTF8.GetString(expected_sig);
var bar = BitConverter.ToString(signature);
The expected signature (foo) comes out like so:
76eba09fcb54877299dcbd1e1e35717e3bd42e066e7ecdb131c7d0161dec3418
The computed signature (bar) is as follows:
76-EB-A0-9F-CB-54-87-72-99-DC-BD-1E-1E-35-71-7E-3B-D4-2E-06-6E-7E-CD-B1-31-C7-D0-16-1D-EC-34-18
Obviously, when comparing bytes for bytes, this doesn't work.
I see that I'm having to convert the expected_sig and the signature in different ways to get them to display as a string, but I can't figure out how I need to change the expected signature to get to where I can compare bytes for bytes.
I can obviously work around the issue but simply converting the string bar, but that's dirty and I just don't like it.
Where am I going wrong here? What am I not understanding?

The good news is that the hash computation appears to be working.
The bad news is that you're receiving the hash in a brain-dead fashion. For some reason it seems that the authors decided it was a good idea to:
Compute the hash (fine)
Convert this binary data to text as hex (fine)
Convert the hex back into binary data by applying ASCII/UTF-8/anything-ASCII-compatible encoding (why?)
Convert the result back into text using base64 (what?)
URL-encode the result (which wouldn't even be necessary with hex...)
Using either base64 or hex on the original binary makes sense, but applying both is crazy.
Anyway, it's fairly easy for you to do the same thing. For example:
string hexSignature = string.Join("", signature.Select(b => b.ToString("x2")));
byte[] hexSignatureUtf8 = Encoding.UTF8.GetBytes(hexSignature);
string finalSignature = Convert.ToBase64String(hexSignatureUtf8);
That should now match WebUtility.UrlDecode(split[1]).
Alternatively, you can work backwards from what's in the result, but I wouldn't go as far as parsing the hex back to bytes - it would be simpler to keep the first line of the above, but use:
string expectedHexBase64 = WebUtility.UrlDecode(split[1]);
byte[] expectedHexUtf8 = Convert.FromBase64String(expectedHexBase64);
string expectedHex = Encoding.UTF8.GetString(expectedHexUtf8);
Then compare it with hexSignature.
Ideally, you should talk to whoever's providing you with the crazy format and hit them with a cluestick though...

Issue while encoding a string

I'm developing an application where at some point i need an encoded stream of bytes based on the given user output.
Something like
Encoding sysEncode = System.Text.Encoding.GetEncoding(850);
byte[] dataToEncrypt = sysEncode.GetBytes(m_oStrActivation);
However when i extract the string from the byte stream i get the encrypted string as
W?????e?????W?X????;??2????W???????#
Is there any way(type of Encoding/equivalent) i can restrict these question marks and allow only plain scrambled alphanumeric characters ?

From m_oStrActivation and you mentioning "encryption" I assume that you're writing some kind of activation/licensing code. If this is the case, you're doing it wrong. A correct way to do this is to use a hash function over your activation data.
You can then transform this string into Base64 string using Convert.ToBase64String() method.

Can we simplify this string encoding code

Is it possible to simplify this code into a cleaner/faster form?
StringBuilder builder = new StringBuilder();
var encoding = Encoding.GetEncoding(936);
// convert the text into a byte array
byte[] source = Encoding.Unicode.GetBytes(text);
// convert that byte array to the new codepage.
byte[] converted = Encoding.Convert(Encoding.Unicode, encoding, source);
// take multi-byte characters and encode them as separate ascii characters
foreach (byte b in converted)
builder.Append((char)b);
// return the result
string result = builder.ToString();
Simply put, it takes a string with Chinese characters such as 鄆 and converts them to ài.
For example, that Chinese character in decimal is 37126 or 0x9106 in hex.
See http://unicodelookup.com/#0x9106/1
Converted to a byte array, we get [145, 6] (145 * 256 + 6 = 37126). When encoded in CodePage 936 (simplified chinese), we get [224, 105]. If we break this byte array down into individual characters, we 224=e0=à and 105=69=i in unicode.
See http://unicodelookup.com/#0x00e0/1
and
http://unicodelookup.com/#0x0069/1
Thus, we're doing an encoding conversion and ensuring that all characters in our output Unicode string can be represented using at most two bytes.
Update: I need this final representation because this is the format my receipt printer is accepting. Took me forever to figure it out! :) Since I'm not an encoding expert, I'm looking for simpler or faster code, but the output must remain the same.
Update (Cleaner version):
return Encoding.GetEncoding("ISO-8859-1").GetString(Encoding.GetEncoding(936).GetBytes(text));

Well, for one, you don't need to convert the "built-in" string representation to a byte array before calling Encoding.Convert.
You could just do:
byte[] converted = Encoding.GetEncoding(936).GetBytes(text);
To then reconstruct a string from that byte array whereby the char values directly map to the bytes, you could do...
static string MangleTextForReceiptPrinter(string text) {
return new string(
Encoding.GetEncoding(936)
.GetBytes(text)
.Select(b => (char) b)
.ToArray());
}
I wouldn't worry too much about efficiency; how many MB/sec are you going to print on a receipt printer anyhow?
Joe pointed out that there's an encoding that directly maps byte values 0-255 to code points, and it's age-old Latin1, which allows us to shorten the function to...
return Encoding.GetEncoding("Latin1").GetString(
Encoding.GetEncoding(936).GetBytes(text)
);
By the way, if this is a buggy windows-only API (which it is, by the looks of it), you might be dealing with codepage 1252 instead (which is almost identical). You might try reflector to see what it's doing with your System.String before it sends it over the wire.

Almost anything would be cleaner than this - you're really abusing text here, IMO. You're trying to represent effectively opaque binary data (the encoded text) as text data... so you'll potentially get things like bell characters, escapes etc.
The normal way of encoding opaque binary data in text is base64, so you could use:
return Convert.ToBase64String(Encoding.GetEncoding(936).GetBytes(text));
The resulting text will be entirely ASCII, which is much less likely to cause you hassle.
EDIT: If you need that output, I would strongly recommend that you represent it as a byte array instead of as a string... pass it around as a byte array from that point onwards, so you're not tempted to perform string operations on it.

Does your receipt printer have an API that accepts a byte array rather than a string?
If so you may be able to simplify the code to a single conversion, from a Unicode string to a byte array using the encoding used by the receipt printer.
Also, if you want to convert an array of bytes to a string whose character values correspond 1-1 to the values of the bytes, you can use the code page 28591 aka Latin1 aka ISO-8859-1.
I.e., the following
foreach (byte b in converted)
builder.Append((char)b);
string result = builder.ToString();
can be replaced by:
// All three of the following are equivalent
// string result = Encoding.GetEncoding(28591).GetString(converted);
// string result = Encoding.GetEncoding("ISO-8859-1").GetString(converted);
string result = Encoding.GetEncoding("Latin1").GetString(converted);
Latin1 is a useful encoding when you want to encode binary data in a string, e.g. to send through a serial port.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to convert between string and byte[] without losing integrity - c#

Related

.NET C# conversion from UTF 16 LE to UTF 16 BE failing

Converted to and form UTF8 after SHA256 yields different results, why?

Why am I getting two different 'formats' of hex in my bytes while evaluating an HMAC?

Issue while encoding a string

Can we simplify this string encoding code

Categories

Resources