Encode unicode string as byte array C++ and C#

Encode unicode string as byte array C++ and C# - c#

I have C++ code which I want to rewrite to C#. This part
case ID_TYPE_UNICODE_STRING :
if(items[i].GetUString().length() > 0xFFFF)
throw dppError("error");
//GetUstring returns std::wstring type object
DataSize = (WORD) (sizeof(WCHAR)*(items[i].GetUString().length()));
blob.AppendData((const BYTE *) &DataSize, sizeof(WORD)); //blob is byte array
//GetUstring returns std::wstring type object
blob.AppendData((const BYTE *) items[i].GetUString().c_str(), DataSize);
break ;
basically serializes length in bytes of unicode string and string itself to byte array.
Here comes my problem (this code then sends this data to server). I don't know which encoding is used in above lines of code(UTF16, UTF8, etc.).
So I don't know what is the best way to reimplement it in C#.
How can I guess what encoding is used in this C++ project?
And if I can't find encoding used in C++ project, given endianness is same as stated in accepted answer of this question, do you think the two methods (GetBytes and GetString) in accepted answer will work for me (for serializing the unicode string as in C++ project and retrieving it back)? e.g.
these two:
static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
static string GetString(byte[] bytes)
{
char[] chars = new char[bytes.Length / sizeof(char)];
System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
return new string(chars);
}
Or I am better of to learn what is the encoding used in C++ project?
I will then need to reconstruct the string in the same way from byte array too. And if I am better of learning which encoding was used in C++, how do I get the length of the string in bytes in C#, using System.Text.ASCII.WhateverEncodingWasUsedinC++.GetByteCount(string); ??
PS. Do you think the C++ code is working in encoding agnostic way? If yes, how can I repeat that also in C#?
UPDATE: I am guessing the encoding used is UTF16 because I saw that being mentioned in several variables names, so I think I will assume UTF16 is used, and if something doesn't work out during testing, look for alternative solutions. In that case, what is the best way to get the number of bytes of the UTF16 string? Is following method OK: System.Text.ASCII.Unicode.GetByteCount(string); ??
feedback and comments welcome. Am I wrong somewhere in my reasoning? Thanks

Change the method signature as like this for getting byte[] equivalent of input string.
static byte[] GetBytes(string str)
{
UnicodeEncoding uEncoding = new UnicodeEncoding();
byte[] stringContentBytes = uEncoding.GetBytes("Your string");
return stringContentBytes;
}
For reverse:
static string GetString(byte[] bytes)
{
UnicodeEncoding uEncoding = new UnicodeEncoding();
string stringContent=uEncoding.GetString(bytes);
return new string(stringContent);
}

Related

SHA-256 NodeJS vs .NET C#

I'm using SHA-256 with Hex to hash some text. However I am finding that the hashed text in my Node implementation differs from my .NET C# implementation.
In NodeJS I have the following:
return crypto.createHash('sha256').update(text).digest('hex');
and in .NET C# I have:
private static string Hash(string text)
{
byte[] bytes = Encoding.Unicode.GetBytes(text);
using (var generator = new SHA256Managed())
{
byte[] hash = generator.ComputeHash(bytes);
return BytesToHex(hash);
}
}
private static string BytesToHex(byte[] bytes)
{
var hex = new StringBuilder(bytes.Length * 2);
foreach (byte b in bytes)
{
hex.AppendFormat("{0:x2}", b);
}
return hex.ToString();
}
return Hash(text);
What have I done wrong in the NodeJS version?
As this means when I try and use my NodeJS app with hashes created by my .NET app the hashes don't match up!
Update: Apparently this could be due to charset...
So I tried:
return crypto.createHash('sha256').update(text, 'utf8').digest('hex');
But the hash is the same as before? So using utf8 instead of binary hasn't actually made any difference to the returned hash... And it's still mismatched from the .NET version.

The problem is a mismatch in encoding used in both systems, Encoding.Unicode will use UTF-16 format, not UTF-8 - the update you made on the NodeJS side is still required as omitting the input_encoding parameter means it will default to binary. However, you also need to update the .NET side to use UTF-8 encoding
byte[] bytes = Encoding.UTF8.GetBytes(text);
Alternatively, you can update the Node side to use UTF-16 using the, undocumented, ucs2 encoding
crypto.createHash('sha256').update(text, 'ucs2')
© IronGeek

Get Byte array from very large integer represented as string

I have a very large prime number (for RSA purposes) that needs to be converted to a byte array. The number however is currently stored as a string. I'm OK with storing it as a byte[] but either way the number is a string and I have to get it into a byte array.
Now to be clear I have used the RSA encryption and decryption sample data provided on MSDN and everything works so I have a high degree of confidence that the encryption portion is fine. Further the samples provided by MSDN provide prime numbers that have already been turned into byte[]. Thus I have a high degree of confidence that the breakdown is in MY conversion of the string representation of the number to a byte[].
I currently do this:
private static string _publicKeyExponent = "12345...310 digits......9876";
private static string _publicKeyModulus = "654782....620 digits.....4576";
_rsaPublicKey.Exponent = CoreHelpers.GetBytes(_publicKeyExponent);
And here is my GetBytes method that I suspect is causing the issue as it is getting the bytes of STRING characters NOT digits.
public static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
Now if I have already identified the problem fixing should be straight forward no? Well for me yes and no. I don't know of any strong type in c# that I can parse a number of this size into. The best idea I can come up with is to break up the string into smaller chunks of say 10 chars which would then easily parse to INT32 and then getbytes of that. Add it to some master byte array and do it again.

You could use the BigInteger struct.
It contains numerous Parse static methods and the ToByteArray method.
Sample code:
public static byte[] GetBytes(string str)
{
BigInteger number;
return BigInteger.TryParse(str, out number) ? number.ToByteArray() : null;
}

PInvoke char* in C DLL handled as String in C#. Issue with null characters

The function in C DLL looks like this:
int my_Funct(char* input, char* output);
I must call this from C# app. I do this in the following way:
...DllImport stuff...
public static extern int my_Funct(string input, string output);
The input string is perfectly transmitted to the DLL (I have visible proof of that). The output that the function fills out although is wrong. I have hexa data in it, like:
3F-D9-00-01
But unfortunately everything that is after the two zeros is cut, and only the first two bytes come to my C# app. It happens, because (I guess) it treats as null character and takes it as the end of the string.
Any idea how could I get rid of it? I tried to specifiy it as out IntPtr instead of a string, but I don't know what to do with it afterwards.
I tried to do after:
byte[] b1 = new byte[2];
Marshal.Copy(output,b1,0,2);
2 should be normally the length of the byte array. But I get all kind of errors: like "Requested range extends past the end of the array." or "Attempted to read or write protected memory..."
I appreciate any help.

Your marshalling of the output string is incorrect. Using string in the p/invoke declaration is appropriate when passing data from managed to native. But you cannot use that when the data flows in the other direction. Instead you need to use StringBuilder. Like this:
[DllImport(...)]
public static extern int my_Funct(string input, StringBuilder output);
Then allocate the memory for output:
StringBuilder output = new StringBuilder(256);
//256 is the capacity in characters - only you know how large a buffer is needed
And then you can call the function.
int retval = my_Funct(inputStr, output);
string outputStr = output.ToString();
On the other hand, if these parameters have null characters in them then you cannot marshal as string. That's because the marshaller won't marshal anything past the null. Instead you need to marshal it as a byte array.
public static extern int my_Funct(
[In] byte[] input,
[Out] byte[] output
);
That matches your C declaration.
Then assuming the ANSI encoding you convert the input string to a byte array like this:
byte[] input = Encoding.Default.GetBytes(inputString);
If you want to use a different encoding, it's obvious how to do so.
And for the output you do need to allocate the array. Assuming it's the same length as the input you would do this:
byte[] output = new byte[input.Length];
And somehow your C function has got to know the length of the arrays. I'll leave that bit to you!
Then you can call the function
int retval = my_Funct(input, output);
And then to convert the output array back to a C# string you use the Encoding class again.
string outputString = Encoding.Default.GetString(output);

How can I convert sbyte[] to base64 string?

How can I convert sbyte[] to base64 string?
I cannot convert that sbyte[] to a byte[], to keep interoperability with java.

You absolutely can convert the sbyte[] to a byte[] - I can pretty much guarantee you that the Java code will really be treating the byte array as unsigned. (Or to put it another way: base64 is only defined in terms of unsigned bytes...)
Just convert to byte[] and call Convert.ToBase64String. Converting to byte[] is actually really easy - although C# itself doesn't provide a conversion between the two, the CLR is quite happy to perform a reference conversion, so you just need to fool the C# compiler:
sbyte[] x = { -1, 1 };
byte[] y = (byte[]) (object) x;
Console.WriteLine(Convert.ToBase64String(y));
If you want to have a genuine byte[] you can copy:
byte[] y = new byte[x.Length];
Buffer.BlockCopy(x, 0, y, 0, y.Length);
but personally I'd stick with the first form.

class Program
{
static void Main()
{
sbyte[] signedByteArray = { -2, -1, 0, 1, 2 };
byte[] unsignedByteArray = (byte[])(Array)signedByteArray;
Console.WriteLine(Convert.ToBase64String(unsignedByteArray));
}
}

C# - RSACryptoServiceProvider Decrypt into a SecureString instead of byte array

I have a method that currently returns a string converted from a byte array:
public static readonly UnicodeEncoding ByteConverter = new UnicodeEncoding();
public static string Decrypt(string textToDecrypt, string privateKeyXml)
{
if (string.IsNullOrEmpty(textToDecrypt))
{
throw new ArgumentException(
"Cannot decrypt null or blank string"
);
}
if (string.IsNullOrEmpty(privateKeyXml))
{
throw new ArgumentException("Invalid private key XML given");
}
byte[] bytesToDecrypt = Convert.FromBase64String(textToDecrypt);
byte[] decryptedBytes;
using (var rsa = new RSACryptoServiceProvider())
{
rsa.FromXmlString(privateKeyXml);
decryptedBytes = rsa.Decrypt(bytesToDecrypt, FOAEP);
}
return ByteConverter.GetString(decryptedBytes);
}
I'm trying to update this method to instead return a SecureString, but I'm having trouble converting the return value of RSACryptoServiceProvider.Decrypt from byte[] to SecureString. I tried the following:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
char[] chars = ByteConverter.GetChars(new[] { b });
if (chars.Length != 1)
{
throw new Exception(
"Could not convert a single byte into a single char"
);
}
secStr.AppendChar(chars[0]);
}
return secStr;
However, using this SecureString equality tester, the resulting SecureString was not equal to the SecureString constructed from the original, unencrypted text. My Encrypt and Decrypt methods worked before, when I was just using string everywhere, and I've also tested the SecureString equality code, so I'm pretty sure the problem here is how I'm trying to convert byte[] into SecureString. Is there another route I should take for using RSA encryption that would allow me to get back a SecureString when I decrypt?
Edit: I didn't want to convert the byte array to a regular string and then stuff that string into a SecureString, because that seems to defeat the point of using a SecureString in the first place. However, is it also bad that Decrypt returns byte[] and I'm then trying to stuff that byte array into a SecureString? It's my guess that if Decrypt returns a byte[], then that's a safe way to pass around sensitive information, so converting one secure representation of the data to another secure representation seems okay.

A char and a byte can be used interchangeably with casting, so modify your second chunk of code as such:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
secStr.AppendChar((char)b);
}
return secStr;
This should work properly, but keep in mind that you're still bringing the unencrypted information into the "clear" in memory, so there's a point at which it could be compromised (which sort of defeats the purpose to a SecureString).
** Update **
A byte[] of your sensitive information is not secure. You can look at it in memory and see the information (especially if it's just a string). The individual bytes will be in the exact order of the string, so 'read'ing it is pretty straight-forward.
I was (actually about an hour ago) just struggling with this same issue myself, and as far as I know there is no good way to go straight from the decrypter to the SecureString unless the decryter is specifically programmed to support this strategy.

I think the problem might be your ByteConvert.GetChars method. I can't find that class or method in the MSDN docs. I'm not sure if that is a typo, or a homegrown function. Regardless, it is mostly likely not interpreting the encoding of the bytes correctly. Instead, use the UTF8Encoding's GetChars method. It will properly convert the bytes back into a .NET string, assuming they were encrypted from a .NET string object originally. (If not, you'll want to use the GetChars method on the encoding that matches the original string.)
You're right that using arrays is the most secure approach. Because the decrypted representations of your secret are stored in byte or char arrays, you can easily clear them out when done, so your plaintext secret isn't left in memory. This isn't perfectly secure, but more secure than converting to a string. Strings can't be changed and they stay in memory until they are garbage collected at some indeterminate future time.
var secStr = new SecureString();
var chars = System.Text.Encoding.UTF8.GetChars(decryptedBytes);
for( int idx = 0; idx < chars.Length; ++idx )
{
secStr.AppendChar(chars[idx]);
# Clear out the chars as you go.
chars[idx] = 0
}
# Clear the decrypted bytes from memory, too.
Array.Clear(decryptedBytes, 0, decryptedBytes.Length);
return secStr;

Based on Coding Gorilla's answer, I tried the following in my Decrypt method:
string decryptedString1 = string.Empty;
foreach (byte b in decryptedBytes)
{
decryptedString1 += (char)b;
}
string decryptedString2 = ByteConverter.GetString(decryptedBytes);
When debugging, decryptedString1 and decryptedString2 were not equal:
decryptedString1 "m\0y\0V\0e\0r\0y\0L\0o\0n\0g\0V\03\0r\0y\05\03\0c\0r\03\07\0p\04\0s\0s\0w\00\0r\0d\0!\0!\0!\0"
decryptedString2 "myVeryLongV3ry53cr37p4ssw0rd!!!"
So it looks like I can just go through the byte[] array, do a direct cast to char, and skip \0 characters. Like Coding Gorilla said, though, this does seem to again in part defeat the point of SecureString, because the sensitive data is floating about in memory in little byte-size chunks. Any suggestions for getting RSACryptoServiceProvider.Decrypt to return a SecureString directly?
Edit: yep, this works:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
var c = (char)b;
if ('\0' == c)
{
continue;
}
secStr.AppendChar(c);
}
return secStr;
Edit: correction: this works with plain old English strings. Encrypting and then attempting to decrypt the string "標準語 明治維新 english やった" doesn't work as expected because the resulting decrypted string, using this foreach (byte b in decryptedBytes) technique, does not match the original unencrypted string.
Edit: using the following works for both:
var secStr = new SecureString();
foreach (char c in ByteConverter.GetChars(decryptedBytes))
{
secStr.AppendChar(c);
}
return secStr;
This still leaves a byte array and a char array of the password in memory, which sucks. Maybe I should find another RSA class that returns a SecureString. :/

What if you stuck to UTF-16?
Internally, .NET (and therefore, SecureString) uses UTF-16 (double byte) to store string contents. You could take advantage of this and translate your protected data two bytes (i.e. 1 char) at a time...
When you encrypt, peel off a Char, and use Encoding.UTF16.GetBytes() to get your two bytes, and push those two bytes into your encryption stream. In reverse, when you are reading from your encrypted stream, read two bytes at a time, and UTF16.GetString() to get your char.
It probably sounds awful, but it keeps all the characters of your secret string from being all in one place, AND it gives you the reliability of character "size" (you won't have to guess if the next single byte is a char, or a UTF marker for a double-wide char). There's no way for an observer to know which characters go with which, nor in which order, so guessing the secret should be near impossible.
Honestly, this is just a suggested idea... I'm about to try it myself, and see how viable it is. My goal is to produce extension methods (SecureString.Encrypt and ICrypto.ToSecureString, or something like that).

Use System.Encoding.Default.GetString
GetString MSDN

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Encode unicode string as byte array C++ and C# - c#

Related

SHA-256 NodeJS vs .NET C#

Get Byte array from very large integer represented as string

PInvoke char* in C DLL handled as String in C#. Issue with null characters

How can I convert sbyte[] to base64 string?

C# - RSACryptoServiceProvider Decrypt into a SecureString instead of byte array

Categories

Resources