Converting byte array to string with correct encoding

Converting byte array to string with correct encoding - c#

I have this bit of C# code that I have translated to VB using http://www.developerfusion.com/tools/convert/csharp-to-vb/
private string DecodeToken (string token, string key)
{
byte [] buffer = new byte[0];
string decoded = "";
int i;
if (Scramble (Convert.FromBase64String(token), key, ref buffer))
{
for (i=0;i<buffer.Length;i++)
{
decoded += Convert.ToString((char)buffer[i]);
}
}
return(decoded);
}
Which, after a little modification, gives this:
Private Function DecodeToken(token As String, key As String) As String
Dim buffer As Byte()
Dim decoded As String = ""
Dim index As Integer
If Scramble(Convert.FromBase64String(token), key, buffer) Then
For index = 0 To buffer.Length - 1
decoded += Convert.ToString(ChrW(buffer(index)))
Next
'decoded = UTF8Encoding.ASCII.GetString(pbyBuffer)
'decoded = UnicodeEncoding.ASCII.GetString(pbyBuffer)
'decoded = ASCIIEncoding.ASCII.GetString(pbyBuffer)
End If
Return decoded
End Function
Scramble just rearranges the array in a specific way and I've checked the VB and C# outputs against each other so it can be ignored. It's inputs and outputs are byte arrays so it shouldn't affect the encoding.
The problem lies in that the result of this function is fed into a hashing algorithm which is then compared against the hashing signature. The result of the VB version, when hashed, does not match to the signature.
You can see from the comments that I've attempted to use different encodings to get the byte buffer out as a string but none of these have worked.
The problem appears to lie in the transalation of decoded += Convert.ToString((char)buffer[i]); to decoded += Convert.ToString(ChrW(buffer(index))).
Does ChrW produce the same result as casting as a char and which encoding will correctly duplicate the reading of the byte array?
Edit: I always have Option Strict On but it's possible that the original C# doesn't so it may be affected by implicit conversion. What does the compiler do in that situation?

Quick answer
decoded += Convert.ToString((char)buffer[i]);
is equivalent to
decoded &= Convert.ToString(Chr(buffer[i]));
VB.Net stops you taking the hacky approach used in the c# code, a Char is Unicode so consists of two bytes.
This looks likes a better implementation of what you have.
Private Function DecodeToken(encodedToken As String, key As String) As String
Dim scrambled = Convert.FromBase64String(encodedToken)
Dim buffer As Byte()
Dim index As Integer
If Not Scramble(scrambled, key, buffer) Then
Return Nothing
End If
Dim descrambled = new StringBuilder(buffer.Length);
For index = 0 To buffer.Length - 1
descrambled.Append(Chr(buffer(index)))
Next
Return descrambled.ToString()
End Function

have you tried the most direct code translation:
decoded += Convert.ToString(CType(buffer[i], char))
When covnerting a byte array to a string you should really make sure you know the encoding first though. If this is set in whatever is providing the byte array then you should use that to decode the string.
For more details on the ChrW (and Chr) functions look at http://msdn.microsoft.com/en-us/library/613dxh46%28v=vs.80%29.aspx . In essence ChrW assumes that the passed int is a unicode codepoint which may not be a valid assumption (I believe from 0 to 127 this wouldn't matter but the upper half of the byte might be different). if this is the problem then it will likely be accented and other such "special" characters that are causing the problem.

Give the following a go:
decoded += Convert.ToChar(foo)
It will work (unlike my last attempt that made assumptions about implicit conversions being framework specific and not language specific) but I can't guarantee that it will be the same as the .NET.
Given you say in comments you expected to use Encoding.xxx.GetString then why don't you use that? Do you know what the encoding was in the original string to byte array? If so then just use that. It is the correct way to convert a byte array to a string anyway since doing it byte by byte will definitely break for any multi-byte characters (clearly).

A small improvement
Private Function DecodeToken(encodedToken As String, key As String) As String
Dim scrambled = Convert.FromBase64String(encodedToken)
Dim buffer As Byte()
Dim index As Integer
If Not Scramble(scrambled, key, buffer) Then
Return Nothing
End If
Dim descrambled = System.Text.Encoding.Unicode.GetString(buffer, 0, buffer.Length);
Return descrambled
End Function

Related

Considering Endianness when converting bytes to string(hex)

Is there any case when this test could fail? Would it fail on BigEndian machine? Is ByteArrayToHexString LittleEndian and why (chars seem to be written from left to right, so it must be BigEndian)?
[Fact]
public void ToMd5Bytes2_ValidValue_Converted()
{
// Arrange
// Act
var bytes = new byte[] {16, 171, 205, 239};
var direct = ByteArrayToHexString(bytes);
var bitConverter = BitConverter.ToString(bytes).Replace("-", string.Empty);
var convert = Convert.ToHexString(bytes);
// Assert
Assert.Equal(direct, bitConverter);
Assert.Equal(bitConverter, convert);
}
public static string ByteArrayToHexString(byte[] Bytes)
{
StringBuilder Result = new StringBuilder(Bytes.Length * 2);
string HexAlphabet = "0123456789ABCDEF";
foreach (byte B in Bytes)
{
Result.Append(HexAlphabet[(int)(B >> 4)]);
Result.Append(HexAlphabet[(int)(B & 0xF)]);
}
return Result.ToString();
}

Only multi-byte values are affected by endianness, like Int32.
Arrays of bytes are not - they're already a defined sequence of single bytes.Of course, it matters how you retrieved this byte array - if it is the result of converting a multi-byte value, you must have done that conversion with the appropriate endianness. Otherwise you'd have to reverse each slice of your array which originally represented a multi-byte value.
Also, endianness does not happen on the bit level, a misconception I see every now and then.
In the comments, you mentioned the remarks sections of the BitConverter.ToString documentation:
All the elements of value are converted. The order of hexadecimal strings returned by the ToString method depends on whether the computer architecture is little-endian or big-endian.
Looking at the reference source, I do not see where endianness is having an effect on its operation on byte arrays. This comment is either outdated or misleading, and I've opened an issue on it.

How do I properly emit binary data from a SecureString, so that it can later be converted to a string?

I have strings of sensitive information that I need to collect from my users. I am using a WPF PasswordBox to request this information. For the uninitiated, the PasswordBox control provides a SecurePassword property which is a SecureString object rather than an insecure string object. Within my application, the data from the PasswordBox gets passed as a SecureString to an encryption method.
What I need to be able to do is marshal the data to a byte array that essentially represents a .Net string value without first converting the data to a .Net string. More specifically, given a SecureString with a value such as...
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!##$%^&*()_-+={[}]|:;"'<,>.?/ ≈篭母
...how can I convert it to a byte array that is the equivalent a .Net string that's been serialized and written to a stream with a StreamWriter?
By using Marshal.SecureStringToCoTaskMemUnicode(...) I am able to do this with more traditional, western text. However, when I created the above text string using additional, not-typical characters and a string of Japanese text (see the last few bolded characters) my method of getting a Unicode byte array assigned to the IntPtr position doesn't seem to properly work anymore.
How can I emit the data of a SecureString in a secure way such that the returned byte data is structured the same as the byte data of a standard .Net string, serialized to binary output?
NOTE
Please ignore all security concerns at the moment. I am working on making various security upgrades to my application. For now, I need to use a SecureString for getting the sensitive data to the encryptor. The decryptor (for now) will still need to decrypt this data to string values, which is why I need to some how serialize the data in the the SecureString to a binary format similar to the binary format of the string object.
I agree that this approach is a bit unfortunate, however, I'm having to make incremental improvements on an existing application, and the first phase is locking down the data in SecureString objects from the user to the encryptor.

If you need to write secure string to stream, I'd suggest to create method like this:
public static class Extensions {
public static void WriteSecure(this StreamWriter writer, SecureString sec) {
int length = sec.Length;
if (length == 0)
return;
IntPtr ptr = Marshal.SecureStringToBSTR(sec);
try {
// each char in that string is 2 bytes, not one (it's UTF-16 string)
for (int i = 0; i < length * 2; i += 2) {
// so use ReadInt16 and convert resulting "short" to char
var ch = Convert.ToChar(Marshal.ReadInt16(ptr + i));
// write
writer.Write(ch);
}
}
finally {
// don't forget to zero memory
Marshal.ZeroFreeBSTR(ptr);
}
}
}
If you really need byte array - you can reuse this method too:
byte[] result;
using (var ms = new MemoryStream()) {
using (var writer = new StreamWriter(ms)) {
writer.WriteSecure(secureString);
}
result = ms.ToArray();
}
Though method from first comment might be a bit more pefomant (not sure if that's important for you).

Get Byte array from very large integer represented as string

I have a very large prime number (for RSA purposes) that needs to be converted to a byte array. The number however is currently stored as a string. I'm OK with storing it as a byte[] but either way the number is a string and I have to get it into a byte array.
Now to be clear I have used the RSA encryption and decryption sample data provided on MSDN and everything works so I have a high degree of confidence that the encryption portion is fine. Further the samples provided by MSDN provide prime numbers that have already been turned into byte[]. Thus I have a high degree of confidence that the breakdown is in MY conversion of the string representation of the number to a byte[].
I currently do this:
private static string _publicKeyExponent = "12345...310 digits......9876";
private static string _publicKeyModulus = "654782....620 digits.....4576";
_rsaPublicKey.Exponent = CoreHelpers.GetBytes(_publicKeyExponent);
And here is my GetBytes method that I suspect is causing the issue as it is getting the bytes of STRING characters NOT digits.
public static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
Now if I have already identified the problem fixing should be straight forward no? Well for me yes and no. I don't know of any strong type in c# that I can parse a number of this size into. The best idea I can come up with is to break up the string into smaller chunks of say 10 chars which would then easily parse to INT32 and then getbytes of that. Add it to some master byte array and do it again.

You could use the BigInteger struct.
It contains numerous Parse static methods and the ToByteArray method.
Sample code:
public static byte[] GetBytes(string str)
{
BigInteger number;
return BigInteger.TryParse(str, out number) ? number.ToByteArray() : null;
}

C# - RSACryptoServiceProvider Decrypt into a SecureString instead of byte array

I have a method that currently returns a string converted from a byte array:
public static readonly UnicodeEncoding ByteConverter = new UnicodeEncoding();
public static string Decrypt(string textToDecrypt, string privateKeyXml)
{
if (string.IsNullOrEmpty(textToDecrypt))
{
throw new ArgumentException(
"Cannot decrypt null or blank string"
);
}
if (string.IsNullOrEmpty(privateKeyXml))
{
throw new ArgumentException("Invalid private key XML given");
}
byte[] bytesToDecrypt = Convert.FromBase64String(textToDecrypt);
byte[] decryptedBytes;
using (var rsa = new RSACryptoServiceProvider())
{
rsa.FromXmlString(privateKeyXml);
decryptedBytes = rsa.Decrypt(bytesToDecrypt, FOAEP);
}
return ByteConverter.GetString(decryptedBytes);
}
I'm trying to update this method to instead return a SecureString, but I'm having trouble converting the return value of RSACryptoServiceProvider.Decrypt from byte[] to SecureString. I tried the following:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
char[] chars = ByteConverter.GetChars(new[] { b });
if (chars.Length != 1)
{
throw new Exception(
"Could not convert a single byte into a single char"
);
}
secStr.AppendChar(chars[0]);
}
return secStr;
However, using this SecureString equality tester, the resulting SecureString was not equal to the SecureString constructed from the original, unencrypted text. My Encrypt and Decrypt methods worked before, when I was just using string everywhere, and I've also tested the SecureString equality code, so I'm pretty sure the problem here is how I'm trying to convert byte[] into SecureString. Is there another route I should take for using RSA encryption that would allow me to get back a SecureString when I decrypt?
Edit: I didn't want to convert the byte array to a regular string and then stuff that string into a SecureString, because that seems to defeat the point of using a SecureString in the first place. However, is it also bad that Decrypt returns byte[] and I'm then trying to stuff that byte array into a SecureString? It's my guess that if Decrypt returns a byte[], then that's a safe way to pass around sensitive information, so converting one secure representation of the data to another secure representation seems okay.

A char and a byte can be used interchangeably with casting, so modify your second chunk of code as such:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
secStr.AppendChar((char)b);
}
return secStr;
This should work properly, but keep in mind that you're still bringing the unencrypted information into the "clear" in memory, so there's a point at which it could be compromised (which sort of defeats the purpose to a SecureString).
** Update **
A byte[] of your sensitive information is not secure. You can look at it in memory and see the information (especially if it's just a string). The individual bytes will be in the exact order of the string, so 'read'ing it is pretty straight-forward.
I was (actually about an hour ago) just struggling with this same issue myself, and as far as I know there is no good way to go straight from the decrypter to the SecureString unless the decryter is specifically programmed to support this strategy.

I think the problem might be your ByteConvert.GetChars method. I can't find that class or method in the MSDN docs. I'm not sure if that is a typo, or a homegrown function. Regardless, it is mostly likely not interpreting the encoding of the bytes correctly. Instead, use the UTF8Encoding's GetChars method. It will properly convert the bytes back into a .NET string, assuming they were encrypted from a .NET string object originally. (If not, you'll want to use the GetChars method on the encoding that matches the original string.)
You're right that using arrays is the most secure approach. Because the decrypted representations of your secret are stored in byte or char arrays, you can easily clear them out when done, so your plaintext secret isn't left in memory. This isn't perfectly secure, but more secure than converting to a string. Strings can't be changed and they stay in memory until they are garbage collected at some indeterminate future time.
var secStr = new SecureString();
var chars = System.Text.Encoding.UTF8.GetChars(decryptedBytes);
for( int idx = 0; idx < chars.Length; ++idx )
{
secStr.AppendChar(chars[idx]);
# Clear out the chars as you go.
chars[idx] = 0
}
# Clear the decrypted bytes from memory, too.
Array.Clear(decryptedBytes, 0, decryptedBytes.Length);
return secStr;

Based on Coding Gorilla's answer, I tried the following in my Decrypt method:
string decryptedString1 = string.Empty;
foreach (byte b in decryptedBytes)
{
decryptedString1 += (char)b;
}
string decryptedString2 = ByteConverter.GetString(decryptedBytes);
When debugging, decryptedString1 and decryptedString2 were not equal:
decryptedString1 "m\0y\0V\0e\0r\0y\0L\0o\0n\0g\0V\03\0r\0y\05\03\0c\0r\03\07\0p\04\0s\0s\0w\00\0r\0d\0!\0!\0!\0"
decryptedString2 "myVeryLongV3ry53cr37p4ssw0rd!!!"
So it looks like I can just go through the byte[] array, do a direct cast to char, and skip \0 characters. Like Coding Gorilla said, though, this does seem to again in part defeat the point of SecureString, because the sensitive data is floating about in memory in little byte-size chunks. Any suggestions for getting RSACryptoServiceProvider.Decrypt to return a SecureString directly?
Edit: yep, this works:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
var c = (char)b;
if ('\0' == c)
{
continue;
}
secStr.AppendChar(c);
}
return secStr;
Edit: correction: this works with plain old English strings. Encrypting and then attempting to decrypt the string "標準語 明治維新 english やった" doesn't work as expected because the resulting decrypted string, using this foreach (byte b in decryptedBytes) technique, does not match the original unencrypted string.
Edit: using the following works for both:
var secStr = new SecureString();
foreach (char c in ByteConverter.GetChars(decryptedBytes))
{
secStr.AppendChar(c);
}
return secStr;
This still leaves a byte array and a char array of the password in memory, which sucks. Maybe I should find another RSA class that returns a SecureString. :/

What if you stuck to UTF-16?
Internally, .NET (and therefore, SecureString) uses UTF-16 (double byte) to store string contents. You could take advantage of this and translate your protected data two bytes (i.e. 1 char) at a time...
When you encrypt, peel off a Char, and use Encoding.UTF16.GetBytes() to get your two bytes, and push those two bytes into your encryption stream. In reverse, when you are reading from your encrypted stream, read two bytes at a time, and UTF16.GetString() to get your char.
It probably sounds awful, but it keeps all the characters of your secret string from being all in one place, AND it gives you the reliability of character "size" (you won't have to guess if the next single byte is a char, or a UTF marker for a double-wide char). There's no way for an observer to know which characters go with which, nor in which order, so guessing the secret should be near impossible.
Honestly, this is just a suggested idea... I'm about to try it myself, and see how viable it is. My goal is to produce extension methods (SecureString.Encrypt and ICrypto.ToSecureString, or something like that).

Use System.Encoding.Default.GetString
GetString MSDN

Difference in writing string vs. char array with System.IO.BinaryWriter

I’m writing text to a binary file in C# and see a difference in quantity written between writing a string and a character array. I’m using System.IO.BinaryWriter and watching BinaryWriter.BaseStream.Length as the writes occur. These are my results:
using(BinaryWriter bw = new BinaryWriter(File.Open(“data.dat”), Encoding.ASCII))
{
string value = “Foo”;
// Writes 4 bytes
bw.Write(value);
// Writes 3 bytes
bw.Write(value.ToCharArray());
}
I don’t understand why the string overload writes 4 bytes when I’m writing only 3 ASCII characters. Can anyone explain this?

The documentation for BinaryWriter.Write(string) states that it writes a length-prefixed string to this stream. The overload for Write(char[]) has no such prefixing.
It would seem to me that the extra data is the length.
EDIT:
Just to be a bit more explicit, use Reflector. You will see that it has this piece of code in there as part of the Write(string) method:
this.Write7BitEncodedInt(byteCount);
It is a way to encode an integer using the least possible number of bytes. For short strings (that we would use day to day that are less than 128 characters), it can be represented using one byte. For longer strings, it starts to use more bytes.
Here is the code for that function just in case you are interested:
protected void Write7BitEncodedInt(int value)
{
uint num = (uint) value;
while (num >= 0x80)
{
this.Write((byte) (num | 0x80));
num = num >> 7;
}
this.Write((byte) num);
}
After prefixing the the length using this encoding, it writes the bytes for the characters in the desired encoding.

From the BinaryWriter.Write(string) docs:
Writes a length-prefixed string to this stream in the current encoding of the BinaryWriter, and advances the current position of the stream in accordance with the encoding used and the specific characters being written to the stream.
This behavior is probably so that when reading the file back in using a BinaryReader the string can be identified. (e.g. 3Foo3Bar6Foobar can be parsed into the string "Foo", "Bar" and "Foobar" but FooBarFoobar could not be.) In fact, BinaryReader.ReadString uses exactly this information to read a string from a binary file.
From the BinaryWriter.Write(char[]) docs:
Writes a character array to the current stream and advances the current position of the stream in accordance with the Encoding used and the specific characters being written to the stream.
It is hard to overstate how comprehensive and useful the docs on MSDN are. Always check them first.

As already stated, BinaryWriter.Write(String) writes the length of the string to the stream, before writing the string itself.
This allows the BinaryReader.ReadString() to know how long the string is.
using (BinaryReader br = new BinaryReader(File.OpenRead("data.dat")))
{
string foo1 = br.ReadString();
char[] foo2 = br.ReadChars(3);
}

Did you look at what was actually written? I'd guess a null terminator.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.