Storing byte arrays in a SecureString fails sporadically - c#

Before somebody comes up with the "why would you do that" be contempt with the fact that I need that. If not, then I simply want a secure byte array, think of it as a binary passphrase. Now on to the stuff!
I have the following extension method to get a SecureString instance from data in a byte array:
public static SecureString FromByteArray(this SecureString secure, byte[] data)
{
secure.Clear(); // throw exception if IsReadOnly
char[] chars = new char[data.Length];
chars = data.Select(c => (char)c).ToArray();
foreach (char c in chars)
{
secure.AppendChar(c);
}
return secure;
}
Then somewhere in StackOverflow I found the following that dumps a SecureString to a byte[]:
public static byte[] ToInsecureBytes(this SecureString securePassword)
{
byte[] bytes = null;
if (null != securePassword)
{
GCHandle? gc = null;
var handle = IntPtr.Zero;
var length = securePassword.Length;
RuntimeHelpers.PrepareConstrainedRegions();
try
{
handle = Marshal.SecureStringToGlobalAllocAnsi(securePassword);
bytes = new byte[length];
gc = GCHandle.Alloc(bytes, GCHandleType.Pinned);
for (var i = 0; i < length; i++)
{
bytes[i] = Marshal.ReadByte(handle, i);
}
}
finally
{
if (handle != IntPtr.Zero)
{
Marshal.ZeroFreeGlobalAllocAnsi(handle);
}
if (null != gc &&
gc.HasValue)
{
gc.Value.Free();
}
}
}
return bytes;
}
Do notice that the code is independent of string encoding because we are considering the data as a mere binary but using the SecureString features.
Then I have this test code in LinqPad to prove the concept:
byte[] byteData = Generate256BitsOfRandomEntropy(); // byte[32]
BitConverter.ToString(byteData).Dump();
SecureString pwd4 = pass.FromByteArray(byteData);
byte[] byteOut = pwd4.ToInsecureBytes();
BitConverter.ToString(byteOut).Dump();
Console.WriteLine("Conversion successful: {0}", byteData.SequenceEqual(byteOut));
But when run as round trip conversion I notice that they are not the same, always there is a mismatch in the sequence and it is always a character with hex code 0x3F which appears mysteriously in random places of the byteOut array.
07-F7-90-67-A7-46-1E-F9-CE-44-91-46-44-9D-B9-81-75-7B-27-43-34-C3-F9-38-AC-E7-F9-E6-F1-7F-29-84
07-F7-90-67-A7-46-1E-F9-CE-44-3F-46-44-9D-B9-81-75-7B-27-43-34-C3-F9-38-AC-E7-F9-E6-F1-7F-29-3F
Conversion successful: False

Related

How to XOR a MD5 hash and return a 32 character string?

How do I further encrypt a MD5 hash by XOR'ing it with a string of variable size (not bigger than 32 characters) ?
I would like the result of the XOR to be a 32 character string as well.
What i have tried so far is:
convert the md5 string to binary
convert second string to binary
pad second binary with 0's (to the left) until both binaries are of equal length
iterate the binary representations and XOR them
convert the XOR'ed result to a string
The approach may be wrong, im not sure how to do it. My problem is, when converting the result of the XOR, it is not a 32 character long string, as I would like it to be.
Sample code (equal length strings in this case):
class Program
{
static void Main(string[] args)
{
var md51 = ToBinary(ConvertToByteArray(CalculateMD5Hash("Maaa"), Encoding.ASCII));
var md52 = ToBinary(ConvertToByteArray(CalculateMD5Hash("Moo"), Encoding.ASCII));
List<int> xoredResult = new List<int>();
for (int i = 0; i < md51.Length; i++)
{
var string1 = md51[i];
var string2 = md52[i];
var xor = string1 ^ string2;
xoredResult.Add(xor);
}
var resultingString = string.Join("", xoredResult);
Console.WriteLine(resultingString.Length);
var data = GetBytesFromBinaryString(resultingString);
var text = Encoding.ASCII.GetString(data);
}
public static byte[] ConvertToByteArray(string str, Encoding encoding)
{
return encoding.GetBytes(str);
}
public static String ToBinary(Byte[] data)
{
return string.Join("", data.Select(byt => Convert.ToString(byt, 2).PadLeft(8, '0')));
}
public static Byte[] GetBytesFromBinaryString(String binary)
{
var list = new List<Byte>();
for (int i = 0; i < binary.Length; i += 8)
{
String t = binary.Substring(i, 8);
list.Add(Convert.ToByte(t, 2));
}
return list.ToArray();
}
public static string CalculateMD5Hash(string input)
{
// step 1, calculate MD5 hash from input
MD5 md5 = System.Security.Cryptography.MD5.Create();
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
byte[] hash = md5.ComputeHash(inputBytes);
// step 2, convert byte array to hex string
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hash.Length; i++)
{
sb.Append(hash[i].ToString("X2"));
}
return sb.ToString();
}
}
xoring a string with what is essentially random bytes is not guaranteed to give you a valid string as a output. Your var text = Encoding.ASCII.GetString(data); is likely failing because you are passing it a non valid string in byte form. You must use something like var text = Convert.ToBase64String(data) to be able to represent the random data without loss of information in the process.

Getting from a string of bytes from this byte array

at С# I have to I have an array of bytes(of two numbers - -0.2210166 and 1.41497) that I was placed in a string separated by commas:
String data = "47,82,98,190,188,29,181,63";
I get it:
protected BinaryWriter writer;
public void writeFloat(float val) {
writer.Write (val);
}
public byte[] getBytes() {
return buffer.ToArray ();
}
I am trying to get back this array of bytes at Java:
String[] array = data.split(",");
List<Byte> listOfBytes = new ArrayList<Byte>();
for (String str : array) {
listOfBytes.add(Byte.valueOf(str));
}
Byte[] B = listOfBytes.toArray(new Byte[listOfBytes.size()]);
byte[] stateData = new byte[B.length];
for (int i = 0; i < B.length; i++) {
stateData[i] = B[i];
}
in this place I get an error that can't get a value greater than permissible in bytes:
listOfBytes.add(Byte.valueOf(str));
for example when str = 190 from data. I understand why i got the error. But i dont understand why buffer at C# writes an array of 4 bytes, with these values, of which I can not get back his bytes.
Any ideas guys? I'm confused...

Hashing a SecureString in .NET

In .NET, we have the SecureString class, which is all very well until you come to try and use it, as to (for example) hash the string, you need the plaintext. I've had a go here at writing a function that will hash a SecureString, given a hash function that takes a byte array and outputs a byte array.
private static byte[] HashSecureString(SecureString ss, Func<byte[], byte[]> hash)
{
// Convert the SecureString to a BSTR
IntPtr bstr = Marshal.SecureStringToBSTR(ss);
// BSTR contains the length of the string in bytes in an
// Int32 stored in the 4 bytes prior to the BSTR pointer
int length = Marshal.ReadInt32(bstr, -4);
// Allocate a byte array to copy the string into
byte[] bytes = new byte[length];
// Copy the BSTR to the byte array
Marshal.Copy(bstr, bytes, 0, length);
// Immediately destroy the BSTR as we don't need it any more
Marshal.ZeroFreeBSTR(bstr);
// Hash the byte array
byte[] hashed = hash(bytes);
// Destroy the plaintext copy in the byte array
for (int i = 0; i < length; i++) { bytes[i] = 0; }
// Return the hash
return hashed;
}
I believe this will correctly hash the string, and will correctly scrub any copies of the plaintext from memory by the time the function returns, assuming the provided hash function is well behaved and doesn't make any copies of the input that it doesn't scrub itself. Have I missed anything here?
Have I missed anything here?
Yes, you have, a rather fundamental one at that. You cannot scrub the copy of the array left behind when the garbage collector compacts the heap. Marshal.SecureStringToBSTR(ss) is okay because a BSTR is allocated in unmanaged memory so will have a reliable pointer that won't change. In other words, no problem scrubbing that one.
Your byte[] bytes array however contains the copy of the string and is allocated on the GC heap. You make it likely to induce a garbage collection with the hashed[] array. Easily avoided but of course you have little control over other threads in your process allocating memory and inducing a collection. Or for that matter a background GC that was already in progress when your code started running.
The point of SecureString is to never have a cleartext copy of the string in garbage collected memory. Copying it into a managed array violated that guarantee. If you want to make this code secure then you are going to have to write a hash() method that takes the IntPtr and only reads through that pointer.
Beware that if your hash needs to match a hash computed on another machine then you cannot ignore the Encoding that machine would use to turn the string into bytes.
There's always the possibility of using the unmanaged CryptoApi or CNG functions.
Bear in mind that SecureString was designed with an unmanaged consumer which has full control over memory management in mind.
If you want to stick to C#, you should pin the temporary array to prevent the GC from moving it around before you get a chance to scrub it:
private static byte[] HashSecureString(SecureString input, Func<byte[], byte[]> hash)
{
var bstr = Marshal.SecureStringToBSTR(input);
var length = Marshal.ReadInt32(bstr, -4);
var bytes = new byte[length];
var bytesPin = GCHandle.Alloc(bytes, GCHandleType.Pinned);
try {
Marshal.Copy(bstr, bytes, 0, length);
Marshal.ZeroFreeBSTR(bstr);
return hash(bytes);
} finally {
for (var i = 0; i < bytes.Length; i++) {
bytes[i] = 0;
}
bytesPin.Free();
}
}
As a complement to Hans’ answer here’s a suggestion how to implement the hasher. Hans suggests passing the pointer to the unmanaged string to the hash function but that means that client code (= the hash function) needs to deal with unmanaged memory. That’s not ideal.
On the other hand, you can replace the callback by an instance of the following interface:
interface Hasher {
void Reinitialize();
void AddByte(byte b);
byte[] Result { get; }
}
That way the hasher (although it becomes slightly more complex) can be implemented wholly in managed land without leaking secure information. Your HashSecureString would then look as follows:
private static byte[] HashSecureString(SecureString ss, Hasher hasher) {
IntPtr bstr = Marshal.SecureStringToBSTR(ss);
try {
int length = Marshal.ReadInt32(bstr, -4);
hasher.Reinitialize();
for (int i = 0; i < length; i++)
hasher.AddByte(Marshal.ReadByte(bstr, i));
return hasher.Result;
}
finally {
Marshal.ZeroFreeBSTR(bstr);
}
}
Note the finally block to make sure that the unmanaged memory is zeroed, no matter what shenanigans the hasher instance does.
Here’s a simple (and not very useful) Hasher implementation to illustrate the interface:
sealed class SingleByteXor : Hasher {
private readonly byte[] data = new byte[1];
public void Reinitialize() {
data[0] = 0;
}
public void AddByte(byte b) {
data[0] ^= b;
}
public byte[] Result {
get { return data; }
}
}
As a further complement, could you not wrap the logic #KonradRudolph and #HansPassant supplied into a custom Stream implementation?
This would allow you to use the HashAlgorithm.ComputeHash(Stream) method, which would keep the interface managed (although it would be down to you to dispose the stream in good time).
Of course, you are at the mercy of the HashAlgorithm implementation as to how much data ends up in memory at a time (but, of course, that's what the reference source is for!)
Just an idea...
public class SecureStringStream : Stream
{
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return false; } }
public override bool CanSeek { get { return false; } }
public override long Position
{
get { return _pos; }
set { throw new NotSupportedException(); }
}
public override void Flush() { throw new NotSupportedException(); }
public override long Seek(long offset, SeekOrigin origin) { throw new NotSupportedException(); }
public override void SetLength(long value) { throw new NotSupportedException(); }
public override void Write(byte[] buffer, int offset, int count) { throw new NotSupportedException(); }
private readonly IntPtr _bstr = IntPtr.Zero;
private readonly int _length;
private int _pos;
public SecureStringStream(SecureString str)
{
if (str == null) throw new ArgumentNullException("str");
_bstr = Marshal.SecureStringToBSTR(str);
try
{
_length = Marshal.ReadInt32(_bstr, -4);
_pos = 0;
}
catch
{
if (_bstr != IntPtr.Zero) Marshal.ZeroFreeBSTR(_bstr);
throw;
}
}
public override long Length { get { return _length; } }
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null) throw new ArgumentNullException("buffer");
if (offset < 0) throw new ArgumentOutOfRangeException("offset");
if (count < 0) throw new ArgumentOutOfRangeException("count");
if (offset + count > buffer.Length) throw new ArgumentException("offset + count > buffer");
if (count > 0 && _pos++ < _length)
{
buffer[offset] = Marshal.ReadByte(_bstr, _pos++);
return 1;
}
else return 0;
}
protected override void Dispose(bool disposing)
{
try { if (_bstr != IntPtr.Zero) Marshal.ZeroFreeBSTR(_bstr); }
finally { base.Dispose(disposing); }
}
}
void RunMe()
{
using (SecureString s = new SecureString())
{
foreach (char c in "jimbobmcgee") s.AppendChar(c);
s.MakeReadOnly();
using (SecureStringStream ss = new SecureStringStream(s))
using (HashAlgorithm h = MD5.Create())
{
Console.WriteLine(Convert.ToBase64String(h.ComputeHash(ss)));
}
}
}

Calculate a MD5 hash from a string

I use the following C# code to calculate a MD5 hash from a string.
It works well and generates a 32-character hex string like this:
900150983cd24fb0d6963f7d28e17f72
string sSourceData;
byte[] tmpSource;
byte[] tmpHash;
sSourceData = "MySourceData";
//Create a byte array from source data.
tmpSource = ASCIIEncoding.ASCII.GetBytes(sSourceData);
tmpHash = new MD5CryptoServiceProvider().ComputeHash(tmpSource);
// and then convert tmpHash to string...
Is there a way to use code like this to generate a 16-character hex string (or 12-character string)? A 32-character hex string is good but I think it'll be boring for the customer to enter the code!
As per MSDN
Create MD5:
public static string CreateMD5(string input)
{
// Use input string to calculate MD5 hash
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
byte[] hashBytes = md5.ComputeHash(inputBytes);
return Convert.ToHexString(hashBytes); // .NET 5 +
// Convert the byte array to hexadecimal string prior to .NET 5
// StringBuilder sb = new System.Text.StringBuilder();
// for (int i = 0; i < hashBytes.Length; i++)
// {
// sb.Append(hashBytes[i].ToString("X2"));
// }
// return sb.ToString();
}
}
// given, a password in a string
string password = #"1234abcd";
// byte array representation of that string
byte[] encodedPassword = new UTF8Encoding().GetBytes(password);
// need MD5 to calculate the hash
byte[] hash = ((HashAlgorithm) CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);
// string representation (similar to UNIX format)
string encoded = BitConverter.ToString(hash)
// without dashes
.Replace("-", string.Empty)
// make lowercase
.ToLower();
// encoded contains the hash you want
Was trying to create a string representation of MD5 hash using LINQ, however, none of the answers were LINQ solutions, therefore adding this to the smorgasbord of available solutions.
string result;
using (MD5 hash = MD5.Create())
{
result = String.Join
(
"",
from ba in hash.ComputeHash
(
Encoding.UTF8.GetBytes(observedText)
)
select ba.ToString("x2")
);
}
You can use Convert.ToBase64String to convert 16 byte output of MD5 to a ~24 char string. A little bit better without reducing security. (j9JIbSY8HuT89/pwdC8jlw== for your example)
Depends entirely on what you are trying to achieve. Technically, you could just take the first 12 characters from the result of the MD5 hash, but the specification of MD5 is to generate a 32 char one.
Reducing the size of the hash reduces the security, and increases the chance of collisions and the system being broken.
Perhaps if you let us know more about what you are trying to achieve we may be able to assist more.
I suppose it is better to use UTF-8 encoding in the string MD5.
public static string MD5(this string s)
{
using var provider = System.Security.Cryptography.MD5.Create();
StringBuilder builder = new StringBuilder();
foreach (byte b in provider.ComputeHash(Encoding.UTF8.GetBytes(s)))
builder.Append(b.ToString("x2").ToLower());
return builder.ToString();
}
public static string Md5(string input, bool isLowercase = false)
{
using (var md5 = MD5.Create())
{
var byteHash = md5.ComputeHash(Encoding.UTF8.GetBytes(input));
var hash = BitConverter.ToString(byteHash).Replace("-", "");
return (isLowercase) ? hash.ToLower() : hash;
}
}
Support string and file stream.
examples
string hashString = EasyMD5.Hash("My String");
string hashFile = EasyMD5.Hash(System.IO.File.OpenRead("myFile.txt"));
-
class EasyMD5
{
private static string GetMd5Hash(byte[] data)
{
StringBuilder sBuilder = new StringBuilder();
for (int i = 0; i < data.Length; i++)
sBuilder.Append(data[i].ToString("x2"));
return sBuilder.ToString();
}
private static bool VerifyMd5Hash(byte[] data, string hash)
{
return 0 == StringComparer.OrdinalIgnoreCase.Compare(GetMd5Hash(data), hash);
}
public static string Hash(string data)
{
using (var md5 = MD5.Create())
return GetMd5Hash(md5.ComputeHash(Encoding.UTF8.GetBytes(data)));
}
public static string Hash(FileStream data)
{
using (var md5 = MD5.Create())
return GetMd5Hash(md5.ComputeHash(data));
}
public static bool Verify(string data, string hash)
{
using (var md5 = MD5.Create())
return VerifyMd5Hash(md5.ComputeHash(Encoding.UTF8.GetBytes(data)), hash);
}
public static bool Verify(FileStream data, string hash)
{
using (var md5 = MD5.Create())
return VerifyMd5Hash(md5.ComputeHash(data), hash);
}
}
Idk anything about 16 character hex strings....
using System;
using System.Security.Cryptography;
using System.Text;
But here is mine for creating MD5 hash in one line.
string hash = BitConverter.ToString(MD5.Create().ComputeHash(Encoding.ASCII.GetBytes("THIS STRING TO MD5"))).Replace("-","");
This solution requires c# 8 and takes advantage of Span<T>. Note, you would still need to call .Replace("-", string.Empty).ToLowerInvariant() to format the result if necessary.
public static string CreateMD5(ReadOnlySpan<char> input)
{
var encoding = System.Text.Encoding.UTF8;
var inputByteCount = encoding.GetByteCount(input);
using var md5 = System.Security.Cryptography.MD5.Create();
Span<byte> bytes = inputByteCount < 1024
? stackalloc byte[inputByteCount]
: new byte[inputByteCount];
Span<byte> destination = stackalloc byte[md5.HashSize / 8];
encoding.GetBytes(input, bytes);
// checking the result is not required because this only returns false if "(destination.Length < HashSizeValue/8)", which is never true in this case
md5.TryComputeHash(bytes, destination, out int _bytesWritten);
return BitConverter.ToString(destination.ToArray());
}
Here is my utility function for UTF8, which can be replaced with ASCII if desired:
public static byte[] MD5Hash(string message)
{
return MD5.Create().ComputeHash(Encoding.UTF8.GetBytes(message));
}
A MD5 hash is 128 bits, so you can't represent it in hex with less than 32 characters...
System.Text.StringBuilder hash = new System.Text.StringBuilder();
System.Security.Cryptography.MD5CryptoServiceProvider md5provider = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bytes = md5provider.ComputeHash(new System.Text.UTF8Encoding().GetBytes(YourEntryString));
for (int i = 0; i < bytes.Length; i++)
{
hash.Append(bytes[i].ToString("x2")); //lowerCase; X2 if uppercase desired
}
return hash.ToString();
A faster alternative of existing answer for .NET Core 2.1 and higher:
public static string CreateMD5(string s)
{
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
var encoding = Encoding.ASCII;
var data = encoding.GetBytes(s);
Span<byte> hashBytes = stackalloc byte[16];
md5.TryComputeHash(data, hashBytes, out int written);
if(written != hashBytes.Length)
throw new OverflowException();
Span<char> stringBuffer = stackalloc char[32];
for (int i = 0; i < hashBytes.Length; i++)
{
hashBytes[i].TryFormat(stringBuffer.Slice(2 * i), out _, "x2");
}
return new string(stringBuffer);
}
}
You can optimize it even more if you are sure that your strings are small enough and replace encoding.GetBytes by unsafe int GetBytes(ReadOnlySpan chars, Span bytes) alternative.
Extending Anant Dabhi's answer
a helper method:
using System.Text;
namespace XYZ.Helpers
{
public static class EncryptionHelper
{
public static string ToMD5(this string input)
{
// Use input string to calculate MD5 hash
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
byte[] hashBytes = md5.ComputeHash(inputBytes);
// Convert the byte array to hexadecimal string
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hashBytes.Length; i++)
{
sb.Append(hashBytes[i].ToString("X2"));
}
return sb.ToString();
}
}
}
}
I'd like to offer an alternative that appears to perform at least 10% faster than craigdfrench's answer in my tests (.NET 4.7.2):
public static string GetMD5Hash(string text)
{
using ( var md5 = MD5.Create() )
{
byte[] computedHash = md5.ComputeHash( Encoding.UTF8.GetBytes(text) );
return new System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary(computedHash).ToString();
}
}
If you prefer to have using System.Runtime.Remoting.Metadata.W3cXsd2001; at the top, the method body can be made an easier to read one-liner:
using ( var md5 = MD5.Create() )
{
return new SoapHexBinary( md5.ComputeHash( Encoding.UTF8.GetBytes(text) ) ).ToString();
}
Obvious enough, but for completeness, in OP's context it would be used as:
sSourceData = "MySourceData";
tmpHash = GetMD5Hash(sSourceData);
https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.md5?view=netframework-4.7.2
using System;
using System.Security.Cryptography;
using System.Text;
static string GetMd5Hash(string input)
{
using (MD5 md5Hash = MD5.Create())
{
// Convert the input string to a byte array and compute the hash.
byte[] data = md5Hash.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
StringBuilder sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
return sBuilder.ToString();
}
}
// Verify a hash against a string.
static bool VerifyMd5Hash(string input, string hash)
{
// Hash the input.
string hashOfInput = GetMd5Hash(input);
// Create a StringComparer an compare the hashes.
StringComparer comparer = StringComparer.OrdinalIgnoreCase;
return 0 == comparer.Compare(hashOfInput, hash);
}
StringBuilder sb= new StringBuilder();
for (int i = 0; i < tmpHash.Length; i++)
{
sb.Append(tmpHash[i].ToString("x2"));
}
public static string GetMD5(string encryptString)
{
var passByteCrypt = new MD5CryptoServiceProvider().ComputeHash(Encoding.UTF8.GetBytes(encryptString));
return ByteArrayToString(passByteCrypt);
}
public static string ByteArrayToString(byte[] bytes)
{
var output = new StringBuilder(bytes.Length);
foreach (var t in bytes)
{
output.Append(t.ToString("X2"));
}
return output.ToString().ToLower();
}
this is simple md5 ByteCrypt
If you are using a version lower than .NET5 this is a neat way to write it
string.Concat(yourHashBytes.Select(x => x.ToString("X2")))
Here is a condensed version.
private string CreateMD5(string myText)
{
var hash = System.Security.Cryptography.MD5.Create()
.ComputeHash(System.Text.Encoding.ASCII.GetBytes(myText ?? ""));
return string.Join("", Enumerable.Range(0, hash.Length).Select(i => hash[i].ToString("x2")));
}

Fastest way to convert a possibly-null-terminated ascii byte[] to a string?

I need to convert a (possibly) null terminated array of ascii bytes to a string in C# and the fastest way I've found to do it is by using my UnsafeAsciiBytesToString method shown below. This method uses the String.String(sbyte*) constructor which contains a warning in it's remarks:
"The value parameter is assumed to point to an array representing a string encoded using the default ANSI code page (that is, the encoding method specified by Encoding.Default).
Note: * Because the default ANSI code page is system-dependent, the string created by this constructor from identical signed byte arrays may differ on different systems. * ...
* If the specified array is not null-terminated, the behavior of this constructor is system dependent. For example, such a situation might cause an access violation. *
"
Now, I'm positive that the way the string is encoded will never change... but the default codepage on the system that my app is running on might change. So, is there any reason that I shouldn't run screaming from using String.String(sbyte*) for this purpose?
using System;
using System.Text;
namespace FastAsciiBytesToString
{
static class StringEx
{
public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
{
int maxIndex = offset + maxLength;
for( int i = offset; i < maxIndex; i++ )
{
/// Skip non-nulls.
if( buffer[i] != 0 ) continue;
/// First null we find, return the string.
return Encoding.ASCII.GetString(buffer, offset, i - offset);
}
/// Terminating null not found. Convert the entire section from offset to maxLength.
return Encoding.ASCII.GetString(buffer, offset, maxLength);
}
public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
{
string result = null;
unsafe
{
fixed( byte* pAscii = &buffer[offset] )
{
result = new String((sbyte*)pAscii);
}
}
return result;
}
}
class Program
{
static void Main(string[] args)
{
byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };
string result = asciiBytes.AsciiBytesToString(3, 6);
Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);
result = asciiBytes.UnsafeAsciiBytesToString(3);
Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);
/// Non-null terminated test.
asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };
result = asciiBytes.UnsafeAsciiBytesToString(3);
Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);
Console.ReadLine();
}
}
}
Oneliner (assuming the buffer actually contains ONE well formatted null terminated string):
String MyString = Encoding.ASCII.GetString(MyByteBuffer).TrimEnd((Char)0);
Any reason not to use the String(sbyte*, int, int) constructor? If you've worked out which portion of the buffer you need, the rest should be simple:
public static string UnsafeAsciiBytesToString(byte[] buffer, int offset, int length)
{
unsafe
{
fixed (byte* pAscii = buffer)
{
return new String((sbyte*)pAscii, offset, length);
}
}
}
If you need to look first:
public static string UnsafeAsciiBytesToString(byte[] buffer, int offset)
{
int end = offset;
while (end < buffer.Length && buffer[end] != 0)
{
end++;
}
unsafe
{
fixed (byte* pAscii = buffer)
{
return new String((sbyte*)pAscii, offset, end - offset);
}
}
}
If this truly is an ASCII string (i.e. all bytes are less than 128) then the codepage problem shouldn't be an issue unless you've got a particularly strange default codepage which isn't based on ASCII.
Out of interest, have you actually profiled your application to make sure that this is really the bottleneck? Do you definitely need the absolute fastest conversion, instead of one which is more readable (e.g. using Encoding.GetString for the appropriate encoding)?
I'm not sure of the speed, but I found it easiest to use LINQ to remove the nulls before encoding:
string s = myEncoding.GetString(bytes.TakeWhile(b => !b.Equals(0)).ToArray());
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestProject1
{
class Class1
{
static public string cstr_to_string( byte[] data, int code_page)
{
Encoding Enc = Encoding.GetEncoding(code_page);
int inx = Array.FindIndex(data, 0, (x) => x == 0);//search for 0
if (inx >= 0)
return (Enc.GetString(data, 0, inx));
else
return (Enc.GetString(data));
}
}
}
s = s.Substring(0, s.IndexOf((char) 0));
Just for completeness, you can also use built-in methods of the .NET framework to do this:
var handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
try
{
return Marshal.PtrToStringAnsi(handle.AddrOfPinnedObject());
}
finally
{
handle.Free();
}
Advantages:
It doesn't require unsafe code (i.e., you can also use this method for VB.NET) and
it also works for "wide" (UTF-16) strings, if you use Marshal.PtrToStringUni instead.
One possibility to consider: check that the default code-page is acceptable and use that information to select the conversion mechanism at run-time.
This could also take into account whether the string is in fact null-terminated, but once you've done that, of course, the speed gains my vanish.
An easy / safe / fast way to convert byte[] objects to strings containing their ASCII equivalent and vice versa using the .NET class System.Text.Encoding. The class has a static function that returns an ASCII encoder:
From String to byte[]:
string s = "Hello World!"
byte[] b = System.Text.Encoding.ASCII.GetBytes(s);
From byte[] to string:
byte[] byteArray = new byte[] {0x41, 0x42, 0x09, 0x00, 0x255};
string s = System.Text.Encoding.ASCII.GetString(byteArray);
This is a bit ugly but you don't have to use unsafe code:
string result = "";
for (int i = 0; i < data.Length && data[i] != 0; i++)
result += (char)data[i];

Categories