What is the best solution in C# for computing an "on the fly" md5 like hash of a stream of unknown length? Specifically, I want to compute a hash from data received over the network. I know I am done receiving data when the sender terminates the connection, so I don't know the length in advance.
[EDIT] - Right now I am using md5 and am doing a second pass over the data after it's been saved and written to disk. I'd rather hash it in place as it comes in from the network.
MD5, like other hash functions, does not require two passes.
To start:
HashAlgorithm hasher = ..;
hasher.Initialize();
As each block of data arrives:
byte[] buffer = ..;
int bytesReceived = ..;
hasher.TransformBlock(buffer, 0, bytesReceived, null, 0);
To finish and retrieve the hash:
hasher.TransformFinalBlock(new byte[0], 0, 0);
byte[] hash = hasher.Hash;
This pattern works for any type derived from HashAlgorithm, including MD5CryptoServiceProvider and SHA1Managed.
HashAlgorithm also defines a method ComputeHash which takes a Stream object; however, this method will block the thread until the stream is consumed. Using the TransformBlock approach allows an "asynchronous hash" that is computed as data arrives without using up a thread.
Further to #peter-mourfield 's answer, here is the code that uses ComputeHash():
private static string CalculateMd5(string filePathName) {
using (var stream = File.OpenRead(filePathName))
using (var md5 = MD5.Create()) {
var hash = md5.ComputeHash(stream);
var base64String = Convert.ToBase64String(hash);
return base64String;
}
}
Since both the stream as well as MD5 implement IDisposible, you need to use using(...){...}
The method in the code example returns the same string that is used for the MD5 checksum in Azure Blob Storage.
The System.Security.Cryptography.MD5 class contains a ComputeHash method that takes either a byte[] or Stream. Check out the documentation.
This seems like a perfect use case for CryptoStream (docs).
I've used CryptoStream for processing unknown-length streams of database results that need to be gzipped and then transferred across the network along with a hash of the compressed file. Inserting a CryptoStream between the compressor and the file writer allows you to compute the hash on the fly so that it's ready as soon as the file is written.
The basic approach looks like this:
var hasher = MD5.Create();
using (FileStream outFile = File.Create(filePath))
using (CryptoStream crypto = new CryptoStream(outFile, hasher, CryptoStreamMode.Write))
using (GZipStream compress = new GZipStream(crypto, CompressionMode.Compress))
using (StreamWriter writer = new StreamWriter(compress))
{
foreach (string line in GetLines())
writer.WriteLine(line);
}
// at this point the streams are closed so the hash is ready
string hash = BitConverter.ToString(hasher.Hash).Replace("-", "").ToLowerInvariant();
Necromancing.
Two possibilitites in C# .NET Core:
private static System.Security.Cryptography.HashAlgorithm GetHashAlgorithm(System.Security.Cryptography.HashAlgorithmName hashAlgorithmName)
{
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.MD5)
return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.MD5.Create();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA1)
return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA1.Create();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA256)
return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA256.Create();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA384)
return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA384.Create();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA512)
return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA512.Create();
throw new System.Security.Cryptography.CryptographicException($"Unknown hash algorithm \"{hashAlgorithmName.Name}\".");
}
protected override byte[] HashData(System.IO.Stream data,
System.Security.Cryptography.HashAlgorithmName hashAlgorithm)
{
using (System.Security.Cryptography.HashAlgorithm hashAlgorithm1 =
GetHashAlgorithm(hashAlgorithm))
return hashAlgorithm1.ComputeHash(data);
}
or with BouncyCastle:
private static Org.BouncyCastle.Crypto.IDigest GetBouncyAlgorithm(
System.Security.Cryptography.HashAlgorithmName hashAlgorithmName)
{
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.MD5)
return new Org.BouncyCastle.Crypto.Digests.MD5Digest();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA1)
return new Org.BouncyCastle.Crypto.Digests.Sha1Digest();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA256)
return new Org.BouncyCastle.Crypto.Digests.Sha256Digest();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA384)
return new Org.BouncyCastle.Crypto.Digests.Sha384Digest();
if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA512)
return new Org.BouncyCastle.Crypto.Digests.Sha512Digest();
throw new System.Security.Cryptography.CryptographicException(
$"Unknown hash algorithm \"{hashAlgorithmName.Name}\"."
);
} // End Function GetBouncyAlgorithm
protected override byte[] HashData(System.IO.Stream data,
System.Security.Cryptography.HashAlgorithmName hashAlgorithm)
{
Org.BouncyCastle.Crypto.IDigest digest = GetBouncyAlgorithm(hashAlgorithm);
byte[] buffer = new byte[4096];
int cbSize;
while ((cbSize = data.Read(buffer, 0, buffer.Length)) > 0)
digest.BlockUpdate(buffer, 0, cbSize);
byte[] hash = new byte[digest.GetDigestSize()];
digest.DoFinal(hash, 0);
return hash;
}
Another option could be to use the System.Security.Cryptography.IncrementalHash class instead.
byte[] DataBrick;
var IncMD5 = IncrementalHash.CreateHash(HashAlgorithmName.MD5);
then you can: accumulate data in the hasher
IncMD5.AppendData(DataBrick,0,DataBrick.Length);
,check the hash value for the data accumulated so far
byte[] hash = IncMD5.GetCurrentHash();
bytesReceived = netStream.Read(DataBrick,0,DataBrick.Length);
IncMD5.AppendData(DataBrick,0,bytesReceived);
,or stop and reset to start accumulating a new hash value
byte[] hash = IncMD5.GetHashAndReset();
Note: it implements iDisposable
IncMD5.Dispose(); // when done, or using(IncMD5){..} if that makes more sense in your scope
Related
The Problem
I want to decrypt encrypted data with RijndaelManaged but the result is always empty (either "" or an byte array with the length of the data full of zeros).
All parameters, salt and data are all correct, CryptoHelper.CreateRijndaelManagedAES gets called the exact same way in the encrypt method (which produces an good output).
The only thing left I could think of is that I use the streams wrong, but I can't figure out why ...
Code
public static RijndaelManaged CreateRijndaelManagedAES(byte[] passwordHash, byte[] salt)
{
RijndaelManaged aes = new RijndaelManaged
{
KeySize = 256,
BlockSize = 128,
Padding = PaddingMode.PKCS7,
Mode = CipherMode.CBC
};
// Derive a key of the full Argon2 string (contains also meta data)
using Rfc2898DeriveBytes key = new Rfc2898DeriveBytes(passwordHash, salt, 10);
aes.Key = key.GetBytes(aes.KeySize / 8);
aes.IV = key.GetBytes(aes.BlockSize / 8);
return aes;
}
public static async Task<string> EncryptDataAsync(string plainData, byte[] passwordHash, int saltSize)
{
return await Task.Run(async () =>
{
// Generate a random salt
byte[] salt = CryptoHelper.GenerateRandomSalt(saltSize);
// Write the salt unencrypted
using MemoryStream memoryStream = new MemoryStream();
await memoryStream.WriteAsync(salt);
// Encrypt the data and write the result to the stream
using RijndaelManaged aes = CryptoHelper.CreateRijndaelManagedAES(passwordHash, salt);
using CryptoStream cryptoStream = new CryptoStream(memoryStream, aes.CreateEncryptor(), CryptoStreamMode.Write);
using StreamWriter streamWriter = new StreamWriter(cryptoStream);
await streamWriter.WriteAsync(plainData);
cryptoStream.FlushFinalBlock();
return ENCRYPTED_MAGIC + Convert.ToBase64String(memoryStream.ToArray());
});
}
public static async Task<string> DecryptDataAsync(string encryptedData, byte[] passwordHash, int saltSize)
{
if (!HasValidMagicBytes(encryptedData))
{
throw new ArgumentException("The given data isn't encrypted");
}
return await Task.Run(async () =>
{
byte[] saltAndData = Convert.FromBase64String(encryptedData.Substring(ENCRYPTED_MAGIC.Length));
byte[] salt = saltAndData.Take(saltSize).ToArray();
byte[] data = saltAndData.TakeLast(saltAndData.Length - saltSize).ToArray();
// Decrypt the data and return the result
using MemoryStream memoryStream = new MemoryStream(data);
using RijndaelManaged aes = CryptoHelper.CreateRijndaelManagedAES(passwordHash, salt);
using CryptoStream cryptoStream = new CryptoStream(memoryStream, aes.CreateDecryptor(), CryptoStreamMode.Read);
using StreamReader streamReader = new StreamReader(cryptoStream);
return await streamReader.ReadToEndAsync();
});
}
Security Notice
Don't use Rfc2898DeriveBytes with only 10 iterations (as I do) if you pass a password to it. I derive a key from the password beforehand due to performance reasons (Blazor WASM) and pass the result to the CreateRijndaelManagedAES function.
If you want to use Rfc2898DeriveBytes with a password you should use at least 50000 iterations (as of 2020).
Explantation
The iterations parameter is basically a cost parameter. The higher the iterations count the harder it gets for an attacker to brute force the derived password (as he would need more computing power/time). NIST has made a publication in 2017 which it states that you should use as many iterations as your environment can handle but at least 10000. I can't find the source anymore but I remember to have read that you currently should use at least 50000 (due to future security).
The issue is in the EncryptDataAsync method, i.e. the encryption (in DecryptDataAsync, i.e. the decryption, the bug only becomes evident). This is because the StreamWriter must first be flushed before memoryStream.ToArray() is called. This call must be executed before:
...
cryptoStream.FlushFinalBlock();
...
that is:
...
streamWriter.Flush();
cryptoStream.FlushFinalBlock();
...
or alternatively
...
streamWriter.Close();
...
which flushes/closes both streams, see also StreamWriter.Flush() and StreamWriter.Close().
I have written a process where a file is encrypted and uploaded to Azure, then the download process has to be decrypted which is what fails with a "Padding is invalid and cannot be removed" error, or a "Length of the data to decrypt is invalid." error.
I've tried numerous solutions online, including C# Decrypting mp3 file using RijndaelManaged and CryptoStream, but none of them seem to work and I end up just bouncing back and forth between these two errors. The encryption process uses the same key/IV pair that decryption uses, and since it will decrypt a portion of the stream I feel like that's working fine - it just ends up dying with the above errors.
Here is my code, any ideas? Please note that the three variants (cryptoStream.CopyTo(decryptedStream), do {} and while) aren't run together - they are here to show the options I've already tried, all of which fail.
byte[] encryptedBytes = null;
using (var encryptedStream = new MemoryStream())
{
//download from Azure
cloudBlockBlob.DownloadToStream(encryptedStream);
//reset positioning for reading it back out
encryptedStream.Position = 0;
encryptedBytes = encryptedStream.ConvertToByteArray();
}
//used for the blob stream from Azure
using (var encryptedStream = new MemoryStream(encryptedBytes))
{
//stream where decrypted contents will be stored
using (var decryptedStream = new MemoryStream())
{
using (var aes = new RijndaelManaged { KeySize = 256, Key = blobKey.Key, IV = blobKey.IV })
{
using (var decryptor = aes.CreateDecryptor())
{
//decrypt stream and write it to parent stream
using (var cryptoStream = new CryptoStream(encryptedStream, decryptor, CryptoStreamMode.Read))
{
//fails here with "Length of the data to decrypt is invalid." error
cryptoStream.CopyTo(decryptedStream);
int data;
//fails here with "Length of the data to decrypt is invalid." error after it loops a number of times,
//implying it is in fact decrypting part of it, just not everything
do
{
data = cryptoStream.ReadByte();
decryptedStream.WriteByte((byte)cryptoStream.ReadByte());
} while (!cryptoStream.HasFlushedFinalBlock);
//fails here with "Length of the data to decrypt is invalid." error after it loops a number of times,
//implying it is in fact decrypting part of it, just not everything
while ((data = cryptoStream.ReadByte()) != -1)
{
decryptedStream.WriteByte((byte)data);
}
}
}
}
//reset position in prep for reading
decryptedStream.Position = 0;
return decryptedStream.ConvertToByteArray();
}
}
One of the comments mentioned wanting to know what ConvertToByteArray is, and it's just a simple extension method:
/// <summary>
/// Converts a Stream into a byte array.
/// </summary>
/// <param name="stream">The stream to convert.</param>
/// <returns>A byte[] array representing the current stream.</returns>
public static byte[] ConvertToByteArray(this Stream stream)
{
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
The code never reaches this though - it dies before I can ever get it to this point.
After a lot of back and forth from various blogs, I found I actually had a couple of errors in the above code that were nailing me. First, the encryption process was incorrectly writing the array - it was wrapped with a CryptoStream instance, but wasn't actually utilizing that so I was writing the unencrypted data to Azure. Here is the proper route to go with this (fileKey is part of a custom class I created to generate Key/IV pairs, so wherever that is referenced can be changed to the built-in process from RijndaelManaged or anything else you'd utilize for coming up with a key/IV pair):
using (var aes = new RijndaelManaged { KeySize = 256, Key = fileKey.Key, IV = fileKey.IV })
{
using (var encryptedStream = new MemoryStream())
{
using (ICryptoTransform encryptor = aes.CreateEncryptor())
{
using (CryptoStream cryptoStream = new CryptoStream(encryptedStream, encryptor, CryptoStreamMode.Write))
{
using (var originalByteStream = new MemoryStream(file.File.Data))
{
int data;
while ((data = originalByteStream.ReadByte()) != -1)
cryptoStream.WriteByte((byte)data);
}
}
}
var encryptedBytes = encryptedStream.ToArray();
return encryptedBytes;
}
}
Second, since my encryption process involves multiple steps (three total keys per file - container, filename and file itself), when I tried to decrypt, I was using the wrong key (which is seen above when I referenced blobKey to decrypt, which was actually the key used for encrypting the filename and not the file itself. The proper decryption method was:
//used for the blob stream from Azure
using (var encryptedStream = new MemoryStream(encryptedBytes))
{
//stream where decrypted contents will be stored
using (var decryptedStream = new MemoryStream())
{
using (var aes = new RijndaelManaged { KeySize = 256, Key = blobKey.Key, IV = blobKey.IV })
{
using (var decryptor = aes.CreateDecryptor())
{
//decrypt stream and write it to parent stream
using (var cryptoStream = new CryptoStream(encryptedStream, decryptor, CryptoStreamMode.Read))
{
int data;
while ((data = cryptoStream.ReadByte()) != -1)
decryptedStream.WriteByte((byte)data);
}
}
}
//reset position in prep for reading
decryptedStream.Position = 0;
return decryptedStream.ConvertToByteArray();
}
}
I had looked into the Azure Encryption Extensions (http://www.stefangordon.com/introducing-azure-encryption-extensions/), but it was a little more local file-centric than I was interested - everything on my end is streams/in-memory only, and retrofitting that utility was going to be more work than it was worth.
Hopefully this helps anyone looking to encrypt Azure blobs with zero reliance on the underlying file system!
Bit late to the party, but in case this is useful to someone who finds this thread:
The following works well for me.
internal static byte[] AesEncryptor(byte[] key, byte[] iv, byte[] payload)
{
using (var aesAlg = Aes.Create())
{
aesAlg.Mode = CipherMode.CBC;
aesAlg.Padding = PaddingMode.PKCS7;
var encryptor = aesAlg.CreateEncryptor(key, iv);
var encrypted = encryptor.TransformFinalBlock(payload, 0, payload.Length);
return iv.Concat(encrypted).ToArray();
}
}
and to decrypt:
internal static byte[] AesDecryptor(byte[] key, byte[] iv, byte[] payload)
{
using (var aesAlg = Aes.Create())
{
aesAlg.Mode = CipherMode.CBC;
aesAlg.Padding = PaddingMode.PKCS7;
var decryptor = aesAlg.CreateDecryptor(aesAlg.Key, aesAlg.IV);
return decryptor.TransformFinalBlock(payload, 0, payload.Length);
}
}
this works for encrypting/decrypting both fixed length hex strings when decoded from hex to byte[] as well as utf8 variable length strings when decoded using Encoding.UTF8.GetBytes().
Using the following code I always get the same hash regardless of the input. Any ideas why that might be?
private static SHA256 sha256;
internal static byte[] HashForCDCR(this string value)
{
byte[] hash;
using (var myStream = new System.IO.MemoryStream())
{
using (var sw = new System.IO.StreamWriter(myStream))
{
sw.Write(value);
hash = sha256.ComputeHash(myStream);
}
}
return hash;
}
You are computing hash of empty portion of the stream (the one immediately after content you wrote with sw.Write) so it always the same.
Cheap fix: sw.Flush();myStream.Position = 0;. Better fix is to finish writing and create new read only stream for encryption based on original stream:
using (var myStream = new System.IO.MemoryStream())
{
using (var sw = new System.IO.StreamWriter(myStream))
{
sw.Write(value);
}
using (var readonlyStream = new MemoryStream(myStream.ToArray(), writable:false)
{
hash = sha256.ComputeHash(readonlyStream);
}
}
You may need to flush your stream. For optimal performance StreamWriter doesn't write to stream immediately . It waits for its internal buffer to fill. Flushing the writer immediately flush the content of the internal buffer to underline stream.
sw.Write(value);
sw.Flush();
myStream.Position = 0;
hash = sha256.ComputeHash(myStream);
I will probably use the solution that Alexei Levenkov called a "cheap fix". However, I did come across one other way to make it work, which I will post for future readers:
var encoding = new System.Text.UTF8Encoding();
var bytes = encoding.GetBytes(value);
var hash = sha256.ComputeHash(bytes);
return hash;
Jacob
How do I use the SHA1CryptoServiceProvider() on a file to create a SHA1 Checksum of the file?
using (FileStream fs = new FileStream(#"C:\file\location", FileMode.Open))
using (BufferedStream bs = new BufferedStream(fs))
{
using (SHA1Managed sha1 = new SHA1Managed())
{
byte[] hash = sha1.ComputeHash(bs);
StringBuilder formatted = new StringBuilder(2 * hash.Length);
foreach (byte b in hash)
{
formatted.AppendFormat("{0:X2}", b);
}
}
}
formatted contains the string representation of the SHA-1 hash. Also, by using a FileStream instead of a byte buffer, ComputeHash computes the hash in chunks, so you don't have to load the entire file in one go, which is helpful for large files.
With the ComputeHash method. See here:
ComputeHash
Example snippet:
using(var cryptoProvider = new SHA1CryptoServiceProvider())
{
string hash = BitConverter
.ToString(cryptoProvider.ComputeHash(buffer));
//do something with hash
}
Where buffer is the contents of your file.
If you are already reading the file as a stream, then the following technique calculates the hash as you read it. The only caveat is that you need to consume the whole stream.
class Program
{
static void Main(string[] args)
{
String sourceFileName = "C:\\test.txt";
Byte[] shaHash;
//Use Sha1Managed if you really want sha1
using (var shaForStream = new SHA256Managed())
using (Stream sourceFileStream = File.Open(sourceFileName, FileMode.Open))
using (Stream sourceStream = new CryptoStream(sourceFileStream, shaForStream, CryptoStreamMode.Read))
{
//Do something with the sourceStream
//NOTE You need to read all the bytes, otherwise you'll get an exception ({"Hash must be finalized before the hash value is retrieved."})
while(sourceStream.ReadByte() != -1);
shaHash = shaForStream.Hash;
}
Console.WriteLine(Convert.ToBase64String(shaHash));
}
}
Also you can try:
FileStream fop = File.OpenRead(#"C:\test.bin");
string chksum = BitConverter.ToString(System.Security.Cryptography.SHA1.Create().ComputeHash(fop));
Good morning all,
I'm working on an MD5 file integrity check tool in C#.
How long should it take for a file to be given an MD5 checksum value?
For example, if I try to get a 2gb .mpg file, it is taking around 5 mins+ each time.
This seems overly long.
Am I just being impatient?
Below is the code I'm running
public string getHash(String #fileLocation)
{
FileStream fs = new FileStream(#fileLocation, FileMode.Open);
HashAlgorithm alg = new HMACMD5();
byte[] hashValue = alg.ComputeHash(fs);
string md5Result = "";
foreach (byte x in hashValue)
{
md5Result += x;
}
fs.Close();
return md5Result;
}
Any suggestions will be appreciated.
Regards
See this on how to calculate file hash value in a most efficient way. You basically have to wrap FileStream into a BufferedStream and than feed that into HMACMD5.ComputeHash(Stream) overload:
HashAlgorithm hmacMd5 = new HMACMD5();
byte[] hash;
using(Stream fileStream = new FileStream(fileLocation, FileMode.Open))
using(Stream bufferedStream = new BufferedStream(fileStream, 1200000))
hash = hmacMd5.ComputeHash(bufferedStream);