How do I hash first N bytes of a file? - c#

Using .net, I would like to be able to hash the first N bytes of potentially large files, but I can't seem to find a way of doing it.
The ComputeHash function (I'm using SHA1) takes a byte array or a stream, but a stream seems like the best way of doing it, since I would prefer not to load a potentially large file into memory.
To be clear: I don't want to load a potentially large piece of data into memory if I can help it. If the file is 2GB and I want to hash the first 1GB, that's a lot of RAM!

You can hash large volumes of data using a CryptoStream - something like this should work:
var sha1 = SHA1Managed.Create();
FileStream fs = \\whatever
using (var cs = new CryptoStream(fs, sha1, CryptoStreamMode.Read))
{
byte[] buf = new byte[16];
int bytesRead = cs.Read(buf, 0, buf.Length);
long totalBytesRead = bytesRead;
while (bytesRead > 0 && totalBytesRead <= maxBytesToHash)
{
bytesRead = cs.Read(buf, 0, buf.Length);
totalBytesRead += bytesRead;
}
}
byte[] hash = sha1.Hash;

fileStream.Read(array, 0, N);
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx

Open the file as a FileStream, copy the first n bytes into a MemoryStream, then hash the MemoryStream.

As others have pointed out, you should read the first few bytes into an array.
What should also be noted that you don't want to make a direct call to Read and assume that the bytes have been read.
Rather, you want to make sure that the number of bytes that are returned are the number of bytes that you requested, and make another call to Read in the event that the number of bytes returned doesn't equal the initial number requested.
Also, if you have rather large streams, you will want to create a proxy for the Stream class where you pass it the underlying stream (the FileStream in this case) and override the Read method to forward the call to the underlying stream until you read the number of bytes that you need to read. Then, when that number of bytes is returned, you would return -1 to indicate that there are no more bytes to be read.

If you are concerned about keeping too much data in memory, you can create a stream wrapper that throttles the maximum number of bytes read.
Without doing all the work, here's a sample boiler plate you could use to get started.
Edit: Please review comments for recommendations to improve this implementation. End edit
public class LimitedStream : Stream
{
private int current = 0;
private int limit;
private Stream stream;
public LimitedStream(Stream stream, int n)
{
this.limit = n;
this.stream = stream;
}
public override int ReadByte()
{
if (current >= limit)
return -1;
var numread = base.ReadByte();
if (numread >= 0)
current++;
return numread;
}
public override int Read(byte[] buffer, int offset, int count)
{
count = Math.Min(count, limit - current);
var numread = this.stream.Read(buffer, offset, count);
current += numread;
return numread;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return false; }
}
public override void Flush()
{
throw new NotImplementedException();
}
public override long Length
{
get { throw new NotImplementedException(); }
}
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
protected override void Dispose(bool disposing)
{
base.Dispose(disposing);
if (this.stream != null)
{
this.stream.Dispose();
}
}
}
Here is an example of the stream in use, wrapping a file stream, but throttling the number of bytes read to the specified limit:
using (var stream = new LimitedStream(File.OpenRead(#".\test.xml"), 100))
{
var bytes = new byte[1024];
stream.Read(bytes, 0, bytes.Length);
}

Related

How to Merge two memory streams containing PDF file's data into one

I am trying to read two PDF files into two memory streams and then return a stream that will have both stream's data. But I don't seem to understand what's wrong with my code.
Sample Code:
string file1Path = "Sampl1.pdf";
string file2Path = "Sample2.pdf";
MemoryStream stream1 = new MemoryStream(File.ReadAllBytes(file1Path));
MemoryStream stream2 = new MemoryStream(File.ReadAllBytes(file2Path));
stream1.Position = 0;
stream1.Copyto(stream2);
return stream2; /*supposed to be containing data of both stream1 and stream2 but contains data of stream1 only*/
It appears in case of PDF files, the merging of memorystreams is not the same as with .txt files. For PDF, you need to use some .dll as I used iTextSharp.dll (available under the AGPL license) and then combine them using this library's functions as follows:
MemoryStream finalStream = new MemoryStream();
PdfCopyFields copy = new PdfCopyFields(finalStream);
string file1Path = "Sample1.pdf";
string file2Path = "Sample2.pdf";
var ms1 = new MemoryStream(File.ReadAllBytes(file1Path));
ms1.Position = 0;
copy.AddDocument(new PdfReader(ms1));
ms1.Dispose();
var ms2 = new MemoryStream(File.ReadAllBytes(file2Path));
ms2.Position = 0;
copy.AddDocument(new PdfReader(ms2));
ms2.Dispose();
copy.Close();
finalStream contains the merged pdf of both ms1 and ms2.
NOTE:
The whole question is based on a false premise, that you can produce a combined PDF file by merging the binaries of two PDF files. This works for plain text files for example (to an extent), but definitely doesn't work for PDFs. The answer only addresses how to merge two binary data streams, not how to merge two PDF files in particular. It answers the OP's question as asked, but doesn't actually solve his problem.
When you use the byte[] constructor for MemoryStream, the memory stream will not expand as you add more data. So it will not be big enough for both stream1 and stream2. Also, the position will start at zero, so you're overwriting stream2 with the data in stream1.
The fix is rather simple:
var result = new MemoryStream();
using (var file1 = File.OpenRead(file1Path)) file1.CopyTo(result);
using (var file2 = File.OpenRead(file2Path)) file2.CopyTo(result);
Another option would be to create your own stream class that would be a combination of two separate streams - interesting if you're interested in composability, but probably an overkill for something as simple as this :)
Just for fun, it could look something like this:
public class DualStream : Stream
{
private readonly Stream _first;
private readonly Stream _second;
public DualStream(Stream first, Stream second)
{
_first = first;
_second = second;
}
public override bool CanRead => true;
public override bool CanSeek => true;
public override bool CanWrite => false;
public override long Length => _first.Length + _second.Length;
public override long Position
{
get { return _first.Position + _second.Position; }
set { Seek(value, SeekOrigin.Begin); }
}
public override void Flush() { throw new NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
var bytesRead = _first.Read(buffer, offset, count);
if (bytesRead == count) return bytesRead;
return bytesRead + _second.Read(buffer, offset + bytesRead, count - bytesRead);
}
public override long Seek(long offset, SeekOrigin origin)
{
// To simplify, let's assume seek always works as if over one big MemoryStream
long targetPosition;
switch (origin)
{
case SeekOrigin.Begin: targetPosition = offset; break;
case SeekOrigin.Current: targetPosition = Position + offset; break;
case SeekOrigin.End: targetPosition = Length - offset; break;
default: throw new NotSupportedException();
}
targetPosition = Math.Max(0, Math.Min(Length, targetPosition));
var firstPosition = Math.Min(_first.Length, targetPosition);
_first.Position = firstPosition;
_second.Position = Math.Max(0, targetPosition - firstPosition);
return Position;
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
_first.Dispose();
_second.Dispose();
}
base.Dispose(disposing);
}
public override void SetLength(long value)
{ throw new NotImplementedException(); }
public override void Write(byte[] buffer, int offset, int count)
{ throw new NotImplementedException(); }
}
The main benefit is that it means you don't have to allocate unnecessary in-memory buffers just to have a combined stream - it can even be used with the file streams directly, if you dare :D And it's easily composable - you can make dual streams of other dual streams, allowing you to chain as many streams as you want together - pretty much the same as IEnumerable.Concat.

Returning database results as a stream

I have a function that returns database query results. These results have got very large, and I now would like to pass them as a stream, so that the client can start to process them quicker, and memory usage is less. But I don't really know how to do this, the following function works, but what I want to know how to change it so that it starts to stream upon reading from the first table.
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Position = 0;
return stream;
}
You could write a custom Stream implementation which functions as a pipe. If you then moved your GetItemsFromTable() method calls into a background task, the client could start reading results from the stream immediately.
In my solution below I'm using a circular buffer as a backing store for the pipe stream. Memory usage will be reduced only if the client consumes data fast enough. But even in the worst case scenario it shouldn't use more memory then your current solution. If memory usage is a bigger priority for you than execution speed then your stream could potentially block write calls until space is available. My solution below does not block writes; it expands the capacity of the circular buffer so that the background thread can continue filling data without delays.
The GetResults method might look like this:
public Stream GetResults()
{
// Begin filling the pipe with data on a background thread
var pipeStream = new CircularBufferPipeStream();
Task.Run(() => WriteResults(pipeStream));
// Return pipe stream for immediate usage by client
// Note: client is responsible for disposing of the stream after reading all data!
return pipeStream;
}
// Runs on background thread, filling circular buffer with data
void WriteResults(CircularBufferPipeStream stream)
{
IFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
// Indicate that there's no more data to write
stream.CloseWritePort();
}
And the circular buffer stream:
/// <summary>
/// Stream that acts as a pipe by supporting reading and writing simultaneously from different threads.
/// Read calls will block until data is available or the CloseWritePort() method has been called.
/// Read calls consume bytes in the circular buffer immediately so that more space is available for writes into the circular buffer.
/// Writes do not block; the capacity of the circular buffer will be expanded as needed to write the entire block of data at once.
/// </summary>
class CircularBufferPipeStream : Stream
{
const int DefaultCapacity = 1024;
byte[] _buffer;
bool _writePortClosed = false;
object _readWriteSyncRoot = new object();
int _length;
ManualResetEvent _dataAddedEvent;
int _start = 0;
public CircularBufferPipeStream(int initialCapacity = DefaultCapacity)
{
_buffer = new byte[initialCapacity];
_length = 0;
_dataAddedEvent = new ManualResetEvent(false);
}
public void CloseWritePort()
{
lock (_readWriteSyncRoot)
{
_writePortClosed = true;
_dataAddedEvent.Set();
}
}
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return true; } }
public override bool CanSeek { get { return false; } }
public override void Flush() { }
public override long Length { get { throw new NotImplementedException(); } }
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); }
public override void SetLength(long value) { throw new NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
int bytesRead = 0;
while (bytesRead == 0)
{
bool waitForData = false;
lock (_readWriteSyncRoot)
{
if (_length != 0)
bytesRead = ReadDirect(buffer, offset, count);
else if (_writePortClosed)
break;
else
{
_dataAddedEvent.Reset();
waitForData = true;
}
}
if (waitForData)
_dataAddedEvent.WaitOne();
}
return bytesRead;
}
private int ReadDirect(byte[] buffer, int offset, int count)
{
int readTailCount = Math.Min(Math.Min(_buffer.Length - _start, count), _length);
Array.Copy(_buffer, _start, buffer, offset, readTailCount);
_start += readTailCount;
_length -= readTailCount;
if (_start == _buffer.Length)
_start = 0;
int readHeadCount = Math.Min(Math.Min(_buffer.Length - _start, count - readTailCount), _length);
if (readHeadCount > 0)
{
Array.Copy(_buffer, _start, buffer, offset + readTailCount, readHeadCount);
_start += readHeadCount;
_length -= readHeadCount;
}
return readTailCount + readHeadCount;
}
public override void Write(byte[] buffer, int offset, int count)
{
lock (_readWriteSyncRoot)
{
// expand capacity as needed
if (count + _length > _buffer.Length)
{
var expandedBuffer = new byte[Math.Max(_buffer.Length * 2, count + _length)];
_length = ReadDirect(expandedBuffer, 0, _length);
_start = 0;
_buffer = expandedBuffer;
}
int startWrite = (_start + _length) % _buffer.Length;
int writeTailCount = Math.Min(_buffer.Length - startWrite, count);
Array.Copy(buffer, offset, _buffer, startWrite, writeTailCount);
startWrite += writeTailCount;
_length += writeTailCount;
if (startWrite == _buffer.Length)
startWrite = 0;
int writeHeadCount = count - writeTailCount;
if (writeHeadCount > 0)
{
Array.Copy(buffer, offset + writeTailCount, _buffer, startWrite, writeHeadCount);
_length += writeHeadCount;
}
}
_dataAddedEvent.Set();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_dataAddedEvent != null)
{
_dataAddedEvent.Dispose();
_dataAddedEvent = null;
}
}
base.Dispose(disposing);
}
}
try
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Seek(0L, SeekOrigin.Begin);
return stream;
}
why the changes?
remove using, because your stream gets disposed once it leaves the using-block. disposing the stream means you cannot use it anymore
seek to the beginning of the stream. if you start reading from the stream without seeking to its beginning, you would start to deserialize/ read from its end; but unfortunately there is no content behind the end of the stream
however, I don't see how using a MemoryStream reduces memory usage. I would suggest chaining it into a DeflateStream or a FileStream to reduce RAM-usage
hope this helps

Encrypt file but can decrypt from any point

Basically i need to encrypt a file and then be able to decrypt the file from almost any point in the file. The reason i need this is i would like to use this for files like Video etc and still be able to jump though the file or video. Also the file would be served over the web so not needing to download the whole file is important. Where i am storing the file supports partial downloads so i can request any part of the file i need and this works for an un encrypted file. The question is how could i make this work for an encrypted file. I need to encrypt and decrypt the file in C# but don't really have any other restrictions than that. Symmetric keys are preferred but if that wont work it is not a deal breaker.
Another example of where i only want to download part of a file and decrypt is where i have joined multiple files together but just need to retrieve one of them. This would generally be used for files smaller than 50MB like pictures and info files.
--- EDIT ---
To be clear i am looking for a working implementation or library that does not increase the size of the source file. Stream cipher seems ideal but i have not seen one in c# that works for any point in the stream or anything apart from the start of the stream. Would consider block based implementation if it works form set blocks in stream. Basically i want to pass a raw stream though this and have unencrypted come out other side of the stream. Happy to set the starting offset it represents in the whole file/stream. Looking for something than works as i am not encryption expert. At the minute i get the parts of the file from a data source in 512kb to 5mb blocks depending on client config and i use streams CopyTo method to write it out to a file on disk. I don't get these parts in order. I am looking for a stream wrapper that i could use to pass into the CopyTo method on stream.
Your best best is probably to treat the file as a list of chunks (of whatever size is convenient for your application; let's say 50 kB) and encrypt each separately. This would allow you to decrypt each chunk independently of the others.
For each chunk, derive new keys from your master key, generate a new IV, and encrypt-then-MAC the chunk.
This method has higher storage overhead than encrypting the entire file at once and takes a bit more computation as well due to the key regeneration that it requires.
If you use a stream cipher instead of a block cipher, you'd be able to start decrypting at any byte offset, as long as the decryptor was able to get the current IV from somewhere.
For those interested i managed to work it out based on a number of examples i found plus some of my own code. It uses bouncycastle but should also work with dotnet AES with a few tweaks. This allows the decryption/encryption from any point in the stream.
using System;
using System.IO;
using Org.BouncyCastle.Crypto;
using Org.BouncyCastle.Crypto.Parameters;
namespace StreamHelpers
{
public class StreamEncryptDecrypt : Stream
{
private readonly Stream _streamToWrap;
private readonly IBlockCipher _cipher;
private readonly ICipherParameters _key;
private readonly byte[] _iv;
private readonly byte[] _counter;
private readonly byte[] _counterOut;
private readonly byte[] _output;
private long currentBlockCount;
public StreamEncryptDecrypt(Stream streamToWrap, IBlockCipher cipher, ParametersWithIV keyAndIv)
{
_streamToWrap = streamToWrap;
_cipher = cipher;
_key = keyAndIv.Parameters;
_cipher.Init(true, _key);
_iv = keyAndIv.GetIV();
_counter = new byte[_cipher.GetBlockSize()];
_counterOut = new byte[_cipher.GetBlockSize()];
_output = new byte[_cipher.GetBlockSize()];
if (_iv.Length != _cipher.GetBlockSize())
{
throw new Exception("IV must be the same size as the cipher block size");
}
InitCipher();
}
private void InitCipher()
{
long position = _streamToWrap.Position;
Array.Copy(_iv, 0, _counter, 0, _counter.Length);
currentBlockCount = 0;
var targetBlock = position/_cipher.GetBlockSize();
while (currentBlockCount < targetBlock)
{
IncrementCounter(false);
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
private void IncrementCounter(bool updateCounterOut = true)
{
currentBlockCount++;
// Increment the counter
int j = _counter.Length;
while (--j >= 0 && ++_counter[j] == 0)
{
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
public override long Position
{
get { return _streamToWrap.Position; }
set
{
_streamToWrap.Position = value;
InitCipher();
}
}
public override long Seek(long offset, SeekOrigin origin)
{
var result = _streamToWrap.Seek(offset, origin);
InitCipher();
return result;
}
public void ProcessBlock(
byte[] input,
int offset,
int length, long streamPosition)
{
if (input.Length < offset+length)
throw new ArgumentException("input does not match offset and length");
var blockSize = _cipher.GetBlockSize();
var startingBlock = streamPosition / blockSize;
var blockOffset = (int)(streamPosition - (startingBlock * blockSize));
while (currentBlockCount < streamPosition / blockSize)
{
IncrementCounter();
}
//process the left over from current block
if (blockOffset !=0)
{
var blockLength = blockSize - blockOffset;
blockLength = blockLength > length ? length : blockLength;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[blockOffset + i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
blockOffset = 0;
if (length > 0)
{
IncrementCounter();
}
}
//need to loop though the rest of the data and increament counter when needed
while (length > 0)
{
var blockLength = blockSize > length ? length : blockSize;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
if (length > 0)
{
IncrementCounter();
}
}
}
public override int Read(byte[] buffer, int offset, int count)
{
var pos = _streamToWrap.Position;
var result = _streamToWrap.Read(buffer, offset, count);
ProcessBlock(buffer, offset, result, pos);
return result;
}
public override void Write(byte[] buffer, int offset, int count)
{
var input = new byte[count];
Array.Copy(buffer, offset, input, 0, count);
ProcessBlock(input, 0, count, _streamToWrap.Position);
_streamToWrap.Write(input, offset, count);
}
public override void Flush()
{
_streamToWrap.Flush();
}
public override void SetLength(long value)
{
_streamToWrap.SetLength(value);
}
public override bool CanRead
{
get { return _streamToWrap.CanRead; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return _streamToWrap.CanWrite; }
}
public override long Length
{
get { return _streamToWrap.Length; }
}
protected override void Dispose(bool disposing)
{
if (_streamToWrap != null)
{
_streamToWrap.Dispose();
}
base.Dispose(disposing);
}
}
}

Hashing a SecureString in .NET

In .NET, we have the SecureString class, which is all very well until you come to try and use it, as to (for example) hash the string, you need the plaintext. I've had a go here at writing a function that will hash a SecureString, given a hash function that takes a byte array and outputs a byte array.
private static byte[] HashSecureString(SecureString ss, Func<byte[], byte[]> hash)
{
// Convert the SecureString to a BSTR
IntPtr bstr = Marshal.SecureStringToBSTR(ss);
// BSTR contains the length of the string in bytes in an
// Int32 stored in the 4 bytes prior to the BSTR pointer
int length = Marshal.ReadInt32(bstr, -4);
// Allocate a byte array to copy the string into
byte[] bytes = new byte[length];
// Copy the BSTR to the byte array
Marshal.Copy(bstr, bytes, 0, length);
// Immediately destroy the BSTR as we don't need it any more
Marshal.ZeroFreeBSTR(bstr);
// Hash the byte array
byte[] hashed = hash(bytes);
// Destroy the plaintext copy in the byte array
for (int i = 0; i < length; i++) { bytes[i] = 0; }
// Return the hash
return hashed;
}
I believe this will correctly hash the string, and will correctly scrub any copies of the plaintext from memory by the time the function returns, assuming the provided hash function is well behaved and doesn't make any copies of the input that it doesn't scrub itself. Have I missed anything here?
Have I missed anything here?
Yes, you have, a rather fundamental one at that. You cannot scrub the copy of the array left behind when the garbage collector compacts the heap. Marshal.SecureStringToBSTR(ss) is okay because a BSTR is allocated in unmanaged memory so will have a reliable pointer that won't change. In other words, no problem scrubbing that one.
Your byte[] bytes array however contains the copy of the string and is allocated on the GC heap. You make it likely to induce a garbage collection with the hashed[] array. Easily avoided but of course you have little control over other threads in your process allocating memory and inducing a collection. Or for that matter a background GC that was already in progress when your code started running.
The point of SecureString is to never have a cleartext copy of the string in garbage collected memory. Copying it into a managed array violated that guarantee. If you want to make this code secure then you are going to have to write a hash() method that takes the IntPtr and only reads through that pointer.
Beware that if your hash needs to match a hash computed on another machine then you cannot ignore the Encoding that machine would use to turn the string into bytes.
There's always the possibility of using the unmanaged CryptoApi or CNG functions.
Bear in mind that SecureString was designed with an unmanaged consumer which has full control over memory management in mind.
If you want to stick to C#, you should pin the temporary array to prevent the GC from moving it around before you get a chance to scrub it:
private static byte[] HashSecureString(SecureString input, Func<byte[], byte[]> hash)
{
var bstr = Marshal.SecureStringToBSTR(input);
var length = Marshal.ReadInt32(bstr, -4);
var bytes = new byte[length];
var bytesPin = GCHandle.Alloc(bytes, GCHandleType.Pinned);
try {
Marshal.Copy(bstr, bytes, 0, length);
Marshal.ZeroFreeBSTR(bstr);
return hash(bytes);
} finally {
for (var i = 0; i < bytes.Length; i++) {
bytes[i] = 0;
}
bytesPin.Free();
}
}
As a complement to Hans’ answer here’s a suggestion how to implement the hasher. Hans suggests passing the pointer to the unmanaged string to the hash function but that means that client code (= the hash function) needs to deal with unmanaged memory. That’s not ideal.
On the other hand, you can replace the callback by an instance of the following interface:
interface Hasher {
void Reinitialize();
void AddByte(byte b);
byte[] Result { get; }
}
That way the hasher (although it becomes slightly more complex) can be implemented wholly in managed land without leaking secure information. Your HashSecureString would then look as follows:
private static byte[] HashSecureString(SecureString ss, Hasher hasher) {
IntPtr bstr = Marshal.SecureStringToBSTR(ss);
try {
int length = Marshal.ReadInt32(bstr, -4);
hasher.Reinitialize();
for (int i = 0; i < length; i++)
hasher.AddByte(Marshal.ReadByte(bstr, i));
return hasher.Result;
}
finally {
Marshal.ZeroFreeBSTR(bstr);
}
}
Note the finally block to make sure that the unmanaged memory is zeroed, no matter what shenanigans the hasher instance does.
Here’s a simple (and not very useful) Hasher implementation to illustrate the interface:
sealed class SingleByteXor : Hasher {
private readonly byte[] data = new byte[1];
public void Reinitialize() {
data[0] = 0;
}
public void AddByte(byte b) {
data[0] ^= b;
}
public byte[] Result {
get { return data; }
}
}
As a further complement, could you not wrap the logic #KonradRudolph and #HansPassant supplied into a custom Stream implementation?
This would allow you to use the HashAlgorithm.ComputeHash(Stream) method, which would keep the interface managed (although it would be down to you to dispose the stream in good time).
Of course, you are at the mercy of the HashAlgorithm implementation as to how much data ends up in memory at a time (but, of course, that's what the reference source is for!)
Just an idea...
public class SecureStringStream : Stream
{
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return false; } }
public override bool CanSeek { get { return false; } }
public override long Position
{
get { return _pos; }
set { throw new NotSupportedException(); }
}
public override void Flush() { throw new NotSupportedException(); }
public override long Seek(long offset, SeekOrigin origin) { throw new NotSupportedException(); }
public override void SetLength(long value) { throw new NotSupportedException(); }
public override void Write(byte[] buffer, int offset, int count) { throw new NotSupportedException(); }
private readonly IntPtr _bstr = IntPtr.Zero;
private readonly int _length;
private int _pos;
public SecureStringStream(SecureString str)
{
if (str == null) throw new ArgumentNullException("str");
_bstr = Marshal.SecureStringToBSTR(str);
try
{
_length = Marshal.ReadInt32(_bstr, -4);
_pos = 0;
}
catch
{
if (_bstr != IntPtr.Zero) Marshal.ZeroFreeBSTR(_bstr);
throw;
}
}
public override long Length { get { return _length; } }
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null) throw new ArgumentNullException("buffer");
if (offset < 0) throw new ArgumentOutOfRangeException("offset");
if (count < 0) throw new ArgumentOutOfRangeException("count");
if (offset + count > buffer.Length) throw new ArgumentException("offset + count > buffer");
if (count > 0 && _pos++ < _length)
{
buffer[offset] = Marshal.ReadByte(_bstr, _pos++);
return 1;
}
else return 0;
}
protected override void Dispose(bool disposing)
{
try { if (_bstr != IntPtr.Zero) Marshal.ZeroFreeBSTR(_bstr); }
finally { base.Dispose(disposing); }
}
}
void RunMe()
{
using (SecureString s = new SecureString())
{
foreach (char c in "jimbobmcgee") s.AppendChar(c);
s.MakeReadOnly();
using (SecureStringStream ss = new SecureStringStream(s))
using (HashAlgorithm h = MD5.Create())
{
Console.WriteLine(Convert.ToBase64String(h.ComputeHash(ss)));
}
}
}

How to implement a lazy stream chunk enumerator?

I'm trying to split a byte stream into chunks of increasing size.
The source stream contains an unknown number of bytes and is expensive to read. The output of the enumerator should be byte arrays of increasing size, starting at 8KB up to 1MB.
This is very simple to do by simply reading the whole stream, storing it in an array and taking the relevant pieces out. However, since the stream may be very large, reading it at once is unfeasible. Also, while performance is not the main concern, it is important to keep system load very low.
While implementing this I noticed that it's relatively difficult to keep the code short and maintainable. There are a few stream related issues to keep in mind, too (for instance, Stream.Read might not fill the buffer even though it succeeded).
I did not find any existing classes that help for my case, nor could I find something close on the net. How would you implement such a class?
public IEnumerable<BufferWrapper> getBytes(Stream stream)
{
List<int> bufferSizes = new List<int>() { 8192, 65536, 220160, 1048576 };
int count = 0;
int bufferSizePostion = 0;
byte[] buffer = new byte[bufferSizes[0]];
bool done = false;
while (!done)
{
BufferWrapper nextResult = new BufferWrapper();
nextResult.bytesRead = stream.Read(buffer, 0, buffer.Length);
nextResult.buffer = buffer;
done = nextResult.bytesRead == 0;
if (!done)
{
yield return nextResult;
count++;
if (count > 10 && bufferSizePostion < bufferSizes.Count)
{
count = 0;
bufferSizePostion++;
buffer = new byte[bufferSizes[bufferSizePostion]];
}
}
}
}
public class BufferWrapper
{
public byte[] buffer { get; set; }
public int bytesRead { get; set; }
}
Obviously the logic for when to move up in buffer size, and how to choose what that size is could be altered.
Someone could also probably find a better way of handling the last buffer to be sent, as this isn't the most efficient way.
For reference, the implementation I currently use, already with improvements as per the answer by #Servy
private const int InitialBlockSize = 8 * 1024;
private const int MaximumBlockSize = 1024 * 1024;
private Stream _Stream;
private int _Size = InitialBlockSize;
public byte[] Current
{
get;
private set;
}
public bool MoveNext ()
{
if (_Size < 0) {
return false;
}
var buf = new byte[_Size];
int count = 0;
while (count < _Size) {
int read = _Stream.Read (buf, count, _Size - count);
if (read == 0) {
break;
}
count += read;
}
if (count == _Size) {
Current = buf;
if (_Size <= MaximumBlockSize / 2) {
_Size *= 2;
}
}
else {
Current = new byte[count];
Array.Copy (buf, Current, count);
_Size = -1;
}
return true;
}

Categories