Basically i need to encrypt a file and then be able to decrypt the file from almost any point in the file. The reason i need this is i would like to use this for files like Video etc and still be able to jump though the file or video. Also the file would be served over the web so not needing to download the whole file is important. Where i am storing the file supports partial downloads so i can request any part of the file i need and this works for an un encrypted file. The question is how could i make this work for an encrypted file. I need to encrypt and decrypt the file in C# but don't really have any other restrictions than that. Symmetric keys are preferred but if that wont work it is not a deal breaker.
Another example of where i only want to download part of a file and decrypt is where i have joined multiple files together but just need to retrieve one of them. This would generally be used for files smaller than 50MB like pictures and info files.
--- EDIT ---
To be clear i am looking for a working implementation or library that does not increase the size of the source file. Stream cipher seems ideal but i have not seen one in c# that works for any point in the stream or anything apart from the start of the stream. Would consider block based implementation if it works form set blocks in stream. Basically i want to pass a raw stream though this and have unencrypted come out other side of the stream. Happy to set the starting offset it represents in the whole file/stream. Looking for something than works as i am not encryption expert. At the minute i get the parts of the file from a data source in 512kb to 5mb blocks depending on client config and i use streams CopyTo method to write it out to a file on disk. I don't get these parts in order. I am looking for a stream wrapper that i could use to pass into the CopyTo method on stream.
Your best best is probably to treat the file as a list of chunks (of whatever size is convenient for your application; let's say 50 kB) and encrypt each separately. This would allow you to decrypt each chunk independently of the others.
For each chunk, derive new keys from your master key, generate a new IV, and encrypt-then-MAC the chunk.
This method has higher storage overhead than encrypting the entire file at once and takes a bit more computation as well due to the key regeneration that it requires.
If you use a stream cipher instead of a block cipher, you'd be able to start decrypting at any byte offset, as long as the decryptor was able to get the current IV from somewhere.
For those interested i managed to work it out based on a number of examples i found plus some of my own code. It uses bouncycastle but should also work with dotnet AES with a few tweaks. This allows the decryption/encryption from any point in the stream.
using System;
using System.IO;
using Org.BouncyCastle.Crypto;
using Org.BouncyCastle.Crypto.Parameters;
namespace StreamHelpers
{
public class StreamEncryptDecrypt : Stream
{
private readonly Stream _streamToWrap;
private readonly IBlockCipher _cipher;
private readonly ICipherParameters _key;
private readonly byte[] _iv;
private readonly byte[] _counter;
private readonly byte[] _counterOut;
private readonly byte[] _output;
private long currentBlockCount;
public StreamEncryptDecrypt(Stream streamToWrap, IBlockCipher cipher, ParametersWithIV keyAndIv)
{
_streamToWrap = streamToWrap;
_cipher = cipher;
_key = keyAndIv.Parameters;
_cipher.Init(true, _key);
_iv = keyAndIv.GetIV();
_counter = new byte[_cipher.GetBlockSize()];
_counterOut = new byte[_cipher.GetBlockSize()];
_output = new byte[_cipher.GetBlockSize()];
if (_iv.Length != _cipher.GetBlockSize())
{
throw new Exception("IV must be the same size as the cipher block size");
}
InitCipher();
}
private void InitCipher()
{
long position = _streamToWrap.Position;
Array.Copy(_iv, 0, _counter, 0, _counter.Length);
currentBlockCount = 0;
var targetBlock = position/_cipher.GetBlockSize();
while (currentBlockCount < targetBlock)
{
IncrementCounter(false);
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
private void IncrementCounter(bool updateCounterOut = true)
{
currentBlockCount++;
// Increment the counter
int j = _counter.Length;
while (--j >= 0 && ++_counter[j] == 0)
{
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
public override long Position
{
get { return _streamToWrap.Position; }
set
{
_streamToWrap.Position = value;
InitCipher();
}
}
public override long Seek(long offset, SeekOrigin origin)
{
var result = _streamToWrap.Seek(offset, origin);
InitCipher();
return result;
}
public void ProcessBlock(
byte[] input,
int offset,
int length, long streamPosition)
{
if (input.Length < offset+length)
throw new ArgumentException("input does not match offset and length");
var blockSize = _cipher.GetBlockSize();
var startingBlock = streamPosition / blockSize;
var blockOffset = (int)(streamPosition - (startingBlock * blockSize));
while (currentBlockCount < streamPosition / blockSize)
{
IncrementCounter();
}
//process the left over from current block
if (blockOffset !=0)
{
var blockLength = blockSize - blockOffset;
blockLength = blockLength > length ? length : blockLength;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[blockOffset + i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
blockOffset = 0;
if (length > 0)
{
IncrementCounter();
}
}
//need to loop though the rest of the data and increament counter when needed
while (length > 0)
{
var blockLength = blockSize > length ? length : blockSize;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
if (length > 0)
{
IncrementCounter();
}
}
}
public override int Read(byte[] buffer, int offset, int count)
{
var pos = _streamToWrap.Position;
var result = _streamToWrap.Read(buffer, offset, count);
ProcessBlock(buffer, offset, result, pos);
return result;
}
public override void Write(byte[] buffer, int offset, int count)
{
var input = new byte[count];
Array.Copy(buffer, offset, input, 0, count);
ProcessBlock(input, 0, count, _streamToWrap.Position);
_streamToWrap.Write(input, offset, count);
}
public override void Flush()
{
_streamToWrap.Flush();
}
public override void SetLength(long value)
{
_streamToWrap.SetLength(value);
}
public override bool CanRead
{
get { return _streamToWrap.CanRead; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return _streamToWrap.CanWrite; }
}
public override long Length
{
get { return _streamToWrap.Length; }
}
protected override void Dispose(bool disposing)
{
if (_streamToWrap != null)
{
_streamToWrap.Dispose();
}
base.Dispose(disposing);
}
}
}
Related
I am trying to read objects from very large files containing padded structs that were written into it by a C++ process. I was using an example to memory map the large file and try to deserialize the data into an object but I now can see that it won't work this way.
How can I extract all the objects from the files to use in C#? I'm probably way off but I've provided the code. The objects have a 8 byte milliseconds member followed by 21 16bit integers, which needs 6bytes of padding to align to a 8byte boundary.
[Serializable]
unsafe public struct DataStruct
{
public UInt64 milliseconds;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 21)]
public fixed Int16 data[21];
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public fixed Int16 padding[3];
};
[Serializable]
public class DataArray
{
public DataStruct[] samples;
}
public static class Helper
{
public static Int16[] GetData(this DataStruct data)
{
unsafe
{
Int16[] output = new Int16[21];
for (int index = 0; index < 21; ++index)
output[index] = data.data[index];
return output;
}
}
}
class FileThreadSupport
{
struct DataFileInfo
{
public string path;
public UInt64 start;
public UInt64 stop;
public UInt64 elements;
};
// Create our epoch timestamp
private static readonly DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
// Output TCP client
private Support.AsyncTcpClient output;
// Directory which contains our data
private string replay_directory;
// Files to be read from
private DataFileInfo[] file_infos;
// Current timestamp of when the process was started
UInt64 process_start = 0;
// Object from current file
DataArray current_file_data;
// Offset into current files
UInt64 current_file_index = 0;
// Offset into current files
UInt64 current_file_offset = 0;
// Run flag
bool run = true;
public FileThreadSupport(ref Support.AsyncTcpClient output, ref Engine.A.Information info, ref Support.Configuration configuration)
{
// Set our output directory
replay_directory = configuration.getString("replay_directory");
if (replay_directory.Length == 0)
{
Console.WriteLine("Configuration does not provide a replay directory");
return;
}
// Check the directory for playable files
if(!loadDataDirectory(replay_directory))
{
Console.WriteLine("Replay directory {} did not have any valid files", replay_directory);
}
// Set the output TCP client
this.output = output;
}
private bool loadDataDirectory(string directory)
{
string[] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
file_infos = new DataFileInfo[files.Length];
int index = 0;
foreach (string file in files)
{
string[] parts = file.Split('\\');
string name = parts.Last();
parts = name.Split('.');
if (parts.Length != 2)
continue;
UInt64 start, stop = 0;
if (!UInt64.TryParse(parts[0], out start) || !UInt64.TryParse(parts[1], out stop))
continue;
long size = new System.IO.FileInfo(file).Length;
// Add to our file info array
file_infos[index] = new DataFileInfo
{
path = file,
start = start,
stop = stop,
elements = (ulong)(new System.IO.FileInfo(file).Length / 56
/*System.Runtime.InteropServices.Marshal.SizeOf(typeof(DataStruct))*/)
};
++index;
}
// Sort the array
Array.Sort(file_infos, delegate (DataFileInfo x, DataFileInfo y) { return x.start.CompareTo(y.start); });
// Return whether or not there were files found
return (files.Length > 0);
}
public void start()
{
process_start = (ulong)DateTime.Now.ToUniversalTime().Subtract(epoch).TotalMilliseconds;
UInt64 num_samples = 0;
while(run)
{
// Get our samples and add it to the sample
DataStruct[] result = getData(100);
Engine.A.A message = new Engine.A.A();
for (int i = 0; i < result.Length; ++i)
{
Engine.A.Data sample = new Engine.A.Data();
sample.Time = process_start + num_samples * 4;
Int16[] signal_data = Helper.GetData(result[i]);
for(int e = 0; e < signal_data.Length; ++e)
sample.Value[e] = signal_data[e];
message.Signal.Add(sample);
++num_samples;
}
// Send out the websocket
this.output.SendAsync(message.ToByteArray());
// Sleep 100 milliseconds
Thread.Sleep(100);
}
}
public void stop()
{
run = false;
}
private DataStruct[] getData(UInt64 milliseconds)
{
if (file_infos.Length == 0)
return new DataStruct[0];
if (current_file_data == null)
{
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if(current_file_data.samples.Length == 0)
return new DataStruct[0];
}
UInt64 elements_to_read = (UInt64) milliseconds / 4;
DataStruct[] result = new DataStruct[elements_to_read];
Array.Copy(current_file_data.samples, (int)current_file_offset, result, 0, (int) Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
while((UInt64) result.Length != elements_to_read)
{
current_file_index = (current_file_index + 1) % (ulong) file_infos.Length;
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if (current_file_data.samples.Length == 0)
return new DataStruct[0];
current_file_offset = 0;
Array.Copy(current_file_data.samples, (int)current_file_offset, result, result.Length, (int)Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
}
return result;
}
private object ByteArrayToObject(byte[] buffer)
{
BinaryFormatter binaryFormatter = new BinaryFormatter(); // Create new BinaryFormatter
MemoryStream memoryStream = new MemoryStream(buffer); // Convert buffer to memorystream
return binaryFormatter.Deserialize(memoryStream); // Deserialize stream to an object
}
private object ReadObjectFromMMF(string file)
{
// Get a handle to an existing memory mapped file
using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(file, FileMode.Open))
{
// Create a view accessor from which to read the data
using (MemoryMappedViewAccessor mmfReader = mmf.CreateViewAccessor())
{
// Create a data buffer and read entire MMF view into buffer
byte[] buffer = new byte[mmfReader.Capacity];
mmfReader.ReadArray<byte>(0, buffer, 0, buffer.Length);
// Convert the buffer to a .NET object
return ByteArrayToObject(buffer);
}
}
}
Well for one thing you're not using that memory mapped file well at all, you're just sequentially reading it all in a buffer, which is both needlessly inefficient and much slower than if you simply opened the file to read normally. The selling point of memory mapped files is repeated random access and random updates backed by the OS's virtual memory paging.
And you definitely don't need to read the entire file in memory, since your data is so strongly structured. You know exactly how many bytes to read for a record: Marshal.SizeOf<DataStruct>().
Then you need to get rid of all that serialization noise. Again your data is strongly typed, just read it. Get rid of those fixed arrays and use regular arrays, you're already instructing the marshaller how to read them with MarshalAs attributes (good). That also gets rid of that helper function that just copies an array for some unknown reason.
Your reading loop is very simple: read the correct number of bytes for one entry, use Marshal.PtrToStructure to convert it to a readable structure and add it to a list to return at the end. Bonus points if you can use .Net Core and Unsafe.As or Unsafe.Cast.
Edit: and don't use object returns, you know exactly what you're returning, write it down.
I have DataWriter that has file storage stream already attached. how ever in particular case I want to first write data in memory so I can know size of bytes, and store size with data in writer.
How can I do that without creating two in memory buffers?
DataWriter writer; // writer is parameter passed from somewhere else.
using (var inMemory = new InMemoryRandomAccessStream())
{
// fill inMemory with data.
// ***Here*** How can I avoid this?
var buffer = new byte[checked((int)inMemory.Position)].AsBuffer();
inMemory.Seek(0);
await inMemory.ReadAsync(buffer, buffer.Length, InputStreamOptions.ReadAhead);
writer.WriteUInt32(buffer.Length); // write size
writer.WriteBuffer(buffer); // write data
}
As you can see I'm using two buffers, one is for memory stream, the other is ibuffer.
I don't know how to directly write inMemory contents into DataWriter which has filestorage stream already attached.
I had to write my own buffer stream in order to prevent duplicate buffer creation. though stream buffer internally works like list but it has benefits when list grows large.
internal sealed class BufferStream : IDisposable
{
private byte[] _array = Array.Empty<byte>();
private int _index = -1;
private const int MaxArrayLength = 0X7FEFFFFF;
public int Capacity => _array.Length;
public int Length => _index + 1;
public void WriteIntoDataWriterStreamAsync(IDataWriter writer)
{
// AsBuffer wont cause copy, its just wrapper around array.
if(_index >= 0) writer.WriteBuffer(_array.AsBuffer(0, _index));
}
public void WriteBuffer(IBuffer buffer)
{
EnsureSize(checked((int) buffer.Length));
for (uint i = 0; i < buffer.Length; i++)
{
_array[++_index] = buffer.GetByte(i);
}
}
public void Flush()
{
Array.Clear(_array, 0, _index);
_index = -1;
}
// list like resizing.
private void EnsureSize(int additionSize)
{
var min = additionSize + _index;
if (_array.Length <= min)
{
var newsize = (int) Math.Min((uint) _array.Length * 2, MaxArrayLength);
if (newsize <= min) newsize = min + 1;
Array.Resize(ref _array, newsize);
}
}
public void Dispose()
{
_array = null;
}
}
Then I can easily do this.
using (var buffer = new BufferStream())
{
// fill buffer
writer.WriteInt32(buffer.Length); // write size
buffer.WriteIntoDataWriterStream(writer); // write data
}
In .NET, we have the SecureString class, which is all very well until you come to try and use it, as to (for example) hash the string, you need the plaintext. I've had a go here at writing a function that will hash a SecureString, given a hash function that takes a byte array and outputs a byte array.
private static byte[] HashSecureString(SecureString ss, Func<byte[], byte[]> hash)
{
// Convert the SecureString to a BSTR
IntPtr bstr = Marshal.SecureStringToBSTR(ss);
// BSTR contains the length of the string in bytes in an
// Int32 stored in the 4 bytes prior to the BSTR pointer
int length = Marshal.ReadInt32(bstr, -4);
// Allocate a byte array to copy the string into
byte[] bytes = new byte[length];
// Copy the BSTR to the byte array
Marshal.Copy(bstr, bytes, 0, length);
// Immediately destroy the BSTR as we don't need it any more
Marshal.ZeroFreeBSTR(bstr);
// Hash the byte array
byte[] hashed = hash(bytes);
// Destroy the plaintext copy in the byte array
for (int i = 0; i < length; i++) { bytes[i] = 0; }
// Return the hash
return hashed;
}
I believe this will correctly hash the string, and will correctly scrub any copies of the plaintext from memory by the time the function returns, assuming the provided hash function is well behaved and doesn't make any copies of the input that it doesn't scrub itself. Have I missed anything here?
Have I missed anything here?
Yes, you have, a rather fundamental one at that. You cannot scrub the copy of the array left behind when the garbage collector compacts the heap. Marshal.SecureStringToBSTR(ss) is okay because a BSTR is allocated in unmanaged memory so will have a reliable pointer that won't change. In other words, no problem scrubbing that one.
Your byte[] bytes array however contains the copy of the string and is allocated on the GC heap. You make it likely to induce a garbage collection with the hashed[] array. Easily avoided but of course you have little control over other threads in your process allocating memory and inducing a collection. Or for that matter a background GC that was already in progress when your code started running.
The point of SecureString is to never have a cleartext copy of the string in garbage collected memory. Copying it into a managed array violated that guarantee. If you want to make this code secure then you are going to have to write a hash() method that takes the IntPtr and only reads through that pointer.
Beware that if your hash needs to match a hash computed on another machine then you cannot ignore the Encoding that machine would use to turn the string into bytes.
There's always the possibility of using the unmanaged CryptoApi or CNG functions.
Bear in mind that SecureString was designed with an unmanaged consumer which has full control over memory management in mind.
If you want to stick to C#, you should pin the temporary array to prevent the GC from moving it around before you get a chance to scrub it:
private static byte[] HashSecureString(SecureString input, Func<byte[], byte[]> hash)
{
var bstr = Marshal.SecureStringToBSTR(input);
var length = Marshal.ReadInt32(bstr, -4);
var bytes = new byte[length];
var bytesPin = GCHandle.Alloc(bytes, GCHandleType.Pinned);
try {
Marshal.Copy(bstr, bytes, 0, length);
Marshal.ZeroFreeBSTR(bstr);
return hash(bytes);
} finally {
for (var i = 0; i < bytes.Length; i++) {
bytes[i] = 0;
}
bytesPin.Free();
}
}
As a complement to Hans’ answer here’s a suggestion how to implement the hasher. Hans suggests passing the pointer to the unmanaged string to the hash function but that means that client code (= the hash function) needs to deal with unmanaged memory. That’s not ideal.
On the other hand, you can replace the callback by an instance of the following interface:
interface Hasher {
void Reinitialize();
void AddByte(byte b);
byte[] Result { get; }
}
That way the hasher (although it becomes slightly more complex) can be implemented wholly in managed land without leaking secure information. Your HashSecureString would then look as follows:
private static byte[] HashSecureString(SecureString ss, Hasher hasher) {
IntPtr bstr = Marshal.SecureStringToBSTR(ss);
try {
int length = Marshal.ReadInt32(bstr, -4);
hasher.Reinitialize();
for (int i = 0; i < length; i++)
hasher.AddByte(Marshal.ReadByte(bstr, i));
return hasher.Result;
}
finally {
Marshal.ZeroFreeBSTR(bstr);
}
}
Note the finally block to make sure that the unmanaged memory is zeroed, no matter what shenanigans the hasher instance does.
Here’s a simple (and not very useful) Hasher implementation to illustrate the interface:
sealed class SingleByteXor : Hasher {
private readonly byte[] data = new byte[1];
public void Reinitialize() {
data[0] = 0;
}
public void AddByte(byte b) {
data[0] ^= b;
}
public byte[] Result {
get { return data; }
}
}
As a further complement, could you not wrap the logic #KonradRudolph and #HansPassant supplied into a custom Stream implementation?
This would allow you to use the HashAlgorithm.ComputeHash(Stream) method, which would keep the interface managed (although it would be down to you to dispose the stream in good time).
Of course, you are at the mercy of the HashAlgorithm implementation as to how much data ends up in memory at a time (but, of course, that's what the reference source is for!)
Just an idea...
public class SecureStringStream : Stream
{
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return false; } }
public override bool CanSeek { get { return false; } }
public override long Position
{
get { return _pos; }
set { throw new NotSupportedException(); }
}
public override void Flush() { throw new NotSupportedException(); }
public override long Seek(long offset, SeekOrigin origin) { throw new NotSupportedException(); }
public override void SetLength(long value) { throw new NotSupportedException(); }
public override void Write(byte[] buffer, int offset, int count) { throw new NotSupportedException(); }
private readonly IntPtr _bstr = IntPtr.Zero;
private readonly int _length;
private int _pos;
public SecureStringStream(SecureString str)
{
if (str == null) throw new ArgumentNullException("str");
_bstr = Marshal.SecureStringToBSTR(str);
try
{
_length = Marshal.ReadInt32(_bstr, -4);
_pos = 0;
}
catch
{
if (_bstr != IntPtr.Zero) Marshal.ZeroFreeBSTR(_bstr);
throw;
}
}
public override long Length { get { return _length; } }
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null) throw new ArgumentNullException("buffer");
if (offset < 0) throw new ArgumentOutOfRangeException("offset");
if (count < 0) throw new ArgumentOutOfRangeException("count");
if (offset + count > buffer.Length) throw new ArgumentException("offset + count > buffer");
if (count > 0 && _pos++ < _length)
{
buffer[offset] = Marshal.ReadByte(_bstr, _pos++);
return 1;
}
else return 0;
}
protected override void Dispose(bool disposing)
{
try { if (_bstr != IntPtr.Zero) Marshal.ZeroFreeBSTR(_bstr); }
finally { base.Dispose(disposing); }
}
}
void RunMe()
{
using (SecureString s = new SecureString())
{
foreach (char c in "jimbobmcgee") s.AppendChar(c);
s.MakeReadOnly();
using (SecureStringStream ss = new SecureStringStream(s))
using (HashAlgorithm h = MD5.Create())
{
Console.WriteLine(Convert.ToBase64String(h.ComputeHash(ss)));
}
}
}
I'm trying to split a byte stream into chunks of increasing size.
The source stream contains an unknown number of bytes and is expensive to read. The output of the enumerator should be byte arrays of increasing size, starting at 8KB up to 1MB.
This is very simple to do by simply reading the whole stream, storing it in an array and taking the relevant pieces out. However, since the stream may be very large, reading it at once is unfeasible. Also, while performance is not the main concern, it is important to keep system load very low.
While implementing this I noticed that it's relatively difficult to keep the code short and maintainable. There are a few stream related issues to keep in mind, too (for instance, Stream.Read might not fill the buffer even though it succeeded).
I did not find any existing classes that help for my case, nor could I find something close on the net. How would you implement such a class?
public IEnumerable<BufferWrapper> getBytes(Stream stream)
{
List<int> bufferSizes = new List<int>() { 8192, 65536, 220160, 1048576 };
int count = 0;
int bufferSizePostion = 0;
byte[] buffer = new byte[bufferSizes[0]];
bool done = false;
while (!done)
{
BufferWrapper nextResult = new BufferWrapper();
nextResult.bytesRead = stream.Read(buffer, 0, buffer.Length);
nextResult.buffer = buffer;
done = nextResult.bytesRead == 0;
if (!done)
{
yield return nextResult;
count++;
if (count > 10 && bufferSizePostion < bufferSizes.Count)
{
count = 0;
bufferSizePostion++;
buffer = new byte[bufferSizes[bufferSizePostion]];
}
}
}
}
public class BufferWrapper
{
public byte[] buffer { get; set; }
public int bytesRead { get; set; }
}
Obviously the logic for when to move up in buffer size, and how to choose what that size is could be altered.
Someone could also probably find a better way of handling the last buffer to be sent, as this isn't the most efficient way.
For reference, the implementation I currently use, already with improvements as per the answer by #Servy
private const int InitialBlockSize = 8 * 1024;
private const int MaximumBlockSize = 1024 * 1024;
private Stream _Stream;
private int _Size = InitialBlockSize;
public byte[] Current
{
get;
private set;
}
public bool MoveNext ()
{
if (_Size < 0) {
return false;
}
var buf = new byte[_Size];
int count = 0;
while (count < _Size) {
int read = _Stream.Read (buf, count, _Size - count);
if (read == 0) {
break;
}
count += read;
}
if (count == _Size) {
Current = buf;
if (_Size <= MaximumBlockSize / 2) {
_Size *= 2;
}
}
else {
Current = new byte[count];
Array.Copy (buf, Current, count);
_Size = -1;
}
return true;
}
Using .net, I would like to be able to hash the first N bytes of potentially large files, but I can't seem to find a way of doing it.
The ComputeHash function (I'm using SHA1) takes a byte array or a stream, but a stream seems like the best way of doing it, since I would prefer not to load a potentially large file into memory.
To be clear: I don't want to load a potentially large piece of data into memory if I can help it. If the file is 2GB and I want to hash the first 1GB, that's a lot of RAM!
You can hash large volumes of data using a CryptoStream - something like this should work:
var sha1 = SHA1Managed.Create();
FileStream fs = \\whatever
using (var cs = new CryptoStream(fs, sha1, CryptoStreamMode.Read))
{
byte[] buf = new byte[16];
int bytesRead = cs.Read(buf, 0, buf.Length);
long totalBytesRead = bytesRead;
while (bytesRead > 0 && totalBytesRead <= maxBytesToHash)
{
bytesRead = cs.Read(buf, 0, buf.Length);
totalBytesRead += bytesRead;
}
}
byte[] hash = sha1.Hash;
fileStream.Read(array, 0, N);
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
Open the file as a FileStream, copy the first n bytes into a MemoryStream, then hash the MemoryStream.
As others have pointed out, you should read the first few bytes into an array.
What should also be noted that you don't want to make a direct call to Read and assume that the bytes have been read.
Rather, you want to make sure that the number of bytes that are returned are the number of bytes that you requested, and make another call to Read in the event that the number of bytes returned doesn't equal the initial number requested.
Also, if you have rather large streams, you will want to create a proxy for the Stream class where you pass it the underlying stream (the FileStream in this case) and override the Read method to forward the call to the underlying stream until you read the number of bytes that you need to read. Then, when that number of bytes is returned, you would return -1 to indicate that there are no more bytes to be read.
If you are concerned about keeping too much data in memory, you can create a stream wrapper that throttles the maximum number of bytes read.
Without doing all the work, here's a sample boiler plate you could use to get started.
Edit: Please review comments for recommendations to improve this implementation. End edit
public class LimitedStream : Stream
{
private int current = 0;
private int limit;
private Stream stream;
public LimitedStream(Stream stream, int n)
{
this.limit = n;
this.stream = stream;
}
public override int ReadByte()
{
if (current >= limit)
return -1;
var numread = base.ReadByte();
if (numread >= 0)
current++;
return numread;
}
public override int Read(byte[] buffer, int offset, int count)
{
count = Math.Min(count, limit - current);
var numread = this.stream.Read(buffer, offset, count);
current += numread;
return numread;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return false; }
}
public override void Flush()
{
throw new NotImplementedException();
}
public override long Length
{
get { throw new NotImplementedException(); }
}
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
protected override void Dispose(bool disposing)
{
base.Dispose(disposing);
if (this.stream != null)
{
this.stream.Dispose();
}
}
}
Here is an example of the stream in use, wrapping a file stream, but throttling the number of bytes read to the specified limit:
using (var stream = new LimitedStream(File.OpenRead(#".\test.xml"), 100))
{
var bytes = new byte[1024];
stream.Read(bytes, 0, bytes.Length);
}