XmlReader to read from fixed length buffer - c#

The incoming stream comes in a fixed 1024 bytes buffer, the stream itself is a hug XML file which may take several rounds of read to finish. My goal is to read the buffer and figure out how many times an element occured in the big XML file.
My chanllenge is, since it is really a fixed length buffer, so it cannot gurantee the wellform of XML, if I wrap the stream in the XmlTextReader, I am always getting exception and cannot finish the read. for example, the element could be abcdef, while the 1st buffer could end at abc while second buffer start with def. I am really frustrated about this, anyone could advise a better way to archieve this using streaming fashion? (I do not want to load entire content in memory)
Thanks so much

Are your 1024-byte buffers coming from one of the standard, concrete implementations of System.IO.Stream? If t they are, you can just create your XmlTextReader around the base stream:
XmlTextReader tr = XmlTextReader.Create( myStreamInstance ) ;
If not -- say, for instance, you're "reading" the buffers from some sort of API -- you need to implement your own concrete Stream, something along these lines (all you should need to do is flesh out the ReadNextFrame() method and possibly implement your constructors):
public class MyStream : System.IO.Stream
{
public override bool CanRead { get { return true ; } }
public override bool CanSeek { get { return false ; } }
public override bool CanWrite { get { return false ; } }
public override long Length { get { throw new NotImplementedException(); } }
public override long Position {
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
public override int Read( byte[] buffer , int offset , int count )
{
int bytesRead = 0 ;
if ( !initialized )
{
Initialize() ;
}
for ( int bytesRemaining = count ; !atEOF && bytesRemaining > 0 ; )
{
int frameRemaining = frameLength - frameOffset ;
int chunkSize = ( bytesRemaining > frameRemaining ? frameRemaining : bytesRemaining ) ;
Array.Copy( frame , offset , frame , frameOffset , chunkSize ) ;
bytesRemaining -= chunkSize ;
offset += chunkSize ;
bytesRead += chunkSize ;
// read next frame if necessary
if ( frameOffset >= frameLength )
{
ReadNextFrame() ;
}
}
return bytesRead ;
}
public override long Seek( long offset , System.IO.SeekOrigin origin ) { throw new NotImplementedException(); }
public override void SetLength( long value ) { throw new NotImplementedException(); }
public override void Write( byte[] buffer , int offset , int count ) { throw new NotImplementedException(); }
public override void Flush() { throw new NotImplementedException(); }
private byte[] frame = null ;
private int frameLength = 0 ;
private int frameOffset = 0 ;
private bool atEOF = false ;
private bool initialized = false ;
private void Initialize()
{
if ( initialized ) throw new InvalidOperationException() ;
frame = new byte[1024] ;
frameLength = 0 ;
frameOffset = 0 ;
atEOF = false ;
initialized = true ;
ReadNextFrame() ;
return ;
}
private void ReadNextFrame()
{
//TODO: read the next (or first 1024-byte buffer
//TODO: set the frame length to the number of bytes actually returned (might be less than 1024 on the last read, right?
//TODO: set the frame offset to 0
//TODO: set the atEOF flag if we've exhausted the data source ;
return ;
}
}
Then instantiate your XmlReader as above:
System.IO.Stream s = new MyStream() ;
System.Xml.XmlReader xr = XmlTextReader.Create( s ) ;
Cheers!

That is sort of strange goal... Usually it is more like "count elements but not load whole XML to memory" which is trivial - write Stream derived class that represents you buffer as forward only stream (similar to NetworkStream) and read XML (i.e. using LINQ) normally using XmlReader, but do not construct XmlDocument.
If you clarify your goal it may be easier for others to advise.

Related

ByteStream source in SharpDX.MediaFoundation hanging

I have been trying to get a custom audio stream to work with SharpDX.MediaFoundation.
To this end I have wrapped my audio object in a class that implements System.IO.Stream as follows:
public class AudioReaderWaveStream : System.IO.Stream
{
byte[] waveHeader = new byte[44];
AudioCore.IAudioReader reader = null;
ulong readHandle = 0xffffffff;
long readPosition = 0;
public AudioReaderWaveStream(AudioCore.CEditedAudio content)
{
reader = content as AudioCore.IAudioReader;
readHandle = reader.OpenDevice();
int sampleRate = 0;
short channels = 0;
content.GetFormat(out sampleRate, out channels);
System.IO.MemoryStream memStream = new System.IO.MemoryStream(waveHeader);
using (System.IO.BinaryWriter bw = new System.IO.BinaryWriter(memStream))
{
bw.Write("RIFF".ToCharArray());
bw.Write((Int32)Length - 8);
bw.Write("WAVE".ToCharArray());
bw.Write("fmt ".ToCharArray());
bw.Write((Int32)16);
bw.Write((Int16)3);
bw.Write((Int16)1);
bw.Write((Int32)sampleRate);
bw.Write((Int32)sampleRate * 4);
bw.Write((Int16)4);
bw.Write((Int16)32);
bw.Write("data".ToCharArray());
bw.Write((Int32)reader.GetSampleCount() * 4);
}
}
protected override void Dispose(bool disposing)
{
if (readHandle != 0xffffffff)
{
reader.CloseDevice(readHandle);
readHandle = 0xfffffffff;
}
base.Dispose(disposing);
}
~AudioReaderWaveStream()
{
Dispose();
}
public override bool CanRead
{
get
{
return true;
}
}
public override bool CanSeek
{
get
{
return true;
}
}
public override bool CanWrite
{
get
{
return false;
}
}
public override long Length
{
get
{
// Number of float samples + header of 44 bytes.
return (reader.GetSampleCount() * 4) + 44;
}
}
public override long Position
{
get
{
return readPosition;
}
set
{
readPosition = value;
}
}
public override void Flush()
{
//throw new NotImplementedException();
}
public override int Read(byte[] buffer, int offset, int count)
{
if (count <= 0)
return 0;
int retCount = count;
if (Position < 44)
{
int headerCount = count;
if ( Position + count >= 44 )
{
headerCount = 44 - (int)Position;
}
Array.Copy(waveHeader, Position, buffer, offset, headerCount);
offset += headerCount;
Position += headerCount;
count -= headerCount;
}
if (count > 0)
{
float[] readBuffer = new float[count/4];
reader.Seek(readHandle, Position - 44);
reader.ReadAudio(readHandle, readBuffer);
Array.Copy(readBuffer, 0, buffer, offset, count);
}
return retCount;
}
public override long Seek(long offset, System.IO.SeekOrigin origin)
{
if (origin == System.IO.SeekOrigin.Begin)
{
readPosition = offset;
}
else if (origin == System.IO.SeekOrigin.Current)
{
readPosition += offset;
}
else
{
readPosition = Length - offset;
}
return readPosition;
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
}
I then take this object and create a source resolver using it as follows:
// Create a source resolver.
SharpDX.MediaFoundation.ByteStream sdxByteStream = new ByteStream( ARWS );
SharpDX.MediaFoundation.SourceResolver resolver = new SharpDX.MediaFoundation.SourceResolver();
ComObject source = (ComObject)resolver.CreateObjectFromStream( sdxByteStream, "File.wav", SourceResolverFlags.MediaSource );
However every time I'm doing this it hangs on the CreateObjectFromStream call. I've had a look inside SharpDX to see whats going on and it seems the actual hang occurs when it makes the call to the underlying interface through CreateObjectFromByteStream. I've also looked to see what data is actually read from the byte stream. It reads the first 16 bytes which includes the 'RIFF', the RIFF size, the 'WAVE' and the 'fmt '. Then nothing else.
Has anyone got any ideas what I could be doing wrong. I've tried all sorts of combinations of the SourceResolverFlags but nothing seems to make any difference. It just hangs.
It does remind me somewhat of interthread marshalling but all the media foundation calls are made from the same thread so I don't think its that. I'm also fairly sure that MediaFoundation uses free threading so this shouldn't be a problem anyway.
Has anyone any idea what I could possibly be doing wrong?
Thanks!
Ok I have come up with a solution to this. It looks like I may be having a COM threading issue. The read happens in a thread and that thread was calling back to the main thread which the function was called from.
So I used the async version of the call and perform an Application.DoEvents() to hand across control where necessary.
Callback cb = new Callback( resolver );
IUnknown cancel = null;
resolver.BeginCreateObjectFromByteStream( sdxByteStream, "File.wav", (int)(SourceResolverFlags.MediaSource | SourceResolverFlags.ByteStream), null, out cancel, cb, null );
if ( cancel != null )
{
cancel.Dispose();
}
while( cb.MediaSource == null )
{
System.Windows.Forms.Application.DoEvents();
}
SharpDX.MediaFoundation.MediaSource mediaSource = cb.MediaSource;
I really hate COM's threading model ...

WaitFor() - How to wait for a specific buffer to arrive on Steam/SerialPort?

Requested Behaviour: I would like to hear proposed, generic solutions for suspending a calling thread until a specific buffer is received on a Stream/SerialPort. For the time being, I'm not concerned with timeouts etc, however I need something robust.
Attempted method:
Class myClass
{
private SerialPort _port; //Assume configured and connected.
public void WaitFor(byte[] buffer)
{
int bufferLength = buffer.Length;
byte[] comparisonBuffer = new byte[bufferLength];
while(true)
{
if(_port.BytesToRead >= bufferLength)
{
_port.Read(comparisonBuffer, 0, bufferLength);
if (comparisonBuffer.SequenceEqual(buffer)) { return; }
}
}
}
{
I've had a reasonable amount of success with this however it just has a "hacky" feel to it. It has quite often caused me trouble. I believe it's due to the fact that I cannot guarantee that other data isn't received either before or after the expected packet, so naturally this method can end up reading off the stream out of sync. In such a case I would not want to loose the leading/trailing data but the method should release the thread.
I need to implement in a procedural nature so event driven methods won't really work for me. In the generic sense I want to be able to implement as;
Do thing;
WaitFor(mybuffer);
Do other thing;
SerialPort.Read() already blocks until at least one byte has arrived. Therefore you don't need to (and shouldn't) use the BytesToRead the way you are - you've introduced a HORRIBLE busy-wait loop.
Instead, do something like this:
// Reads 'count' bytes from a serial port into the specified
// part of a buffer. This blocks until all the bytes have been read.
public void BlockingRead(SerialPort port, byte[] buffer, int offset, int count)
{
while (count > 0)
{
// SerialPort.Read() blocks until at least one byte has been read, or SerialPort.ReadTimeout milliseconds
// have elapsed. If a timeout occurs a TimeoutException will be thrown.
// Because SerialPort.Read() blocks until some data is available this is not a busy loop,
// and we do NOT need to issue any calls to Thread.Sleep().
int bytesRead = port.Read(buffer, offset, count);
offset += bytesRead;
count -= bytesRead;
}
}
Here's how you would implement your original code in terms of BlockingRead():
public void WaitFor(SerialPort port, byte[] buffer)
{
byte[] comparisonBuffer = new byte[buffer.Length];
while (true)
{
BlockingRead(port, comparisonBuffer, 0, comparisonBuffer.Length);
if (comparisonBuffer.SequenceEqual(buffer))
return;
}
}
Problem
Lets assume you wait for the byte pattern {1,1,1,2,2} and the serial port has buffered {1,1,1,1,2,2,5}.
Your code reads the first 5 bytes {1,1,1,1,2} which will not match the pattern. But after reading from the port the data you read has been removed from the buffer and contains only {2,5} and you will never get a match.
Solution
public void WaitFor( byte[ ] buffer )
{
if ( buffer.Length == 0 )
return;
var q = new List<byte>( buffer.Length );
while ( true )
{
var current = _reader.ReadByte();
q.Add( (byte)current );
// sequence match so far
if ( q.Last == buffer[ q.Count - 1 ] )
{
// check for total match
if ( q.Count == buffer.Length )
return;
}
else
{
// shift the data
while ( q.Any() && !q.SequenceEqual( buffer.Take( q.Count ) ) )
{
q.RemoveAt( 0 );
}
}
}
}
What do you think to this solution?
public override byte[] WaitFor(byte[] buffer, int timeout)
{
// List to stack stream into
List<byte> stack = new List<byte>();
// Index of first comparison byte
int index = 0;
// Index of last comparison byte
int upperBound = buffer.Length - 1;
// Timeout Manager
Stopwatch Sw = new Stopwatch();
Sw.Start();
while (Sw.Elapsed.Seconds <= timeout)
{
// Read off the last byte receievd and add to the stack
stack.Add((byte)_port.ReadByte());
// If my stack contains enough bytes to compare to the buffer
if (stack.Count > upperBound)
{
// If my first comparison byte matches my first buffer byte
if (stack[index] == buffer[0])
{
// Extract the comparison block to array
byte[] compBuffer = stack.GetRange(index,upperBound +1).ToArray();
// If the comparison matches, break and return the redundent bytes should I wish to handle them.
if ((compBuffer.SequenceEqual(buffer) && (index-1 > 0))) { return stack.GetRange(0, index - 1).ToArray(); }
// If there were no redundent bytes, just return zero.
else if (compBuffer.SequenceEqual(buffer)) { return new byte[] { 0}; }
}
// Increments
index += 1;
upperBound += 1;
}
}
throw new TimeoutException("Timeout: Expected buffer was not received prior to timeout");
}

Returning database results as a stream

I have a function that returns database query results. These results have got very large, and I now would like to pass them as a stream, so that the client can start to process them quicker, and memory usage is less. But I don't really know how to do this, the following function works, but what I want to know how to change it so that it starts to stream upon reading from the first table.
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Position = 0;
return stream;
}
You could write a custom Stream implementation which functions as a pipe. If you then moved your GetItemsFromTable() method calls into a background task, the client could start reading results from the stream immediately.
In my solution below I'm using a circular buffer as a backing store for the pipe stream. Memory usage will be reduced only if the client consumes data fast enough. But even in the worst case scenario it shouldn't use more memory then your current solution. If memory usage is a bigger priority for you than execution speed then your stream could potentially block write calls until space is available. My solution below does not block writes; it expands the capacity of the circular buffer so that the background thread can continue filling data without delays.
The GetResults method might look like this:
public Stream GetResults()
{
// Begin filling the pipe with data on a background thread
var pipeStream = new CircularBufferPipeStream();
Task.Run(() => WriteResults(pipeStream));
// Return pipe stream for immediate usage by client
// Note: client is responsible for disposing of the stream after reading all data!
return pipeStream;
}
// Runs on background thread, filling circular buffer with data
void WriteResults(CircularBufferPipeStream stream)
{
IFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
// Indicate that there's no more data to write
stream.CloseWritePort();
}
And the circular buffer stream:
/// <summary>
/// Stream that acts as a pipe by supporting reading and writing simultaneously from different threads.
/// Read calls will block until data is available or the CloseWritePort() method has been called.
/// Read calls consume bytes in the circular buffer immediately so that more space is available for writes into the circular buffer.
/// Writes do not block; the capacity of the circular buffer will be expanded as needed to write the entire block of data at once.
/// </summary>
class CircularBufferPipeStream : Stream
{
const int DefaultCapacity = 1024;
byte[] _buffer;
bool _writePortClosed = false;
object _readWriteSyncRoot = new object();
int _length;
ManualResetEvent _dataAddedEvent;
int _start = 0;
public CircularBufferPipeStream(int initialCapacity = DefaultCapacity)
{
_buffer = new byte[initialCapacity];
_length = 0;
_dataAddedEvent = new ManualResetEvent(false);
}
public void CloseWritePort()
{
lock (_readWriteSyncRoot)
{
_writePortClosed = true;
_dataAddedEvent.Set();
}
}
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return true; } }
public override bool CanSeek { get { return false; } }
public override void Flush() { }
public override long Length { get { throw new NotImplementedException(); } }
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); }
public override void SetLength(long value) { throw new NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
int bytesRead = 0;
while (bytesRead == 0)
{
bool waitForData = false;
lock (_readWriteSyncRoot)
{
if (_length != 0)
bytesRead = ReadDirect(buffer, offset, count);
else if (_writePortClosed)
break;
else
{
_dataAddedEvent.Reset();
waitForData = true;
}
}
if (waitForData)
_dataAddedEvent.WaitOne();
}
return bytesRead;
}
private int ReadDirect(byte[] buffer, int offset, int count)
{
int readTailCount = Math.Min(Math.Min(_buffer.Length - _start, count), _length);
Array.Copy(_buffer, _start, buffer, offset, readTailCount);
_start += readTailCount;
_length -= readTailCount;
if (_start == _buffer.Length)
_start = 0;
int readHeadCount = Math.Min(Math.Min(_buffer.Length - _start, count - readTailCount), _length);
if (readHeadCount > 0)
{
Array.Copy(_buffer, _start, buffer, offset + readTailCount, readHeadCount);
_start += readHeadCount;
_length -= readHeadCount;
}
return readTailCount + readHeadCount;
}
public override void Write(byte[] buffer, int offset, int count)
{
lock (_readWriteSyncRoot)
{
// expand capacity as needed
if (count + _length > _buffer.Length)
{
var expandedBuffer = new byte[Math.Max(_buffer.Length * 2, count + _length)];
_length = ReadDirect(expandedBuffer, 0, _length);
_start = 0;
_buffer = expandedBuffer;
}
int startWrite = (_start + _length) % _buffer.Length;
int writeTailCount = Math.Min(_buffer.Length - startWrite, count);
Array.Copy(buffer, offset, _buffer, startWrite, writeTailCount);
startWrite += writeTailCount;
_length += writeTailCount;
if (startWrite == _buffer.Length)
startWrite = 0;
int writeHeadCount = count - writeTailCount;
if (writeHeadCount > 0)
{
Array.Copy(buffer, offset + writeTailCount, _buffer, startWrite, writeHeadCount);
_length += writeHeadCount;
}
}
_dataAddedEvent.Set();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_dataAddedEvent != null)
{
_dataAddedEvent.Dispose();
_dataAddedEvent = null;
}
}
base.Dispose(disposing);
}
}
try
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Seek(0L, SeekOrigin.Begin);
return stream;
}
why the changes?
remove using, because your stream gets disposed once it leaves the using-block. disposing the stream means you cannot use it anymore
seek to the beginning of the stream. if you start reading from the stream without seeking to its beginning, you would start to deserialize/ read from its end; but unfortunately there is no content behind the end of the stream
however, I don't see how using a MemoryStream reduces memory usage. I would suggest chaining it into a DeflateStream or a FileStream to reduce RAM-usage
hope this helps

Encrypt file but can decrypt from any point

Basically i need to encrypt a file and then be able to decrypt the file from almost any point in the file. The reason i need this is i would like to use this for files like Video etc and still be able to jump though the file or video. Also the file would be served over the web so not needing to download the whole file is important. Where i am storing the file supports partial downloads so i can request any part of the file i need and this works for an un encrypted file. The question is how could i make this work for an encrypted file. I need to encrypt and decrypt the file in C# but don't really have any other restrictions than that. Symmetric keys are preferred but if that wont work it is not a deal breaker.
Another example of where i only want to download part of a file and decrypt is where i have joined multiple files together but just need to retrieve one of them. This would generally be used for files smaller than 50MB like pictures and info files.
--- EDIT ---
To be clear i am looking for a working implementation or library that does not increase the size of the source file. Stream cipher seems ideal but i have not seen one in c# that works for any point in the stream or anything apart from the start of the stream. Would consider block based implementation if it works form set blocks in stream. Basically i want to pass a raw stream though this and have unencrypted come out other side of the stream. Happy to set the starting offset it represents in the whole file/stream. Looking for something than works as i am not encryption expert. At the minute i get the parts of the file from a data source in 512kb to 5mb blocks depending on client config and i use streams CopyTo method to write it out to a file on disk. I don't get these parts in order. I am looking for a stream wrapper that i could use to pass into the CopyTo method on stream.
Your best best is probably to treat the file as a list of chunks (of whatever size is convenient for your application; let's say 50 kB) and encrypt each separately. This would allow you to decrypt each chunk independently of the others.
For each chunk, derive new keys from your master key, generate a new IV, and encrypt-then-MAC the chunk.
This method has higher storage overhead than encrypting the entire file at once and takes a bit more computation as well due to the key regeneration that it requires.
If you use a stream cipher instead of a block cipher, you'd be able to start decrypting at any byte offset, as long as the decryptor was able to get the current IV from somewhere.
For those interested i managed to work it out based on a number of examples i found plus some of my own code. It uses bouncycastle but should also work with dotnet AES with a few tweaks. This allows the decryption/encryption from any point in the stream.
using System;
using System.IO;
using Org.BouncyCastle.Crypto;
using Org.BouncyCastle.Crypto.Parameters;
namespace StreamHelpers
{
public class StreamEncryptDecrypt : Stream
{
private readonly Stream _streamToWrap;
private readonly IBlockCipher _cipher;
private readonly ICipherParameters _key;
private readonly byte[] _iv;
private readonly byte[] _counter;
private readonly byte[] _counterOut;
private readonly byte[] _output;
private long currentBlockCount;
public StreamEncryptDecrypt(Stream streamToWrap, IBlockCipher cipher, ParametersWithIV keyAndIv)
{
_streamToWrap = streamToWrap;
_cipher = cipher;
_key = keyAndIv.Parameters;
_cipher.Init(true, _key);
_iv = keyAndIv.GetIV();
_counter = new byte[_cipher.GetBlockSize()];
_counterOut = new byte[_cipher.GetBlockSize()];
_output = new byte[_cipher.GetBlockSize()];
if (_iv.Length != _cipher.GetBlockSize())
{
throw new Exception("IV must be the same size as the cipher block size");
}
InitCipher();
}
private void InitCipher()
{
long position = _streamToWrap.Position;
Array.Copy(_iv, 0, _counter, 0, _counter.Length);
currentBlockCount = 0;
var targetBlock = position/_cipher.GetBlockSize();
while (currentBlockCount < targetBlock)
{
IncrementCounter(false);
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
private void IncrementCounter(bool updateCounterOut = true)
{
currentBlockCount++;
// Increment the counter
int j = _counter.Length;
while (--j >= 0 && ++_counter[j] == 0)
{
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
public override long Position
{
get { return _streamToWrap.Position; }
set
{
_streamToWrap.Position = value;
InitCipher();
}
}
public override long Seek(long offset, SeekOrigin origin)
{
var result = _streamToWrap.Seek(offset, origin);
InitCipher();
return result;
}
public void ProcessBlock(
byte[] input,
int offset,
int length, long streamPosition)
{
if (input.Length < offset+length)
throw new ArgumentException("input does not match offset and length");
var blockSize = _cipher.GetBlockSize();
var startingBlock = streamPosition / blockSize;
var blockOffset = (int)(streamPosition - (startingBlock * blockSize));
while (currentBlockCount < streamPosition / blockSize)
{
IncrementCounter();
}
//process the left over from current block
if (blockOffset !=0)
{
var blockLength = blockSize - blockOffset;
blockLength = blockLength > length ? length : blockLength;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[blockOffset + i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
blockOffset = 0;
if (length > 0)
{
IncrementCounter();
}
}
//need to loop though the rest of the data and increament counter when needed
while (length > 0)
{
var blockLength = blockSize > length ? length : blockSize;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
if (length > 0)
{
IncrementCounter();
}
}
}
public override int Read(byte[] buffer, int offset, int count)
{
var pos = _streamToWrap.Position;
var result = _streamToWrap.Read(buffer, offset, count);
ProcessBlock(buffer, offset, result, pos);
return result;
}
public override void Write(byte[] buffer, int offset, int count)
{
var input = new byte[count];
Array.Copy(buffer, offset, input, 0, count);
ProcessBlock(input, 0, count, _streamToWrap.Position);
_streamToWrap.Write(input, offset, count);
}
public override void Flush()
{
_streamToWrap.Flush();
}
public override void SetLength(long value)
{
_streamToWrap.SetLength(value);
}
public override bool CanRead
{
get { return _streamToWrap.CanRead; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return _streamToWrap.CanWrite; }
}
public override long Length
{
get { return _streamToWrap.Length; }
}
protected override void Dispose(bool disposing)
{
if (_streamToWrap != null)
{
_streamToWrap.Dispose();
}
base.Dispose(disposing);
}
}
}

How do I hash first N bytes of a file?

Using .net, I would like to be able to hash the first N bytes of potentially large files, but I can't seem to find a way of doing it.
The ComputeHash function (I'm using SHA1) takes a byte array or a stream, but a stream seems like the best way of doing it, since I would prefer not to load a potentially large file into memory.
To be clear: I don't want to load a potentially large piece of data into memory if I can help it. If the file is 2GB and I want to hash the first 1GB, that's a lot of RAM!
You can hash large volumes of data using a CryptoStream - something like this should work:
var sha1 = SHA1Managed.Create();
FileStream fs = \\whatever
using (var cs = new CryptoStream(fs, sha1, CryptoStreamMode.Read))
{
byte[] buf = new byte[16];
int bytesRead = cs.Read(buf, 0, buf.Length);
long totalBytesRead = bytesRead;
while (bytesRead > 0 && totalBytesRead <= maxBytesToHash)
{
bytesRead = cs.Read(buf, 0, buf.Length);
totalBytesRead += bytesRead;
}
}
byte[] hash = sha1.Hash;
fileStream.Read(array, 0, N);
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
Open the file as a FileStream, copy the first n bytes into a MemoryStream, then hash the MemoryStream.
As others have pointed out, you should read the first few bytes into an array.
What should also be noted that you don't want to make a direct call to Read and assume that the bytes have been read.
Rather, you want to make sure that the number of bytes that are returned are the number of bytes that you requested, and make another call to Read in the event that the number of bytes returned doesn't equal the initial number requested.
Also, if you have rather large streams, you will want to create a proxy for the Stream class where you pass it the underlying stream (the FileStream in this case) and override the Read method to forward the call to the underlying stream until you read the number of bytes that you need to read. Then, when that number of bytes is returned, you would return -1 to indicate that there are no more bytes to be read.
If you are concerned about keeping too much data in memory, you can create a stream wrapper that throttles the maximum number of bytes read.
Without doing all the work, here's a sample boiler plate you could use to get started.
Edit: Please review comments for recommendations to improve this implementation. End edit
public class LimitedStream : Stream
{
private int current = 0;
private int limit;
private Stream stream;
public LimitedStream(Stream stream, int n)
{
this.limit = n;
this.stream = stream;
}
public override int ReadByte()
{
if (current >= limit)
return -1;
var numread = base.ReadByte();
if (numread >= 0)
current++;
return numread;
}
public override int Read(byte[] buffer, int offset, int count)
{
count = Math.Min(count, limit - current);
var numread = this.stream.Read(buffer, offset, count);
current += numread;
return numread;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return false; }
}
public override void Flush()
{
throw new NotImplementedException();
}
public override long Length
{
get { throw new NotImplementedException(); }
}
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
protected override void Dispose(bool disposing)
{
base.Dispose(disposing);
if (this.stream != null)
{
this.stream.Dispose();
}
}
}
Here is an example of the stream in use, wrapping a file stream, but throttling the number of bytes read to the specified limit:
using (var stream = new LimitedStream(File.OpenRead(#".\test.xml"), 100))
{
var bytes = new byte[1024];
stream.Read(bytes, 0, bytes.Length);
}

Categories