I have DataWriter that has file storage stream already attached. how ever in particular case I want to first write data in memory so I can know size of bytes, and store size with data in writer.
How can I do that without creating two in memory buffers?
DataWriter writer; // writer is parameter passed from somewhere else.
using (var inMemory = new InMemoryRandomAccessStream())
{
// fill inMemory with data.
// ***Here*** How can I avoid this?
var buffer = new byte[checked((int)inMemory.Position)].AsBuffer();
inMemory.Seek(0);
await inMemory.ReadAsync(buffer, buffer.Length, InputStreamOptions.ReadAhead);
writer.WriteUInt32(buffer.Length); // write size
writer.WriteBuffer(buffer); // write data
}
As you can see I'm using two buffers, one is for memory stream, the other is ibuffer.
I don't know how to directly write inMemory contents into DataWriter which has filestorage stream already attached.
I had to write my own buffer stream in order to prevent duplicate buffer creation. though stream buffer internally works like list but it has benefits when list grows large.
internal sealed class BufferStream : IDisposable
{
private byte[] _array = Array.Empty<byte>();
private int _index = -1;
private const int MaxArrayLength = 0X7FEFFFFF;
public int Capacity => _array.Length;
public int Length => _index + 1;
public void WriteIntoDataWriterStreamAsync(IDataWriter writer)
{
// AsBuffer wont cause copy, its just wrapper around array.
if(_index >= 0) writer.WriteBuffer(_array.AsBuffer(0, _index));
}
public void WriteBuffer(IBuffer buffer)
{
EnsureSize(checked((int) buffer.Length));
for (uint i = 0; i < buffer.Length; i++)
{
_array[++_index] = buffer.GetByte(i);
}
}
public void Flush()
{
Array.Clear(_array, 0, _index);
_index = -1;
}
// list like resizing.
private void EnsureSize(int additionSize)
{
var min = additionSize + _index;
if (_array.Length <= min)
{
var newsize = (int) Math.Min((uint) _array.Length * 2, MaxArrayLength);
if (newsize <= min) newsize = min + 1;
Array.Resize(ref _array, newsize);
}
}
public void Dispose()
{
_array = null;
}
}
Then I can easily do this.
using (var buffer = new BufferStream())
{
// fill buffer
writer.WriteInt32(buffer.Length); // write size
buffer.WriteIntoDataWriterStream(writer); // write data
}
Related
I am trying to read objects from very large files containing padded structs that were written into it by a C++ process. I was using an example to memory map the large file and try to deserialize the data into an object but I now can see that it won't work this way.
How can I extract all the objects from the files to use in C#? I'm probably way off but I've provided the code. The objects have a 8 byte milliseconds member followed by 21 16bit integers, which needs 6bytes of padding to align to a 8byte boundary.
[Serializable]
unsafe public struct DataStruct
{
public UInt64 milliseconds;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 21)]
public fixed Int16 data[21];
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public fixed Int16 padding[3];
};
[Serializable]
public class DataArray
{
public DataStruct[] samples;
}
public static class Helper
{
public static Int16[] GetData(this DataStruct data)
{
unsafe
{
Int16[] output = new Int16[21];
for (int index = 0; index < 21; ++index)
output[index] = data.data[index];
return output;
}
}
}
class FileThreadSupport
{
struct DataFileInfo
{
public string path;
public UInt64 start;
public UInt64 stop;
public UInt64 elements;
};
// Create our epoch timestamp
private static readonly DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
// Output TCP client
private Support.AsyncTcpClient output;
// Directory which contains our data
private string replay_directory;
// Files to be read from
private DataFileInfo[] file_infos;
// Current timestamp of when the process was started
UInt64 process_start = 0;
// Object from current file
DataArray current_file_data;
// Offset into current files
UInt64 current_file_index = 0;
// Offset into current files
UInt64 current_file_offset = 0;
// Run flag
bool run = true;
public FileThreadSupport(ref Support.AsyncTcpClient output, ref Engine.A.Information info, ref Support.Configuration configuration)
{
// Set our output directory
replay_directory = configuration.getString("replay_directory");
if (replay_directory.Length == 0)
{
Console.WriteLine("Configuration does not provide a replay directory");
return;
}
// Check the directory for playable files
if(!loadDataDirectory(replay_directory))
{
Console.WriteLine("Replay directory {} did not have any valid files", replay_directory);
}
// Set the output TCP client
this.output = output;
}
private bool loadDataDirectory(string directory)
{
string[] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
file_infos = new DataFileInfo[files.Length];
int index = 0;
foreach (string file in files)
{
string[] parts = file.Split('\\');
string name = parts.Last();
parts = name.Split('.');
if (parts.Length != 2)
continue;
UInt64 start, stop = 0;
if (!UInt64.TryParse(parts[0], out start) || !UInt64.TryParse(parts[1], out stop))
continue;
long size = new System.IO.FileInfo(file).Length;
// Add to our file info array
file_infos[index] = new DataFileInfo
{
path = file,
start = start,
stop = stop,
elements = (ulong)(new System.IO.FileInfo(file).Length / 56
/*System.Runtime.InteropServices.Marshal.SizeOf(typeof(DataStruct))*/)
};
++index;
}
// Sort the array
Array.Sort(file_infos, delegate (DataFileInfo x, DataFileInfo y) { return x.start.CompareTo(y.start); });
// Return whether or not there were files found
return (files.Length > 0);
}
public void start()
{
process_start = (ulong)DateTime.Now.ToUniversalTime().Subtract(epoch).TotalMilliseconds;
UInt64 num_samples = 0;
while(run)
{
// Get our samples and add it to the sample
DataStruct[] result = getData(100);
Engine.A.A message = new Engine.A.A();
for (int i = 0; i < result.Length; ++i)
{
Engine.A.Data sample = new Engine.A.Data();
sample.Time = process_start + num_samples * 4;
Int16[] signal_data = Helper.GetData(result[i]);
for(int e = 0; e < signal_data.Length; ++e)
sample.Value[e] = signal_data[e];
message.Signal.Add(sample);
++num_samples;
}
// Send out the websocket
this.output.SendAsync(message.ToByteArray());
// Sleep 100 milliseconds
Thread.Sleep(100);
}
}
public void stop()
{
run = false;
}
private DataStruct[] getData(UInt64 milliseconds)
{
if (file_infos.Length == 0)
return new DataStruct[0];
if (current_file_data == null)
{
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if(current_file_data.samples.Length == 0)
return new DataStruct[0];
}
UInt64 elements_to_read = (UInt64) milliseconds / 4;
DataStruct[] result = new DataStruct[elements_to_read];
Array.Copy(current_file_data.samples, (int)current_file_offset, result, 0, (int) Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
while((UInt64) result.Length != elements_to_read)
{
current_file_index = (current_file_index + 1) % (ulong) file_infos.Length;
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if (current_file_data.samples.Length == 0)
return new DataStruct[0];
current_file_offset = 0;
Array.Copy(current_file_data.samples, (int)current_file_offset, result, result.Length, (int)Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
}
return result;
}
private object ByteArrayToObject(byte[] buffer)
{
BinaryFormatter binaryFormatter = new BinaryFormatter(); // Create new BinaryFormatter
MemoryStream memoryStream = new MemoryStream(buffer); // Convert buffer to memorystream
return binaryFormatter.Deserialize(memoryStream); // Deserialize stream to an object
}
private object ReadObjectFromMMF(string file)
{
// Get a handle to an existing memory mapped file
using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(file, FileMode.Open))
{
// Create a view accessor from which to read the data
using (MemoryMappedViewAccessor mmfReader = mmf.CreateViewAccessor())
{
// Create a data buffer and read entire MMF view into buffer
byte[] buffer = new byte[mmfReader.Capacity];
mmfReader.ReadArray<byte>(0, buffer, 0, buffer.Length);
// Convert the buffer to a .NET object
return ByteArrayToObject(buffer);
}
}
}
Well for one thing you're not using that memory mapped file well at all, you're just sequentially reading it all in a buffer, which is both needlessly inefficient and much slower than if you simply opened the file to read normally. The selling point of memory mapped files is repeated random access and random updates backed by the OS's virtual memory paging.
And you definitely don't need to read the entire file in memory, since your data is so strongly structured. You know exactly how many bytes to read for a record: Marshal.SizeOf<DataStruct>().
Then you need to get rid of all that serialization noise. Again your data is strongly typed, just read it. Get rid of those fixed arrays and use regular arrays, you're already instructing the marshaller how to read them with MarshalAs attributes (good). That also gets rid of that helper function that just copies an array for some unknown reason.
Your reading loop is very simple: read the correct number of bytes for one entry, use Marshal.PtrToStructure to convert it to a readable structure and add it to a list to return at the end. Bonus points if you can use .Net Core and Unsafe.As or Unsafe.Cast.
Edit: and don't use object returns, you know exactly what you're returning, write it down.
protobuf-net cannot serialize the following class because serializing objects of type Stream is not supported:
[ProtoContract]
class StreamObject
{
[ProtoMember(1)]
public Stream StreamProperty { get; set; }
}
I know I can work around this by using a serialized property of type byte[] and reading the stream into that property, as in this question. But that requires the entire byte[] to be loaded into memory, which, if the stream is long, can quickly exhaust system resources.
Is there a way to serialize a stream as an array of bytes in protobuf-net without loading the entire sequence of bytes into memory?
The basic difficulty here isn't protobuf-net, it's the V2 protocol buffer format. There are two ways a repeated element (e.g. a byte array or stream) can be encoded:
As a packed repeated element. Here all of the elements of the field are packed into a single key-value pair with wire type 2 (length-delimited). Each element is encoded the same way it would be normally, except without a tag preceding it.
protobuf-net automatically encodes byte arrays in this format, however doing so requires knowing the total number of bytes in advance. For a byte stream, this might require loading the entire stream into memory (e.g. when StreamProperty.CanSeek == false), which violates your requirements.
As a repeated element. Here the encoded message has zero or more key-value pairs with the same tag number.
For a byte stream, using this format would cause massive bloat in the encoded message as each byte would require an an additional integer key.
As you can see, neither default representation meets your needs. Instead, it makes sense to encode a large byte stream as a sequence of "fairly large" chunks, where each chunk is packed, but the overall sequence is not.
The following version of StreamObject does this:
[ProtoContract]
class StreamObject
{
public StreamObject() : this(new MemoryStream()) { }
public StreamObject(Stream stream)
{
if (stream == null)
throw new ArgumentNullException();
this.StreamProperty = stream;
}
[ProtoIgnore]
public Stream StreamProperty { get; set; }
internal static event EventHandler OnDataReadBegin;
internal static event EventHandler OnDataReadEnd;
const int ChunkSize = 4096;
[ProtoMember(1, IsPacked = false, OverwriteList = true)]
IEnumerable<ByteBuffer> Data
{
get
{
if (OnDataReadBegin != null)
OnDataReadBegin(this, new EventArgs());
while (true)
{
byte[] buffer = new byte[ChunkSize];
int read = StreamProperty.Read(buffer, 0, buffer.Length);
if (read <= 0)
{
break;
}
else if (read == buffer.Length)
{
yield return new ByteBuffer { Data = buffer };
}
else
{
Array.Resize(ref buffer, read);
yield return new ByteBuffer { Data = buffer };
break;
}
}
if (OnDataReadEnd != null)
OnDataReadEnd(this, new EventArgs());
}
set
{
if (value == null)
return;
foreach (var buffer in value)
StreamProperty.Write(buffer.Data, 0, buffer.Data.Length);
}
}
}
[ProtoContract]
struct ByteBuffer
{
[ProtoMember(1, IsPacked = true)]
public byte[] Data { get; set; }
}
Notice the OnDataReadBegin and OnDataReadEnd events? I added then in for debugging purposes, to enable checking that the input stream is actually getting streamed into the output protobuf stream. The following test class does this:
internal class TestClass
{
public void Test()
{
var writeStream = new MemoryStream();
long beginLength = 0;
long endLength = 0;
EventHandler begin = (o, e) => { beginLength = writeStream.Length; Console.WriteLine(string.Format("Begin serialization of Data, writeStream.Length = {0}", writeStream.Length)); };
EventHandler end = (o, e) => { endLength = writeStream.Length; Console.WriteLine(string.Format("End serialization of Data, writeStream.Length = {0}", writeStream.Length)); };
StreamObject.OnDataReadBegin += begin;
StreamObject.OnDataReadEnd += end;
try
{
int length = 1000000;
var inputStream = new MemoryStream();
for (int i = 0; i < length; i++)
{
inputStream.WriteByte(unchecked((byte)i));
}
inputStream.Position = 0;
var streamObject = new StreamObject(inputStream);
Serializer.Serialize(writeStream, streamObject);
var data = writeStream.ToArray();
StreamObject newStreamObject;
using (var s = new MemoryStream(data))
{
newStreamObject = Serializer.Deserialize<StreamObject>(s);
}
if (beginLength >= endLength)
{
throw new InvalidOperationException("inputStream was completely buffered before writing to writeStream");
}
inputStream.Position = 0;
newStreamObject.StreamProperty.Position = 0;
if (!inputStream.AsEnumerable().SequenceEqual(newStreamObject.StreamProperty.AsEnumerable()))
{
throw new InvalidOperationException("!inputStream.AsEnumerable().SequenceEqual(newStreamObject.StreamProperty.AsEnumerable())");
}
else
{
Console.WriteLine("Streams identical.");
}
}
finally
{
StreamObject.OnDataReadBegin -= begin;
StreamObject.OnDataReadEnd -= end;
}
}
}
public static class StreamExtensions
{
public static IEnumerable<byte> AsEnumerable(this Stream stream)
{
if (stream == null)
throw new ArgumentNullException();
int b;
while ((b = stream.ReadByte()) != -1)
yield return checked((byte)b);
}
}
And the output of the above is:
Begin serialization of Data, writeStream.Length = 0
End serialization of Data, writeStream.Length = 1000888
Streams identical.
Which indicates that the input stream is indeed streamed to the output without being fully loaded into memory at once.
Prototype fiddle.
Is there a mechanism available to write out a packed repeated element incrementally with bytes from a stream, knowing the length in advance?
It appears not. Assuming you have a stream for which CanSeek == true, you could encapsulate it in an IList<byte> that enumerates through the bytes in the stream, provides random access to bytes in the stream, and returns the stream length in IList.Count. There is a sample fiddle here showing such an attempt. Unfortunately, however, ListDecorator.Write() simply enumerates the list and buffers its encoded contents before writing them to the output stream, which causes the input stream to be loaded completely into memory. I think this happens because protobuf-net encodes a List<byte> differently from a byte [], namely as a length-delimited sequence of Base 128 Varints. Since the Varint representation of a byte sometimes requires more than one byte, the length cannot be computed in advance from the list count. See this answer for additional details on the difference in how byte arrays and lists are encoded. It should be possible to implement encoding of an IList<byte> in the same way as a byte [] -- it just isn't currently available.
I have a function that returns database query results. These results have got very large, and I now would like to pass them as a stream, so that the client can start to process them quicker, and memory usage is less. But I don't really know how to do this, the following function works, but what I want to know how to change it so that it starts to stream upon reading from the first table.
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Position = 0;
return stream;
}
You could write a custom Stream implementation which functions as a pipe. If you then moved your GetItemsFromTable() method calls into a background task, the client could start reading results from the stream immediately.
In my solution below I'm using a circular buffer as a backing store for the pipe stream. Memory usage will be reduced only if the client consumes data fast enough. But even in the worst case scenario it shouldn't use more memory then your current solution. If memory usage is a bigger priority for you than execution speed then your stream could potentially block write calls until space is available. My solution below does not block writes; it expands the capacity of the circular buffer so that the background thread can continue filling data without delays.
The GetResults method might look like this:
public Stream GetResults()
{
// Begin filling the pipe with data on a background thread
var pipeStream = new CircularBufferPipeStream();
Task.Run(() => WriteResults(pipeStream));
// Return pipe stream for immediate usage by client
// Note: client is responsible for disposing of the stream after reading all data!
return pipeStream;
}
// Runs on background thread, filling circular buffer with data
void WriteResults(CircularBufferPipeStream stream)
{
IFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
// Indicate that there's no more data to write
stream.CloseWritePort();
}
And the circular buffer stream:
/// <summary>
/// Stream that acts as a pipe by supporting reading and writing simultaneously from different threads.
/// Read calls will block until data is available or the CloseWritePort() method has been called.
/// Read calls consume bytes in the circular buffer immediately so that more space is available for writes into the circular buffer.
/// Writes do not block; the capacity of the circular buffer will be expanded as needed to write the entire block of data at once.
/// </summary>
class CircularBufferPipeStream : Stream
{
const int DefaultCapacity = 1024;
byte[] _buffer;
bool _writePortClosed = false;
object _readWriteSyncRoot = new object();
int _length;
ManualResetEvent _dataAddedEvent;
int _start = 0;
public CircularBufferPipeStream(int initialCapacity = DefaultCapacity)
{
_buffer = new byte[initialCapacity];
_length = 0;
_dataAddedEvent = new ManualResetEvent(false);
}
public void CloseWritePort()
{
lock (_readWriteSyncRoot)
{
_writePortClosed = true;
_dataAddedEvent.Set();
}
}
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return true; } }
public override bool CanSeek { get { return false; } }
public override void Flush() { }
public override long Length { get { throw new NotImplementedException(); } }
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); }
public override void SetLength(long value) { throw new NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
int bytesRead = 0;
while (bytesRead == 0)
{
bool waitForData = false;
lock (_readWriteSyncRoot)
{
if (_length != 0)
bytesRead = ReadDirect(buffer, offset, count);
else if (_writePortClosed)
break;
else
{
_dataAddedEvent.Reset();
waitForData = true;
}
}
if (waitForData)
_dataAddedEvent.WaitOne();
}
return bytesRead;
}
private int ReadDirect(byte[] buffer, int offset, int count)
{
int readTailCount = Math.Min(Math.Min(_buffer.Length - _start, count), _length);
Array.Copy(_buffer, _start, buffer, offset, readTailCount);
_start += readTailCount;
_length -= readTailCount;
if (_start == _buffer.Length)
_start = 0;
int readHeadCount = Math.Min(Math.Min(_buffer.Length - _start, count - readTailCount), _length);
if (readHeadCount > 0)
{
Array.Copy(_buffer, _start, buffer, offset + readTailCount, readHeadCount);
_start += readHeadCount;
_length -= readHeadCount;
}
return readTailCount + readHeadCount;
}
public override void Write(byte[] buffer, int offset, int count)
{
lock (_readWriteSyncRoot)
{
// expand capacity as needed
if (count + _length > _buffer.Length)
{
var expandedBuffer = new byte[Math.Max(_buffer.Length * 2, count + _length)];
_length = ReadDirect(expandedBuffer, 0, _length);
_start = 0;
_buffer = expandedBuffer;
}
int startWrite = (_start + _length) % _buffer.Length;
int writeTailCount = Math.Min(_buffer.Length - startWrite, count);
Array.Copy(buffer, offset, _buffer, startWrite, writeTailCount);
startWrite += writeTailCount;
_length += writeTailCount;
if (startWrite == _buffer.Length)
startWrite = 0;
int writeHeadCount = count - writeTailCount;
if (writeHeadCount > 0)
{
Array.Copy(buffer, offset + writeTailCount, _buffer, startWrite, writeHeadCount);
_length += writeHeadCount;
}
}
_dataAddedEvent.Set();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_dataAddedEvent != null)
{
_dataAddedEvent.Dispose();
_dataAddedEvent = null;
}
}
base.Dispose(disposing);
}
}
try
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Seek(0L, SeekOrigin.Begin);
return stream;
}
why the changes?
remove using, because your stream gets disposed once it leaves the using-block. disposing the stream means you cannot use it anymore
seek to the beginning of the stream. if you start reading from the stream without seeking to its beginning, you would start to deserialize/ read from its end; but unfortunately there is no content behind the end of the stream
however, I don't see how using a MemoryStream reduces memory usage. I would suggest chaining it into a DeflateStream or a FileStream to reduce RAM-usage
hope this helps
Basically i need to encrypt a file and then be able to decrypt the file from almost any point in the file. The reason i need this is i would like to use this for files like Video etc and still be able to jump though the file or video. Also the file would be served over the web so not needing to download the whole file is important. Where i am storing the file supports partial downloads so i can request any part of the file i need and this works for an un encrypted file. The question is how could i make this work for an encrypted file. I need to encrypt and decrypt the file in C# but don't really have any other restrictions than that. Symmetric keys are preferred but if that wont work it is not a deal breaker.
Another example of where i only want to download part of a file and decrypt is where i have joined multiple files together but just need to retrieve one of them. This would generally be used for files smaller than 50MB like pictures and info files.
--- EDIT ---
To be clear i am looking for a working implementation or library that does not increase the size of the source file. Stream cipher seems ideal but i have not seen one in c# that works for any point in the stream or anything apart from the start of the stream. Would consider block based implementation if it works form set blocks in stream. Basically i want to pass a raw stream though this and have unencrypted come out other side of the stream. Happy to set the starting offset it represents in the whole file/stream. Looking for something than works as i am not encryption expert. At the minute i get the parts of the file from a data source in 512kb to 5mb blocks depending on client config and i use streams CopyTo method to write it out to a file on disk. I don't get these parts in order. I am looking for a stream wrapper that i could use to pass into the CopyTo method on stream.
Your best best is probably to treat the file as a list of chunks (of whatever size is convenient for your application; let's say 50 kB) and encrypt each separately. This would allow you to decrypt each chunk independently of the others.
For each chunk, derive new keys from your master key, generate a new IV, and encrypt-then-MAC the chunk.
This method has higher storage overhead than encrypting the entire file at once and takes a bit more computation as well due to the key regeneration that it requires.
If you use a stream cipher instead of a block cipher, you'd be able to start decrypting at any byte offset, as long as the decryptor was able to get the current IV from somewhere.
For those interested i managed to work it out based on a number of examples i found plus some of my own code. It uses bouncycastle but should also work with dotnet AES with a few tweaks. This allows the decryption/encryption from any point in the stream.
using System;
using System.IO;
using Org.BouncyCastle.Crypto;
using Org.BouncyCastle.Crypto.Parameters;
namespace StreamHelpers
{
public class StreamEncryptDecrypt : Stream
{
private readonly Stream _streamToWrap;
private readonly IBlockCipher _cipher;
private readonly ICipherParameters _key;
private readonly byte[] _iv;
private readonly byte[] _counter;
private readonly byte[] _counterOut;
private readonly byte[] _output;
private long currentBlockCount;
public StreamEncryptDecrypt(Stream streamToWrap, IBlockCipher cipher, ParametersWithIV keyAndIv)
{
_streamToWrap = streamToWrap;
_cipher = cipher;
_key = keyAndIv.Parameters;
_cipher.Init(true, _key);
_iv = keyAndIv.GetIV();
_counter = new byte[_cipher.GetBlockSize()];
_counterOut = new byte[_cipher.GetBlockSize()];
_output = new byte[_cipher.GetBlockSize()];
if (_iv.Length != _cipher.GetBlockSize())
{
throw new Exception("IV must be the same size as the cipher block size");
}
InitCipher();
}
private void InitCipher()
{
long position = _streamToWrap.Position;
Array.Copy(_iv, 0, _counter, 0, _counter.Length);
currentBlockCount = 0;
var targetBlock = position/_cipher.GetBlockSize();
while (currentBlockCount < targetBlock)
{
IncrementCounter(false);
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
private void IncrementCounter(bool updateCounterOut = true)
{
currentBlockCount++;
// Increment the counter
int j = _counter.Length;
while (--j >= 0 && ++_counter[j] == 0)
{
}
_cipher.ProcessBlock(_counter, 0, _counterOut, 0);
}
public override long Position
{
get { return _streamToWrap.Position; }
set
{
_streamToWrap.Position = value;
InitCipher();
}
}
public override long Seek(long offset, SeekOrigin origin)
{
var result = _streamToWrap.Seek(offset, origin);
InitCipher();
return result;
}
public void ProcessBlock(
byte[] input,
int offset,
int length, long streamPosition)
{
if (input.Length < offset+length)
throw new ArgumentException("input does not match offset and length");
var blockSize = _cipher.GetBlockSize();
var startingBlock = streamPosition / blockSize;
var blockOffset = (int)(streamPosition - (startingBlock * blockSize));
while (currentBlockCount < streamPosition / blockSize)
{
IncrementCounter();
}
//process the left over from current block
if (blockOffset !=0)
{
var blockLength = blockSize - blockOffset;
blockLength = blockLength > length ? length : blockLength;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[blockOffset + i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
blockOffset = 0;
if (length > 0)
{
IncrementCounter();
}
}
//need to loop though the rest of the data and increament counter when needed
while (length > 0)
{
var blockLength = blockSize > length ? length : blockSize;
//
// XOR the counterOut with the plaintext producing the cipher text
//
for (int i = 0; i < blockLength; i++)
{
input[offset + i] = (byte)(_counterOut[i] ^ input[offset + i]);
}
offset += blockLength;
length -= blockLength;
if (length > 0)
{
IncrementCounter();
}
}
}
public override int Read(byte[] buffer, int offset, int count)
{
var pos = _streamToWrap.Position;
var result = _streamToWrap.Read(buffer, offset, count);
ProcessBlock(buffer, offset, result, pos);
return result;
}
public override void Write(byte[] buffer, int offset, int count)
{
var input = new byte[count];
Array.Copy(buffer, offset, input, 0, count);
ProcessBlock(input, 0, count, _streamToWrap.Position);
_streamToWrap.Write(input, offset, count);
}
public override void Flush()
{
_streamToWrap.Flush();
}
public override void SetLength(long value)
{
_streamToWrap.SetLength(value);
}
public override bool CanRead
{
get { return _streamToWrap.CanRead; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return _streamToWrap.CanWrite; }
}
public override long Length
{
get { return _streamToWrap.Length; }
}
protected override void Dispose(bool disposing)
{
if (_streamToWrap != null)
{
_streamToWrap.Dispose();
}
base.Dispose(disposing);
}
}
}
I'm trying to split a byte stream into chunks of increasing size.
The source stream contains an unknown number of bytes and is expensive to read. The output of the enumerator should be byte arrays of increasing size, starting at 8KB up to 1MB.
This is very simple to do by simply reading the whole stream, storing it in an array and taking the relevant pieces out. However, since the stream may be very large, reading it at once is unfeasible. Also, while performance is not the main concern, it is important to keep system load very low.
While implementing this I noticed that it's relatively difficult to keep the code short and maintainable. There are a few stream related issues to keep in mind, too (for instance, Stream.Read might not fill the buffer even though it succeeded).
I did not find any existing classes that help for my case, nor could I find something close on the net. How would you implement such a class?
public IEnumerable<BufferWrapper> getBytes(Stream stream)
{
List<int> bufferSizes = new List<int>() { 8192, 65536, 220160, 1048576 };
int count = 0;
int bufferSizePostion = 0;
byte[] buffer = new byte[bufferSizes[0]];
bool done = false;
while (!done)
{
BufferWrapper nextResult = new BufferWrapper();
nextResult.bytesRead = stream.Read(buffer, 0, buffer.Length);
nextResult.buffer = buffer;
done = nextResult.bytesRead == 0;
if (!done)
{
yield return nextResult;
count++;
if (count > 10 && bufferSizePostion < bufferSizes.Count)
{
count = 0;
bufferSizePostion++;
buffer = new byte[bufferSizes[bufferSizePostion]];
}
}
}
}
public class BufferWrapper
{
public byte[] buffer { get; set; }
public int bytesRead { get; set; }
}
Obviously the logic for when to move up in buffer size, and how to choose what that size is could be altered.
Someone could also probably find a better way of handling the last buffer to be sent, as this isn't the most efficient way.
For reference, the implementation I currently use, already with improvements as per the answer by #Servy
private const int InitialBlockSize = 8 * 1024;
private const int MaximumBlockSize = 1024 * 1024;
private Stream _Stream;
private int _Size = InitialBlockSize;
public byte[] Current
{
get;
private set;
}
public bool MoveNext ()
{
if (_Size < 0) {
return false;
}
var buf = new byte[_Size];
int count = 0;
while (count < _Size) {
int read = _Stream.Read (buf, count, _Size - count);
if (read == 0) {
break;
}
count += read;
}
if (count == _Size) {
Current = buf;
if (_Size <= MaximumBlockSize / 2) {
_Size *= 2;
}
}
else {
Current = new byte[count];
Array.Copy (buf, Current, count);
_Size = -1;
}
return true;
}