Returning database results as a stream - c#

I have a function that returns database query results. These results have got very large, and I now would like to pass them as a stream, so that the client can start to process them quicker, and memory usage is less. But I don't really know how to do this, the following function works, but what I want to know how to change it so that it starts to stream upon reading from the first table.
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Position = 0;
return stream;
}

You could write a custom Stream implementation which functions as a pipe. If you then moved your GetItemsFromTable() method calls into a background task, the client could start reading results from the stream immediately.
In my solution below I'm using a circular buffer as a backing store for the pipe stream. Memory usage will be reduced only if the client consumes data fast enough. But even in the worst case scenario it shouldn't use more memory then your current solution. If memory usage is a bigger priority for you than execution speed then your stream could potentially block write calls until space is available. My solution below does not block writes; it expands the capacity of the circular buffer so that the background thread can continue filling data without delays.
The GetResults method might look like this:
public Stream GetResults()
{
// Begin filling the pipe with data on a background thread
var pipeStream = new CircularBufferPipeStream();
Task.Run(() => WriteResults(pipeStream));
// Return pipe stream for immediate usage by client
// Note: client is responsible for disposing of the stream after reading all data!
return pipeStream;
}
// Runs on background thread, filling circular buffer with data
void WriteResults(CircularBufferPipeStream stream)
{
IFormatter formatter = new BinaryFormatter();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
// Indicate that there's no more data to write
stream.CloseWritePort();
}
And the circular buffer stream:
/// <summary>
/// Stream that acts as a pipe by supporting reading and writing simultaneously from different threads.
/// Read calls will block until data is available or the CloseWritePort() method has been called.
/// Read calls consume bytes in the circular buffer immediately so that more space is available for writes into the circular buffer.
/// Writes do not block; the capacity of the circular buffer will be expanded as needed to write the entire block of data at once.
/// </summary>
class CircularBufferPipeStream : Stream
{
const int DefaultCapacity = 1024;
byte[] _buffer;
bool _writePortClosed = false;
object _readWriteSyncRoot = new object();
int _length;
ManualResetEvent _dataAddedEvent;
int _start = 0;
public CircularBufferPipeStream(int initialCapacity = DefaultCapacity)
{
_buffer = new byte[initialCapacity];
_length = 0;
_dataAddedEvent = new ManualResetEvent(false);
}
public void CloseWritePort()
{
lock (_readWriteSyncRoot)
{
_writePortClosed = true;
_dataAddedEvent.Set();
}
}
public override bool CanRead { get { return true; } }
public override bool CanWrite { get { return true; } }
public override bool CanSeek { get { return false; } }
public override void Flush() { }
public override long Length { get { throw new NotImplementedException(); } }
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); }
public override void SetLength(long value) { throw new NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
int bytesRead = 0;
while (bytesRead == 0)
{
bool waitForData = false;
lock (_readWriteSyncRoot)
{
if (_length != 0)
bytesRead = ReadDirect(buffer, offset, count);
else if (_writePortClosed)
break;
else
{
_dataAddedEvent.Reset();
waitForData = true;
}
}
if (waitForData)
_dataAddedEvent.WaitOne();
}
return bytesRead;
}
private int ReadDirect(byte[] buffer, int offset, int count)
{
int readTailCount = Math.Min(Math.Min(_buffer.Length - _start, count), _length);
Array.Copy(_buffer, _start, buffer, offset, readTailCount);
_start += readTailCount;
_length -= readTailCount;
if (_start == _buffer.Length)
_start = 0;
int readHeadCount = Math.Min(Math.Min(_buffer.Length - _start, count - readTailCount), _length);
if (readHeadCount > 0)
{
Array.Copy(_buffer, _start, buffer, offset + readTailCount, readHeadCount);
_start += readHeadCount;
_length -= readHeadCount;
}
return readTailCount + readHeadCount;
}
public override void Write(byte[] buffer, int offset, int count)
{
lock (_readWriteSyncRoot)
{
// expand capacity as needed
if (count + _length > _buffer.Length)
{
var expandedBuffer = new byte[Math.Max(_buffer.Length * 2, count + _length)];
_length = ReadDirect(expandedBuffer, 0, _length);
_start = 0;
_buffer = expandedBuffer;
}
int startWrite = (_start + _length) % _buffer.Length;
int writeTailCount = Math.Min(_buffer.Length - startWrite, count);
Array.Copy(buffer, offset, _buffer, startWrite, writeTailCount);
startWrite += writeTailCount;
_length += writeTailCount;
if (startWrite == _buffer.Length)
startWrite = 0;
int writeHeadCount = count - writeTailCount;
if (writeHeadCount > 0)
{
Array.Copy(buffer, offset + writeTailCount, _buffer, startWrite, writeHeadCount);
_length += writeHeadCount;
}
}
_dataAddedEvent.Set();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_dataAddedEvent != null)
{
_dataAddedEvent.Dispose();
_dataAddedEvent = null;
}
}
base.Dispose(disposing);
}
}

try
public Stream GetResults()
{
IFormatter formatter = new BinaryFormatter();
Stream stream = new MemoryStream();
formatter.Serialize(stream, GetItemsFromTable1());
formatter.Serialize(stream, GetItemsFromTable2());
formatter.Serialize(stream, GetItemsFromTable3());
formatter.Serialize(stream, GetItemsFromTable4());
stream.Seek(0L, SeekOrigin.Begin);
return stream;
}
why the changes?
remove using, because your stream gets disposed once it leaves the using-block. disposing the stream means you cannot use it anymore
seek to the beginning of the stream. if you start reading from the stream without seeking to its beginning, you would start to deserialize/ read from its end; but unfortunately there is no content behind the end of the stream
however, I don't see how using a MemoryStream reduces memory usage. I would suggest chaining it into a DeflateStream or a FileStream to reduce RAM-usage
hope this helps

Related

ByteStream source in SharpDX.MediaFoundation hanging

I have been trying to get a custom audio stream to work with SharpDX.MediaFoundation.
To this end I have wrapped my audio object in a class that implements System.IO.Stream as follows:
public class AudioReaderWaveStream : System.IO.Stream
{
byte[] waveHeader = new byte[44];
AudioCore.IAudioReader reader = null;
ulong readHandle = 0xffffffff;
long readPosition = 0;
public AudioReaderWaveStream(AudioCore.CEditedAudio content)
{
reader = content as AudioCore.IAudioReader;
readHandle = reader.OpenDevice();
int sampleRate = 0;
short channels = 0;
content.GetFormat(out sampleRate, out channels);
System.IO.MemoryStream memStream = new System.IO.MemoryStream(waveHeader);
using (System.IO.BinaryWriter bw = new System.IO.BinaryWriter(memStream))
{
bw.Write("RIFF".ToCharArray());
bw.Write((Int32)Length - 8);
bw.Write("WAVE".ToCharArray());
bw.Write("fmt ".ToCharArray());
bw.Write((Int32)16);
bw.Write((Int16)3);
bw.Write((Int16)1);
bw.Write((Int32)sampleRate);
bw.Write((Int32)sampleRate * 4);
bw.Write((Int16)4);
bw.Write((Int16)32);
bw.Write("data".ToCharArray());
bw.Write((Int32)reader.GetSampleCount() * 4);
}
}
protected override void Dispose(bool disposing)
{
if (readHandle != 0xffffffff)
{
reader.CloseDevice(readHandle);
readHandle = 0xfffffffff;
}
base.Dispose(disposing);
}
~AudioReaderWaveStream()
{
Dispose();
}
public override bool CanRead
{
get
{
return true;
}
}
public override bool CanSeek
{
get
{
return true;
}
}
public override bool CanWrite
{
get
{
return false;
}
}
public override long Length
{
get
{
// Number of float samples + header of 44 bytes.
return (reader.GetSampleCount() * 4) + 44;
}
}
public override long Position
{
get
{
return readPosition;
}
set
{
readPosition = value;
}
}
public override void Flush()
{
//throw new NotImplementedException();
}
public override int Read(byte[] buffer, int offset, int count)
{
if (count <= 0)
return 0;
int retCount = count;
if (Position < 44)
{
int headerCount = count;
if ( Position + count >= 44 )
{
headerCount = 44 - (int)Position;
}
Array.Copy(waveHeader, Position, buffer, offset, headerCount);
offset += headerCount;
Position += headerCount;
count -= headerCount;
}
if (count > 0)
{
float[] readBuffer = new float[count/4];
reader.Seek(readHandle, Position - 44);
reader.ReadAudio(readHandle, readBuffer);
Array.Copy(readBuffer, 0, buffer, offset, count);
}
return retCount;
}
public override long Seek(long offset, System.IO.SeekOrigin origin)
{
if (origin == System.IO.SeekOrigin.Begin)
{
readPosition = offset;
}
else if (origin == System.IO.SeekOrigin.Current)
{
readPosition += offset;
}
else
{
readPosition = Length - offset;
}
return readPosition;
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
}
I then take this object and create a source resolver using it as follows:
// Create a source resolver.
SharpDX.MediaFoundation.ByteStream sdxByteStream = new ByteStream( ARWS );
SharpDX.MediaFoundation.SourceResolver resolver = new SharpDX.MediaFoundation.SourceResolver();
ComObject source = (ComObject)resolver.CreateObjectFromStream( sdxByteStream, "File.wav", SourceResolverFlags.MediaSource );
However every time I'm doing this it hangs on the CreateObjectFromStream call. I've had a look inside SharpDX to see whats going on and it seems the actual hang occurs when it makes the call to the underlying interface through CreateObjectFromByteStream. I've also looked to see what data is actually read from the byte stream. It reads the first 16 bytes which includes the 'RIFF', the RIFF size, the 'WAVE' and the 'fmt '. Then nothing else.
Has anyone got any ideas what I could be doing wrong. I've tried all sorts of combinations of the SourceResolverFlags but nothing seems to make any difference. It just hangs.
It does remind me somewhat of interthread marshalling but all the media foundation calls are made from the same thread so I don't think its that. I'm also fairly sure that MediaFoundation uses free threading so this shouldn't be a problem anyway.
Has anyone any idea what I could possibly be doing wrong?
Thanks!
Ok I have come up with a solution to this. It looks like I may be having a COM threading issue. The read happens in a thread and that thread was calling back to the main thread which the function was called from.
So I used the async version of the call and perform an Application.DoEvents() to hand across control where necessary.
Callback cb = new Callback( resolver );
IUnknown cancel = null;
resolver.BeginCreateObjectFromByteStream( sdxByteStream, "File.wav", (int)(SourceResolverFlags.MediaSource | SourceResolverFlags.ByteStream), null, out cancel, cb, null );
if ( cancel != null )
{
cancel.Dispose();
}
while( cb.MediaSource == null )
{
System.Windows.Forms.Application.DoEvents();
}
SharpDX.MediaFoundation.MediaSource mediaSource = cb.MediaSource;
I really hate COM's threading model ...

How to write from MemoryStream into DataWriter into disk

I have DataWriter that has file storage stream already attached. how ever in particular case I want to first write data in memory so I can know size of bytes, and store size with data in writer.
How can I do that without creating two in memory buffers?
DataWriter writer; // writer is parameter passed from somewhere else.
using (var inMemory = new InMemoryRandomAccessStream())
{
// fill inMemory with data.
// ***Here*** How can I avoid this?
var buffer = new byte[checked((int)inMemory.Position)].AsBuffer();
inMemory.Seek(0);
await inMemory.ReadAsync(buffer, buffer.Length, InputStreamOptions.ReadAhead);
writer.WriteUInt32(buffer.Length); // write size
writer.WriteBuffer(buffer); // write data
}
As you can see I'm using two buffers, one is for memory stream, the other is ibuffer.
I don't know how to directly write inMemory contents into DataWriter which has filestorage stream already attached.
I had to write my own buffer stream in order to prevent duplicate buffer creation. though stream buffer internally works like list but it has benefits when list grows large.
internal sealed class BufferStream : IDisposable
{
private byte[] _array = Array.Empty<byte>();
private int _index = -1;
private const int MaxArrayLength = 0X7FEFFFFF;
public int Capacity => _array.Length;
public int Length => _index + 1;
public void WriteIntoDataWriterStreamAsync(IDataWriter writer)
{
// AsBuffer wont cause copy, its just wrapper around array.
if(_index >= 0) writer.WriteBuffer(_array.AsBuffer(0, _index));
}
public void WriteBuffer(IBuffer buffer)
{
EnsureSize(checked((int) buffer.Length));
for (uint i = 0; i < buffer.Length; i++)
{
_array[++_index] = buffer.GetByte(i);
}
}
public void Flush()
{
Array.Clear(_array, 0, _index);
_index = -1;
}
// list like resizing.
private void EnsureSize(int additionSize)
{
var min = additionSize + _index;
if (_array.Length <= min)
{
var newsize = (int) Math.Min((uint) _array.Length * 2, MaxArrayLength);
if (newsize <= min) newsize = min + 1;
Array.Resize(ref _array, newsize);
}
}
public void Dispose()
{
_array = null;
}
}
Then I can easily do this.
using (var buffer = new BufferStream())
{
// fill buffer
writer.WriteInt32(buffer.Length); // write size
buffer.WriteIntoDataWriterStream(writer); // write data
}

How to Merge two memory streams containing PDF file's data into one

I am trying to read two PDF files into two memory streams and then return a stream that will have both stream's data. But I don't seem to understand what's wrong with my code.
Sample Code:
string file1Path = "Sampl1.pdf";
string file2Path = "Sample2.pdf";
MemoryStream stream1 = new MemoryStream(File.ReadAllBytes(file1Path));
MemoryStream stream2 = new MemoryStream(File.ReadAllBytes(file2Path));
stream1.Position = 0;
stream1.Copyto(stream2);
return stream2; /*supposed to be containing data of both stream1 and stream2 but contains data of stream1 only*/
It appears in case of PDF files, the merging of memorystreams is not the same as with .txt files. For PDF, you need to use some .dll as I used iTextSharp.dll (available under the AGPL license) and then combine them using this library's functions as follows:
MemoryStream finalStream = new MemoryStream();
PdfCopyFields copy = new PdfCopyFields(finalStream);
string file1Path = "Sample1.pdf";
string file2Path = "Sample2.pdf";
var ms1 = new MemoryStream(File.ReadAllBytes(file1Path));
ms1.Position = 0;
copy.AddDocument(new PdfReader(ms1));
ms1.Dispose();
var ms2 = new MemoryStream(File.ReadAllBytes(file2Path));
ms2.Position = 0;
copy.AddDocument(new PdfReader(ms2));
ms2.Dispose();
copy.Close();
finalStream contains the merged pdf of both ms1 and ms2.
NOTE:
The whole question is based on a false premise, that you can produce a combined PDF file by merging the binaries of two PDF files. This works for plain text files for example (to an extent), but definitely doesn't work for PDFs. The answer only addresses how to merge two binary data streams, not how to merge two PDF files in particular. It answers the OP's question as asked, but doesn't actually solve his problem.
When you use the byte[] constructor for MemoryStream, the memory stream will not expand as you add more data. So it will not be big enough for both stream1 and stream2. Also, the position will start at zero, so you're overwriting stream2 with the data in stream1.
The fix is rather simple:
var result = new MemoryStream();
using (var file1 = File.OpenRead(file1Path)) file1.CopyTo(result);
using (var file2 = File.OpenRead(file2Path)) file2.CopyTo(result);
Another option would be to create your own stream class that would be a combination of two separate streams - interesting if you're interested in composability, but probably an overkill for something as simple as this :)
Just for fun, it could look something like this:
public class DualStream : Stream
{
private readonly Stream _first;
private readonly Stream _second;
public DualStream(Stream first, Stream second)
{
_first = first;
_second = second;
}
public override bool CanRead => true;
public override bool CanSeek => true;
public override bool CanWrite => false;
public override long Length => _first.Length + _second.Length;
public override long Position
{
get { return _first.Position + _second.Position; }
set { Seek(value, SeekOrigin.Begin); }
}
public override void Flush() { throw new NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
var bytesRead = _first.Read(buffer, offset, count);
if (bytesRead == count) return bytesRead;
return bytesRead + _second.Read(buffer, offset + bytesRead, count - bytesRead);
}
public override long Seek(long offset, SeekOrigin origin)
{
// To simplify, let's assume seek always works as if over one big MemoryStream
long targetPosition;
switch (origin)
{
case SeekOrigin.Begin: targetPosition = offset; break;
case SeekOrigin.Current: targetPosition = Position + offset; break;
case SeekOrigin.End: targetPosition = Length - offset; break;
default: throw new NotSupportedException();
}
targetPosition = Math.Max(0, Math.Min(Length, targetPosition));
var firstPosition = Math.Min(_first.Length, targetPosition);
_first.Position = firstPosition;
_second.Position = Math.Max(0, targetPosition - firstPosition);
return Position;
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
_first.Dispose();
_second.Dispose();
}
base.Dispose(disposing);
}
public override void SetLength(long value)
{ throw new NotImplementedException(); }
public override void Write(byte[] buffer, int offset, int count)
{ throw new NotImplementedException(); }
}
The main benefit is that it means you don't have to allocate unnecessary in-memory buffers just to have a combined stream - it can even be used with the file streams directly, if you dare :D And it's easily composable - you can make dual streams of other dual streams, allowing you to chain as many streams as you want together - pretty much the same as IEnumerable.Concat.

How to implement a lazy stream chunk enumerator?

I'm trying to split a byte stream into chunks of increasing size.
The source stream contains an unknown number of bytes and is expensive to read. The output of the enumerator should be byte arrays of increasing size, starting at 8KB up to 1MB.
This is very simple to do by simply reading the whole stream, storing it in an array and taking the relevant pieces out. However, since the stream may be very large, reading it at once is unfeasible. Also, while performance is not the main concern, it is important to keep system load very low.
While implementing this I noticed that it's relatively difficult to keep the code short and maintainable. There are a few stream related issues to keep in mind, too (for instance, Stream.Read might not fill the buffer even though it succeeded).
I did not find any existing classes that help for my case, nor could I find something close on the net. How would you implement such a class?
public IEnumerable<BufferWrapper> getBytes(Stream stream)
{
List<int> bufferSizes = new List<int>() { 8192, 65536, 220160, 1048576 };
int count = 0;
int bufferSizePostion = 0;
byte[] buffer = new byte[bufferSizes[0]];
bool done = false;
while (!done)
{
BufferWrapper nextResult = new BufferWrapper();
nextResult.bytesRead = stream.Read(buffer, 0, buffer.Length);
nextResult.buffer = buffer;
done = nextResult.bytesRead == 0;
if (!done)
{
yield return nextResult;
count++;
if (count > 10 && bufferSizePostion < bufferSizes.Count)
{
count = 0;
bufferSizePostion++;
buffer = new byte[bufferSizes[bufferSizePostion]];
}
}
}
}
public class BufferWrapper
{
public byte[] buffer { get; set; }
public int bytesRead { get; set; }
}
Obviously the logic for when to move up in buffer size, and how to choose what that size is could be altered.
Someone could also probably find a better way of handling the last buffer to be sent, as this isn't the most efficient way.
For reference, the implementation I currently use, already with improvements as per the answer by #Servy
private const int InitialBlockSize = 8 * 1024;
private const int MaximumBlockSize = 1024 * 1024;
private Stream _Stream;
private int _Size = InitialBlockSize;
public byte[] Current
{
get;
private set;
}
public bool MoveNext ()
{
if (_Size < 0) {
return false;
}
var buf = new byte[_Size];
int count = 0;
while (count < _Size) {
int read = _Stream.Read (buf, count, _Size - count);
if (read == 0) {
break;
}
count += read;
}
if (count == _Size) {
Current = buf;
if (_Size <= MaximumBlockSize / 2) {
_Size *= 2;
}
}
else {
Current = new byte[count];
Array.Copy (buf, Current, count);
_Size = -1;
}
return true;
}

How do I hash first N bytes of a file?

Using .net, I would like to be able to hash the first N bytes of potentially large files, but I can't seem to find a way of doing it.
The ComputeHash function (I'm using SHA1) takes a byte array or a stream, but a stream seems like the best way of doing it, since I would prefer not to load a potentially large file into memory.
To be clear: I don't want to load a potentially large piece of data into memory if I can help it. If the file is 2GB and I want to hash the first 1GB, that's a lot of RAM!
You can hash large volumes of data using a CryptoStream - something like this should work:
var sha1 = SHA1Managed.Create();
FileStream fs = \\whatever
using (var cs = new CryptoStream(fs, sha1, CryptoStreamMode.Read))
{
byte[] buf = new byte[16];
int bytesRead = cs.Read(buf, 0, buf.Length);
long totalBytesRead = bytesRead;
while (bytesRead > 0 && totalBytesRead <= maxBytesToHash)
{
bytesRead = cs.Read(buf, 0, buf.Length);
totalBytesRead += bytesRead;
}
}
byte[] hash = sha1.Hash;
fileStream.Read(array, 0, N);
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
Open the file as a FileStream, copy the first n bytes into a MemoryStream, then hash the MemoryStream.
As others have pointed out, you should read the first few bytes into an array.
What should also be noted that you don't want to make a direct call to Read and assume that the bytes have been read.
Rather, you want to make sure that the number of bytes that are returned are the number of bytes that you requested, and make another call to Read in the event that the number of bytes returned doesn't equal the initial number requested.
Also, if you have rather large streams, you will want to create a proxy for the Stream class where you pass it the underlying stream (the FileStream in this case) and override the Read method to forward the call to the underlying stream until you read the number of bytes that you need to read. Then, when that number of bytes is returned, you would return -1 to indicate that there are no more bytes to be read.
If you are concerned about keeping too much data in memory, you can create a stream wrapper that throttles the maximum number of bytes read.
Without doing all the work, here's a sample boiler plate you could use to get started.
Edit: Please review comments for recommendations to improve this implementation. End edit
public class LimitedStream : Stream
{
private int current = 0;
private int limit;
private Stream stream;
public LimitedStream(Stream stream, int n)
{
this.limit = n;
this.stream = stream;
}
public override int ReadByte()
{
if (current >= limit)
return -1;
var numread = base.ReadByte();
if (numread >= 0)
current++;
return numread;
}
public override int Read(byte[] buffer, int offset, int count)
{
count = Math.Min(count, limit - current);
var numread = this.stream.Read(buffer, offset, count);
current += numread;
return numread;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return false; }
}
public override void Flush()
{
throw new NotImplementedException();
}
public override long Length
{
get { throw new NotImplementedException(); }
}
public override long Position
{
get { throw new NotImplementedException(); }
set { throw new NotImplementedException(); }
}
protected override void Dispose(bool disposing)
{
base.Dispose(disposing);
if (this.stream != null)
{
this.stream.Dispose();
}
}
}
Here is an example of the stream in use, wrapping a file stream, but throttling the number of bytes read to the specified limit:
using (var stream = new LimitedStream(File.OpenRead(#".\test.xml"), 100))
{
var bytes = new byte[1024];
stream.Read(bytes, 0, bytes.Length);
}

Categories