Back in the WebForms days, I could use Response.OutputStream.Write() and Response.Flush() to chunk file data to the client - because the files we are streaming are huge and will consume too much web server memory. How can I do that now with the new MVC classes like FileStreamResult?
My exact situation is: The DB contains the file data (CSV or XLS) in a VarBinary column. In the WebForms implementation, I send down a System.Func to the DataAccess layer, which would iterate through the IDataReader and use the System.Func to stream the content to the client. The point is that I don't want the webapp to have to have any specific DB knowledge, including IDataReader.
How can I achieve the same result using MVC?
The Func is (that I define in the web layer and send down to the DB layer):
Func<byte[], long, bool> partialUpdateFunc = (data, length) =>
{
if (Response.IsClientConnected)
{
// Write the data to the current output stream.
Response.OutputStream.Write(data, 0, (int) length);
// Flush the data to the HTML output.
Response.Flush();
return true;
}
else
{
return false;
}
};
and in the DB Layer, we get the IDataReader from the DB SP (using statement with ExecuteReader):
using (var reader = conn.ExecuteReader())
{
if (reader.Read())
{
byte[] outByte = new byte[BufferSize];
long startIndex = 0;
// Read bytes into outByte[] and retain the number of bytes returned.
long retval = reader.GetBytes(0, startIndex, outByte, 0, BufferSize);
// Continue while there are bytes beyond the size of the buffer.
bool stillConnected = true;
while (retval == BufferSize)
{
stillConnected = partialUpdateFunc(outByte, retval);
if (!stillConnected)
{
break;
}
// Reposition start index to end of last buffer and fill buffer.
startIndex += BufferSize;
retval = reader.GetBytes(0, startIndex, outByte, 0, BufferSize);
}
// Write the remaining buffer.
if (stillConnected)
{
partialUpdateFunc(outByte, retval);
}
}
// Close the reader and the connection.
reader.Close();
}
If you want to reuse FileStreamResult you need to create Stream-derived class that reads data from DB and pass that stream to the FileStreamResult.
Couple issues with that approach
action results are executed synchronously, so your download will not release thread while data is read from DB/send - may be ok for small number of parallel downloads. To get around you may need to use handler or download from async action directly (feels wrong for MVC approach)
at least old versions of FileStreamResult did not have "streaming" support (discussed here), make sure you are fine with that.
Related
I am working on a project where I am populating some pdfs on the back-end, then I convert those pdfs into a List of byte[] which gets merged into one very large array and finally, send back via the response body as a Memory Stream.
My issue is that this is a large amount of data and during the process of getting the list of byte arrays to merge I am using a lot of memory.
I am wondering if instead of converting the final merged byte[] into a Memory Stream and adding that to the response body; could I create several Memory Stream objects that I an append to the Response.Body as they are created? Alternatively, I wondered if there was a way to use the one Memory Stream and just keep adding to it as a create each new byte[] for each pdf document?
Edit: This is probably a little long winded but I was too vague with my original post. At the core of what I am trying to do I have several pdf documents, they are each several pages long. Each of them is represented in the code below as one of the byte[] items in the filesToMerge List. Ideally, I would like to go through these one by one and convert them into a memory stream and send them to the client one right after the other in a loop. However, when I try to do this I get errors that the Response body has already been sent. Is there a way to append something to the response body so it is updated each time through the loop?
[HttpGet("template/{formId}/fillforms")]
public void FillForms(string formId/*, [FromBody] IList<IDictionary<string, string>> fieldDictionaries*/)
{
List<byte[]> filesToMerge = new List<byte[]>();
// For testing
var mockData = new MockData();
IList<IDictionary<string, string>> fieldDictionaries = mockData.GetMock1095Dictionaries();
foreach(IDictionary<string, string> dictionary in fieldDictionaries)
{
var populatedForm = this.dataRepo.PopulateForm(formId, dictionary);
// write to rb
filesToMerge.Add(populatedForm);
}
byte[] mergedFilesAsByteArray = this.dataRepo.GetMergedByteArray(filesToMerge);
this.SendResponse(formId + "_filled.pdf", new MemoryStream(mergedFilesAsByteArray));
}
private void SendResponse(string formName, MemoryStream ms, IDictionary<string, string> fieldData = null)
{
Response.Clear();
Response.ContentType = "application/pdf";
Response.Headers.Add("content-disposition", $"attachment;filename={formName}.pdf");
ms.WriteTo(Response.Body);
}
Memory streams are really just byte arrays with a bunch of nice methods on top. So switching to byte arrays won't help that much. A problem that a log of people run into when dealing with byte arrays and memory streams is not releasing the memory when you are done with the data since they occupy the memory of the machine you are running on so you can easily run out of memory. So you should be disposing of data as soon as you don't need it anymore with "using statements" as an example. Memory streams has a method called Dispose that will release all resources used by the stream
If you wanted to transfer the data from your application as quickly as possible the best approach would be to cut the stream into smaller parts and re-assemble them in the correct order at the destination. You could cut them to 1mb or 126kb really whatever you want. When you send the data to the destination you need to also pass what the order number of this part is because this method allows you to POSt the data in parallel and there is no guarantee of order.
To split a stream into multiple streams
private static List<MemoryStream> CreateChunks(Stream stream)
{
byte[] buffer = new byte[4000000]; //set the size of your buffer (chunk)
var returnStreams = new List<MemoryStream>();
using (MemoryStream ms = new MemoryStream())
{
while (true) //loop to the end of the file
{
var returnStream = new MemoryStream();
int read = stream.Read(buffer, 0, buffer.Length); //read each chunk
returnStream.Write(buffer, 0, read); //write chunk to [wherever];
if (read <= 0)
{ //check for end of file
return returnStreams;
}
else
{
returnStream.Position = 0;
returnStreams.Add(returnStream);
}
}
}
}
I then looped through the streams that were created to create tasks to post to the service and each task would post to the server. I would await all of the tasks to finish then call my server again to tell it I had finished uploading and it could combine all of the data into one in the correct order. My service has the concept of an upload session to keep track of all of the parts and which order they would go in. It would also save each part to the database as they came in; in my case Azure Blob storage.
It's not clear why you would be getting errors copying the contents of multiple MemoryStreams to the Response.Body. You should certainly be able to do this, although you'll need to be sure not to try and change response headers or the status code after you begin writing data (also don't try to call Response.Clear() after you begin writing data).
Here is a simple example of starting a response and then writing data:
[ApiController]
[Route("[controller]")]
public class RandomDataController : ControllerBase {
private readonly ILogger<RandomDataController> logger;
private const String CharacterData = "abcdefghijklmnopqrstuvwxyz0123456789 ";
public RandomDataController(ILogger<RandomDataController> logger) {
this.logger = logger;
}
[HttpGet]
public async Task Get(CancellationToken cancellationToken) {
this.Response.ContentType = "text/plain";
this.Response.ContentLength = 1000;
await this.Response.StartAsync(cancellationToken);
logger.LogInformation("Response Started");
var rand = new Random();
for (var i = 0; i < 1000; i++) {
// You should be able to copy the contents of a MemoryStream or other buffer here instead of sending random data like this does.
await this.Response.Body.WriteAsync(Encoding.UTF8.GetBytes(CharacterData[rand.Next(0, CharacterData.Length)].ToString()), cancellationToken);
Thread.Sleep(50); // This is just to demonstrate that data is being sent to the client as it is written
cancellationToken.ThrowIfCancellationRequested();
if (i % 100 == 0 && i > 0) {
logger.LogInformation("Response In Flight {PercentComplete}", (Double)i / 1000);
}
}
logger.LogInformation("Response Complete");
}
}
You can verify that this streams data back to the client using netcat:
% nc -nc 127.0.0.1 5000
GET /randomdata HTTP/1.1
Host: localhost:5000
Connection: Close
(Enter an extra blank line after Connection: Close to begin the request). You should see data appear in netcat as it is written to Response.Body on the server.
One thing to note is that this approach involves calculating the length of the data to be sent up front. If you are unable to calculate the size of the response up front, or prefer not to, you can look into Chunked Transfer Encoding, which ASP.Net should automatically use if you start writing data to the Response.Body without specifying the Content-Length.
I have created a dot net standard app for central place for storing files. I've managed to upload file in chunks cuz basically I leave the client to send the chunks and just append them as a stream in the database but the problem comes when I want to do it the way back, retrieving the file in chunks from the database (with a couple of sql queries perhaps) and not sending it at once. There is possibility to be done with SqlFileStream but its not possible in dot net standard application therefore I'm seeking to some solutions with Dapper reader may be ?
Found some sample code here - https://stackoverflow.com/a/2101447 but I'm not sure if I can do it with dapper. Every proposition is much appreciated.
Found this solution with passing the Stream. Basically using the Respose.Body stream from client asp.net mvc and modifying it directly without returning any data. So on file get starts sending chunks directly to the client without using the whole server memory (only this 1mb specified at a time).
var sql = $#"
SELECT [Data]
FROM {TableName}
WHERE ChunkId = #chunkId";
using (var conn = this.dbConnectionFactory.GetSqlConnection)
using (var reader = await conn.ExecuteReaderAsync(sql, new { chunkId }).ConfigureAwait(false))
{
while (reader.Read())
{
var buffer = new byte[1024 * 1024]; // Read chunks of 1MB
var bytesRead = 0L;
var dataIndex = 0L;
while ((bytesRead = reader.GetBytes(0, dataIndex, buffer, 0, buffer.Length)) > 0)
{
var actual = new byte[bytesRead];
Array.Copy(buffer, 0, actual, 0, bytesRead);
await stream.WriteAsync(actual, 0, (int)bytesRead).ConfigureAwait(false);
dataIndex += bytesRead;
}
}
}
I have very poor knowledge of C#, but I need to write code, that read binary blob to byte[].
I wrote this code:
byte[] userBlob;
myCommand.CommandText = "SELECT id, userblob FROM USERS";
myCommand.Connection = myFBConnection;
myCommand.Transaction = myTransaction;
FbDataReader reader = myCommand.ExecuteReader();
try
{
while(reader.Read())
{
Console.WriteLine(reader.GetString(0));
userBlob = // what I should to do here??
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
Console.WriteLine("Can't read data from DB");
}
But what I should place here? As I understood I need use streams, but I can't understand how to do it.
A little late to the game; I hope this is on point.
I assuming you're using the Firebird .NET provider, which is a C# implementation that does not ride on top of the native fbclient.dll. Unfortunately, it does not provide a streaming interface to BLOBs, which would allow for reading potentially huge data in chunks without blowing out memory.
Instead, you use the FbDataReader.GetBytes() method to read the data, and it all has to fit in memory. GetBytes takes a user-provided buffer and stuffs the BLOB data in the position referenced, and it returns the number of bytes it actually copied (which could be less than the full size).
Passing a null buffer to GetBytes returns you the full size of the BLOB (but no data!) so you can reallocate as needed.
Here we assume you have an INT for field #0 (not interesting) and the BLOB for #1, and this naive implementation should take care of it:
// temp buffer for all BLOBs, reallocated as needed
byte [] blobbuffer = new byte[512];
while (reader.Read())
{
int id = reader.GetInt32(0); // read first field
// get bytes required for this BLOB
long n = reader.GetBytes(
i: 1, // field number
dataIndex: 0,
buffer: null, // no buffer = size check only
bufferIndex: 0,
length: 0);
// extend buffer if needed
if (n > blobbuffer.Length)
blobbuffer = new byte[n];
// read again into nominally "big enough" buffer
n = reader.GetBytes(1, 0, blobbuffer, 0, blobbuffer.Length);
// Now: <n> bytes of <blobbuffer> has your data. Go at it.
}
It's possible to optimize this somewhat, but the Firebird .NET provider really needs a streaming BLOB interface like the native fbclient.dll offers.
byte[] toBytes = Encoding.ASCII.GetBytes(string);
So, in your case;
userBlob = Encoding.ASCII.GetBytes(reader.GetString(0));
However, I am not sure what you are trying to achieve with your code as you are pulling back all users and then creating the blob over and over.
I am experiencing some strange behaviour from my code which i am using to stream files to my clients.
I have a mssql server which acts as a filestore, with files that is accessed via an UNC path.
On my webserver i have some .net code running that handles streaming the files (in this case pictures and thumbnails) to my clients.
My code works, but i am experiencing a constant delay of ~12 sec on the initial file request. When i have made the initial request it is as the server wakes up and suddenly becomes responsive only to fall back to the same behaviour some time after.
At first i thought it was my code, but from what i can see on the server activity log there is no ressource intensive code going on. My theory is that at each call to the server the path must first be mounted and that is what causes the delay. It will then unmount some time after and will have to remount.
For reference i am posting my code (maybe i just cannot see the problem):
public async static Task StreamFileAsync(HttpContext context, FileInfo fileInfo)
{
//This controls how many bytes to read at a time and send to the client
int bytesToRead = 512 * 1024; // 512KB
// Buffer to read bytes in chunk size specified above
byte[] buffer = new Byte[bytesToRead];
// Clear the current response content/headers
context.Response.Clear();
context.Response.ClearHeaders();
//Indicate the type of data being sent
context.Response.ContentType = FileTools.GetMimeType(fileInfo.Extension);
//Name the file
context.Response.AddHeader("Content-Disposition", "filename=\"" + fileInfo.Name + "\"");
context.Response.AddHeader("Content-Length", fileInfo.Length.ToString());
// Open the file
using (var stream = fileInfo.OpenRead())
{
// The number of bytes read
int length;
do
{
// Verify that the client is connected
if (context.Response.IsClientConnected)
{
// Read data into the buffer
length = await stream.ReadAsync(buffer, 0, bytesToRead);
// and write it out to the response's output stream
await context.Response.OutputStream.WriteAsync(buffer, 0, length);
try
{
// Flush the data
context.Response.Flush();
}
catch (HttpException)
{
// Cancel the download if a HttpException happens
// (ie. the client has disconnected by we tried to send some data)
length = -1;
}
//Clear the buffer
buffer = new Byte[bytesToRead];
}
else
{
// Cancel the download if client has disconnected
length = -1;
}
} while (length > 0); //Repeat until no data is read
}
// Tell the response not to send any more content to the client
context.Response.SuppressContent = true;
// Tell the application to skip to the EndRequest event in the HTTP pipeline
context.ApplicationInstance.CompleteRequest();
}
If anyone could shed some light over this problem i would be very grateful!
I am trying to empower users to upload large files. Before I upload a file, I want to chunk it up. Each chunk needs to be a C# object. The reason why is for logging purposes. Its a long story, but I need to create actual C# objects that represent each file chunk. Regardless, I'm trying the following approach:
public static List<FileChunk> GetAllForFile(byte[] fileBytes)
{
List<FileChunk> chunks = new List<FileChunk>();
if (fileBytes.Length > 0)
{
FileChunk chunk = new FileChunk();
for (int i = 0; i < (fileBytes.Length / 512); i++)
{
chunk.Number = (i + 1);
chunk.Offset = (i * 512);
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
chunks.Add(chunk);
chunk = new FileChunk();
}
}
return chunks;
}
Unfortunately, this approach seems to be incredibly slow. Does anyone know how I can improve the performance while still creating objects for each chunk?
thank you
I suspect this is going to hurt a little:
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
Try this instead:
byte buffer = new byte[512];
Buffer.BlockCopy(fileBytes, chunk.Offset, buffer, 0, 512);
chunk.Bytes = buffer;
(Code not tested)
And the reason why this code would likely be slow is because Skip doesn't do anything special for arrays (though it could). This means that every pass through your loop is iterating the first 512*n items in the array, which results in O(n^2) performance, where you should just be seeing O(n).
Try something like this (untested code):
public static List<FileChunk> GetAllForFile(string fileName, FileMode.Open)
{
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName))
{
int i = 0;
while (stream.Position <= stream.Length)
{
var chunk = new FileChunk();
chunk.Number = (i);
chunk.Offset = (i * 512);
Stream.Read(chunk.Bytes, 0, 512);
chunks.Add(chunk);
i++;
}
}
return chunks;
}
The above code skips several steps in your process, preferring to read the bytes from the file directly.
Note that, if the file is not an even multiple of 512, the last chunk will contain less than 512 bytes.
Same as Robert Harvey's answer, but using a BinaryReader, that way I don't need to specify an offset. If you use a BinaryWriter on the other end to reassemble the file, you won't need the Offset member of FileChunk.
public static List<FileChunk> GetAllForFile(string fileName) {
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName)) {
BinaryReader reader = new BinaryReader(stream);
int i = 0;
bool eof = false;
while (!eof) {
var chunk = new FileChunk();
chunk.Number = i;
chunk.Offset = (i * 512);
chunk.Bytes = reader.ReadBytes(512);
chunks.Add(chunk);
i++;
if (chunk.Bytes.Length < 512) { eof = true; }
}
}
return chunks;
}
Have you thought about what you're going to do to compensate for packet loss and data corruption?
Since you mentioned that the load is taking a long time then I would use asynchronous file reading in order to speed up the loading process. The hard disk is the slowest component of a computer. Google does asynchronous reads and writes on Google Chrome to improve their load times. I had to do something like this in C# in a previous job.
The idea would be to spawn several asynchronous requests over different parts of the file. Then when a request comes in, take the byte array and create your FileChunk objects taking 512 bytes at a time. There are several benefits to this:
If you have this run in a separate thread, then you won't have the whole program waiting to load the large file you have.
You can process a byte array, creating FileChunk objects, while the hard disk is still trying to for-fill read request on other parts of the file.
You will save on RAM space if you limit the amount of pending read requests you can have. This allows less page faulting to the hard disk and use the RAM and CPU cache more efficiently, which speeds up processing further.
You would want to use the following methods in the FileStream class.
[HostProtectionAttribute(SecurityAction.LinkDemand, ExternalThreading = true)]
public virtual IAsyncResult BeginRead(
byte[] buffer,
int offset,
int count,
AsyncCallback callback,
Object state
)
public virtual int EndRead(
IAsyncResult asyncResult
)
Also this is what you will get in the asyncResult:
// Extract the FileStream (state) out of the IAsyncResult object
FileStream fs = (FileStream) ar.AsyncState;
// Get the result
Int32 bytesRead = fs.EndRead(ar);
Here is some reference material for you to read.
This is a code sample of working with Asynchronous File I/O Models.
This is a MS documentation reference for Asynchronous File I/O.