I am working on a project where I am populating some pdfs on the back-end, then I convert those pdfs into a List of byte[] which gets merged into one very large array and finally, send back via the response body as a Memory Stream.
My issue is that this is a large amount of data and during the process of getting the list of byte arrays to merge I am using a lot of memory.
I am wondering if instead of converting the final merged byte[] into a Memory Stream and adding that to the response body; could I create several Memory Stream objects that I an append to the Response.Body as they are created? Alternatively, I wondered if there was a way to use the one Memory Stream and just keep adding to it as a create each new byte[] for each pdf document?
Edit: This is probably a little long winded but I was too vague with my original post. At the core of what I am trying to do I have several pdf documents, they are each several pages long. Each of them is represented in the code below as one of the byte[] items in the filesToMerge List. Ideally, I would like to go through these one by one and convert them into a memory stream and send them to the client one right after the other in a loop. However, when I try to do this I get errors that the Response body has already been sent. Is there a way to append something to the response body so it is updated each time through the loop?
[HttpGet("template/{formId}/fillforms")]
public void FillForms(string formId/*, [FromBody] IList<IDictionary<string, string>> fieldDictionaries*/)
{
List<byte[]> filesToMerge = new List<byte[]>();
// For testing
var mockData = new MockData();
IList<IDictionary<string, string>> fieldDictionaries = mockData.GetMock1095Dictionaries();
foreach(IDictionary<string, string> dictionary in fieldDictionaries)
{
var populatedForm = this.dataRepo.PopulateForm(formId, dictionary);
// write to rb
filesToMerge.Add(populatedForm);
}
byte[] mergedFilesAsByteArray = this.dataRepo.GetMergedByteArray(filesToMerge);
this.SendResponse(formId + "_filled.pdf", new MemoryStream(mergedFilesAsByteArray));
}
private void SendResponse(string formName, MemoryStream ms, IDictionary<string, string> fieldData = null)
{
Response.Clear();
Response.ContentType = "application/pdf";
Response.Headers.Add("content-disposition", $"attachment;filename={formName}.pdf");
ms.WriteTo(Response.Body);
}
Memory streams are really just byte arrays with a bunch of nice methods on top. So switching to byte arrays won't help that much. A problem that a log of people run into when dealing with byte arrays and memory streams is not releasing the memory when you are done with the data since they occupy the memory of the machine you are running on so you can easily run out of memory. So you should be disposing of data as soon as you don't need it anymore with "using statements" as an example. Memory streams has a method called Dispose that will release all resources used by the stream
If you wanted to transfer the data from your application as quickly as possible the best approach would be to cut the stream into smaller parts and re-assemble them in the correct order at the destination. You could cut them to 1mb or 126kb really whatever you want. When you send the data to the destination you need to also pass what the order number of this part is because this method allows you to POSt the data in parallel and there is no guarantee of order.
To split a stream into multiple streams
private static List<MemoryStream> CreateChunks(Stream stream)
{
byte[] buffer = new byte[4000000]; //set the size of your buffer (chunk)
var returnStreams = new List<MemoryStream>();
using (MemoryStream ms = new MemoryStream())
{
while (true) //loop to the end of the file
{
var returnStream = new MemoryStream();
int read = stream.Read(buffer, 0, buffer.Length); //read each chunk
returnStream.Write(buffer, 0, read); //write chunk to [wherever];
if (read <= 0)
{ //check for end of file
return returnStreams;
}
else
{
returnStream.Position = 0;
returnStreams.Add(returnStream);
}
}
}
}
I then looped through the streams that were created to create tasks to post to the service and each task would post to the server. I would await all of the tasks to finish then call my server again to tell it I had finished uploading and it could combine all of the data into one in the correct order. My service has the concept of an upload session to keep track of all of the parts and which order they would go in. It would also save each part to the database as they came in; in my case Azure Blob storage.
It's not clear why you would be getting errors copying the contents of multiple MemoryStreams to the Response.Body. You should certainly be able to do this, although you'll need to be sure not to try and change response headers or the status code after you begin writing data (also don't try to call Response.Clear() after you begin writing data).
Here is a simple example of starting a response and then writing data:
[ApiController]
[Route("[controller]")]
public class RandomDataController : ControllerBase {
private readonly ILogger<RandomDataController> logger;
private const String CharacterData = "abcdefghijklmnopqrstuvwxyz0123456789 ";
public RandomDataController(ILogger<RandomDataController> logger) {
this.logger = logger;
}
[HttpGet]
public async Task Get(CancellationToken cancellationToken) {
this.Response.ContentType = "text/plain";
this.Response.ContentLength = 1000;
await this.Response.StartAsync(cancellationToken);
logger.LogInformation("Response Started");
var rand = new Random();
for (var i = 0; i < 1000; i++) {
// You should be able to copy the contents of a MemoryStream or other buffer here instead of sending random data like this does.
await this.Response.Body.WriteAsync(Encoding.UTF8.GetBytes(CharacterData[rand.Next(0, CharacterData.Length)].ToString()), cancellationToken);
Thread.Sleep(50); // This is just to demonstrate that data is being sent to the client as it is written
cancellationToken.ThrowIfCancellationRequested();
if (i % 100 == 0 && i > 0) {
logger.LogInformation("Response In Flight {PercentComplete}", (Double)i / 1000);
}
}
logger.LogInformation("Response Complete");
}
}
You can verify that this streams data back to the client using netcat:
% nc -nc 127.0.0.1 5000
GET /randomdata HTTP/1.1
Host: localhost:5000
Connection: Close
(Enter an extra blank line after Connection: Close to begin the request). You should see data appear in netcat as it is written to Response.Body on the server.
One thing to note is that this approach involves calculating the length of the data to be sent up front. If you are unable to calculate the size of the response up front, or prefer not to, you can look into Chunked Transfer Encoding, which ASP.Net should automatically use if you start writing data to the Response.Body without specifying the Content-Length.
Related
I want to send file as byte[] to another PC via HTTP POST method. What is the most efficient way to assemble the file from byte[] on the other side? I am using File.ReadAllBytes method to get byte[] from file.
If you are using tcp the network protocol will make sure that your stream is coming in the right order and without dropped parts. Therefore, the simplest readstream will be the most efficient. If you want to use parallel routes and play with the datagrams.
If the file is large you will have to transmit and receive it in chunks. But the IP streams can hide that from you.
For example: https://learn.microsoft.com/en-us/dotnet/api/system.net.sockets.tcpclient.getstream?view=netframework-4.7.2
This is what worked for me. I used this method to call api method and send file as byte[]. I tried sending whole byte[] but api method wasn't able to recive it.
private static async void SendFiles(string path)
{
var bytes = File.ReadAllBytes(path);
var length = bytes.Length;
foreach (var b in bytes)
{
length--;
string sendFilesUrl = $"http://api/communication/sendF/{b}/{length}";
StringContent queryString = new StringContent(bytes.ToString(), Encoding.UTF8, "application/x-www-form-urlencoded");
HttpResponseMessage response = await client.PostAsync(sendFilesUrl, queryString);
string responseBody = await response.Content.ReadAsStringAsync();
}
}
My api method is
[HttpPost]
[Route("sendF/{b}/{length}")]
public HttpResponseMessage SendF([FromUri]byte[] b, [FromUri]int length)
{
if(length != 0)
{
bytes.AddRange(b);
}
else
{
File.WriteAllBytes(#"G:\test\test.exe", bytes.ToArray<byte>());
}
return CreateResponse(client);
}
This code works for me but it takes very long to pass all bytes if the file is large. Currently I'm searching for more efficient way of sending bytes. One solution that came to my mind is to send byte[] in chunks
I have already return zip file stream to client as the following MessageContract:
[MessageContract]
public class ExportResult_C
{
[MessageHeader]
public PackedStudy_C[] PackedStudy
{
get;
set;
}
[MessageBodyMember]
public Stream Stream
{
get;
set;
}
}
I have decided to split it to zip parts when the file length is more than 500 MB.
Scenario:
1- User will call Export method which returns ExportResult_C
2- If requested file is greater than 500 MB split it to smaller part that each section must have 200MB size.
3- If requested file is smaller than 500 MB return the MessageContract with one stream.
Desc:
For backward compatibility I have decided to change ExportResult_C to have two properties one named Stream which already designed for when file is smaller than 500 MB and the other one will be array of stream to hold all split zip with 200 MB of size.
Question:
1- Is that MessageContract can have another array prop of stream ?
2- If not, is it possible to change the Stream prop to array of Stream type ?
3- Or to implement mentioned scenario I have to change the contract completely or is there any better idea (in terms of throughput and backward-compatibility)?
I want to share the result of my investigation and my solution to pass big file as stream to client consumer:
Question 1:
this is not possible to have MessageBodyMember دeither Stream or any other type, after running code you may got an Exception as following:
In order to use Streams with the MessageContract programming model, the type yourMessageContract must have a single member with MessageBodyMember attribute and the member type must be Stream.
Question 2:
I changed the contract to have a prop member named Stream like what I was wanted, the Streams is array of stream:
[MessageBodyMember]
public Stream[] Streams
{
get;
set;
}
my piece of code to split big file to parts of zip and make stream of each part into Streams like:
ZipFile zip = new ZipFile();
if (!Directory.Exists(zipRoot))
Directory.CreateDirectory(zipRoot);
zip.AddDirectory(packageSpec.FolderPath, zipRoot);
zip.MaxOutputSegmentSize = 200 * 1024 * 1024; // 200 MB segments
zip.Save(fileName);
ExportResult_C result = null;
if (zip.NumberOfSegmentsForMostRecentSave > 1)
{
result = new ExportResult_C()
{
PackedStudy = packed.ToArray(),
Streams = new Stream[zip.NumberOfSegmentsForMostRecentSave]
};
string[] zipFiles = Directory.GetFiles(zipRoot);
foreach (string fileN in zipFiles)
{
Stream streamToAdd = new MemoryStream(File.ReadAllBytes(fileN));
result.Streams[zipFiles.ToList().IndexOf(fileN)] = streamToAdd;
}
}
else
{
result = new ExportResult_C()
{
PackedStudy = packed.ToArray(),
Streams = new Stream[1] { new MemoryStream(File.ReadAllBytes(fileName)) }
};
}
return result;
At the compile time there is no any error when we have array of stream in MessageBodyMember, everything works fine until the service is passing array stream (result in code) to consumer at runtime, by the way I cross the Exception like :
The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:29:59.9895560'.
Question 3:
To implement the mentioned scenario the Contract should not change for Backward-Compatibility so the contract have a message body stream like before:
[MessageBodyMember]
public Stream Stream
{
get;
set;
}
but I am going to write stream of the zip part to end of Stream as one stream and in client-server will be read an split each stream as file
solution:
4 Byte for length number of each stream
each stream content write after it's length number(after 4 byte)
at the end stream will be something like this
Stream = Part1 len + part1 stream content + part2 len + part2 stream content + ....
any comment and help around the answer would be truly appreciated.
Back in the WebForms days, I could use Response.OutputStream.Write() and Response.Flush() to chunk file data to the client - because the files we are streaming are huge and will consume too much web server memory. How can I do that now with the new MVC classes like FileStreamResult?
My exact situation is: The DB contains the file data (CSV or XLS) in a VarBinary column. In the WebForms implementation, I send down a System.Func to the DataAccess layer, which would iterate through the IDataReader and use the System.Func to stream the content to the client. The point is that I don't want the webapp to have to have any specific DB knowledge, including IDataReader.
How can I achieve the same result using MVC?
The Func is (that I define in the web layer and send down to the DB layer):
Func<byte[], long, bool> partialUpdateFunc = (data, length) =>
{
if (Response.IsClientConnected)
{
// Write the data to the current output stream.
Response.OutputStream.Write(data, 0, (int) length);
// Flush the data to the HTML output.
Response.Flush();
return true;
}
else
{
return false;
}
};
and in the DB Layer, we get the IDataReader from the DB SP (using statement with ExecuteReader):
using (var reader = conn.ExecuteReader())
{
if (reader.Read())
{
byte[] outByte = new byte[BufferSize];
long startIndex = 0;
// Read bytes into outByte[] and retain the number of bytes returned.
long retval = reader.GetBytes(0, startIndex, outByte, 0, BufferSize);
// Continue while there are bytes beyond the size of the buffer.
bool stillConnected = true;
while (retval == BufferSize)
{
stillConnected = partialUpdateFunc(outByte, retval);
if (!stillConnected)
{
break;
}
// Reposition start index to end of last buffer and fill buffer.
startIndex += BufferSize;
retval = reader.GetBytes(0, startIndex, outByte, 0, BufferSize);
}
// Write the remaining buffer.
if (stillConnected)
{
partialUpdateFunc(outByte, retval);
}
}
// Close the reader and the connection.
reader.Close();
}
If you want to reuse FileStreamResult you need to create Stream-derived class that reads data from DB and pass that stream to the FileStreamResult.
Couple issues with that approach
action results are executed synchronously, so your download will not release thread while data is read from DB/send - may be ok for small number of parallel downloads. To get around you may need to use handler or download from async action directly (feels wrong for MVC approach)
at least old versions of FileStreamResult did not have "streaming" support (discussed here), make sure you are fine with that.
I am trying to empower users to upload large files. Before I upload a file, I want to chunk it up. Each chunk needs to be a C# object. The reason why is for logging purposes. Its a long story, but I need to create actual C# objects that represent each file chunk. Regardless, I'm trying the following approach:
public static List<FileChunk> GetAllForFile(byte[] fileBytes)
{
List<FileChunk> chunks = new List<FileChunk>();
if (fileBytes.Length > 0)
{
FileChunk chunk = new FileChunk();
for (int i = 0; i < (fileBytes.Length / 512); i++)
{
chunk.Number = (i + 1);
chunk.Offset = (i * 512);
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
chunks.Add(chunk);
chunk = new FileChunk();
}
}
return chunks;
}
Unfortunately, this approach seems to be incredibly slow. Does anyone know how I can improve the performance while still creating objects for each chunk?
thank you
I suspect this is going to hurt a little:
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
Try this instead:
byte buffer = new byte[512];
Buffer.BlockCopy(fileBytes, chunk.Offset, buffer, 0, 512);
chunk.Bytes = buffer;
(Code not tested)
And the reason why this code would likely be slow is because Skip doesn't do anything special for arrays (though it could). This means that every pass through your loop is iterating the first 512*n items in the array, which results in O(n^2) performance, where you should just be seeing O(n).
Try something like this (untested code):
public static List<FileChunk> GetAllForFile(string fileName, FileMode.Open)
{
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName))
{
int i = 0;
while (stream.Position <= stream.Length)
{
var chunk = new FileChunk();
chunk.Number = (i);
chunk.Offset = (i * 512);
Stream.Read(chunk.Bytes, 0, 512);
chunks.Add(chunk);
i++;
}
}
return chunks;
}
The above code skips several steps in your process, preferring to read the bytes from the file directly.
Note that, if the file is not an even multiple of 512, the last chunk will contain less than 512 bytes.
Same as Robert Harvey's answer, but using a BinaryReader, that way I don't need to specify an offset. If you use a BinaryWriter on the other end to reassemble the file, you won't need the Offset member of FileChunk.
public static List<FileChunk> GetAllForFile(string fileName) {
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName)) {
BinaryReader reader = new BinaryReader(stream);
int i = 0;
bool eof = false;
while (!eof) {
var chunk = new FileChunk();
chunk.Number = i;
chunk.Offset = (i * 512);
chunk.Bytes = reader.ReadBytes(512);
chunks.Add(chunk);
i++;
if (chunk.Bytes.Length < 512) { eof = true; }
}
}
return chunks;
}
Have you thought about what you're going to do to compensate for packet loss and data corruption?
Since you mentioned that the load is taking a long time then I would use asynchronous file reading in order to speed up the loading process. The hard disk is the slowest component of a computer. Google does asynchronous reads and writes on Google Chrome to improve their load times. I had to do something like this in C# in a previous job.
The idea would be to spawn several asynchronous requests over different parts of the file. Then when a request comes in, take the byte array and create your FileChunk objects taking 512 bytes at a time. There are several benefits to this:
If you have this run in a separate thread, then you won't have the whole program waiting to load the large file you have.
You can process a byte array, creating FileChunk objects, while the hard disk is still trying to for-fill read request on other parts of the file.
You will save on RAM space if you limit the amount of pending read requests you can have. This allows less page faulting to the hard disk and use the RAM and CPU cache more efficiently, which speeds up processing further.
You would want to use the following methods in the FileStream class.
[HostProtectionAttribute(SecurityAction.LinkDemand, ExternalThreading = true)]
public virtual IAsyncResult BeginRead(
byte[] buffer,
int offset,
int count,
AsyncCallback callback,
Object state
)
public virtual int EndRead(
IAsyncResult asyncResult
)
Also this is what you will get in the asyncResult:
// Extract the FileStream (state) out of the IAsyncResult object
FileStream fs = (FileStream) ar.AsyncState;
// Get the result
Int32 bytesRead = fs.EndRead(ar);
Here is some reference material for you to read.
This is a code sample of working with Asynchronous File I/O Models.
This is a MS documentation reference for Asynchronous File I/O.
are there any good examples of how to attach multiple files from a database to an e-mail in .NET? I've got a method that returns a Byte[] containing the Image column contents that I am calling in a loop to get each attachment, but I was wondering if there was a "correct"/best-practice way of doing this, especially with the possibility of introducing memory leaks by using MemoryStreams to contain the data? I'm fine creating en e-mail object and attaching the list of attachments to it, once I've got them and can do this fine with a single attachement but it seems to get slightly more complex with multiple files. Considering I wouldn't have thought this was an unusual requirement, there seems to be a dearth of articles/posts about it.
Thx - MH
Here's how to proceed. Let's suppose that you have an array of attachments that you have loaded from your database:
IEnumerable<byte[]> attachments = ... fetch from your database
We could also safely assume that along with those attachments you have loaded the filenames and probably their corresponding MIME type (information that you surely must have persisted along with those byte arrays representing your attachments). So you will probably have fetched IEnumerable<SomeAttachmentType> but that's not important for the purpose of this post.
So now you could send the mail:
using (var client = new SmtpClient("smtp.foo.com"))
using (var message = new MailMessage("from#foo.com", "to#bar.com"))
{
message.Subject = "test subject";
message.Body = "test body";
message.IsBodyHtml = false;
foreach (var attachment in attachments)
{
var attachmentStream = new MemoryStream(attachment);
// TODO: Choose a better name for your attachments and adapt the MIME type
var messageAttachment = new Attachment(attachmentStream, Guid.NewGuid().ToString(), "application/octet-stream");
message.Attachments.Add(messageAttachment);
}
client.Send(message);
}
Here's the deal:
A MailMessage (IDisposable) contains multiple Attachments (IDisposable). Each attachment references a MemoryStream (IDisposable). The MailMessage is wrapped in a using block which ensures that its Dispose method will be called which in turn calls the Dispose method of all attachments which in turn call the Dispose method of the memory streams.
Hi you can have buffered reads directly from the database, MemoryStream does NOT introduce any memory leak if you dispose it after usage. Example using SqlDataReader:
using(var stream = new MemoryStream())
{
byte[] buffer = new byte[4096];
long l, dataOffset = 0;
while ((l = reader.GetBytes(columnIndex, dataOffset, buffer, 0, buffer.Length)) > 0)
{
stream.Write(buffer, 0, buffer.Length);
dataOffset += l;
}
// here you have the whole stream and can attach it to the email...
}
similar question on how to read bytes from database has been asked already countless times, see here for example: What is the most efficient way to read many bytes from SQL Server using SqlDataReader (C#)