I'm using this method for asynchronously file copy with notifications and cancellation. It is good for copy files locally. But in case of cross drive copy its performance reduces, because at every moment only one drive works. The worst situation occurs when I copy large files from SSD to slow flash or vice versa.
Can any body advice me better solution? Maybe something based on producer-consumer pattern or there are some libraries? (I have searched, but without result)
P.S.: It is not method for direct use - it is wrapped in some others, which prepares files list and choose bufferSize
private static async Task<long> CopyFileAsync(
[NotNull]string sourcePath,
[NotNull]string destPath,
[NotNull]IProgress<FileCopyProgress> progress,
CancellationToken cancellationToken,
long bufferSize = 1024 * 1024 * 10
)
{
if (bufferSize <= 0)
{
throw new ArgumentException(nameof(bufferSize));
}
long totalRead = 0;
long fileSize;
var buffer = new byte[bufferSize];
using (var reader = File.Open(sourcePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
fileSize = reader.Length;
using (var writer = File.Create(destPath, Convert.ToInt32(bufferSize), FileOptions.Asynchronous))
{
while (totalRead < fileSize)
{
var readCount = await reader.ReadAsync(buffer, 0, Convert.ToInt32(bufferSize), cancellationToken).ConfigureAwait(false);
await writer.WriteAsync(buffer, 0, readCount, cancellationToken).ConfigureAwait(false);
totalRead += readCount;
progress.Report(new FileCopyProgress(totalRead, fileSize, null));
cancellationToken.ThrowIfCancellationRequested();
}
}
}
progress.Report(new FileCopyProgress(fileSize, fileSize, null));
return fileSize;
}
Related
Problem Statement:
I'm trying to iterate over a Streamed file upload in a HttpPut request using the Request.Body stream and I'm having a real hard time and my google-fu has turned up little. The situation is that I expect something like this to work and it doesn't:
[HttpPut("{accountName}/{subAccount}/{revisionId}/{randomNumber}")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status500InternalServerError)]
public async Task<IActionResult> PutTest()
{
var memStream = new MemoryStream();
var b = new Memory<byte>();
int totalBytes = 0;
int bytesRead = 0;
byte[] buffer = new byte[1024];
do
{
bytesRead = await Request.Body.ReadAsync(new Memory<byte>(buffer), CancellationToken.None);
totalBytes += bytesRead;
await memStream.WriteAsync(buffer, 0, bytesRead);
} while (bytesRead > 0);
return Ok(memStream);
}
In the debugger, I can examine the Request.Body and look at it's internal _buffer. It contains the desired data. When the above code runs, the MemoryStream is full of zeros. During "Read", the buffer is also full of zeros. The Request.Body also has a length of 0.
The Goal:
Use a HttpPut request to upload a file via streaming, iterate over it in chunks, do some processing, and stream those chunks using gRPC to another endpoint. I want to avoid reading the entire file into memory.
What I've tried:
This works:
using (var sr = new StreamReader(Request.Body))
{
var body = await sr.ReadToEndAsync();
return Ok(body);
}
That code will read all of the Stream into memory as a string which is quite undesirable, but it proves to me that the Request.Body data can be read in some fashion in the method I'm working on.
In the configure method of the Startup.cs class, I have included the following to ensure that buffering is enabled:
app.Use(async (context, next) => {
context.Request.EnableBuffering();
await next();
});
I have tried encapsulating the Request.Body in another stream like BufferedStream and FileBufferingReadStream and those don't make a difference.
I've tried:
var reader = new BinaryReader(Request.Body, Encoding.Default);
do
{
bytesRead = reader.Read(buffer, 0, buffer.Length);
await memStream.WriteAsync(buffer);
} while (bytesRead > 0);
This, as well, turns up a MemoryStream with all zeros.
I use to do this kind of request body stream a lot in my current project.
This works perfectly fine for me:
[HttpPut("{accountName}/{subAccount}/{revisionId}/{randomNumber}")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status500InternalServerError)]
public async Task<IActionResult> PutTest(CancellationToken cancel) {
using (var to = new MemoryStream()) {
var from = HttpContext.Request.Body;
var buffer = new byte[8 * 1024];
long totalBytes = 0;
int bytesRead;
while ((bytesRead = await from.ReadAsync(buffer, 0, buffer.Length, cancel)) > 0) {
await to.WriteAsync(buffer, 0, bytesRead, cancel);
totalBytes += bytesRead;
}
return Ok(to);
}
}
The only things I am doing different are:
I am creating the MemoryStream in a scoped context (using).
I am using a slightly bigger buffer (some trial and error led me to this specific size)
I am using a different overload of Stream.ReadAsync, where I pass the bytes[] buffer, the reading length and the reading start position as 0.
I am really struggling with the following piece of code.
Currently working on a Xamarin Android App.
I am downloading a file from a webserver, but in case of a network outage, i want to retry the download a few times. But after the second time an exception is thrown, it doesn't continue the download anymore.
The first time it works like a charm.
I have also tried it without the use of Polly, and tried it recursively with some delay built in. But no luck. After the exception has been thrown the download doesn't continue anymore.
:-( any idea what is causing this?
Something built into Android 9.0?
await Policy
.Handle<NetworkOnMainThreadException>()
.Or<Java.Net.UnknownHostException>()
.Or<SSLException>()
.WaitAndRetryAsync(new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(5),
TimeSpan.FromSeconds(10)
})
.ExecuteAsync(async () =>
{
totalRead = await DownloadFile(url, progress, totalRead, token);
});
Here are the DownloadFile and OpenStream functions
private async Task<long> DownloadFile(string url, IProgress<double> progress, long totalRead, CancellationToken token)
{
// Step 1 : Get call using HttpClient
var response = await _client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead, token);
if (!response.IsSuccessStatusCode)
{
throw new Exception(string.Format("The request returned with HTTP status code {0}", response.StatusCode));
}
// Step 2 : Filename
var fileName = url.Split('/').Last();
var buffer = new byte[bufferSize];
// Step 3 : Get total of data
var totalData = response.Content.Headers.ContentLength.GetValueOrDefault(-1L);
// Step 4 : Get the full path
var filePath = Path.Combine(_fileService.GetStorageFolderPath(), fileName);
// Step 5 : Download data
using (var fileStream = OpenStream(filePath, totalRead))
{
using (var inputStream = await response.Content.ReadAsStreamAsync())
{
int bytesRead;
while ((bytesRead = inputStream.Read(buffer, 0, buffer.Length)) > 0)
{
totalRead += bytesRead;
// Write data on disk.
await fileStream.WriteAsync(buffer, 0, bytesRead);
progress.Report((totalRead * 1d) / (totalData * 1d) * 100);
}
progress.Report(0);
}
}
return totalRead;
}
private Stream OpenStream(string path, long totalRead)
{
if (totalRead > 0)
{
return new FileStream(path, FileMode.Append, FileAccess.Write, FileShare.None, bufferSize);
}
else
{
return new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.None, bufferSize);
}
}
The background of the problem is to read data from a stream(IO-bound), process the data(CPU bound) then write to another stream(IO-bound).
The naive way is like this
thread 1: loop { |<--read data block from stream-->|<--process data block-->|<--write to stream-->| }
A naive producer-consumer pattern
thread 1: loop { |<--read data block from stream-->| enqueue data block to blocking queue A }
thread 2: loop { dequeue data block from blocking queue A |<--process data block-->| enqueue data block to blocking queue B }
thread 3: loop { dequeue data block from blocking queue B |<--write to stream-->| }
A stream example is as following
var hasher = MD5.Create();
using (FileStream readStream = new FileStream("filePath", FileMode.Open))
using (BufferedStream readBs = new BufferedStream(readStream ))
using (CryptoStream md5HashStream = new CryptoStream(readBs, hasher, CryptoStreamMode.Read))
using (FileStream writeStream= File.OpenWrite("destPath"))
using (BufferedStream writeBs = new BufferedStream(writeStream))
{
md5HashStream.CopyTo(writeBs);
}
How to use C# async tricks such as async stream, channel, dataflow to transform the above stream sample to a producer-customer pattern to cut the blocking io time?
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
var query =
Observable.Using(() => new FileStream(#"filePath", FileMode.Open), readStream =>
Observable.Using(() => new BufferedStream(readStream), readBs =>
Observable.Using(() => MD5.Create(), hasher =>
Observable.Using(() => new CryptoStream(readBs, hasher, CryptoStreamMode.Read), md5HashStream =>
Observable.Using(() => File.OpenWrite(#"destPath"), writeStream => Observable.Using(() => new BufferedStream(writeStream), writeBs =>
Observable.FromAsync(() => md5HashStream.CopyToAsync(writeBs))))))));
query.Wait(); // or await query;
I consistently get results 2x to 5x times faster than your original code.
You can use the ReadAsync and WriteAsync methods of the streams to await the io operations and read some fixed blockSize number of bytes into a buffer. However since ReadAsync may read less bytes as wanted, you need to makesure to read blockSize bytes in with a loop.
int blockSize = 1024;
using (FileStream readStream = new FileStream("filePath", FileMode.Open))
using (BufferedStream readBs = new BufferedStream(readStream ))
using (FileStream writeStream = File.OpenWrite("destPath"))
using (BufferedStream writeBs = new BufferedStream(writeStream))
{
int offset;
var buffer = new byte[blockSize];
do {
offset = 0;
while (offset < buffer.Length)
{ // make sure to read blockSize bytes
var bytesRead = await readBs.ReadAsync(buffer, offset, buffer.Length - offset);
if (bytesRead == 0) break;
offset += bytesRead;
}
if (offset > 0)
{
var result = DoSomethingWithData(buffer, offset); // assumtion: retuns a new byte[] with only relevant data
await writeBs.WriteAsync(result, 0, result.Length);
}
} while (0 < offset);
}
We're trying to measure the performance between reading a series of files using sync methods vs async. Was expecting to have about the same time between the two but turns out using async is about 5.5x slower.
This might be due to the overhead of managing the threads but just wanted to know your opinion. Maybe we're just measuring the timings wrong.
These are the methods being tested:
static void ReadAllFile(string filename)
{
var content = File.ReadAllBytes(filename);
}
static async Task ReadAllFileAsync(string filename)
{
using (var file = File.OpenRead(filename))
{
using (var ms = new MemoryStream())
{
byte[] buff = new byte[file.Length];
await file.ReadAsync(buff, 0, (int)file.Length);
}
}
}
And this is the method that runs them and starts the stopwatch:
static void Test(string name, Func<string, Task> gettask, int count)
{
Stopwatch sw = new Stopwatch();
Task[] tasks = new Task[count];
sw.Start();
for (int i = 0; i < count; i++)
{
string filename = "file" + i + ".bin";
tasks[i] = gettask(filename);
}
Task.WaitAll(tasks);
sw.Stop();
Console.WriteLine(name + " {0} ms", sw.ElapsedMilliseconds);
}
Which is all run from here:
static void Main(string[] args)
{
int count = 10000;
for (int i = 0; i < count; i++)
{
Write("file" + i + ".bin");
}
Console.WriteLine("Testing read...!");
Test("Read Contents", (filename) => Task.Run(() => ReadAllFile(filename)), count);
Test("Read Contents Async", (filename) => ReadAllFileAsync(filename), count);
Console.ReadKey();
}
And the helper write method:
static void Write(string filename)
{
Data obj = new Data()
{
Header = "random string size here"
};
int size = 1024 * 20; // 1024 * 256;
obj.Body = new byte[size];
for (var i = 0; i < size; i++)
{
obj.Body[i] = (byte)(i % 256);
}
Stopwatch sw = new Stopwatch();
sw.Start();
MemoryStream ms = new MemoryStream();
Serializer.Serialize(ms, obj);
ms.Position = 0;
using (var file = File.Create(filename))
{
ms.CopyToAsync(file).Wait();
}
sw.Stop();
//Console.WriteLine("Writing file {0}", sw.ElapsedMilliseconds);
}
The results:
-Read Contents 574 ms
-Read Contents Async 3160 ms
Will really appreciate if anyone can shed some light on this as we searched the stack and the web but can't really find a proper explanation.
There are lots of things wrong with the testing code. Most notably, your "async" test does not use async I/O; with file streams, you have to explicitly open them as asynchronous or else you're just doing synchronous operations on a background thread. Also, your file sizes are very small and can be easily cached.
I modified the test code to write out much larger files, to have comparable sync vs async code, and to make the async code asynchronous:
static void Main(string[] args)
{
Write("0.bin");
Write("1.bin");
Write("2.bin");
ReadAllFile("2.bin"); // warmup
var sw = new Stopwatch();
sw.Start();
ReadAllFile("0.bin");
ReadAllFile("1.bin");
ReadAllFile("2.bin");
sw.Stop();
Console.WriteLine("Sync: " + sw.Elapsed);
ReadAllFileAsync("2.bin").Wait(); // warmup
sw.Restart();
ReadAllFileAsync("0.bin").Wait();
ReadAllFileAsync("1.bin").Wait();
ReadAllFileAsync("2.bin").Wait();
sw.Stop();
Console.WriteLine("Async: " + sw.Elapsed);
Console.ReadKey();
}
static void ReadAllFile(string filename)
{
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, false))
{
byte[] buff = new byte[file.Length];
file.Read(buff, 0, (int)file.Length);
}
}
static async Task ReadAllFileAsync(string filename)
{
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, true))
{
byte[] buff = new byte[file.Length];
await file.ReadAsync(buff, 0, (int)file.Length);
}
}
static void Write(string filename)
{
int size = 1024 * 1024 * 256;
var data = new byte[size];
var random = new Random();
random.NextBytes(data);
File.WriteAllBytes(filename, data);
}
On my machine, this test (built in Release, run outside the debugger) yields these numbers:
Sync: 00:00:00.4461936
Async: 00:00:00.4429566
All I/O Operation are async. The thread just waits(it gets suspended) for I/O operation to finish. That's why when read jeffrey richter he always tells to do i/o async, so that your thread is not wasted by waiting around.
from Jeffery Ricter
Also creating a thread is not cheap. Each thread gets 1 mb of address space reserved for user mode and another 12kb for kernel mode. After this the OS has to notify all the dll in system that a new thread has been spawned.Same happens when you destroy a thread. Also think about the complexities of context switching
Found a great SO answer here
I'm trying to return large files via a controller ActionResult and have implemented a custom FileResult class like the following.
public class StreamedFileResult : FileResult
{
private string _FilePath;
public StreamedFileResult(string filePath, string contentType)
: base(contentType)
{
_FilePath = filePath;
}
protected override void WriteFile(System.Web.HttpResponseBase response)
{
using (FileStream fs = new FileStream(_FilePath, FileMode.Open, FileAccess.Read))
{
int bufferLength = 65536;
byte[] buffer = new byte[bufferLength];
int bytesRead = 0;
while (true)
{
bytesRead = fs.Read(buffer, 0, bufferLength);
if (bytesRead == 0)
{
break;
}
response.OutputStream.Write(buffer, 0, bytesRead);
}
}
}
}
However the problem I am having is that entire file appears to be buffered into memory. What would I need to do to prevent this?
You need to flush the response in order to prevent buffering. However if you keep on buffering without setting content-length, user will not see any progress. So in order for users to see proper progress, IIS buffers entire content, calculates content-length, applies compression and then sends the response. We have adopted following procedure to deliver files to client with high performance.
FileInfo path = new FileInfo(filePath);
// user will not see a progress if content-length is not specified
response.AddHeader("Content-Length", path.Length.ToString());
response.Flush();// do not add anymore headers after this...
byte[] buffer = new byte[ 4 * 1024 ]; // 4kb is a good for network chunk
using(FileStream fs = path.OpenRead()){
int count = 0;
while( (count = fs.Read(buffer,0,buffer.Length)) >0 ){
if(!response.IsClientConnected)
{
// network connection broke for some reason..
break;
}
response.OutputStream.Write(buffer,0,count);
response.Flush(); // this will prevent buffering...
}
}
You can change buffer size, but 4kb is ideal as lower level file system also reads buffer in chunks of 4kb.
Akash Kava is partly right and partly wrong. You DO NOT need to add the Content-Length header or do the flush afterward. But you DO, need to periodically flush response.OutputStream and then response. ASP.NET MVC (at least version 5) will automatically convert this into a "Transfer-Encoding: chunked" response.
byte[] buffer = new byte[ 4 * 1024 ]; // 4kb is a good for network chunk
using(FileStream fs = path.OpenRead()){
int count = 0;
while( (count = fs.Read(buffer,0,buffer.Length)) >0 ){
if(!response.IsClientConnected)
{
// network connection broke for some reason..
break;
}
response.OutputStream.Write(buffer,0,count);
response.OutputStream.Flush();
response.Flush(); // this will prevent buffering...
}
}
I tested it and it works.