How can I pause multi-file copy when one is working? - c#

I started a small project for fun and I liked it so much I started to expand it. I needed a simple file explorer (like windows one) to just see and open files but now expanding it I have a problem with multiple files, I want to copy some from directory A and paste them in B then while doing it I want to copy some files from directory C and paste them in D and if the copy A -> B is in progress the copy C -> D is paused and when the copy A -> B finishes the second copy can start. For now, I can copy files from A to B. Is there anything not too complex I can try?
I'm using a new form to display the progress bar, file name, file count, and size when starting a copy and I'm using a BackgroundWorker

I am assuming you are just calling File.Move or File.Copy those don't give the ability to pause the actual operation, you will have to write your own Move/Copy operations
eg. to copy the file you could do the following
public void CopyFile(string sourceFileName, string destFileName, bool overwrite)
{
var outputFileMode = overwrite ? FileMode.Create : FileMode.CreateNew;
using (var inputStream = new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var outputStream = new FileStream(destFileName, outputFileMode, FileAccess.Write, FileShare.None))
{
const int bufferSize = 16384; // 16 Kb
var buffer = new byte[bufferSize];
int bytesRead;
do
{
bytesRead = inputStream.Read(buffer, 0, bufferSize);
outputStream.Write(buffer, 0, bytesRead);
} while (bytesRead == bufferSize);
}
}
Now that you have this code you can simply add a while loop with a condition to "pause" the code eg.
while (_pause)
{
Thread.Sleep(100);
}
This while would go into the do loop from the above code.
Here the complete idea
public void CopyFile(string sourceFileName, string destFileName, bool overwrite)
{
var outputFileMode = overwrite ? FileMode.Create : FileMode.CreateNew;
using (var inputStream = new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var outputStream = new FileStream(destFileName, outputFileMode, FileAccess.Write, FileShare.None))
{
const int bufferSize = 16384; // 16 Kb
var buffer = new byte[bufferSize];
int bytesRead;
do
{
//run this loop until _pause = false
while (_pause)
{
Thread.Sleep(100);
}
bytesRead = inputStream.Read(buffer, 0, bufferSize);
outputStream.Write(buffer, 0, bytesRead);
} while (bytesRead == bufferSize);
}
}

You could use objects to represent different operations, something like
public interface IFileOperation {
void Execute(Func<double> ReportProgress, CancellationToken cancel);
}
You could then create a queue of multiple operations, and create a task on another thread to process each item
private CancellationTokenSource cts = new CancellationTokenSource();
public double CurrentProgress {get; private set;}
public void CancelCurrentOperation() => cts.Cancel();
public BlockingCollection<IFileOperation> Queue = new BlockingCollection<IFileOperation>(new ConcurrentQueue<IFileOperation>());
public void RunOnWorkerThread(){
foreach(var op in Queue .GetConsumingEnumerable()){
cts = new CancellationTokenSource();
CurrentProgress = 0;
op.Execute(p => Progress = p, cts.Token );
}
}
This will run file operations, one at a time, on a background thread, while allowing new operations to be added from the main thread. To report progress you would need a non-modal progress bar. I.e. instead of showing a dialog you should add a progress bar control somewhere in your UI. Otherwise you would not be able to add new operations without cancelling the current operation. You will also need some way to connect the progress bar to the currently running operation, For example by running a timer on the main thread that updates the property that the progress bar is bound to. You can either run the method as a long Running task, or as a dedicated thread.
You could, if you wish, add a pause/resume method to the FileOperation. The answer by Rand Random shows how to copy files manually, so I will skip this here. You could also create a UI that will show a list of all queued file operations, and allow removing queued tasks. You could even, with some more work, run multiple operations in parallel, and show separate progress for each one.

Related

Upload blocks in parallel in blob storage

I am trying to convert this to parallel to improve the upload times of a file but with what I have tried it has not had great changes in time.
I want to upload the blocks side-by-side and then confirm them. How could I manage to do it in parallel?
public static async Task UploadInBlocks
(BlobContainerClient blobContainerClient, string localFilePath, int blockSize)
{
string fileName = Path.GetFileName(localFilePath);
BlockBlobClient blobClient = blobContainerClient.GetBlockBlobClient(fileName);
FileStream fileStream = File.OpenRead(localFilePath);
ArrayList blockIDArrayList = new ArrayList();
byte[] buffer;
var bytesLeft = (fileStream.Length - fileStream.Position);
while (bytesLeft > 0)
{
if (bytesLeft >= blockSize)
{
buffer = new byte[blockSize];
await fileStream.ReadAsync(buffer, 0, blockSize);
}
else
{
buffer = new byte[bytesLeft];
await fileStream.ReadAsync(buffer, 0, Convert.ToInt32(bytesLeft));
bytesLeft = (fileStream.Length - fileStream.Position);
}
using (var stream = new MemoryStream(buffer))
{
string blockID = Convert.ToBase64String
(Encoding.UTF8.GetBytes(Guid.NewGuid().ToString()));
blockIDArrayList.Add(blockID);
await blobClient.StageBlockAsync(blockID, stream);
}
bytesLeft = (fileStream.Length - fileStream.Position);
}
string[] blockIDArray = (string[])blockIDArrayList.ToArray(typeof(string));
await blobClient.CommitBlockListAsync(blockIDArray);
}
Of course. You shouldn't expect any improvements - quite the opposite. Blob storage doesn't have any simplistic throughput throttling that would benefit from uploading in multiple streams, and you're already doing extremely light-weight I/O which is going to be entirely I/O bound.
Good I/O code has absolutely no benefits from parallelization. No matter how many workers you put on the job, the pipe is only this thick and will not allow you to pass more data through.
All your code just reimplements the already very efficient mechanisms that the blob storage library has... and you do it considerably worse, with pointless allocation, wrong arguments and new opportunities for bugs. Don't do that. The library can deal with streams just fine.

C# async/await progress reporting is not in expected order

I am experimenting with async/await and progress reporting and therefore have written an async file copy method that reports progress after every copied MB:
public async Task CopyFileAsync(string sourceFile, string destFile, CancellationToken ct, IProgress<int> progress) {
var bufferSize = 1024*1024 ;
byte[] bytes = new byte[bufferSize];
using(var source = new FileStream(sourceFile, FileMode.Open, FileAccess.Read)){
using(var dest = new FileStream(destFile, FileMode.Create, FileAccess.Write)){
var totalBytes = source.Length;
var copiedBytes = 0;
var bytesRead = -1;
while ((bytesRead = await source.ReadAsync(bytes, 0, bufferSize, ct)) > 0)
{
await dest.WriteAsync(bytes, 0, bytesRead, ct);
copiedBytes += bytesRead;
progress?.Report((int)(copiedBytes * 100 / totalBytes));
}
}
}
}
In a console application a create I file with random content of 10MB and then copy it using the method above:
private void MainProgram(string[] args)
{
Console.WriteLine("Create File...");
var dir = Path.GetDirectoryName(typeof(MainClass).Assembly.Location);
var file = Path.Combine(dir, "file.txt");
var dest = Path.Combine(dir, "fileCopy.txt");
var rnd = new Random();
const string chars = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
var str = new string(Enumerable
.Range(0, 1024*1024*10)
.Select(i => letters[rnd.Next(chars.Length -1)])
.ToArray());
File.WriteAllText(file, str);
var source = new CancellationTokenSource();
var token = source.Token;
var progress = new Progress<int>();
progress.ProgressChanged += (sender, percent) => Console.WriteLine($"Progress: {percent}%");
var task = CopyFileAsync(file, dest, token, progress);
Console.WriteLine("Start Copy...");
Console.ReadLine();
}
After the application has executed, both files are identical, so the copy process is carried out in the correct order. However, the Console output is something like:
Create File...
Start Copy...
Progress: 10%
Progress: 30%
Progress: 20%
Progress: 60%
Progress: 50%
Progress: 70%
Progress: 80%
Progress: 40%
Progress: 90%
Progress: 100%
The order differs every time I call the application. I don't understand this behaviour. If I put a Breakpoint to the event handler and check each value, they are in the correct order. Can anyone explain this to me?
I want to use this later in a GUI application with a progress bar and don't want to have it jumping back and forward all the time.
Progress<T> captures current SynchronizationContext when created. If there is no SynchronizationContext (like in console app) - progress callbacks will be scheduled to thread pool threads. That means multiple callbacks can even run in parallel, and of course order is not guaranteed.
In UI applications, posting to synchronization context is roughly equivalent to:
In WPF: Dispatcher.BeginInvoke()
In WinForms: Control.BeginInvoke
I'm not working with WinForms, but in WPF, multiple BeginInvoke with the same priority (and in this case they are with the same priority) are guaranteed to execute in order they were invoked:
multiple BeginInvoke calls are made at the same DispatcherPriority,
they will be executed in the order the calls were made.
I don't see why in WinForms Control.BeginInvoke might execute our of order, but I'm not aware of a proof like I provided above for WPF. So I think in both WPF and WinForms you can safely rely on your progress callbacks to be executed in order (provided that you created Progress<T> instance itself on UI thread so that context could be captured).
Site note: don't forget to add ConfigureAwait(false) to your ReadAsync and WriteAsync calls to prevent returning to UI thread in UI applications every time after those awaits.

FileStream.ReadAsync very slow compared to Read()

I have the following code to loop thru a file and read 1024 bytes at a time. The first iteration uses FileStream.Read() and the second iteration uses FileStream.ReadAsync().
private async void Button_Click(object sender, RoutedEventArgs e)
{
await Task.Run(() => Test()).ConfigureAwait(false);
}
private async Task Test()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int readSize;
int blockSize = 1024;
byte[] data = new byte[blockSize];
string theFile = #"C:\test.mp4";
long totalRead = 0;
using (FileStream fs = new FileStream(theFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
readSize = fs.Read(data, 0, blockSize);
while (readSize > 0)
{
totalRead += readSize;
readSize = fs.Read(data, 0, blockSize);
}
}
sw.Stop();
Console.WriteLine($"Read() Took {sw.ElapsedMilliseconds}ms and totalRead: {totalRead}");
sw.Reset();
sw.Start();
totalRead = 0;
using (FileStream fs = new FileStream(theFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, (blockSize*2), FileOptions.Asynchronous | FileOptions.SequentialScan))
{
readSize = await fs.ReadAsync(data, 0, blockSize).ConfigureAwait(false);
while (readSize > 0)
{
totalRead += readSize;
readSize = await fs.ReadAsync(data, 0, blockSize).ConfigureAwait(false);
}
}
sw.Stop();
Console.WriteLine($"ReadAsync() Took {sw.ElapsedMilliseconds}ms and totalRead: {totalRead}");
}
And the result:
Read() Took 162ms and totalRead: 162835040
ReadAsync() Took 15597ms and totalRead: 162835040
The ReadAsync() is about 100 times slower. Am I missing anything? The only thing I can think of is the overhead to create and destroy task using ReadAsync(), but is the overhead that much?
UPDATE:
I've changed the above code to reflect the suggestion by #Cory. There is a slight improvement:
Read() Took 142ms and totalRead: 162835040
ReadAsync() Took 12288ms and totalRead: 162835040
When I increase the read block size to 1MB as suggested by #Alexandru, the results are much more acceptable:
Read() Took 32ms and totalRead: 162835040
ReadAsync() Took 76ms and totalRead: 162835040
So, it hinted to me that it is indeed the overhead of the number of tasks which causes the slowness. But, if the creation and destroy of task only takes merely 100µs, things still don't really adds up for the slowness with a small block size.
Stick with big buffers if you're doing async and make sure to turn on async mode in the FileStream constructor, and you should be okay. Async methods that you await like this will trap in and out of the current thread (mind you the current thread is the UI thread in your case, which can be lagged by any other async method facilitating the same in and out thread trapping) and so there will be some overhead involved in this process if you have a large number of calls (imagine calling a new thread constructor and awaiting for it to finish about 100K times, and especially if you're dealing with a UI app where you need to wait for the UI thread to be free in order to trap back into it once the async function completes). So, to reduce these calls, we simply read in larger increments of data and focus the application on reading more data at a time by increasing the buffer size. Make sure to test this code in Release mode so all of the compiler optimizations are available to us and also such that the debugger does not slow us down:
class Program
{
static void Main(string[] args)
{
DoStuff();
Console.ReadLine();
}
public static async void DoStuff()
{
var filename = #"C:\Example.txt";
var sw = new Stopwatch();
sw.Start();
ReadAllFile(filename);
sw.Stop();
Console.WriteLine("Sync: " + sw.Elapsed);
sw.Restart();
await ReadAllFileAsync(filename);
sw.Stop();
Console.WriteLine("Async: " + sw.Elapsed);
}
static void ReadAllFile(string filename)
{
byte[] buffer = new byte[131072];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, buffer.Length, false))
while (true)
if (file.Read(buffer, 0, buffer.Length) <= 0)
break;
}
static async Task ReadAllFileAsync(string filename)
{
byte[] buffer = new byte[131072];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, buffer.Length, true))
while (true)
if ((await file.ReadAsync(buffer, 0, buffer.Length)) <= 0)
break;
}
}
Results:
Sync: 00:00:00.3092809
Async: 00:00:00.5541262
Pretty negligible...the file is about 1 GB.
Let's say I go even bigger, a 1 MB buffer, AKA new byte[1048576] (come on man, everyone has 1 MB of RAM these days):
Sync: 00:00:00.2925763
Async: 00:00:00.3402034
Then its just a few hundredths of a second difference. If you blink, you'll miss it.
Your method signature suggests you're doing this from an WPF app. While the blocking code will take up the UI thread during this time, the async code will be forced to go through the UI message queue every time an asynchronous operation completes, slowing it down and competing with any UI messages. You should try removing it from the UI thread like so:
void Button_Click(object sender, RoutedEventArgs e)
{
Task.Run(() => Button_Click_Impl());
}
async Task Button_Click_Impl()
{
// put code here.
}
Next, open the file in async mode. If you don't do this, async is emulated and will go much slower:
new FileStream(theFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 4096,
FileOptions.Asynchronous | FileOptions.SequentialScan)
Finally, you may also be able to extract some small performance using ConfigureAwait(false) to avoid moving between threads:
readSize = await fs.ReadAsync(data, 0, 1024).ConfigureAwait(false);
The overhead of a single ReadAsync operation is much higher than of a single Read operation (especially if you do not use the right mode upon opening the file, see other answers). If you eventually end up with the whole file in memory anyway, just query the file's size, allocate a large enough buffer and read all at once. Otherwise, you can still increase the buffer size to e.g. 32 MiB, or even larger if you expect larger file sizes. That should considerably speed up everything.
Only bother with launching a new task if there is considerable CPU-bound work for each block. Otherwise the UI should be kept responsive by the ReadAsync operation (with a sufficiently large buffer) taking their time (if it completes immediately, you may still be blocking the UI, see remarks at Task.Yield()).

Reading a lot of files "at the same time"

I'm using FileSystemWatcher in order to catch every created, changed, deleted and renamed change over whichever file in a folder.
Over this changes I need to perform a simple checksum of the contents of these files. Simply, I'm opening a filestream and pass it to MD5 class:
private byte[] calculateChecksum(string frl)
{
using (FileStream stream = File.Open(frl, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
return this.md5.ComputeHash(stream);
}
}
The problem is according the amount of files I need to handle. For example, imagine I have 200 files created along the time in a folder, and then I copy all of them and paste them on the same folder. This action is going to cause 200 event and 200 calculateChecksum() performings.
How could I solve this kind of problems?
In FileSystemWatcher handler put tasks to queue that will processed by some worker. Worker can process checksum calc tasks with targeted speed or/and frequency. Probably one worker will be better because many readers can slow down hdd with many read seeks.
Try read about BlockingCollection:
https://msdn.microsoft.com/ru-ru/library/dd997371(v=vs.110).aspx
and Producer-Consumer Dataflow Pattern
https://msdn.microsoft.com/ru-ru/library/hh228601(v=vs.110).aspx
var workerCount = 2;
BlockingCollection<String>[] filesQueues= new BlockingCollection<String>[workerCount];
for(int i = 0; i < workerCount; i++)
{
filesQueues[i] = new BlockingCollection<String>(500);
// Worker
Task.Run(() =>
{
while (!filesQueues[i].IsCompleted)
{
string url;
try
{
url= filesQueues[i].Take();
}
catch (InvalidOperationException) { }
if (!string.IsNullOrWhiteSpace(url))
{
calculateChecksum(url);
}
}
}
}
// inside of FileSystemWatcher handler
var queueIndex = hash(filename) % workersCount
// Warning!!
// Blocks if numbers.Count == dataItems.BoundedCapacity
filesQueues[queueIndex].Add(fileName);
filesQueues[queueIndex].CompleteAdding();
Also you can make multiple consumers, just call Take or TryTake concurrently - each item will only be consumed by a single consumer. But take into account in that case one file can be processed by many workers, and multiple hdd readers can slow down hdd.
UPD in case of multiple workers, it would be better to make multiple BlockingCollections, and push files in queue with index:
I've scketched a cosumer-producer pattern to solve that, and I've tried to use a thread pool in order to smooth the big amount of work, sharing a BlockingCollection
BlockingCollection & ThreadPool:
private BlockingCollection<Index.ResourceIndexDocument> documents;
this.pool = new SmartThreadPool(SmartThreadPool.DefaultIdleTimeout, 4);
this.documents = new BlockingCollection<string>();
As you cann see, I've created a I treadPool setting concurrency to 4. So, there is going to work only 4 thread at the same time regasdless of whether there is x > 4 work's units to handle in the pool.
Producer:
public void warn(string channel, string frl)
{
this.pool.QueueWorkItem<string, string>(
(file) => this.files.Add(file),
channel,
frl
);
}
Consumer:
Task.Factory.StartNew(() =>
{
Index.ResourceIndexDocument document = null;
while (this.documents.TryTake(out document, TimeSpan.FromSeconds(1)))
{
IEnumerable<Index.ResourceIndexDocument> documents = this.documents.Take(this.documents.Count);
Index.IndexEngine.Instance.index(documents);
}
},
TaskCreationOptions.LongRunning
);

Method that can be accessed by multiple threads

I want to write a method to open (or create if it does not exist) a file from different threads.
What would be the FileAccess and FileShare flags in this case? I tried both FileAccess.Read/Write and FileShare.Read/Write but don't see any differences. I used the following code to test, looks fine but not sure about the flags (last 2).
Can anybody clarify should I use FileAccess.ReadWrite or FileAccess.Read and FileShare.ReadWrite or FileShare.Read?
class Program
{
static void Main(string[] args)
{
Task first = Task.Run(() => AccessFile());
Task second = Task.Run(() => AccessFile());
Task third = Task.Run(() => AccessFile());
Task fourth = Task.Run(() => AccessFile());
Task.WaitAll(first, second, third, fourth);
Task[] tasks = new Task[100];
for (int i = 0; i < 100; i++)
{
tasks[i] = Task.Run(() => AccessFile());
}
Task.WaitAll(tasks);
}
public static void AccessFile()
{
string path = #"c:\temp\test.txt";
// FileShare.Write gives access violation
using (FileStream fs = File.Open(path, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite))
{
byte[] by = new byte[100];
fs.Read(by, 0, 100);
Console.WriteLine(Encoding.ASCII.GetString(by));
}
}
}
If multiple threads need to write to the file then you need an exclusive lock on it. If you could have some threads read and others write to the file you could use a ReaderWriterLockSlim.
The FileShare parameter is used to indicate how other threads/processes can access the file while a handle is being opened in the current thread. ReadWrite means that other threads will be able to read and write from this file which obviously means that if you try to write to the file from the current handle you will probably corrupt the contents.
Since you are only reading you should specify just that (FileAccess.Read). And you are OK with others reading so say that (FileShare.Read).
And here's nicer test code:
ParallelEnumerable.Range(0, 100000).ForAll(_ => AccessFile());

Categories