How to force FileSystemWatcher to wait till the file downloaded? - c#

I am downloading a file and want to execute the install only after the download is complete. How do I accomplish this? Seems like FileSystemWatcher onCreate event would do this but this happens in a different thread, is there a simple way to force the waiting part to happen in the same thread.
Code I have so far
FileSystemWatcher w = new FileSystemWatcher(#"C:/downloads");
w.EnableRaisingEvents = true;
w.Created += new FileSystemEventHandler(FileDownloaded);
static void FileDownloaded(object source, FileSystemEventArgs e)
{
InstallMSI(e.FullPath);
}
I looked at SynchronizingObject and WaitForChangedResult but didn't get a solid working sample.

Try:
FileInfo fInfo = new FileInfo(e.FullPath);
while(IsFileLocked(fInfo)){
Thread.Sleep(500);
}
InstallMSI(e.FullPath);
static bool IsFileLocked(FileInfo file)
{
FileStream stream = null;
try {
stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch (IOException) {
return true;
}
finally {
if (stream != null)
stream.Close();
}
return false;
}

If you insist on using FileSystemWatcher you would probably have to account for the fact that a file of some size isn't created (uploaded) in one single operation. The filesystem is likely to produce 1 created and x changed events before the file is ready for use.
You could catch the created events and create new (dedicated) threads (unless you already have an ongoing thread for that file) in which you loop and periodically try to open the file exclusively. If you succeed, the file is ready.

One technique would be to download to the temporary directory, and then move it into C:/downloads once it was complete.

If you are using WebClient to download, you can use set the client's DownloadFileCompleted eventhandler.
If you do it this way you can also use client.DownloadFileAsync() to make it download asynchronously.

Related

Wait till all files are downloaded and wait till proccessing with those files are finished

I have a listOfFilesToDownload. I want to download all files in list in parallel
.........
Parallel.ForEach(listOfFilesToDownload, (file) =>
{
SaveFile(file, myModel);
});
private static void SaveFile(string file, MyType myModel)
{
filePath = "...";
try
{
using (WebClient webClient = new WebClient())
{
webClient.DownloadFileTaskAsync(file, filePath)
}
//some time consuming proccess with downloaded file
}
catch (Exception ex)
{
}
}
In SaveFile method I download the file, then I want to wait till it is downloaded, then make some processing with this file, and wait till this processing is finished. The full iteration have to be - download file and process it
So, the questions are:
how to wait till the file is downloaded in the best way, so nothing is blocked and with maximum performance (I mean if I would use just DownloadFile it will block the thread till the file downloading, and I think this is not so good)
How to ensure that the file is downloaded and only then start processing it (cause if I start to process not existing file or not fully downloaded file I will have an error or wrong data)
How to be sure processing with file is finished (because I tried to use webClient.DownloadFileCompleted event and process the file there, but I didn't manage to ensure that the processing is finished, example down below)
In complex the question is how to wait for a file to download asynchronously AND wait till it's processed
using (WebClient webClient = new WebClient())
{
webClient.DownloadFileCompleted += DownloadFileCompleted(filePath, myModel);
webClient.DownloadFileTaskAsync(file, filePath);
}
DownloadFileCompleted returns AsyncCompletedEventHandler:
public static AsyncCompletedEventHandler DownloadFileCompleted(string filePath, MyType myModel)
{
Action<object, AsyncCompletedEventArgs> action = (sender, e) =>
{
if (e.Error != null)
return;
//some time consuming proccess with downloaded file
};
return new AsyncCompletedEventHandler(action);
}
Many thanks in advance!
Have you considered Task.WhenAll? Something like:
var tasks = listOfFilesToDownload
.AsParallel()
.Select(f => SaveFile(f, myModel))
.ToList();
await Task.WhenAll(tasks);
private static async Task SaveFile(string file, MyType myModel)
{
filePath = "...";
using (WebClient webClient = new WebClient())
{
await webClient.DownloadFileTaskAsync(file, filePath);
// process downloaded file
}
}
The .AsParallel() call is helpful if you have CPU-bound work you're doing after downloading the file. Otherwise you're better off without it.
As stated on this answer The whole idea behind Parallel.ForEach() is that you have a set of threads and each thread processes part of the collection so you can't await the saving part to finish. What you could do is to use Dataflow instead of Parallel.ForEach, which supports asynchronous Tasks well.
Like this:
var downloadTasks = listOfFilesToDownload.Select(file =>
{
SaveFile(file, myModel);
});
var downloaded = await Task.WhenAll(customerTasks);
You await until all the files are saved.
Other answers on the same question might be useful to you.

Reading a lot of files "at the same time"

I'm using FileSystemWatcher in order to catch every created, changed, deleted and renamed change over whichever file in a folder.
Over this changes I need to perform a simple checksum of the contents of these files. Simply, I'm opening a filestream and pass it to MD5 class:
private byte[] calculateChecksum(string frl)
{
using (FileStream stream = File.Open(frl, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
return this.md5.ComputeHash(stream);
}
}
The problem is according the amount of files I need to handle. For example, imagine I have 200 files created along the time in a folder, and then I copy all of them and paste them on the same folder. This action is going to cause 200 event and 200 calculateChecksum() performings.
How could I solve this kind of problems?
In FileSystemWatcher handler put tasks to queue that will processed by some worker. Worker can process checksum calc tasks with targeted speed or/and frequency. Probably one worker will be better because many readers can slow down hdd with many read seeks.
Try read about BlockingCollection:
https://msdn.microsoft.com/ru-ru/library/dd997371(v=vs.110).aspx
and Producer-Consumer Dataflow Pattern
https://msdn.microsoft.com/ru-ru/library/hh228601(v=vs.110).aspx
var workerCount = 2;
BlockingCollection<String>[] filesQueues= new BlockingCollection<String>[workerCount];
for(int i = 0; i < workerCount; i++)
{
filesQueues[i] = new BlockingCollection<String>(500);
// Worker
Task.Run(() =>
{
while (!filesQueues[i].IsCompleted)
{
string url;
try
{
url= filesQueues[i].Take();
}
catch (InvalidOperationException) { }
if (!string.IsNullOrWhiteSpace(url))
{
calculateChecksum(url);
}
}
}
}
// inside of FileSystemWatcher handler
var queueIndex = hash(filename) % workersCount
// Warning!!
// Blocks if numbers.Count == dataItems.BoundedCapacity
filesQueues[queueIndex].Add(fileName);
filesQueues[queueIndex].CompleteAdding();
Also you can make multiple consumers, just call Take or TryTake concurrently - each item will only be consumed by a single consumer. But take into account in that case one file can be processed by many workers, and multiple hdd readers can slow down hdd.
UPD in case of multiple workers, it would be better to make multiple BlockingCollections, and push files in queue with index:
I've scketched a cosumer-producer pattern to solve that, and I've tried to use a thread pool in order to smooth the big amount of work, sharing a BlockingCollection
BlockingCollection & ThreadPool:
private BlockingCollection<Index.ResourceIndexDocument> documents;
this.pool = new SmartThreadPool(SmartThreadPool.DefaultIdleTimeout, 4);
this.documents = new BlockingCollection<string>();
As you cann see, I've created a I treadPool setting concurrency to 4. So, there is going to work only 4 thread at the same time regasdless of whether there is x > 4 work's units to handle in the pool.
Producer:
public void warn(string channel, string frl)
{
this.pool.QueueWorkItem<string, string>(
(file) => this.files.Add(file),
channel,
frl
);
}
Consumer:
Task.Factory.StartNew(() =>
{
Index.ResourceIndexDocument document = null;
while (this.documents.TryTake(out document, TimeSpan.FromSeconds(1)))
{
IEnumerable<Index.ResourceIndexDocument> documents = this.documents.Take(this.documents.Count);
Index.IndexEngine.Instance.index(documents);
}
},
TaskCreationOptions.LongRunning
);

FileSystemWatcher skips some events

If you google for FileSystemWatcher issues, you will find a lot of articles about FileSystemWatcher skipping some events (not firing all events). Basically, if you change a lot of files in watched folder some of them will not be processes by FileSystemWatcher.
Why is that so, and how can I avoid missing events?
Cause
FileSystemWatcher is watching for changes happening in some folder. When file is changed (e.g. file is created), the FileSystemWatcher raises the appropriate event. The event handler might unzip the file, read its content to decide how to process it further, write record of it in database log table and move the file to another folder. The processing of the file might take some time.
During that time another file might be created in watched folder. Since FileSystemWatcher’s event handler is processing the first file, it cannot handle creation event of second file. So, the second file is missed by FileSystemWatcher.
Solution
Since file processing might take some time and creation of other files might get undetected by FileSystemWatcher, file processing should be separated from file change detection and file change detection should be so short that it never misses single file change. File handling can be divided into two threads: one for the file change detection and the other for the file processing. When file is changed and it is detected by FileSystemWatcher, appropriate event handler should only read its path, forward it to file processing thread and close itself so FileSystemWatcher can detect another file change and use the same event handler. The processing thread could take as much time as it needs to process the file. A queue is used for forwarding file path from event handler thread to the processing thread.
This is classic producer-consumer problem. More about producer-consumer queue can be found here.
Code
using System;
using System.IO;
using System.Threading;
using System.Collections.Generic;
namespace FileSystemWatcherExample {
class Program {
static void Main(string[] args) {
// If a directory and filter are not specified, exit program
if (args.Length !=2) {
// Display the proper way to call the program
Console.WriteLine("Usage: Watcher.exe \"directory\" \"filter\"");
return;
}
FileProcessor fileProcessor = new FileProcessor();
// Create a new FileSystemWatcher
FileSystemWatcher fileSystemWatcher1 = new FileSystemWatcher();
// Set FileSystemWatcher's properties
fileSystemWatcher1.Path = args[0];
fileSystemWatcher1.Filter = args[1];
fileSystemWatcher1.IncludeSubdirectories = false;
// Add event handlers
fileSystemWatcher1.Created += new System.IO.FileSystemEventHandler(this.fileSystemWatcher1_Created);
// Start to watch
fileSystemWatcher1.EnableRaisingEvents = true;
// Wait for the user to quit the program
Console.WriteLine("Press \'q\' to quit the program.");
while(Console.Read()!='q');
// Turn off FileSystemWatcher
if (fileSystemWatcher1 != null) {
fileSystemWatcher1.EnableRaisingEvents = false;
fileSystemWatcher1.Dispose();
fileSystemWatcher1 = null;
}
// Dispose fileProcessor
if (fileProcessor != null)
fileProcessor.Dispose();
}
// Define the event handler
private void fileSystemWatcher1_Created(object sender, FileSystemEventArgs e) {
// If file is created...
if (e.ChangeType == WatcherChangeTypes.Created) {
// ...enqueue it's file name so it can be processed...
fileProcessor.EnqueueFileName(e.FullPath);
}
// ...and immediately finish event handler
}
}
// File processor class
class FileProcessor : IDisposable {
// Create an AutoResetEvent EventWaitHandle
private EventWaitHandle eventWaitHandle = new AutoResetEvent(false);
private Thread worker;
private readonly object locker = new object();
private Queue<string> fileNamesQueue = new Queue<string>();
public FileProcessor() {
// Create worker thread
worker = new Thread(Work);
// Start worker thread
worker.Start();
}
public void EnqueueFileName(string FileName) {
// Enqueue the file name
// This statement is secured by lock to prevent other thread to mess with queue while enqueuing file name
lock (locker) fileNamesQueue.Enqueue(FileName);
// Signal worker that file name is enqueued and that it can be processed
eventWaitHandle.Set();
}
private void Work() {
while (true) {
string fileName = null;
// Dequeue the file name
lock (locker)
if (fileNamesQueue.Count > 0) {
fileName = fileNamesQueue.Dequeue();
// If file name is null then stop worker thread
if (fileName == null) return;
}
if (fileName != null) {
// Process file
ProcessFile(fileName);
} else {
// No more file names - wait for a signal
eventWaitHandle.WaitOne();
}
}
}
private ProcessFile(string FileName) {
// Maybe it has to wait for file to stop being used by process that created it before it can continue
// Unzip file
// Read its content
// Log file data to database
// Move file to archive folder
}
#region IDisposable Members
public void Dispose() {
// Signal the FileProcessor to exit
EnqueueFileName(null);
// Wait for the FileProcessor's thread to finish
worker.Join();
// Release any OS resources
eventWaitHandle.Close();
}
#endregion
}
}

How can I reliably move files (asynchronously) from a directory as they're being created?

In my Windows Service solution have a FileSystemWatcher monitoring a directory tree for new files, and whenever it fires a Created event I am trying to move the files asynchronously to another server for further processing. Here's the code:
foreach (string fullFilePath in
Directory.EnumerateFiles(directoryToWatch, "*.*",
SearchOption.AllDirectories)
.Where(filename => fileTypes.Contains(Path.GetExtension(filename))))
{
string filename = Path.GetFileName(fullFilePath);
using (FileStream sourceStream = File.Open(filename, FileMode.Open, FileAccess.Read))
{
using (FileStream destStream = File.Create(Path.Combine(destination, filename)))
{
await sourceStream.CopyToAsync(destStream);
}
}
}
The problem is that as these files are being copied into the folder I'm watching, they're not always unlocked and available to me. I want to "retry" when I hit a locked file, but I'm not accustomed to thinking asynchronously, so I have no idea how to put the error'ed file back in the queue.
First of all you need to 'detect' the exceptions thrown in process of asynchronous execution. This can be done by something like this:
try
{
await sourceStream.CopyToAsync(destStream);
}
catch (Exception copyException)
{
}
Once an exception is detected and properly handled i.e. you decide that one particular exception is a reason for a retry, you will have to maintain your own queue of copy targets (and destinations) that are due for a retry.
Then you will have to organize a new entry point that would lead to a retry itself. Such an entry point could be triggered by a timer or a next event from the file system monitor you use (which I would not recommend). You will also have to implement a detection for an overflow of your queue for a case of multiple failures. Keep in mind that such overflow detection is also present in the file system monitor which can simply skip a notification if there are too many system events (many files appear to be copied into the monitored folders at once).
If these matters do not bother you much I would suggest you to implement a timer or to be more precise a timeout in order to retry the copy task.
If on the other hand you need a robust solution I would implement a file system monitor myself.
Concerning the timeout it could look like this:
private Queue<myCopyTask> queue;
private Timer retryTimeout;
public Program()
{
retryTimeout = new Timer(QueueProcess, null, Timeout.Infinite, Timeout.Infinite);
}
private void FileSystemMonitorEventhandler()
{
//New tasks are provided by the file system monitor.
myCopyTask newTask = new myCopyTask();
newTask.sourcePath = "...";
newTask.destinationPath = "...";
//Keep in mind that queue is touched from different threads.
lock (queue)
{
queue.Enqueue(newTask);
}
//Keep in mind that Timer is touched from different threads.
lock (retryTimeout)
{
retryTimeout.Change(1000, Timeout.Infinite);
}
}
//Start this routine only via Timer.
private void QueueProcess(object iTimeoutState)
{
myCopyTask task = null;
do
{
//Keep in mind that queue is touched from different threads.
lock (queue)
{
if (queue.Count > 0)
{
task = queue.Dequeue();
}
}
if (task != null)
{
CopyTaskProcess(task);
}
} while (task != null);
}
private async void CopyTaskProcess(myCopyTask task)
{
FileStream sourceStream = null;
FileStream destStream = null;
try
{
sourceStream = File.OpenRead(task.sourcePath);
destStream = File.OpenWrite(task.destinationPath);
await sourceStream.CopyToAsync(destStream);
}
catch (Exception copyException)
{
task.retryCount++;
//In order to avoid instant retries on several problematic tasks you probably
//should involve a mechanism to delay retries. Keep in mind that this approach
//delays worker thread that is implicitly involved by await keyword.
Thread.Sleep(100);
//Keep in mind that queue is touched from different threads.
lock (queue)
{
queue.Enqueue(task);
}
//Keep in mind that Timer is touched from different threads.
lock (retryTimeout)
{
retryTimeout.Change(1000, Timeout.Infinite);
}
}
finally
{
if (sourceStream != null)
{
sourceStream.Close();
}
if (destStream != null)
{
destStream.Close();
}
}
}
}
internal class myCopyTask
{
public string sourcePath;
public string destinationPath;
public long retryCount;
}

StreamWriter Creates Zero-Byte File

I have a Task that reads strings from a blocking collection and is supposed to write them out to a file. Trouble is, while the file is created, the size of the file is 0 bytes after the task completes.
While debugging, I see that non-empty lines are retrieved from the blocking collection, and the stream writer is wrapped in a using block.
For debugging I threw in a flush that should not be required and write the lines to the console. There are 100 non-empty lines of text read from the blocking collection.
// Stuff is placed in writeQueue from a different task
BlockingCollection<string> writeQueue = new BlockingCollection<string>();
Task writer = Task.Factory.StartNew(() =>
{
try
{
while (true)
{
using (FileStream fsOut = new FileStream(destinationPath, FileMode.Create, FileAccess.Write))
using (BufferedStream bsOut = new BufferedStream(fsOut))
using (StreamWriter sw = new StreamWriter(bsOut))
{
string line = writeQueue.Take();
Console.WriteLine(line); // Stuff is written to the console
sw.WriteLine(line);
sw.Flush(); // Just in case, makes no difference
}
}
}
catch (InvalidOperationException)
{
// We're done.
}
});
Stepping through in the debugger, I see that the program terminates in an orderly manner. There are no unhandled exceptions.
What might be going wrong here?
You are re-creating the file on every run of the loop. Change the FileMode.Create to FileMode.Append and it will keep the previous values you wrote on it.
Also, using exceptions to detect that you should stop is a really bad practice, if this a consumer-producer solution, you can easily do better by having the producer setting a thread safe flag variable signaling it has finished the work and will not produce anything else.

Categories