I download GB's of stuff every day. And I get all OCD and organize files and folders so many times during the day and it's driving me nuts.
So I plan on writing an app that detects when a file has finished downloading (to the Windows Downloads folder), and then places it in its relevant categorized folder.
E.g.:
I download an app. When the app detects that the file has finished downloading, it places it into Sub-folder Applciations. Or, when I finish downloading a Document, the document is then placed inside the Documents sub-folder of the Downloads folder.
The problem I have here is that I don't want to do this unless there is a definitive way to tell whether a file has finished downloading.
Things I have thought of:
I have thought about implementing FileSystemWatcher on the Downloads folder, and when a new file is created there, it gets added to a list. And when FileSystemWatcher detects that the file size has changed, or has been modified, it will start a timer; the purpose of this timer is to determine after x amount of seconds whether the download is complete. It does this by assuming (wrongly) that if a file's size has not increased in a specified period of time, the download is complete.
That's all I can think of. Any ideas on how this kind of thing can be accomplished?
File is blocked when it is accessed. Not every file. But you could check whether the file is open by another application. If the file is not open - this should tell you, that it has downloaded completely.
Related
Context: A team of operators work with large batch files up to 10GB in size in a third party application. Each file contains thousands of images and after processing every 50 images, they hit the save button. The work place has unreliable power and if the power goes out during a save, the entire file becomes corrupt. To overcome this, I am writing a small utility using the FileSystemWatcher to detect saves and create a backup so that it may be restored without the need to reprocess the entire batch.
Problem: The FileSystemWatcher does a very good job of reporting events but there is a problem I can't pinpoint. Since the monitored files are large in size, the save process takes a few seconds. I want to to be notified once the save operation is complete. I suspect that every time the file buffer is flushed to disk, it triggers an unwanted event. The file remains locked for writing whether or not the a save is in progress so I cannot tell that way.
Creating a backup of the file DURING a save operation defeats the purpose since it corrupts the backed file.
Question:
Is there a way to use the FileSystemWatcher to be notified after the save operation is complete?
If not, how else could I reliably check to see if the file is still being written to?
Alternatives: Any alternative suggestions would be welcome as well.
There's really no direct way to do exactly what you're trying to do. The file system itself doesn't know when a save operation is completed. In logical terms, you may think of it as a series of saves simply because the user clicks the Save button multiple times, but that isn't how the file system sees it. As long as the application has the file locked for writing, as far as the file system is concerned it is still in the process of being saved.
If you think about it, it makes sense. If the application holds onto write access to the file, how would the file system know when the file is in a "corrupt" state and when it's not? Only the application writing the file knows that.
If you have access to the application writing the file, you might be able to solve this problem. Failing that, you might be able to get something with the last modified date, creating a backup only if the file isn't modified for a certain period of time, but that is bound to be buggy and unreliable.
I'm writing an application that runs on a file server and monitors files being dropped in a shared folder.
I'm using a FileSystem watcher and it's working well with files, however I would like to process folders as well so if someone drops a folder into the share then it gets zipped up and treated like the files.
I don't know however how to check when all the files in the directory (and sub directories) have finished copying.
My best idea so far has been to start a timer and test every 10 seconds to see if the contents is different from the previous 10 seconds and if any of the files are still locked. Then when no files are locked and the file contents is the same after 10 seconds then process the folder.
Is there a better way of doing this?
I have a project that uses the .net FileSystemWatcher to watch a Samba network share for video files. When it sees a file, it adds it to an encode queue. When files are dequeued, they are moved to a local directory where the process then encodes the file to several different formats and spits them out to an output directory.
The problem arises because the video files are so big, that it often takes several minutes for them to copy completely into the network directory, so when a file is dequeued, it may or may not have completely finished being copied to the network share. When the file is being copied from a windows machine, I am able to work around it because trying to move a file that is still being copied throws an IOException. I simply catch the exception and retry every few seconds until it is done copying.
When a file is dropped into the Samba share from a computer running OS X however, that IOException is not thrown. Instead, a partial file is copied to the working directory which then fails to encode because it is not a valid video file.
So my question is, is there any way to make the FileSystemWatcher wait for files to be completely written before firing its "Created" event (based on this question I think the answer to that question is "no")? Alternatively, is there a way to get files copied from OS X to behave similarly to those in windows? Or do I need to find another solution for watching the Samba share? Thanks for any help.
Option 3. Your best bet is to have a process that watches the incoming share for files. When it sees a file, note its size and/or modification date.
Then, after some amount of time (like, 1 or 2 seconds), look again. Note any files that were seen before and compare their new sizes/mod dates to the one you saw last time.
Any file that has not changed for some "sufficiently long" period of time (1s? 5s?) is considered "done".
Once you have a "done" file, MOVE/rename that file to another directory. It is from THIS directory that your loading process can run. It "knows" that only files that are complete are in this directory.
By having this two stage process, you are able to later possibly add other rules for acceptance of a file, since all of those rules must pass before the file gets moved to its proper staging area (you can check format, check size, etc.) beyond a simple rule of just file existence.
Your later process can rely on file existence, both as a start mechanism and a restart mechanism. When the process restarts after failure or shut down, it can assume that any files in the second staging are either new or incomplete and take appropriate action based on its own internal state. When the processing is done it can choose to either delete the file, or move it to a "finished" area for archiving or what not.
This is what I'm trying to do :
Download a file (txt, doc, xls, whatever) from a server
Open the file with the appropriate application using System.Diagnostics.Process.Start(path to file)
Monitor for file changes using a FileSystemWatcher.
Each time the file is changed, upload the file back to the server
Continue monitoring until the user has finished editing the file
Delete the local copy of the file
Exit the application
I'm stuck at step 5. How can I know whether a user has finished working on a file ?
I cannot rely on the file being locked (notepad doesn't lock txt files for example).
I cannot rely on a process having exited or not (an example is Notepad++ for txt files : the file could be open in a tab. When you close the tab, you've finished editing the file, but the process is still running)
Any idea/points on how to do that in C# ?
You've excluded the two ways you could go about detecting the file being in use: file locking, and the process you start exiting.
The only alternative I an think of is to display a dialog to ask the user when they've finished editing.
Edit: For what it's worth - FileZilla has this type of behaviour. You can choose to edit a file on the remote server, it downloads the file, launches the default editor, and (in the background) shows a "If you've finished editing - Click OK" button.
This gives me the opportunity to cancel an edit, if I've mucked up the file and saved it.
This is really hard to do - we've tried various things but never found anything that was foolproof. If you know the program you have launched then, in theory, you can find the file handles it uses and see when it stops using the one you're interested in.....but if you rely on Windows to resolve the default application to launch even this becomes tricky.
We copy editable files into a temp folder named with the date and rely on users uploading them back when they have finished their edit session. We then clean up previous days folders on application startup.
You could check the date of last change of the file. This date gets set when you save changes to the file. Mind though that this field is not very reliable since one can set it to any value (with appropriate tools).
We have a C# Windows service polling a folder waiting for an FTP’ed file to be posted in. To avoid using the file when it is still being written to we attempt to get a lock on the file first, however, there seems to be occasions where we are getting a lock on the file after the FTP’ed file is created but before the file is written to, so we end up opening an empty file.
Is there a reliable anyway to tell if the FTP is complete?
You could possibly change the filename before upload, then rename it after it's done. that way it will look like it doesn't exist until finished.
A practice I've seen done is you transfer two files, One which is the actual file, then a second one which will we can call a .done file. The ideal is as soon as you see the .done file you know the first file should be done.
Other options include watching the file for modifications and wait for a certain ammount of time of no modifications. Of course this is not full proof.
Edit
Kyle makes a good point that adding a checksum to the .done file and/or indicating the size of the first file is a good protection against fringe cases.
I'm always a big fan of the .filepart protocol, so that no matter what transfer protocol you use (ftp,ssh,rsync,etc) you have the same understanding.
This isn't a direct answer to your question, but instead of searching for a ftp-only solution a more generic solution could be better for you in long run.
(.filepart: rename the file,test.txt to test.txt.filepart, then when it is done, name it back to test.txt)
What about using a folder watcher to index the contents and if a files size does not change within 5 mins you can pretty-much guarantee the upload has been finished.
The time out could be tied to the timeout of your FTP server to.
http://www.codeproject.com/KB/files/MonitorFolderActivity.aspx
I've always used a checksum file. So you send a checksum file that denotes the filesize and the checksum. You'll know the file is uploaded correctly when the checksum in the first file matches the actual checksum on the file system.
The method I've used in the past is a mix of some of the other replies here.
i.e. FTP a file using a different extension to the one expected (eg FILENAME.part) then rename it with the proper extension as the last step of uploading.
On the server, use a FileSystemWatcher to look for new files with the correct extension.
The FSW will not see the file until it's renamed, and the renaming operation is atomic so the file will be complete and available the moment it's been renamed.
Renaming or moving files of course relies on you having control over the uploading process.
If you do not have any control over how the files are uploaded, you will be stuck with using the FSW to know a file is being uploaded, then monitoring it's size - when it's unchanged for a long period of time you may be able to assume it's complete.
Rather than polling, you might want to have a look at System.IO.FileSystemWatcher.