What does one need to take care of when creating a method to move (cut) a batch of file from one directory to another?
Let's say the method signature is Move(filter, sourceFolder, destinationFolder, overwrite). What do I need to take care of to avoid the risk of data loss especially when overwriting the original file and deleting of the source file is taken into account?
Several possible scenario I am worried of: error occurs when a move is in progress, moved a file but the file are somehow corrupted, deleted a namesake file in order to allow the new file to move but then error happens when moving the new file, etc.
I'm using .net's System.IO namespace for the move operations.
Without transactions, the safest way is to copy, verify and then delete. It is up to you if you want to move per file (this is how windows does it, a move operation can fail leaving you with half of the files moved) or to allow only the entire batch to be moved, or none at all.
You would have to make decisions on how to respond to files that have been modified during the move, source files that cannot be deleted afterwards, or destination files that have already been opened when you're performing a rollback.
Related
I have a service running on a webserver that waits for a zip to be dropped in a folder, extracts it, and then moves it to a certain directory. Since we want to replace the directory in question, it renames the existing folder (very large folder, takes a couple minutes to delete), then moves the extracted files in its place, then deletes the old folder. The problem is: when it tries to rename the existing folder, it gets 'Access to the path '<>' is denied.', I believe because the folder is in constant use by the webservice. Is there a way I can force the folder to rename, or take control and wait for it to not be in use? Or is there another way I can accomplish this goal?
You can't "force" a rename while any process holds an underlying operating system handle to the folder(it would be horrible if you were able to do that).
You can:
Implement pause/resume functionality for the webservice so it can be told to pause its work and release the handles, then resume after you are done.
or
Stop the webserice completely, do your work, then start the webservice
Context: A team of operators work with large batch files up to 10GB in size in a third party application. Each file contains thousands of images and after processing every 50 images, they hit the save button. The work place has unreliable power and if the power goes out during a save, the entire file becomes corrupt. To overcome this, I am writing a small utility using the FileSystemWatcher to detect saves and create a backup so that it may be restored without the need to reprocess the entire batch.
Problem: The FileSystemWatcher does a very good job of reporting events but there is a problem I can't pinpoint. Since the monitored files are large in size, the save process takes a few seconds. I want to to be notified once the save operation is complete. I suspect that every time the file buffer is flushed to disk, it triggers an unwanted event. The file remains locked for writing whether or not the a save is in progress so I cannot tell that way.
Creating a backup of the file DURING a save operation defeats the purpose since it corrupts the backed file.
Question:
Is there a way to use the FileSystemWatcher to be notified after the save operation is complete?
If not, how else could I reliably check to see if the file is still being written to?
Alternatives: Any alternative suggestions would be welcome as well.
There's really no direct way to do exactly what you're trying to do. The file system itself doesn't know when a save operation is completed. In logical terms, you may think of it as a series of saves simply because the user clicks the Save button multiple times, but that isn't how the file system sees it. As long as the application has the file locked for writing, as far as the file system is concerned it is still in the process of being saved.
If you think about it, it makes sense. If the application holds onto write access to the file, how would the file system know when the file is in a "corrupt" state and when it's not? Only the application writing the file knows that.
If you have access to the application writing the file, you might be able to solve this problem. Failing that, you might be able to get something with the last modified date, creating a backup only if the file isn't modified for a certain period of time, but that is bound to be buggy and unreliable.
I have a project that uses the .net FileSystemWatcher to watch a Samba network share for video files. When it sees a file, it adds it to an encode queue. When files are dequeued, they are moved to a local directory where the process then encodes the file to several different formats and spits them out to an output directory.
The problem arises because the video files are so big, that it often takes several minutes for them to copy completely into the network directory, so when a file is dequeued, it may or may not have completely finished being copied to the network share. When the file is being copied from a windows machine, I am able to work around it because trying to move a file that is still being copied throws an IOException. I simply catch the exception and retry every few seconds until it is done copying.
When a file is dropped into the Samba share from a computer running OS X however, that IOException is not thrown. Instead, a partial file is copied to the working directory which then fails to encode because it is not a valid video file.
So my question is, is there any way to make the FileSystemWatcher wait for files to be completely written before firing its "Created" event (based on this question I think the answer to that question is "no")? Alternatively, is there a way to get files copied from OS X to behave similarly to those in windows? Or do I need to find another solution for watching the Samba share? Thanks for any help.
Option 3. Your best bet is to have a process that watches the incoming share for files. When it sees a file, note its size and/or modification date.
Then, after some amount of time (like, 1 or 2 seconds), look again. Note any files that were seen before and compare their new sizes/mod dates to the one you saw last time.
Any file that has not changed for some "sufficiently long" period of time (1s? 5s?) is considered "done".
Once you have a "done" file, MOVE/rename that file to another directory. It is from THIS directory that your loading process can run. It "knows" that only files that are complete are in this directory.
By having this two stage process, you are able to later possibly add other rules for acceptance of a file, since all of those rules must pass before the file gets moved to its proper staging area (you can check format, check size, etc.) beyond a simple rule of just file existence.
Your later process can rely on file existence, both as a start mechanism and a restart mechanism. When the process restarts after failure or shut down, it can assume that any files in the second staging are either new or incomplete and take appropriate action based on its own internal state. When the processing is done it can choose to either delete the file, or move it to a "finished" area for archiving or what not.
I'm making a little app in C#/.NET that watch for the creation of a file and when it is created it gets its content, parse it and write it in another file.
Everything is working fine so far. But the problem is : there's another process that watch for this file as well. My process is only READING the file while the second one reads it and then DELETES it.
My application is making its job but when it reads the file, the other process can't read it and totally crashes (Not made by me and don't have the sources to fix it).
My application is running very fast and other open the files for a very little time to get the content and put it in a variable so it could close the file faster and then parse the content of the file which is in the variable.
I clearly don't know how but I'd like to be able to read the file and let the other read the file at the same time without any hiccups. Is it possible? I still think that there will be a problem about the fact that the file is being deleted after the other app is done parsing it...
Any suggestions or ideas?
Thanks very much!
You can open the file as follows to ensure you don't lock it from other processes:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// do your stuff
}
But if the other process is trying to open it in exclusive mode, it won't help and it will still crash. There's no way to deal with that other than fixing the code for the other process.
KISS: Can you have the file created in a location which the first program isn't looking at, but your software is - and when you are done processing it you then move it to the current location where the first program is looking?
Otherwise:
You are going to have contention since it's going to be a race to see which process actually "notices" the file first and begins working.
I'm assuming you also don't have any control over the process creating the file?
In that case you might look at PsSuspend or PauseSp - if you can control the other process by suspending it until you are ready for it (done with the file) then that might be viable. Not sure how robust this would be.
There's also still the potential race condition of "noticing" the file and performing an action (whatever it is) - keeping the other process paused perpetually until you want it to run (or killing it and starting it) is the only completely deterministic way to achieve what you want within the constraints.
If you are using an NTFS drive (which is very likely), then you can create a hard-link to the file. Essentially, this duplicates the file without actually creating a duplicate. You can read the file with the hard-link. The other process can delete the file, which will only remove their link to the file. This will leave the file in place for you to read. When your program is done reading the file, it can delete the hard-link, and the file system will see that both links have been deleted, and it will delete the file itself.
This can be done from the command line with
fsutil hardlink create <NewFileName> <ExistingFileName>
Or you can P/Invoke the CreateHardLink function in the Windows API.
Can you create another empty zero bytes file called .reading file which has the same name but extension "reading" to it. Then once first process is done reading the file, rename .reading to .done and the second process can check .done files and delete the original file,since both .done and original file have same name but different extensions ?.
#Prashant's response gave me the inspiration for this, and it's very similar, but I believe will solve your problem.
If the other process must match a certain filename pattern
Rename the file to something that
won't match first, a very cheap/fast
operation
Rename it back when finished
If it matches every file in a given folder
Move it to another folder (also a very cheap operation in most filesystems)
Move it back when finished.
If the other process had already locked your file (even for read) then your process would fail, and you can make that graceful. If not you should be safe.
There is still a race condition possibility, of course, but this should be much safer than what you are doing.
We have a C# Windows service polling a folder waiting for an FTP’ed file to be posted in. To avoid using the file when it is still being written to we attempt to get a lock on the file first, however, there seems to be occasions where we are getting a lock on the file after the FTP’ed file is created but before the file is written to, so we end up opening an empty file.
Is there a reliable anyway to tell if the FTP is complete?
You could possibly change the filename before upload, then rename it after it's done. that way it will look like it doesn't exist until finished.
A practice I've seen done is you transfer two files, One which is the actual file, then a second one which will we can call a .done file. The ideal is as soon as you see the .done file you know the first file should be done.
Other options include watching the file for modifications and wait for a certain ammount of time of no modifications. Of course this is not full proof.
Edit
Kyle makes a good point that adding a checksum to the .done file and/or indicating the size of the first file is a good protection against fringe cases.
I'm always a big fan of the .filepart protocol, so that no matter what transfer protocol you use (ftp,ssh,rsync,etc) you have the same understanding.
This isn't a direct answer to your question, but instead of searching for a ftp-only solution a more generic solution could be better for you in long run.
(.filepart: rename the file,test.txt to test.txt.filepart, then when it is done, name it back to test.txt)
What about using a folder watcher to index the contents and if a files size does not change within 5 mins you can pretty-much guarantee the upload has been finished.
The time out could be tied to the timeout of your FTP server to.
http://www.codeproject.com/KB/files/MonitorFolderActivity.aspx
I've always used a checksum file. So you send a checksum file that denotes the filesize and the checksum. You'll know the file is uploaded correctly when the checksum in the first file matches the actual checksum on the file system.
The method I've used in the past is a mix of some of the other replies here.
i.e. FTP a file using a different extension to the one expected (eg FILENAME.part) then rename it with the proper extension as the last step of uploading.
On the server, use a FileSystemWatcher to look for new files with the correct extension.
The FSW will not see the file until it's renamed, and the renaming operation is atomic so the file will be complete and available the moment it's been renamed.
Renaming or moving files of course relies on you having control over the uploading process.
If you do not have any control over how the files are uploaded, you will be stuck with using the FSW to know a file is being uploaded, then monitoring it's size - when it's unchanged for a long period of time you may be able to assume it's complete.
Rather than polling, you might want to have a look at System.IO.FileSystemWatcher.