We have a C# Windows service polling a folder waiting for an FTP’ed file to be posted in. To avoid using the file when it is still being written to we attempt to get a lock on the file first, however, there seems to be occasions where we are getting a lock on the file after the FTP’ed file is created but before the file is written to, so we end up opening an empty file.
Is there a reliable anyway to tell if the FTP is complete?
You could possibly change the filename before upload, then rename it after it's done. that way it will look like it doesn't exist until finished.
A practice I've seen done is you transfer two files, One which is the actual file, then a second one which will we can call a .done file. The ideal is as soon as you see the .done file you know the first file should be done.
Other options include watching the file for modifications and wait for a certain ammount of time of no modifications. Of course this is not full proof.
Edit
Kyle makes a good point that adding a checksum to the .done file and/or indicating the size of the first file is a good protection against fringe cases.
I'm always a big fan of the .filepart protocol, so that no matter what transfer protocol you use (ftp,ssh,rsync,etc) you have the same understanding.
This isn't a direct answer to your question, but instead of searching for a ftp-only solution a more generic solution could be better for you in long run.
(.filepart: rename the file,test.txt to test.txt.filepart, then when it is done, name it back to test.txt)
What about using a folder watcher to index the contents and if a files size does not change within 5 mins you can pretty-much guarantee the upload has been finished.
The time out could be tied to the timeout of your FTP server to.
http://www.codeproject.com/KB/files/MonitorFolderActivity.aspx
I've always used a checksum file. So you send a checksum file that denotes the filesize and the checksum. You'll know the file is uploaded correctly when the checksum in the first file matches the actual checksum on the file system.
The method I've used in the past is a mix of some of the other replies here.
i.e. FTP a file using a different extension to the one expected (eg FILENAME.part) then rename it with the proper extension as the last step of uploading.
On the server, use a FileSystemWatcher to look for new files with the correct extension.
The FSW will not see the file until it's renamed, and the renaming operation is atomic so the file will be complete and available the moment it's been renamed.
Renaming or moving files of course relies on you having control over the uploading process.
If you do not have any control over how the files are uploaded, you will be stuck with using the FSW to know a file is being uploaded, then monitoring it's size - when it's unchanged for a long period of time you may be able to assume it's complete.
Rather than polling, you might want to have a look at System.IO.FileSystemWatcher.
Related
We have here a windows server and one day we will get via sftp some text files in a folder. I dont have more information, but maybe this is enough. Now I should write a function that is moving these files into another folder. Well that should not be that hard, I thought... but now I realized that Im able to move a file before its finished. So I was searching for some solutions and Im really confused.
My solution would be to check the file and the processes around it. Because if the file is not finished yet, there is a copy-process and I can check this process. To make this easy, I just have to try to lock the file and if there is no another process, well then the file is ready for move?
using (File.Open("myFile", FileMode.Open, FileAccess.Read, FileShare.None))
{ /*rdy!*/ }
But now I see that people are writing something about checksum test or to test the filesize and if the filesize is not changing then the file is ready. Is this stuff not a little bit complicated? Please tell me that my solution could work also... Im not able to test it with any server to server sftp stuff. I just know that if I copy a file to another folder (via explorer) this is working. Does this work via sftp transfer as well? Any ideas? Thank you
File-size checks are dangerous - what if the upload is suspended and later resumed? How much time should go by until you accept the current file size as the final file size? => Not a good solution.
I'd go for the locking, however, this only works if the process that writes the file also opens the file in a way so that it is locked exclusively. If the process doesn't do that, you'll be stuck with your problem again.
Another solution would be to upload the files with temporary names, like ".sftptmp". And to have the uploader rename it after it is done. That way you can be sure the file has been uploaded - just ignore all files that end with ".sftptmp". This, however, assumes that you actually have control over the process of uploading files.
Another option is to have the sender put a control file after the data file. For example, put uploadfile-20220714.txt, then put uploadfile-20220714.ctl. The control file can contain file information such as the name and size of the data file. This option requires the sender to modify their process, but it shouldn't require too much effort.
Context: A team of operators work with large batch files up to 10GB in size in a third party application. Each file contains thousands of images and after processing every 50 images, they hit the save button. The work place has unreliable power and if the power goes out during a save, the entire file becomes corrupt. To overcome this, I am writing a small utility using the FileSystemWatcher to detect saves and create a backup so that it may be restored without the need to reprocess the entire batch.
Problem: The FileSystemWatcher does a very good job of reporting events but there is a problem I can't pinpoint. Since the monitored files are large in size, the save process takes a few seconds. I want to to be notified once the save operation is complete. I suspect that every time the file buffer is flushed to disk, it triggers an unwanted event. The file remains locked for writing whether or not the a save is in progress so I cannot tell that way.
Creating a backup of the file DURING a save operation defeats the purpose since it corrupts the backed file.
Question:
Is there a way to use the FileSystemWatcher to be notified after the save operation is complete?
If not, how else could I reliably check to see if the file is still being written to?
Alternatives: Any alternative suggestions would be welcome as well.
There's really no direct way to do exactly what you're trying to do. The file system itself doesn't know when a save operation is completed. In logical terms, you may think of it as a series of saves simply because the user clicks the Save button multiple times, but that isn't how the file system sees it. As long as the application has the file locked for writing, as far as the file system is concerned it is still in the process of being saved.
If you think about it, it makes sense. If the application holds onto write access to the file, how would the file system know when the file is in a "corrupt" state and when it's not? Only the application writing the file knows that.
If you have access to the application writing the file, you might be able to solve this problem. Failing that, you might be able to get something with the last modified date, creating a backup only if the file isn't modified for a certain period of time, but that is bound to be buggy and unreliable.
I have a project that uses the .net FileSystemWatcher to watch a Samba network share for video files. When it sees a file, it adds it to an encode queue. When files are dequeued, they are moved to a local directory where the process then encodes the file to several different formats and spits them out to an output directory.
The problem arises because the video files are so big, that it often takes several minutes for them to copy completely into the network directory, so when a file is dequeued, it may or may not have completely finished being copied to the network share. When the file is being copied from a windows machine, I am able to work around it because trying to move a file that is still being copied throws an IOException. I simply catch the exception and retry every few seconds until it is done copying.
When a file is dropped into the Samba share from a computer running OS X however, that IOException is not thrown. Instead, a partial file is copied to the working directory which then fails to encode because it is not a valid video file.
So my question is, is there any way to make the FileSystemWatcher wait for files to be completely written before firing its "Created" event (based on this question I think the answer to that question is "no")? Alternatively, is there a way to get files copied from OS X to behave similarly to those in windows? Or do I need to find another solution for watching the Samba share? Thanks for any help.
Option 3. Your best bet is to have a process that watches the incoming share for files. When it sees a file, note its size and/or modification date.
Then, after some amount of time (like, 1 or 2 seconds), look again. Note any files that were seen before and compare their new sizes/mod dates to the one you saw last time.
Any file that has not changed for some "sufficiently long" period of time (1s? 5s?) is considered "done".
Once you have a "done" file, MOVE/rename that file to another directory. It is from THIS directory that your loading process can run. It "knows" that only files that are complete are in this directory.
By having this two stage process, you are able to later possibly add other rules for acceptance of a file, since all of those rules must pass before the file gets moved to its proper staging area (you can check format, check size, etc.) beyond a simple rule of just file existence.
Your later process can rely on file existence, both as a start mechanism and a restart mechanism. When the process restarts after failure or shut down, it can assume that any files in the second staging are either new or incomplete and take appropriate action based on its own internal state. When the processing is done it can choose to either delete the file, or move it to a "finished" area for archiving or what not.
I'm making a little app in C#/.NET that watch for the creation of a file and when it is created it gets its content, parse it and write it in another file.
Everything is working fine so far. But the problem is : there's another process that watch for this file as well. My process is only READING the file while the second one reads it and then DELETES it.
My application is making its job but when it reads the file, the other process can't read it and totally crashes (Not made by me and don't have the sources to fix it).
My application is running very fast and other open the files for a very little time to get the content and put it in a variable so it could close the file faster and then parse the content of the file which is in the variable.
I clearly don't know how but I'd like to be able to read the file and let the other read the file at the same time without any hiccups. Is it possible? I still think that there will be a problem about the fact that the file is being deleted after the other app is done parsing it...
Any suggestions or ideas?
Thanks very much!
You can open the file as follows to ensure you don't lock it from other processes:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// do your stuff
}
But if the other process is trying to open it in exclusive mode, it won't help and it will still crash. There's no way to deal with that other than fixing the code for the other process.
KISS: Can you have the file created in a location which the first program isn't looking at, but your software is - and when you are done processing it you then move it to the current location where the first program is looking?
Otherwise:
You are going to have contention since it's going to be a race to see which process actually "notices" the file first and begins working.
I'm assuming you also don't have any control over the process creating the file?
In that case you might look at PsSuspend or PauseSp - if you can control the other process by suspending it until you are ready for it (done with the file) then that might be viable. Not sure how robust this would be.
There's also still the potential race condition of "noticing" the file and performing an action (whatever it is) - keeping the other process paused perpetually until you want it to run (or killing it and starting it) is the only completely deterministic way to achieve what you want within the constraints.
If you are using an NTFS drive (which is very likely), then you can create a hard-link to the file. Essentially, this duplicates the file without actually creating a duplicate. You can read the file with the hard-link. The other process can delete the file, which will only remove their link to the file. This will leave the file in place for you to read. When your program is done reading the file, it can delete the hard-link, and the file system will see that both links have been deleted, and it will delete the file itself.
This can be done from the command line with
fsutil hardlink create <NewFileName> <ExistingFileName>
Or you can P/Invoke the CreateHardLink function in the Windows API.
Can you create another empty zero bytes file called .reading file which has the same name but extension "reading" to it. Then once first process is done reading the file, rename .reading to .done and the second process can check .done files and delete the original file,since both .done and original file have same name but different extensions ?.
#Prashant's response gave me the inspiration for this, and it's very similar, but I believe will solve your problem.
If the other process must match a certain filename pattern
Rename the file to something that
won't match first, a very cheap/fast
operation
Rename it back when finished
If it matches every file in a given folder
Move it to another folder (also a very cheap operation in most filesystems)
Move it back when finished.
If the other process had already locked your file (even for read) then your process would fail, and you can make that graceful. If not you should be safe.
There is still a race condition possibility, of course, but this should be much safer than what you are doing.
I'm looking for a way to determine if a file has been executed or not. I've looked a bit into FileInfo's LastAccessTime but this doesn't seem to change when a file is executed. I've also looked into FileSystemWatcher but this also doesn't seem to offer a solution. Is there such a thing as a file execution listener or is there another way? If it helps, i'm looking to write a folder listener that renames an .avi file within it after it has been watched/executed.
There is a distinction between file being "executed" (e.g. a portable executable file, like an "exe") and a file being "accessed" (e.g. an AVI file that is "played" by another exe).
It sounds like you are looking in the right place and you will want the "LastAccessTime" but, be aware that resolution of the Access time is dependent on the file system... On NTFS it's a full date/time, on FAT it's just the date (hence it won't change if it's already been accessed that day.)
Actually, LastAccessTime might be what you want, since AVI files aren't "executed", only opened. I have, in the past, used it for exactly the purpose you describe, but not programmatically.
Just for the sake of completeness: Windows does not keep execution history, at least not publicly.
Edited to add:
According to MSDN, LastAccessTime is your best shot, however:
Note This method may return an inaccurate value, because it uses native functions whose values may not be continuously updated by the operating system.
But this is followed a few lines later by:
To get the latest value, call the Refresh method.
(This refers to FileSystemInfo.Refresh.)
It's all a little obtuse, if it doesn't work exactly as documented I wouldn't be surprised.
Hmmm, I'm not too sure about finding out if a file has be run, but what might be a better approach would be to monitor the media player to determine when a video has finished playing.