Determining copy/write progress with FileSystemWatcher in C# - c#

Context: A team of operators work with large batch files up to 10GB in size in a third party application. Each file contains thousands of images and after processing every 50 images, they hit the save button. The work place has unreliable power and if the power goes out during a save, the entire file becomes corrupt. To overcome this, I am writing a small utility using the FileSystemWatcher to detect saves and create a backup so that it may be restored without the need to reprocess the entire batch.
Problem: The FileSystemWatcher does a very good job of reporting events but there is a problem I can't pinpoint. Since the monitored files are large in size, the save process takes a few seconds. I want to to be notified once the save operation is complete. I suspect that every time the file buffer is flushed to disk, it triggers an unwanted event. The file remains locked for writing whether or not the a save is in progress so I cannot tell that way.
Creating a backup of the file DURING a save operation defeats the purpose since it corrupts the backed file.
Question:
Is there a way to use the FileSystemWatcher to be notified after the save operation is complete?
If not, how else could I reliably check to see if the file is still being written to?
Alternatives: Any alternative suggestions would be welcome as well.

There's really no direct way to do exactly what you're trying to do. The file system itself doesn't know when a save operation is completed. In logical terms, you may think of it as a series of saves simply because the user clicks the Save button multiple times, but that isn't how the file system sees it. As long as the application has the file locked for writing, as far as the file system is concerned it is still in the process of being saved.
If you think about it, it makes sense. If the application holds onto write access to the file, how would the file system know when the file is in a "corrupt" state and when it's not? Only the application writing the file knows that.
If you have access to the application writing the file, you might be able to solve this problem. Failing that, you might be able to get something with the last modified date, creating a backup only if the file isn't modified for a certain period of time, but that is bound to be buggy and unreliable.

Related

Delete an "in use" file after no processes reference it

Searched a lot, but without luck - so here goes
My C# winforms application creates temp files which are opened using the default registered application (let's call them viewer apps). Once the user is done viewing those files, I want to delete them.
Currently, I register for an Application.ApplicationExit event, to delete the file. This approach covers most of the situations but not all. Sometimes the user still has the viewing application open while exiting my app, so the success of my File.Delete depends on whether the viewer has opened the file with FileShare.Delete or not - which is out of my control.
This is what I have found so far, but fall short of what I want
FileOptions.DeleteOnClose does not help, since my app will already be closed in some cases and the temp file will still be needed. Also, when I create the file like this: new FileStream(fn, FileMode.CreateNew, FileAccess.ReadWrite, FileShare.ReadWrite | FileShare.Delete, 4096, FileOptions.DeleteOnClose), the viewer apps like say adobe reader & notepad, still complain about file in use by my application The process cannot access the file because it is being used by another process
MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT dwFlags works, but it would wait till a reboot to delete it - I would rather have it deleted once the use is done, since reboots can be few and far between and forcing reboots IMO is not the most user friendly approach. On a side note, does windows automatically clear the %temp% folder on restart? Or is there any temp folder that windows automatically clears on restart?
I can write another background process which constantly tries to delete the temp files till it succeeds, but I would like to avoid deploying one more piece of software to accomplish this. It can be done using a windows service or scheduled task or adding command line switches to my existing app and making it run in "delete mode" in background or through task scheduler. All of it decrease my ease of deployment and use along with increasing my footprint on client's computer
In a nutshell, I am wondering if there is any Win32 API or .NET Framework API that will delete a file as soon as there are no processes with open handle to that file?
EDIT:
The information in the temp files are reasonably private (think your downloaded bank account statements) and hence the need for immediate deletion after viewing as opposed to waiting for a reboot or app restart
Summary of all Answers and Comments
After doing some more experiments with inputs from Scott Chamberlain's answer, and other comments on this question, the best path seems to be to force the end users to close the viewer app before closing my application, if the viewer app disallows deletion (FileShare.Delete) of the temp file. The below factors played a role in the decision
The best option is FileOptions.DeleteOnClose, but this only works if all files open before or after this call use FileShare.Delete option to open the file.
Viewer apps can frequently open files without FileShare.Delete option.
Some viewers close the handle immediately after reading/displaying the file contents (like notepad), whereas some other apps (like Adobe Reader) retain such handle till the file is closed in the viewer
Keeping sensitive files on disk for any longer than required is definitely not a good way to proceed. So waiting till reboot should only be used as a fail-safe and not as the main strategy.
The costs of maintaining another process to do the temp file cleanup, far exceeds the slight user inconvenience when they are forced to "close" the viewer before proceeding further.
This answer is based on my comments in the question.
Try write the file without the delete, close the file, let the editor open the file, then open a new filestream as a read with DeleteOnClose with an empty body in the using section.
If that 2nd opening does not fail it will behave exactly like you wanted, it will delete the file as soon as there are no processes with open handle to that file. If the 2nd opening for the delete does fail you can use MoveFileEx as a fallback failsafe.

How to programmatically determine which process created a file in .net?

There are several threads on SO that describe how to check which application creates a file with tools like Sysinternals process monitor. Is something like this possible programmatically from .net?
Background: My program has to remote-control a proprietary third party application using its automation interface, and one of the functions I need from this application has a bug where it creates a bunch of temporary files in %TEMP% that are called tmpXXXX.tmp (the same as .net's Path.GetTempFileName() does) but does not delete them. This causes the C drive to become full over time, eventually failing the application. I already filed a bug to the manufacturer, but we need a temporary workaround for the time being, so I thought of putting a FileSystemWatcher on %TEMP% that watches tmp*.tmp, collects these files, and after the operation on the third-party application finishes, deletes them. But this is risky as another application might also write files with the same file name pattern to %TEMP% so I only want to delete those created by NastyBuggyThirdPartyApplication.exe.
Is this anyhow possible?
This kind of things is possible, but maybe a bit tricky.
To know who created the file, look at the user that owns it. Therefore you might need to create a specific user, and that application will run under this specific user. In order to do that, you need to create a small application that will start your buggy app by impersonating another user, so anything done within the app will be under this user so as file creating...
I don't know how to monitor and get triggered when a file is created, but nothing can prevent you from setting a timer that wakes up every five or ten minutes, then checks if any file in the directory is owned by the application user and closed, so it deletes it.
Maybe if they react fast for this bug fixing, you won't need your app very long time. So another solution, if possible might just to change the Temp folder into another drive, which has lots of space...
One solution is that you use a FileWatcher to automatically delete all the files but before deleting you should check if the file is not currently locked or used by other process, for example the Sysinternal Suite has a tool called handle.exe that can do this. Use it from the command line:
handle.exe -a
You can invoke this from a c# program (there might be some performance issues though)
So what you would do is when a file is created you verify if it is in use or locked (for example u can use the code provided in Is there a way to check if a file is in use?) and then delete it.
Most of the time when an app is using a temp file it will lock it to prevent just what you fear, that you might delete files from other processes.
As far as I can tell there is no sure way to identify which process created a specific file.

When updating a text file via code, is the entire file re-saved, or just the parts?

Okay so my overall goal is to create a UWP notes app that doesn't require the end-user to manually save each note they write; this would be done automatically for them.
So what I'm looking to do is create a C# class that will detect changes to the document the user is currently writing and constantly update the underlying text file (This will eventually be written to a row within the database, but I hear it is less efficient to constantly update records within a DB than to deal with text files for this matter?).
But yeah, this is pretty much what apps like OneNote do in the background for the user, and so the user have to never worry about saving the file or losing data in situations where the computer loses power or the app terminates unexpectedly.
So if I created a class that detected changes to the document and then update the underlying file, is the WHOLE file rewritten or just the particular parts (bytes?) that were changed within (or appended to) the text?
I'm just looking for the most efficient way to constantly update a file because if a user is a fast typist, the system will have to be able to keep up with every single keystroke input.
Last, would the entire file have to be rewritten if the user makes random changes to the text at random locations (rather than append to the end of the file)? Does any of this even make sense. I tend to write a lot to ask a simple question. I have problems....
I would do a timertick event and have it automatically save every 3 to 5 seconds. I do this alot. I understand what your doing, but automatically saving every key stroke, would be putting a lot of stress on the program.
I would automatically save every few seconds on a if basis,
If a change is detected then it will save. Think about this answer, it hs been saved almost 100 times if done by keystroke.

Why doesn't OS X lock files like windows does when copying to a Samba share?

I have a project that uses the .net FileSystemWatcher to watch a Samba network share for video files. When it sees a file, it adds it to an encode queue. When files are dequeued, they are moved to a local directory where the process then encodes the file to several different formats and spits them out to an output directory.
The problem arises because the video files are so big, that it often takes several minutes for them to copy completely into the network directory, so when a file is dequeued, it may or may not have completely finished being copied to the network share. When the file is being copied from a windows machine, I am able to work around it because trying to move a file that is still being copied throws an IOException. I simply catch the exception and retry every few seconds until it is done copying.
When a file is dropped into the Samba share from a computer running OS X however, that IOException is not thrown. Instead, a partial file is copied to the working directory which then fails to encode because it is not a valid video file.
So my question is, is there any way to make the FileSystemWatcher wait for files to be completely written before firing its "Created" event (based on this question I think the answer to that question is "no")? Alternatively, is there a way to get files copied from OS X to behave similarly to those in windows? Or do I need to find another solution for watching the Samba share? Thanks for any help.
Option 3. Your best bet is to have a process that watches the incoming share for files. When it sees a file, note its size and/or modification date.
Then, after some amount of time (like, 1 or 2 seconds), look again. Note any files that were seen before and compare their new sizes/mod dates to the one you saw last time.
Any file that has not changed for some "sufficiently long" period of time (1s? 5s?) is considered "done".
Once you have a "done" file, MOVE/rename that file to another directory. It is from THIS directory that your loading process can run. It "knows" that only files that are complete are in this directory.
By having this two stage process, you are able to later possibly add other rules for acceptance of a file, since all of those rules must pass before the file gets moved to its proper staging area (you can check format, check size, etc.) beyond a simple rule of just file existence.
Your later process can rely on file existence, both as a start mechanism and a restart mechanism. When the process restarts after failure or shut down, it can assume that any files in the second staging are either new or incomplete and take appropriate action based on its own internal state. When the processing is done it can choose to either delete the file, or move it to a "finished" area for archiving or what not.

Reading a file without preventing other processes from reading it at the same time

I'm making a little app in C#/.NET that watch for the creation of a file and when it is created it gets its content, parse it and write it in another file.
Everything is working fine so far. But the problem is : there's another process that watch for this file as well. My process is only READING the file while the second one reads it and then DELETES it.
My application is making its job but when it reads the file, the other process can't read it and totally crashes (Not made by me and don't have the sources to fix it).
My application is running very fast and other open the files for a very little time to get the content and put it in a variable so it could close the file faster and then parse the content of the file which is in the variable.
I clearly don't know how but I'd like to be able to read the file and let the other read the file at the same time without any hiccups. Is it possible? I still think that there will be a problem about the fact that the file is being deleted after the other app is done parsing it...
Any suggestions or ideas?
Thanks very much!
You can open the file as follows to ensure you don't lock it from other processes:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// do your stuff
}
But if the other process is trying to open it in exclusive mode, it won't help and it will still crash. There's no way to deal with that other than fixing the code for the other process.
KISS: Can you have the file created in a location which the first program isn't looking at, but your software is - and when you are done processing it you then move it to the current location where the first program is looking?
Otherwise:
You are going to have contention since it's going to be a race to see which process actually "notices" the file first and begins working.
I'm assuming you also don't have any control over the process creating the file?
In that case you might look at PsSuspend or PauseSp - if you can control the other process by suspending it until you are ready for it (done with the file) then that might be viable. Not sure how robust this would be.
There's also still the potential race condition of "noticing" the file and performing an action (whatever it is) - keeping the other process paused perpetually until you want it to run (or killing it and starting it) is the only completely deterministic way to achieve what you want within the constraints.
If you are using an NTFS drive (which is very likely), then you can create a hard-link to the file. Essentially, this duplicates the file without actually creating a duplicate. You can read the file with the hard-link. The other process can delete the file, which will only remove their link to the file. This will leave the file in place for you to read. When your program is done reading the file, it can delete the hard-link, and the file system will see that both links have been deleted, and it will delete the file itself.
This can be done from the command line with
fsutil hardlink create <NewFileName> <ExistingFileName>
Or you can P/Invoke the CreateHardLink function in the Windows API.
Can you create another empty zero bytes file called .reading file which has the same name but extension "reading" to it. Then once first process is done reading the file, rename .reading to .done and the second process can check .done files and delete the original file,since both .done and original file have same name but different extensions ?.
#Prashant's response gave me the inspiration for this, and it's very similar, but I believe will solve your problem.
If the other process must match a certain filename pattern
Rename the file to something that
won't match first, a very cheap/fast
operation
Rename it back when finished
If it matches every file in a given folder
Move it to another folder (also a very cheap operation in most filesystems)
Move it back when finished.
If the other process had already locked your file (even for read) then your process would fail, and you can make that graceful. If not you should be safe.
There is still a race condition possibility, of course, but this should be much safer than what you are doing.

Categories