I realise that FileSystemWatcher does not provide a Move event, instead it will generate a separate Delete and Create events for the same file. (The FilesystemWatcher is watching both the source and destination folders).
However how do we differentiate between a true file move and some random creation of a file that happens to have the same name as a file that was recently deleted?
Some sort of property of the FileSystemEventArgs class such as "AssociatedDeleteFile" that is assigned the deleted file path if it is the result of a move, or NULL otherwise, would be great. But of course this doesn't exist.
I also understand that the FileSystemWatcher is operating at the basic Filesystem level and so the concept of a "Move" may be only meaningful to higher level applications. But if this is the case, what sort of algorithm would people recommend to handle this situation in my application?
Update based on feedback:
The FileSystemWatcher class seems to see moving a file as simply 2 distinct events, a Delete of the original file, followed by a Create at the new location.
Unfortunately there is no "link" provided between these events, so it is not obvious how to differentiate between a file move and a normal Delete or Create. At the OS level, a move is treated specially, you can move say a 1GB file almost instantaneously.
A couple of answers suggested using a hash on files to identify them reliably between events, and I will proably take this approach. But if anyone knows how to detect a move more simply, please leave an answer.
According to the docs:
Common file system operations might
raise more than one event. For
example, when a file is moved from one
directory to another, several
OnChanged and some OnCreated and
OnDeleted events might be raised.
Moving a file is a complex operation
that consists of multiple simple
operations, therefore raising multiple
events.
So if you're trying to be very careful about detecting moves, and having the same path is not good enough, you will have to use some sort of heuristic. For example, create a "fingerprint" using file name, size, last modified time, etc for files in the source folder. When you see any event that may signal a move, check the "fingerprint" against the new file.
As far as I understand it, the Renamed event is for files being moved...?
My mistake - the docs specifically say that only files inside a moved folder are considered "renamed" in a cut-and-paste operation:
The operating system and FileSystemWatcher object interpret a cut-and-paste action or a move action as a rename action for a folder and its contents. If you cut and paste a folder with files into a folder being watched, the FileSystemWatcher object reports only the folder as new, but not its contents because they are essentially only renamed.
It also says about moving files:
Common file system operations might raise more than one event. For example, when a file is moved from one directory to another, several OnChanged and some OnCreated and OnDeleted events might be raised. Moving a file is a complex operation that consists of multiple simple operations, therefore raising multiple events.
As you already mentioned, there is no reliable way to do this with the default FileSystemWatcher class provided by C#. You can apply certain heuristics like filename, hashes, or unique file ids to map created and deleted events together, but none of these approaches will work reliably. In addition, you cannot easily get the hash or file id for the file associated with the deleted event, meaning that you have to maintain these values in some sort of database.
I think the only reliable approach for detecting file movements is to create an own file system watcher. Therefore, you can use different approaches. If you are only going to watch changes on NTFS file systems, one solution might be to read out the NTFS change journal as described here. What's nice about this is that it even allows you to track changes that occurred while your app wasn't running.
Another approach is to create a minifilter driver that tracks file system operations and forwards them to your application. Using this you basically get all information about what is happening to your files and you'll be able to get information about moved files. A drawback of this approach is that you have to create a separate driver that needs to be installed on the target system. The good thing however is that you wouldn't need to start from scratch, because I already started to create something like this: https://github.com/CenterDevice/MiniFSWatcher
This allows you to simply track moved files like this:
var eventWatcher = new EventWatcher();
eventWatcher.OnRenameOrMove += (filename, oldFilename, process) =>
{
Console.WriteLine("File " + oldFilename + " has been moved to " + filename + " by process " + process );
};
eventWatcher.Connect();
eventWatcher.WatchPath("C:\\Users\\MyUser\\*");
However, please be aware that this requires kernel code that needs to be signed in order run on 64bit version of Windows (if you don't disable signature checking for testing). At time of writing, this code is also still in an early stage of development, so I would not use it on production systems yet. But even if you're not going to use this, it should still give you some information about how file system events might be tracked on Windows.
I'll hazard a guess 'move' indeed does not exist, so you're really just going to have to look for a 'delete' and then mark that file as one that could be 'possibly moved', and then if you see a 'create' for it shortly after, I suppose you can assume you're correct.
Do you have a case of random file creations affecting your detection of moves?
Might want to try the OnChanged and/or OnRenamed events mentioned in the documentation.
StorageLibrary class can track moves. The example from Microsoft:
StorageLibrary videosLib = await StorageLibrary.GetLibraryAsync(KnownLibraryId.Videos);
StorageLibraryChangeTracker videoTracker = videosLib.ChangeTracker;
videoTracker.Enable();
A complete example could be found here.
However, it looks like you can only track changes inside Windows "known libraries".
You can also try to get StorageLibraryChangeTracker using StorageFolder.TryGetChangeTracker(). But your folder must be under sync root, you can not use this method to get an arbitrary folder in file system.
Related
After having read questions dealing with how to tell whether two files are on the same physical volume or not, and seeing that it's (almost) impossible (e.g. here), I'm wondering how the OS knows whether a file move operation should update a master file table (or its equivalent) or whether to copy and delete.
Does Windows delegate that to the drives somehow? (Or perhaps the OS does have information about every file, and it's just not accessible by programs? Unlikely.)
Or - Does Windows know only about certain types of drives (and copies and deletes in other cases)? In which case we could also assume the same. Which means allowing a file move without using a background thread, for example. (Because it will be near instantaneous.)
I'm trying to better understand this subject. If I'm making some basic incorrect assumption - please, correcting that in itself would be an answer.
If needed to limit the scope, let's concentrate on Windows 7 and up, and NTFS and FAT drives.
Of course the operating system knows which drive (and which partition on that drive) contains any particular local file; otherwise, how could it read the data? (For remote files, the operating system doesn't know about the drives, but it does know which server to contact. Moves between different servers are implemented as copy-and-delete; moves on the same server are either copy-and-delete or are delegated to that server, depending on the protocol in use.)
This information is also available to applications. You can use the GetFileInformationByHandle() function to obtain the serial number of the volume containing a particular file.
The OS does have information about every file, and it's just not as easily accessible to your program. Not in any portable way, that is.
See it this way: Those files are owned by the system. The system allocates the space, manages the volume and indexes. It's not going to copy and delete the file if it ends up in the same physical volume, as it is more efficient to move the file. It will only copy and delete if it needs to.
In C or C++ for Windows I first try to MoveFileEx without MOVEFILE_COPY_ALLOWED set. It will fail if the file can not be moved by renaming. If rename fails I know that it may take some time and show some progress bar or the like.
There are no such rename AFAIK in .NET and that System::IO::File::Move of .NET does not fail if you move between different volumes.
First, regarding Does Windows delegate that to the drives somehow. No. The OS is more like a central nervous system. It keeps track of whats going on centrally, and for its distributed assets (or devices) such as a drive. (internal or external)
It follows that the OS, has information about every file residing on a drive for which it has successfully enumerated. The most relevant part of the OS with respect to file access is the File System. There are several types. Knowledge of the following topics will help to understand issues surrounding file access:
1) File attribute settings
2) User Access Controls
3) File location (pdf) (related to User Access Controls)
4) Current state of file (i.e. is the file in use currently)
5) Access Control Lists
Regarding will be near instantaneous. This obviously is only a perception. No matter how fast, or seemingly simultaneous, file handling via standard programming libraries can be done in such a way as to be aware of file related errors, such as:
ENOMEM - insufficient memory.
EMFILE - FOPEN_MAX files open already.
EINVAL - filename is NULL or contains only whitespace.
EINVAL - invalid mode.
(these in relation to fopen) can be used to mitigate OS/file run-time issues. This being said, applications should always be written to comply with good programming methods to avoid bumping into OS related file access issues, thread safety included.
I have a folder with the following structure
Parent/
Child1/
GrandChild1/
File1.txt
I need to query Parent folder and find out if Child1 has changed.
Changed = A new file was add/update/deleted.
The Child1 folder DateModified is not updated. Only the GrandChild1 date modified was updated when changes occurs. I am trying to avoid going to the file level to determine if the rootparent has changed. since there will be many folders and sub folder. I just need to know if Child1 has changed.
I do not want to use FileSystemWatcher, since I am running this as a scheduled job and not watching it LIVE.
User FileSystemWatcher. Remember to enable raising events since it is a common mistake (watchfolder.EnableRaisingEvents = true;).
The FileSystemWatcher may prove not to be optimal from a performance perspective. If that is an issue for you, you might implement a CRC check with a Timer to check for changes of the files and folders you are interested in.
Essentially, what I would do is to generate a CRC32 hash for the entire folder I am watching (and save it away into variable A) and when I decide it is time to check for changes, you simply calculate a new CRC32 hash for the same folder (into variable B). You then compare A with B and if they donĀ“t match, something has changed. Really not that difficult.
Reference:
http://www.codeproject.com/Articles/26528/C-Application-to-Watch-a-File-or-Directory-using-F
http://social.msdn.microsoft.com/Forums/zh/netfxbcl/thread/b7612249-eb32-4005-9d6b-7f291c218326
http://damieng.com/blog/2006/08/08/calculating_crc32_in_c_and_net
http://marknelson.us/1992/05/01/file-verification-using-crc-2/
Have you tried the file system watcher?
You can monitor local drives for changes from a given path, and then if necessary, ignore or process the fact they changed.
You can use the FileSystemWatcher Class for this.
Alternatively, if you would rather schedule a Task to run, Weekly, for example, you might want to have a look at: http://taskscheduler.codeplex.com/ and http://www.emoreau.com/Entries/Articles/2004/08/Interfacing-the-Windows-Task-Scheduler.aspx
And here's a link to the Windows Task Schedular API
I need to uniquely identify a file on Windows so I can always have a reference for that file even if it's moved or renamed. I did some research and found the question Unique file identifier in windows with a way that uses the method GetFileInformationByHandle with C++, but apparently that only works for NTFS partitions, but not for the FAT ones.
I need to program a behavior like the one on DropBox: if you close it on your computer, rename a file and open it again it detects that change and syncs correctly. I wonder whats the technique and maybe how DropBox does if you guys know.
FileSystemWatcher for example would work, but If the program using it is closed, no changes can be detected.
I will be using C#.
Thanks,
The next best method (but one that involves reading every file completely, which I'd avoid when it can be helped) would be to compare file size and a hash (e.g. SHA-256) of the file contents. The probability that both collide is fairly slim, especially under normal circumstances.
I'd use the GetFileInformationByHandle way on NTFS and fall back to hashing on FAT volumes.
In Dropbox' case I think though, that there is a service or process running in background observing file system changes. It's the most reliable way, even if it ceases to work if you stop said service/process.
What the user was looking for was most likely Windows Change Journals. Those track changes like renames of files persistently, no need to have a watcher observing file system events running all the time. Instead, one simply needs to maintain when last looked at the log and continue looking again beginning at that point. At some point a file with an already known ID would have an event of type RENAME and whoever is interested in that event could do the same for its own version of that file. The important thing is to keep track of the used IDs for files of course.
An automatic backup application is one example of a program that must check for changes to the state of a volume to perform its task. The brute force method of checking for changes in directories or files is to scan the entire volume. However, this is often not an acceptable approach because of the decrease in system performance it would cause. Another method is for the application to register a directory notification (by calling the FindFirstChangeNotification or ReadDirectoryChangesW functions) for the directories to be backed up. This is more efficient than the first method, however, it requires that an application be running at all times. Also, if a large number of directories and files must be backed up, the amount of processing and memory overhead for such an application might also cause the operating system's performance to decrease.
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.
https://learn.microsoft.com/en-us/windows/win32/fileio/change-journals
I need to monitor a folder and its subdirectories for any file manipulations (add/remove/rename). I've read about FileSystemWatcher but I'd like to monitor changes between each time the program is run or when the user presses the "check for changes" button (FSW seems more orientated to runtime detection). My first thought was to iterate through all the (sub)directories and hash each file. Then, concatenate all the hashes (which have been ordered) and hash that. When I want to check for changes, I repeat the process and check if the hashes are the same.
Is this an efficient way of doing it?
Also, once I've detected a change, how do I find out what file has been added, removed or renamed as quickly as possible?
As a side note, I don't mind using scripts to do this if they're faster as long as those scripts don't require end users to install anything and the scripts can notify my C# app of the changes.
We handle this by storing all found files in a database along with their last modification time.
On each pass through the files, we check the database for each file: if it doesn't exist in the DB, it is new and if it does exist, but the timestamp is different, it has changed.
There is also an option to handle deleted files by marking all of the files in the database as ToBeDeleteed prior to the pass and clearing this if the file was found. Then, at the end of the process, we can just delete all of the records that are marked as ToBeDeleted.
Obviously you need to make "snapshots" of the directory tree and compare them as required. What exactly goes into the snapshots would depend on your requirements. Keep in mind that:
You need to store filenames in order to detect "new" and "deleted" files
File sizes and last-modified times are a good and cheap indicator that a file has or has not changed, but do not provide a guarantee
Hashing the contents of files can be prohibitively expensive if the files can be large, but it's the only way to know they have changed with a near-perfect degree of accuracy (remember that hashes can collide as well, so if you want mathematical 100% certainty that's not going to be good enough either)
Say I want to be informed whenever a file copy is launched on my system and get the file name, the destination where it is being copied or moved and the time of copy.
Is this possible? How would you go about it? Should you hook CopyFile API function?
Is there any software that already accomplishes this?
Windows has the concept of I/O filters which allow you to intercept all I/O operations and choose to perform additional actions as a result. They are primarily used for A/V type scenarios but can be programmed for a wide variety of tasks. The SysInternals Process Monitor for example uses a I/O filter to see the file level access.
You can view your current filters using MS Filter Manager, (fltmc.exe from a command prompt)
There is a kit to help you write filters, you can get the drivers and develop your own.
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx is a starting place to get in depth info
As there is a .NET tag on this question, I would simply use System.IO.FileSystemWatcher that's in the .NET Framework. I'm guessing it is implemented using the I/O Filters that Andrew mentions in his answer, but I really do not know (nor care, exactly). Would that fit your needs?
As Andrew says a filter driver is the way to go.
There is no foolproof way of detecting a file copy as different programs copy files in different ways (some may use the CopyFile API, others may just read one file and write out the contents to another themselves). You could try calculating a hash in your filter driver of any file opened for reading, and then do the same after a program finishes writing to a file. If the hashes match you know you have a file copy. However this technique may be slow. If you just hook the CopyFile API you will miss file copies made without that API. Java programs (to name but one) have no access to the CopyFile API.
This is likely impossible as there is no guaranteed central method for performing a copy/move. You could hook into a core API (like CopyFile) but of course that means that you will still miss any copy/move that any application does without using this API.
Maybe you could watch the entire filesystem with IO filters for open files and then just draw conclusions yourself if two files with same names and same filesizes are open at the same time. But that no 100% solution either.
As previously mentioned, a file copy operation can be implemented in various ways and may involve several disk and memory transfers, therefore is not possible to simply get notified by the system when such operation occurs.
Even for the user, there are multiple ways to duplicate content and entire files. Copy commands, "save as", "send to", move, using various tools. Under the hood the copy operation is a succession of read / write, correlated by certain parameters. That is the only way to guarantee successful auditing. Hooking on CopyFile will not give you the copy operations of Total Commander, for example. Nor will it give you "Save as" operations which are in fact file create -> file content moved -> closing of original file -> opening of the new file. Then, things are different when dealing with copy over network, impersonated copy operations where the file handle security context is different than the process security context, and so on. I do not think that there is a straightforward way to achieve all of the above.
However, there is a software that can notify you for most of the common copy operations (i.e. when they are performed through windows explorer, total commander, command prompt and other applications). It also gives you the source and destination file name, the timestamp and other relevant details. It can be found here: http://temasoft.com/products/filemonitor.
Note: I work for the company which develops this product.