I have groups of files that need to be batch renamed from time to time. I don't want to rename any of them unless it can be asserted that all can be renamed from one to the other. Is there some kind of assertion method for doing this, or will I have to write my own?
Transactional NTFS can do that. There are .NET wrappers for that.
If you don't want to use that consider opening all files in exclusive mode before you start the renaming. That gives you assurance that there was a point in time where each of the files was unused. Of course, those files can be opened right after your check so it is only a heuristic.
Related
I want to use some mutex on files, so any process won't touch certain files before other stop using them. How can I do it in .NET 3.5? Here are some details:
I have some service, which checks every period of time if there are any files/directories in certain folder and if there are, service's doing something with it.
My other process is responsible for moving files (and directories) into certain folder and everything works just fine.
But I'm worrying because there can be situation, when my copying process will copy the files to certain folder and in the same time (in the same milisecond) my service will check if there are some files, and will do something with them (but not with all of them, because it checked during the copying).
So my idea is to put some mutex in there (maybe one extra file can be used as a mutex?), so service won't check anything until copying is done.
How can I achieve something like that in possibly easy way?
Thanks for any help.
The canonical way to achieve this is the filename:
Process A copies the files to e.g. "somefile.ext.noprocess" (this is non-atomic)
Process B ignores all files with the ".noprocess" suffix
After Process B has finished copying, it renames the file to "somefile.ext"
Next time Process B checks, it sees the file and starts processing.
If you have more than one file, that have to be processd together (or none), you need to adapt this scheme to an additional transaction file containing the file names for the transaction: Only if this file exists and has the correct name, must process B read it and process the files mentioned in it.
Your problem really is not of mutual exclusion, but of atomicity. Copying multiple files is not an atomic operation, and so it is possible to observe the files in a half-copied state which you'd like to prevent.
To solve your problem, you could hinge your entire operation on a single atomic file system operation, for example renaming (or moving) of a folder. That way no one can observe an intermediate state. You can do it as follows:
Copy the files to a folder outside the monitored folder, but on the same drive.
When the copying operation is complete, move the folder inside the monitored folder. To any outside process, all the files would appear at once, and it would have no chance to see only part of the files.
I need to uniquely identify a file on Windows so I can always have a reference for that file even if it's moved or renamed. I did some research and found the question Unique file identifier in windows with a way that uses the method GetFileInformationByHandle with C++, but apparently that only works for NTFS partitions, but not for the FAT ones.
I need to program a behavior like the one on DropBox: if you close it on your computer, rename a file and open it again it detects that change and syncs correctly. I wonder whats the technique and maybe how DropBox does if you guys know.
FileSystemWatcher for example would work, but If the program using it is closed, no changes can be detected.
I will be using C#.
Thanks,
The next best method (but one that involves reading every file completely, which I'd avoid when it can be helped) would be to compare file size and a hash (e.g. SHA-256) of the file contents. The probability that both collide is fairly slim, especially under normal circumstances.
I'd use the GetFileInformationByHandle way on NTFS and fall back to hashing on FAT volumes.
In Dropbox' case I think though, that there is a service or process running in background observing file system changes. It's the most reliable way, even if it ceases to work if you stop said service/process.
What the user was looking for was most likely Windows Change Journals. Those track changes like renames of files persistently, no need to have a watcher observing file system events running all the time. Instead, one simply needs to maintain when last looked at the log and continue looking again beginning at that point. At some point a file with an already known ID would have an event of type RENAME and whoever is interested in that event could do the same for its own version of that file. The important thing is to keep track of the used IDs for files of course.
An automatic backup application is one example of a program that must check for changes to the state of a volume to perform its task. The brute force method of checking for changes in directories or files is to scan the entire volume. However, this is often not an acceptable approach because of the decrease in system performance it would cause. Another method is for the application to register a directory notification (by calling the FindFirstChangeNotification or ReadDirectoryChangesW functions) for the directories to be backed up. This is more efficient than the first method, however, it requires that an application be running at all times. Also, if a large number of directories and files must be backed up, the amount of processing and memory overhead for such an application might also cause the operating system's performance to decrease.
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.
https://learn.microsoft.com/en-us/windows/win32/fileio/change-journals
I need to monitor a folder and its subdirectories for any file manipulations (add/remove/rename). I've read about FileSystemWatcher but I'd like to monitor changes between each time the program is run or when the user presses the "check for changes" button (FSW seems more orientated to runtime detection). My first thought was to iterate through all the (sub)directories and hash each file. Then, concatenate all the hashes (which have been ordered) and hash that. When I want to check for changes, I repeat the process and check if the hashes are the same.
Is this an efficient way of doing it?
Also, once I've detected a change, how do I find out what file has been added, removed or renamed as quickly as possible?
As a side note, I don't mind using scripts to do this if they're faster as long as those scripts don't require end users to install anything and the scripts can notify my C# app of the changes.
We handle this by storing all found files in a database along with their last modification time.
On each pass through the files, we check the database for each file: if it doesn't exist in the DB, it is new and if it does exist, but the timestamp is different, it has changed.
There is also an option to handle deleted files by marking all of the files in the database as ToBeDeleteed prior to the pass and clearing this if the file was found. Then, at the end of the process, we can just delete all of the records that are marked as ToBeDeleted.
Obviously you need to make "snapshots" of the directory tree and compare them as required. What exactly goes into the snapshots would depend on your requirements. Keep in mind that:
You need to store filenames in order to detect "new" and "deleted" files
File sizes and last-modified times are a good and cheap indicator that a file has or has not changed, but do not provide a guarantee
Hashing the contents of files can be prohibitively expensive if the files can be large, but it's the only way to know they have changed with a near-perfect degree of accuracy (remember that hashes can collide as well, so if you want mathematical 100% certainty that's not going to be good enough either)
What is difference between
Copying a file and deleting it using File.Copy() and File.Delete()
Moving the file using File.Move()
In terms of permission required to do these operations is there any difference? Any help much appreciated.
File.Move method can be used to move the file from one path to another. This method works across disk volumes, and it does not throw an exception if the source and destination are the same.
You cannot use the Move method to overwrite an existing file. If you attempt to replace a file by moving a file of the same name into that directory, you get an IOException. To overcome this you can use the combination of Copy and Delete methods
Performance wise, if on one and the same file system, moving a file is (in simplified terms) just adjusting some internal registers of the file system itself (possibly adjusting some nodes in a red/black-tree), without actually moving something.
Imagine you have 180MiB to move, and you can write onto your disk at roughly 30MiB/s. Then with copy/delete, it takes approximately 6 seconds to finish. With a simple move [same file system], it goes so fast you might not even realise it.
(I once wrote some transactional file system helpers that would move or copy multiple files, all or none; in order to make the commit as fast as possible, I moved/copied all stuff into a temporary sub-folder first, and then the final commit would move existent data into another folder (to enable rollback), and the new data up to the target).
I don't think there is any difference permission-wise, but I would personally prefer to use File.Move() since then you have both actions happening in the same "transaction". In other words if something on the move fails the whole operation fails. However, if you break it up in two steps (copy + delete) if copy worked and delete failed, you would have to reverse the "transaction" (delete the copy) manually.
Permission in file transfer is checked at two points: source, and destination. So, if you don't have read permission in source folder, or you don't have write permission in destination, then these methods both throw AccessDeniedException exception. In other words, permission checking is agnostic to method in use.
Say I want to be informed whenever a file copy is launched on my system and get the file name, the destination where it is being copied or moved and the time of copy.
Is this possible? How would you go about it? Should you hook CopyFile API function?
Is there any software that already accomplishes this?
Windows has the concept of I/O filters which allow you to intercept all I/O operations and choose to perform additional actions as a result. They are primarily used for A/V type scenarios but can be programmed for a wide variety of tasks. The SysInternals Process Monitor for example uses a I/O filter to see the file level access.
You can view your current filters using MS Filter Manager, (fltmc.exe from a command prompt)
There is a kit to help you write filters, you can get the drivers and develop your own.
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx is a starting place to get in depth info
As there is a .NET tag on this question, I would simply use System.IO.FileSystemWatcher that's in the .NET Framework. I'm guessing it is implemented using the I/O Filters that Andrew mentions in his answer, but I really do not know (nor care, exactly). Would that fit your needs?
As Andrew says a filter driver is the way to go.
There is no foolproof way of detecting a file copy as different programs copy files in different ways (some may use the CopyFile API, others may just read one file and write out the contents to another themselves). You could try calculating a hash in your filter driver of any file opened for reading, and then do the same after a program finishes writing to a file. If the hashes match you know you have a file copy. However this technique may be slow. If you just hook the CopyFile API you will miss file copies made without that API. Java programs (to name but one) have no access to the CopyFile API.
This is likely impossible as there is no guaranteed central method for performing a copy/move. You could hook into a core API (like CopyFile) but of course that means that you will still miss any copy/move that any application does without using this API.
Maybe you could watch the entire filesystem with IO filters for open files and then just draw conclusions yourself if two files with same names and same filesizes are open at the same time. But that no 100% solution either.
As previously mentioned, a file copy operation can be implemented in various ways and may involve several disk and memory transfers, therefore is not possible to simply get notified by the system when such operation occurs.
Even for the user, there are multiple ways to duplicate content and entire files. Copy commands, "save as", "send to", move, using various tools. Under the hood the copy operation is a succession of read / write, correlated by certain parameters. That is the only way to guarantee successful auditing. Hooking on CopyFile will not give you the copy operations of Total Commander, for example. Nor will it give you "Save as" operations which are in fact file create -> file content moved -> closing of original file -> opening of the new file. Then, things are different when dealing with copy over network, impersonated copy operations where the file handle security context is different than the process security context, and so on. I do not think that there is a straightforward way to achieve all of the above.
However, there is a software that can notify you for most of the common copy operations (i.e. when they are performed through windows explorer, total commander, command prompt and other applications). It also gives you the source and destination file name, the timestamp and other relevant details. It can be found here: http://temasoft.com/products/filemonitor.
Note: I work for the company which develops this product.