By doing some google, i came to know that 'it is used to keep track of file system actions'. But i don't undrstand it's utility, the watcher can directly trigger the event(s) without storing it in some intermediate buffer!!
Is it there to convert asynchronous flow of events (copying/modifying files) into synchrounous event calls ? Also, I am not sure if FileWatcher triggers the events asynchronously.
Can someone please throw some light on this?
You're missing the point of the buffer in your question, I think.
From MSDN, FileSystemWatcher (emphasis mine):
The Windows operating system notifies your component of file
changes in a buffer created by the
FileSystemWatcher. If there are many
changes in a short time, the buffer
can overflow. This causes the
component to lose track of changes in
the directory, and it will only
provide blanket notification.
So it's not a buffer of events that it hasn't told you about yet, it's the buffer it offers for windows to support the notifications in the first place, without having to poll. If Windows throws a huge pile of operations at this instance this buffer will overflow and you, the consumer/user of the FileSystemWatcher, will lose some notifications.
The underlying Windows API that makes FileSystemWatcher work is ReadDirectoryChangesW(). Note the 2nd argument, lpBuffer. That's a one-to-one match with the internal buffer whose size you can set with the InternalBufferSize property.
A buffer is required because Windows cannot easily run user code in response to directory changes. These changes are detected by the respective file system drivers, they run in kernel mode. Running user mode code requires an expensive mode switch and a thread context switch, much too expensive to do so for each individual detected change. The buffer is there to collect changes, waiting for the user mode code to start running and empty the buffer.
There's a well documented failure mode for FSW, there could be too many changes to keep up with. You'd see the Error event in managed code. Increasing the buffer size can help, a lot, the default buffer is rather small at 4096 bytes. Making it arbitrary large is not a good idea though, buffer space is also required in the kernel and that's taken from the kernel memory pool. That's a limited resource, gobbling large amounts from the pool affects all programs running on the machine.
The filewatcher will have to buffer request when it can't all handle them at once, which is mainly caused by the code you wrote to react to the events the FileSystemwatcher throws. As far as I know the FileSystemWatcher Events are not asynchonosly but you could spawn thread in an event to make the handling of your codee be asynchronosly. Of course the file system can change multiple files in one go, like delete all files or think of copy paste.
I hope that was clear.
Yes, FileSystemWatcher is used to keep track of changes in the file system. It watches a directory and reports the following changes to any files in the directory:
OnCreated: Called when a file or directory is created
OnChanged: Called when a file or directory is changed
OnRenamed: Called when a file or directory is renamed
OnDeleted: Called when a file or directory is deleted
The "internal buffer" is how the operating system sends information to the FileSystemWatcher. Its size is controlled by the "InternalBufferSize" property.
If too many changes occur at once the internal buffer can fill up. Then instead of getting all the individual changes you get a single change notification:
OnError: Called when individual changes were lost because of a buffer overflow
FileSystemWatcher does trigger events asynchronously. Specifically, the event is triggered whenever the file changes.
Related
I've used FileSystemWatcher in the past. However, I am hoping someone can explain how it actually is working behind the scenes.
I plan to utilize it in an application I am making and it would monitor about 5 drives and maybe 300,000 files.
Does the FileSystemWatcher actually do "Checking" on the drive - as in, will it be causing wear/tear on the drive? Also does it impact hard drive ability to "sleep"
This is where I do not understand how it works - if it is like scanning the drives on a timer etc... or if its waiting for some type of notification from the OS before it does anything.
I just do not want to implement something that is going to cause extra reads on a drive and keep the drive from sleeping.
Nothing like that. The file system driver simply monitors the normal file operations requested by other programs that run on the machine against the filters you've selected. If there's a match then it adds an entry to an internal buffer that records the operation and the filename. Which completes the driver request and gets an event to run in your program. You'll get the details of the operation passed to you from that buffer.
So nothing actually happens the operations themselves, there is no extra disk activity at all. It is all just software that runs. The overhead is minimal, nothing slows down noticeably.
The short answer is no. The FileSystemWatcher calls the ReadDirectoryChangesW API passing it an asynchronous flag. Basically, Windows will store data in an allocated buffer when changes to a directory occur. This function returns the data in that buffer and the FileSystemWatcher converts it into nice notifications for you.
I need to uniquely identify a file on Windows so I can always have a reference for that file even if it's moved or renamed. I did some research and found the question Unique file identifier in windows with a way that uses the method GetFileInformationByHandle with C++, but apparently that only works for NTFS partitions, but not for the FAT ones.
I need to program a behavior like the one on DropBox: if you close it on your computer, rename a file and open it again it detects that change and syncs correctly. I wonder whats the technique and maybe how DropBox does if you guys know.
FileSystemWatcher for example would work, but If the program using it is closed, no changes can be detected.
I will be using C#.
Thanks,
The next best method (but one that involves reading every file completely, which I'd avoid when it can be helped) would be to compare file size and a hash (e.g. SHA-256) of the file contents. The probability that both collide is fairly slim, especially under normal circumstances.
I'd use the GetFileInformationByHandle way on NTFS and fall back to hashing on FAT volumes.
In Dropbox' case I think though, that there is a service or process running in background observing file system changes. It's the most reliable way, even if it ceases to work if you stop said service/process.
What the user was looking for was most likely Windows Change Journals. Those track changes like renames of files persistently, no need to have a watcher observing file system events running all the time. Instead, one simply needs to maintain when last looked at the log and continue looking again beginning at that point. At some point a file with an already known ID would have an event of type RENAME and whoever is interested in that event could do the same for its own version of that file. The important thing is to keep track of the used IDs for files of course.
An automatic backup application is one example of a program that must check for changes to the state of a volume to perform its task. The brute force method of checking for changes in directories or files is to scan the entire volume. However, this is often not an acceptable approach because of the decrease in system performance it would cause. Another method is for the application to register a directory notification (by calling the FindFirstChangeNotification or ReadDirectoryChangesW functions) for the directories to be backed up. This is more efficient than the first method, however, it requires that an application be running at all times. Also, if a large number of directories and files must be backed up, the amount of processing and memory overhead for such an application might also cause the operating system's performance to decrease.
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.
https://learn.microsoft.com/en-us/windows/win32/fileio/change-journals
I need to create a service which is basically responsible for the following:
Watch a specific folder for any new files created.
If yes , read that file , process it and save data in DB.
For the above task, I am thinking of creating a multi threaded service with either of the following approach:
In the main thread, create an instance of filesystem watcher and as soon as a new file is created, add that file in the threadQueue. There will be N no. of consumer threads running which should take a file from the queue and process it (i.e step 2).
Again in the main thread, create an instance of filesystem watcher and as soon as a new file is created, read that file and add the data to MSMQ using wcf MSMQ service. When the message is read by the wcf msmq service, it will be responsible for processing further
I am a newbie when it comes to creating a multi threaded service. So not sure which will tbe the best option. Please guide me.
Thanks,
First off, let me say that you have taken a wise approach to do a single producer - multiple consumer model. This is the best approach in this case.
I would go for option 1, using a ConcurrentQueue data structure, which provides you an easy way to queue tasks in a thread-safe manner. Alternatively, you can simply use the ThreadPool.QueueUserWorkItem method to send work directly to the built-in thread pool, without worrying about managing the workers or the queue explicitly.
Edit: Regarding the reliability of FileSystemWatcher, MSDN says:
The Windows operating system notifies your component of file changes
in a buffer created by the FileSystemWatcher. If there are many
changes in a short time, the buffer can overflow. This causes the
component to lose track of changes in the directory, and it will only
provide blanket notification. Increasing the size of the buffer with
the InternalBufferSize property is expensive, as it comes from
non-paged memory that cannot be swapped out to disk, so keep the
buffer as small yet large enough to not miss any file change events.
To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties so you can filter out unwanted change
notifications.
So it depends on how often changes will occur and how much buffer you are allocating.
I would also consider your demands for failure handling and sizes of the files you are sending.
Whether you decide for option 1 or 2 will be dependent on specifications.
Option 2 has the avantage that by using MSMQ you have your data persisted in a recoverable way, even if you may need to restart your machine. Option 1 only has your data in memory which might get lost.
On the other hand, option 2 has a disadvantage that the message size of MSMQ is limited to 4 MB per message (explanation in a Microsoft blog here) and therefore only half of it when working with unicode characters, while the in-memory queues are capaple of much bigger sizes.
[Edit]
Thinking a bit longer, I would prefer option 2.
In your comment, you mention that you want to move files around in the filesystem. This can be very expensive in regards to performance, even worse if you move the files between different partions.
I have used the MSQM in multiple projects at work and am convinced that it would work well for what you want to do. A big advantage here would be that the MSMQ works with transactional communications. That means, that if for some reason a network or electricity or whatever failure occurs, neither your message nor your files get lost.
If any of those happen while you move a file around it could easily get corrupted.
Only thing I have grumbles in my stomach is the file sizes. To work around the message size limitations of 4 MB (see added link above), I would not put the file content into a message. Instead. I would only send an ID or a filepath with it so that the consuming service can find it and read it when needed.
This keeps the message and queue sizes small and avoids using too much bandwith or memory in network and on your serve(s).
When there is a big file moved into the watched folder,it raises created event even before the file is copied fully.
Copying such files within the created event causes 'file being used by another process' error.
I used a thread that tries to copy the file until it is allowed to do so.but i am still not satisfied.
Can we configure FileSystemWatcher such that the created event is raised only after the file is fully copied?thanks.
The documentation for the FileSystemWatcher class specifically states your observed behaviour
Common file system operations might raise more than one event. For example, when a file is moved from one directory to another, several OnChanged and some OnCreated and OnDeleted events might be raised. Moving a file is a complex operation that consists of multiple simple operations, therefore raising multiple events. Likewise, some applications (for example, antivirus software) might cause additional file system events that are detected by FileSystemWatcher.
You could check that a lock to the file has been released, and then know that the copying is complete?
Why Don't you create your own event that is thrown when the copy is finished by the Thread? You can specify the file and have a method that lisens to the event to handle the post processing.
Details on how to create your own events here: http://msdn.microsoft.com/en-us/library/5z57dxz2.aspx
Because the FileSystemWatcher as discussed by plenderj states that events might be fired you cannot use it. It is really only good for first time creation aka first starts of copy and then the on change will be called possibly numerous times.
EDIT: There is a VB (only) Class that wraps the Windows Copy. You can use this in c# since it all runs off of the clr. This will provide the Windows Dialog to show the progress of the copy.
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.filesystem.copyfile.aspx
Thanks,
Brad
I am setting out to create an app that will watch a directory for any files created. pretty straightforward time to use a filesystemwatcher. My question relates to how to utilize it. Is it common practice to use a windows service to ensure the application is always running?
i have been trying to get away from building windows services if i don't have to, but i really don't see an alternative to doing it that way in this instance. normally, i would convert my service to a console app and schedule it using windows scheduler, but that doesn't really apply in this situation.
can anyone recommend a better way of implementing the filesystemwatcher than a windows service?
thanks for any thoughts.
EDIT
in response to the comments below, more specifically, i just have to watch a directory on a server, and when a new file is created, i have to move a copy of that file into a different directory on the same server, perhaps renaming it in the process.
The frequency and amount of files will be quite small. perhaps 5-10 at most in a day.
I'm not sure yet how the file watcher works, but this is what I'm thinking: The file system fires events; I mean like NTFS must be doing that. Your file watcher hooks into those events. The file watcher probably suspends the thread it's running in until an event occurs and the event somehow wakes up the thread. A suspended thread uses pretty much very few cpu cycles (actually none) while it is suspended, so waiting for a file event costs nothing. So a polled approach wastes mucho beaucoup (that's French, it means 'a shit load') of cpu cycles but the file watcher does not. You could probably look at PerfMon to see if this is likely true.
You should describe more about what you want to do, but typically if you have something that needs to run in the background and does not require direct user interaction, then a service often makes sense.
You can use Remoting to connect a front-end to your service as needed if you'd like.
Yes, use a service for this kind of operation, but don't use filesystem watcher. If you poll for files in your service, dont use the Timer class either.
Do check to make sure the file is completed writing and is no longer locked before trying to move it.
Its trivial to poll for file changes (syntax may be off), and eliminates much of the extra programming associated with file system watcher events.
While True 'or your exit condition'
Dim _files() as FileInfo = Directory.GetFiles(yourTargetDirectory)
For Each _file as FileInfo In _files
If _file.LastModifiedDate < DateTime.Now.AddMinutes(1) Then
'move your files'
End If
Next
End While
Using a Windows service to wrap FileSystemWatcher is perfectly fine for what you need.
FSW is the right tool for the job (native code for filesystem watching is a bear to get right), and a service is the right mechanism to deploy it given you need 'always on' operation.
The service credentials will be independent of logged-in user too, which may be useful to you.