I've used FileSystemWatcher in the past. However, I am hoping someone can explain how it actually is working behind the scenes.
I plan to utilize it in an application I am making and it would monitor about 5 drives and maybe 300,000 files.
Does the FileSystemWatcher actually do "Checking" on the drive - as in, will it be causing wear/tear on the drive? Also does it impact hard drive ability to "sleep"
This is where I do not understand how it works - if it is like scanning the drives on a timer etc... or if its waiting for some type of notification from the OS before it does anything.
I just do not want to implement something that is going to cause extra reads on a drive and keep the drive from sleeping.
Nothing like that. The file system driver simply monitors the normal file operations requested by other programs that run on the machine against the filters you've selected. If there's a match then it adds an entry to an internal buffer that records the operation and the filename. Which completes the driver request and gets an event to run in your program. You'll get the details of the operation passed to you from that buffer.
So nothing actually happens the operations themselves, there is no extra disk activity at all. It is all just software that runs. The overhead is minimal, nothing slows down noticeably.
The short answer is no. The FileSystemWatcher calls the ReadDirectoryChangesW API passing it an asynchronous flag. Basically, Windows will store data in an allocated buffer when changes to a directory occur. This function returns the data in that buffer and the FileSystemWatcher converts it into nice notifications for you.
Related
I need to uniquely identify a file on Windows so I can always have a reference for that file even if it's moved or renamed. I did some research and found the question Unique file identifier in windows with a way that uses the method GetFileInformationByHandle with C++, but apparently that only works for NTFS partitions, but not for the FAT ones.
I need to program a behavior like the one on DropBox: if you close it on your computer, rename a file and open it again it detects that change and syncs correctly. I wonder whats the technique and maybe how DropBox does if you guys know.
FileSystemWatcher for example would work, but If the program using it is closed, no changes can be detected.
I will be using C#.
Thanks,
The next best method (but one that involves reading every file completely, which I'd avoid when it can be helped) would be to compare file size and a hash (e.g. SHA-256) of the file contents. The probability that both collide is fairly slim, especially under normal circumstances.
I'd use the GetFileInformationByHandle way on NTFS and fall back to hashing on FAT volumes.
In Dropbox' case I think though, that there is a service or process running in background observing file system changes. It's the most reliable way, even if it ceases to work if you stop said service/process.
What the user was looking for was most likely Windows Change Journals. Those track changes like renames of files persistently, no need to have a watcher observing file system events running all the time. Instead, one simply needs to maintain when last looked at the log and continue looking again beginning at that point. At some point a file with an already known ID would have an event of type RENAME and whoever is interested in that event could do the same for its own version of that file. The important thing is to keep track of the used IDs for files of course.
An automatic backup application is one example of a program that must check for changes to the state of a volume to perform its task. The brute force method of checking for changes in directories or files is to scan the entire volume. However, this is often not an acceptable approach because of the decrease in system performance it would cause. Another method is for the application to register a directory notification (by calling the FindFirstChangeNotification or ReadDirectoryChangesW functions) for the directories to be backed up. This is more efficient than the first method, however, it requires that an application be running at all times. Also, if a large number of directories and files must be backed up, the amount of processing and memory overhead for such an application might also cause the operating system's performance to decrease.
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.
https://learn.microsoft.com/en-us/windows/win32/fileio/change-journals
So, I'd like to write a logger in c# for an app I'm working on. However, since I love efficiency, I don't want to be opening and closing a log file over and over again during execution.
I think I'd like to write all events to RAM and then write to the log file once when the app exits. Would this be a good practice? If so, how should I implement it?
If this is not a good practice, what would be?
(And I'm not using Windows' event log at this time.)
However, since I love efficiency, I don't want to be opening and closing a log file over and over again during execution
No. It's since you love premature optimizations.
I think I'd like to write all events to RAM and then write to the log file once when the app exits. Would this be a good practice? If so, how should I implement it?
If you love efficiency, why do you want to waste a lot of memory for log entries?
If this is not a good practice, what would be?
It is if you want to lose all logs when your application crashes (since it cannot write the log to disk then). Why did you create the log in the first place?
You'll have to think of some issues you might encounter:
System being shut down while your application runs -> no log files
An application crash might not invoke your write method
If the log grows large (how long does your application run?), you might get memory problems
If the log grows large, and there is not enough space on the drive, not a single log line will be written
You could simply keep the file open while your application runs (with at least FileShare.Read so you can monitor it), or consider writing batches of log lines, invoking the write method after a group of methods, or even using a timer.
Well if your app crashes, you lose all your logs. Better if you flush the logs to disk at an appropriate moment.. Lazy write:
Queue off the log entries to a seperate logger thread, (ie. store them in some class and queue the class instance to a producer-consumer queue). In the logger thread, wait on the input queue with a timeout. If a log entry comes in, store it in a local cache queue.
If (timeout fires) or (some high water mark of logs stored is reached) then write all cached log entries to the file and flush file buffers.
Rgds,
Martin
I am setting out to create an app that will watch a directory for any files created. pretty straightforward time to use a filesystemwatcher. My question relates to how to utilize it. Is it common practice to use a windows service to ensure the application is always running?
i have been trying to get away from building windows services if i don't have to, but i really don't see an alternative to doing it that way in this instance. normally, i would convert my service to a console app and schedule it using windows scheduler, but that doesn't really apply in this situation.
can anyone recommend a better way of implementing the filesystemwatcher than a windows service?
thanks for any thoughts.
EDIT
in response to the comments below, more specifically, i just have to watch a directory on a server, and when a new file is created, i have to move a copy of that file into a different directory on the same server, perhaps renaming it in the process.
The frequency and amount of files will be quite small. perhaps 5-10 at most in a day.
I'm not sure yet how the file watcher works, but this is what I'm thinking: The file system fires events; I mean like NTFS must be doing that. Your file watcher hooks into those events. The file watcher probably suspends the thread it's running in until an event occurs and the event somehow wakes up the thread. A suspended thread uses pretty much very few cpu cycles (actually none) while it is suspended, so waiting for a file event costs nothing. So a polled approach wastes mucho beaucoup (that's French, it means 'a shit load') of cpu cycles but the file watcher does not. You could probably look at PerfMon to see if this is likely true.
You should describe more about what you want to do, but typically if you have something that needs to run in the background and does not require direct user interaction, then a service often makes sense.
You can use Remoting to connect a front-end to your service as needed if you'd like.
Yes, use a service for this kind of operation, but don't use filesystem watcher. If you poll for files in your service, dont use the Timer class either.
Do check to make sure the file is completed writing and is no longer locked before trying to move it.
Its trivial to poll for file changes (syntax may be off), and eliminates much of the extra programming associated with file system watcher events.
While True 'or your exit condition'
Dim _files() as FileInfo = Directory.GetFiles(yourTargetDirectory)
For Each _file as FileInfo In _files
If _file.LastModifiedDate < DateTime.Now.AddMinutes(1) Then
'move your files'
End If
Next
End While
Using a Windows service to wrap FileSystemWatcher is perfectly fine for what you need.
FSW is the right tool for the job (native code for filesystem watching is a bear to get right), and a service is the right mechanism to deploy it given you need 'always on' operation.
The service credentials will be independent of logged-in user too, which may be useful to you.
I have a faulty hard drive that works intermittently. After cold booting, I can access it for about 30-60 seconds, then the hard drive fails. I'm willing to write a software to backup this drive to a new and bigger disk. I can develop it under GNU/Linux or Windows, I don't care.
The problem is: I can only access the disk for some time, and there are some files that are big and will take longer than that to be copied. For this reason, I'm thinking of backing up the entire hard disk in smaller pieces, something like bit torrenting. I'll read some megabytes and store it, before trying to read another set. My main loop would be something like this:
while(1){
if(!check_harddrive()){ sleep(100ms); continue; }
read_some_megabytes();
if(!check_harddrive()){ sleep(100ms); continue; }
save_data();
update_reading_pointer();
if(all_done){ break; }
}
The problem is the check_harddrive() function. I'm willing to write this in C/C++ for maximus API/library compatibility. I'll need some control over my file handlers to check if they are still valid, and I need something to return bad data, but return, if the drive fails during the copy process.
Maybe C# would give me best results if I abuse "hardcoded" hardware exceptions?
Another approach would be measuring how much time would I need to power cycle my harddrive and code a program to read it during this time only, and flagging me when to power cycle.
What would you do in this case? Are there any tools/utilities that already do this?
Oh, there is a GREAT app to read bad optical medias here, it's called IsoPuzzle, it's not mine, I just wanted to share something related to my problem.
!EDIT!
Some clarifications. I'm a home user, a student of computer engineering at college, I'd rather lose the data than spend thousands of dollars recovering it. The harddrive is still covered by Seagate's warranty, but since they gave me 5 years of warranty, I wanna try everything possible until the time runs out.
When I say cold booting, I mean booting after some seconds without power. Hot booting would be rebooting your computer, cold booting would be shutting it down, waiting a few seconds then bootting it up again. Since the harddisk in question is internal but SATA, I can just disconnect the power cable, wait a few seconds and connect it again.
Until now I'll go with robocopy, I'm just searching for it to see how I can use it. If I don't need to code myself, but script, it'll be even easier.
!EDIT2!
I wasn't clear, my drive is a Seagate 7200.11. It's known that it has a bad firmware and it's not always fixable with a simple firmware update (not after this bug appears). The drive physically is 100% in working condition, just the firmware is screwed, making it enter on a infinite busy state after some seconds.
I would work this from the hardware angle first. Is it an external drive - if so, can you try it in a different case?
You mention cold-booting works, then it quits. Is this heat related? Have you tried using the hard drive for an extended period in something like a freezer?
From the software side I'd have a second thread keep an eye on some progress counter updated by a repeated loop reading small amounts of data, then it would be able to signal failure via a timeout you would define.
I think the simplest way for you is to copy the entire disk image. Under Linux your disk will appear as a block device, /dev/sdb1 for example.
Start copying the disk image until the read error appear. Then wait for the user to "repair" the disk and start reading from the last position.
You can easily mount file disk image and read its content, see -o loop option for mount.
Cool down disk before use. I heard that helps.
You might be interested in robocopy("Robust File Copy"). Robocopy is a command line tool and it can tolerate network outages and resume copying where it previously left off (incomplete files are noted with a date stamp corresponding to 1980-01-01 and contain a recovery record so Robocopy knows from where to continue).
You know... I like being "lazy"... Here is what I would do:
I would write 2 simple scripts. One of them would start robocopy (with persistance feautures turned off) and start the copying while the other would periodically check (maybe by trying to list the contents of the root directory and if it takes more than a few seconds than it it is dead... again..) whether the drive is still working and if the HDD stopped working it would restart the machine. Get them start up after login and setup up auto-login so when the machines reboots it automatically continues.
From a "I need to get my data back" perspective, if your data is really valuable to you, I would recommend sending the drive to a data recovery specialist. Depending on how valuable the data is, the cost (probably several hundred dollars) is trivial. Ideally, you would find a data recovery specialist that doesn't just run some software to do the recovery - if the software approach doesn't work, they should be able to do things like replace the circiut board on the drive, and probably other things (I am not a data recover specialist).
If the value of the data on the drive doesn't quite rise to that level, you should consider purchasing one of the many pieces of software for data recovery. For example, I personally have used and would recommend GetDataBack from Runtime software http://www.runtime.org. I've used it to recover a failing drive, it worked for me.
And now on to more general information... The standard process for data recovery off of a failing drive is to do as little as possible on the drive itself. You should unplug the drive, and stop attempting to do anything. The drive is failing, and it is likely to get worse and worse. You don't want to play around with it. You need to maximize your chances of getting the data off.
The way the process works is to use software that reads the drive block-by-block (not file-by-file), and makes an image copy of the drive. The software attempts to read every block, and will retry the reads if they fail, and writes an image file which is an image of the entire hard drive.
Once the hard drive has been imaged, the software then works against the image to identify the various logical parts of the drive - the partitions, directories, and files. And then it enables you to copy the files off of the image.
The software can typically "deduce" structures from the image. For example, if the partition table is damaged or missing, the software will scan through the entire image, looking for things that might be partitions, and if they look enough like partitions, it will treat them like a partition and see if it can find directories and files. So good software is written with using a lot of knowledge about the different structures on the drive.
If you want to learn how to write such software, good for you! My recommendation is that you start with books about how various operating systems organize data on hard drives, so that you can start to get an intuitive feel for how a software might work with drive images to pull data from them.
By doing some google, i came to know that 'it is used to keep track of file system actions'. But i don't undrstand it's utility, the watcher can directly trigger the event(s) without storing it in some intermediate buffer!!
Is it there to convert asynchronous flow of events (copying/modifying files) into synchrounous event calls ? Also, I am not sure if FileWatcher triggers the events asynchronously.
Can someone please throw some light on this?
You're missing the point of the buffer in your question, I think.
From MSDN, FileSystemWatcher (emphasis mine):
The Windows operating system notifies your component of file
changes in a buffer created by the
FileSystemWatcher. If there are many
changes in a short time, the buffer
can overflow. This causes the
component to lose track of changes in
the directory, and it will only
provide blanket notification.
So it's not a buffer of events that it hasn't told you about yet, it's the buffer it offers for windows to support the notifications in the first place, without having to poll. If Windows throws a huge pile of operations at this instance this buffer will overflow and you, the consumer/user of the FileSystemWatcher, will lose some notifications.
The underlying Windows API that makes FileSystemWatcher work is ReadDirectoryChangesW(). Note the 2nd argument, lpBuffer. That's a one-to-one match with the internal buffer whose size you can set with the InternalBufferSize property.
A buffer is required because Windows cannot easily run user code in response to directory changes. These changes are detected by the respective file system drivers, they run in kernel mode. Running user mode code requires an expensive mode switch and a thread context switch, much too expensive to do so for each individual detected change. The buffer is there to collect changes, waiting for the user mode code to start running and empty the buffer.
There's a well documented failure mode for FSW, there could be too many changes to keep up with. You'd see the Error event in managed code. Increasing the buffer size can help, a lot, the default buffer is rather small at 4096 bytes. Making it arbitrary large is not a good idea though, buffer space is also required in the kernel and that's taken from the kernel memory pool. That's a limited resource, gobbling large amounts from the pool affects all programs running on the machine.
The filewatcher will have to buffer request when it can't all handle them at once, which is mainly caused by the code you wrote to react to the events the FileSystemwatcher throws. As far as I know the FileSystemWatcher Events are not asynchonosly but you could spawn thread in an event to make the handling of your codee be asynchronosly. Of course the file system can change multiple files in one go, like delete all files or think of copy paste.
I hope that was clear.
Yes, FileSystemWatcher is used to keep track of changes in the file system. It watches a directory and reports the following changes to any files in the directory:
OnCreated: Called when a file or directory is created
OnChanged: Called when a file or directory is changed
OnRenamed: Called when a file or directory is renamed
OnDeleted: Called when a file or directory is deleted
The "internal buffer" is how the operating system sends information to the FileSystemWatcher. Its size is controlled by the "InternalBufferSize" property.
If too many changes occur at once the internal buffer can fill up. Then instead of getting all the individual changes you get a single change notification:
OnError: Called when individual changes were lost because of a buffer overflow
FileSystemWatcher does trigger events asynchronously. Specifically, the event is triggered whenever the file changes.