How can System.IO.FileSystemInfo.Refresh be used - c#

There is an impressive lack of examples of the usage of Refresh.
I'm using the following method, which gets an inaccurate time
ViewBag.t1 = System.IO.File.GetLastAccessTime(#"C:\BillingExport\BILLING_TABLE_FILE01_1.txt");
I read that it's inaccurate because the OS hasn't performed a check and updated the files read/write times.
I've tried
System.IO.FileSystemInfo.Refresh(#"C:\BillingExport\BILLING_TABLE_FILE01_1.txt");
But this does not work and I can't locate a resource giving similar examples of its usage.

FileSystemInfo.Refresh is not a static method. What you have shown for your example does not compile. You should create a FileInfo object initialized with the file name and then you can call Refresh on that. You should then be able to use the properties of the FileInfo object to get the last access time and other pertinent file details.
var info = new FileInfo(#"C:\Temp\a.txt");
info.Refresh(#"C:\BillingExport\BILLING_TABLE_FILE01_1.txt");
var lastAccess = info.LastAccessTime;
One last edit based on an answer at the above linked possible duplicate and CodeCaster's answer:
http://blogs.technet.com/b/filecab/archive/2006/11/07/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance.aspx
Indicates that in Vista this was disabled by default. I just checked the registry in my Win 8.1 box and sure enough, the registry key is there and Last Access update is disabled by default. So, if you are on Vista or above the above code won't really work. If you are on XP than you should be golden!

FileSystemInfo is the abstract base class for FileInfo and DirectoryInfo. Which cache the properties of a file/directory. If you keep, say, a FileInfo object around and keep testing its Exists property then it gets to be important that you call Refresh().
Which has nothing to do with File.GetLastAccessTime(). The classes are entirely unrelated, the File class does no caching and always retrieves the last access time from the file system.
Which is unreliable if the file is opened by any program. The file system is just not in a hurry to update these attributes when a program is actively accessing the file. That's way too expensive, that can easily cost many dozens of milliseconds to send the disk drive write head to the MFT sector that stores these values. A program can access a file much faster than that. Documented in this MSDN article:
Not all file systems can record creation and last access times, and not all file systems record them in the same manner. For example, the resolution of create time on FAT is 10 milliseconds, while write time has a resolution of 2 seconds and access time has a resolution of 1 day, so it is really the access date. The NTFS file system delays updates to the last access time for a file by up to 1 hour after the last access.
Most relevant phrase bolded, what you see is pretty much expected. You'll need to look for a different approach.

Related

C#: GetCreationTime and GetLastWriteTime gave same time, but my file size is increasing over the time

I created a performan file (.blg) and started the performance parameters collection.
The file Modified Date never updated on windows explorer, it is always showing up the Created Date like below, but over the time the file size is increasing.
Now I would like to get the file created data and last write date, but below code gave me same date/time (the created time). How to get last write time of the file?
Console.WriteLine(File.GetCreationTime(#"C:\Temp\BasicPerfCounters.blg"));
Console.WriteLine(File.GetLastWriteTime(#"C:\Temp\BasicPerfCounters.blg"));
The answer from MSDN:
This method may return an inaccurate value, because it uses native
functions whose values may not be continuously updated by the
operating system. Each operating system manages the last write time
according to its own rules. To improve performance, an operating
system might not set the last write time value to the exact time of
the last write operation, but might set it to a close approximation
instead.
https://msdn.microsoft.com/en-us/library/system.io.file.getlastwritetime(v=vs.110).aspx
So it seems that OS is not changing the Last Write Time for performance consideration. This is especially valid since the process seems to update the file very frequently and the OS is not intended to update the file Last Write Time accordingly.
Therefore, I suggest to save the last update time in an in-memory variable or even in database. However, the best way I think is to get the Last Write Time from the your performance file (.blg) that you are using.
Note: if you need to read the data form the .blg problematically (including the Last Write Time of the last action saved), you may consider changing the format to a Comma-Separated Value for example using the tool: RELOG.EXE. And here is Two Minute Drill article: https://blogs.technet.microsoft.com/askperf/2008/05/20/two-minute-drill-relog-exe/.

File move - How does the OS know whether to update a master file table or copy and delete?

After having read questions dealing with how to tell whether two files are on the same physical volume or not, and seeing that it's (almost) impossible (e.g. here), I'm wondering how the OS knows whether a file move operation should update a master file table (or its equivalent) or whether to copy and delete.
Does Windows delegate that to the drives somehow? (Or perhaps the OS does have information about every file, and it's just not accessible by programs? Unlikely.)
Or - Does Windows know only about certain types of drives (and copies and deletes in other cases)? In which case we could also assume the same. Which means allowing a file move without using a background thread, for example. (Because it will be near instantaneous.)
I'm trying to better understand this subject. If I'm making some basic incorrect assumption - please, correcting that in itself would be an answer.
If needed to limit the scope, let's concentrate on Windows 7 and up, and NTFS and FAT drives.
Of course the operating system knows which drive (and which partition on that drive) contains any particular local file; otherwise, how could it read the data? (For remote files, the operating system doesn't know about the drives, but it does know which server to contact. Moves between different servers are implemented as copy-and-delete; moves on the same server are either copy-and-delete or are delegated to that server, depending on the protocol in use.)
This information is also available to applications. You can use the GetFileInformationByHandle() function to obtain the serial number of the volume containing a particular file.
The OS does have information about every file, and it's just not as easily accessible to your program. Not in any portable way, that is.
See it this way: Those files are owned by the system. The system allocates the space, manages the volume and indexes. It's not going to copy and delete the file if it ends up in the same physical volume, as it is more efficient to move the file. It will only copy and delete if it needs to.
In C or C++ for Windows I first try to MoveFileEx without MOVEFILE_COPY_ALLOWED set. It will fail if the file can not be moved by renaming. If rename fails I know that it may take some time and show some progress bar or the like.
There are no such rename AFAIK in .NET and that System::IO::File::Move of .NET does not fail if you move between different volumes.
First, regarding Does Windows delegate that to the drives somehow. No. The OS is more like a central nervous system. It keeps track of whats going on centrally, and for its distributed assets (or devices) such as a drive. (internal or external)
It follows that the OS, has information about every file residing on a drive for which it has successfully enumerated. The most relevant part of the OS with respect to file access is the File System. There are several types. Knowledge of the following topics will help to understand issues surrounding file access:
1) File attribute settings
2) User Access Controls
3) File location (pdf) (related to User Access Controls)
4) Current state of file (i.e. is the file in use currently)
5) Access Control Lists
Regarding will be near instantaneous. This obviously is only a perception. No matter how fast, or seemingly simultaneous, file handling via standard programming libraries can be done in such a way as to be aware of file related errors, such as:
ENOMEM - insufficient memory.
EMFILE - FOPEN_MAX files open already.
EINVAL - filename is NULL or contains only whitespace.
EINVAL - invalid mode.
(these in relation to fopen) can be used to mitigate OS/file run-time issues. This being said, applications should always be written to comply with good programming methods to avoid bumping into OS related file access issues, thread safety included.

Uniquely identify file on Windows

I need to uniquely identify a file on Windows so I can always have a reference for that file even if it's moved or renamed. I did some research and found the question Unique file identifier in windows with a way that uses the method GetFileInformationByHandle with C++, but apparently that only works for NTFS partitions, but not for the FAT ones.
I need to program a behavior like the one on DropBox: if you close it on your computer, rename a file and open it again it detects that change and syncs correctly. I wonder whats the technique and maybe how DropBox does if you guys know.
FileSystemWatcher for example would work, but If the program using it is closed, no changes can be detected.
I will be using C#.
Thanks,
The next best method (but one that involves reading every file completely, which I'd avoid when it can be helped) would be to compare file size and a hash (e.g. SHA-256) of the file contents. The probability that both collide is fairly slim, especially under normal circumstances.
I'd use the GetFileInformationByHandle way on NTFS and fall back to hashing on FAT volumes.
In Dropbox' case I think though, that there is a service or process running in background observing file system changes. It's the most reliable way, even if it ceases to work if you stop said service/process.
What the user was looking for was most likely Windows Change Journals. Those track changes like renames of files persistently, no need to have a watcher observing file system events running all the time. Instead, one simply needs to maintain when last looked at the log and continue looking again beginning at that point. At some point a file with an already known ID would have an event of type RENAME and whoever is interested in that event could do the same for its own version of that file. The important thing is to keep track of the used IDs for files of course.
An automatic backup application is one example of a program that must check for changes to the state of a volume to perform its task. The brute force method of checking for changes in directories or files is to scan the entire volume. However, this is often not an acceptable approach because of the decrease in system performance it would cause. Another method is for the application to register a directory notification (by calling the FindFirstChangeNotification or ReadDirectoryChangesW functions) for the directories to be backed up. This is more efficient than the first method, however, it requires that an application be running at all times. Also, if a large number of directories and files must be backed up, the amount of processing and memory overhead for such an application might also cause the operating system's performance to decrease.
To avoid these disadvantages, the NTFS file system maintains an update sequence number (USN) change journal. When any change is made to a file or directory in a volume, the USN change journal for that volume is updated with a description of the change and the name of the file or directory.
https://learn.microsoft.com/en-us/windows/win32/fileio/change-journals

Queue file operations for later when file is locked

I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?
The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.

directory monitoring

What is the best way for me to check for new files added to a directory, I dont think the filesystemwatcher would be suitable as this is not an always on service but a method that runs when my program starts up.
there are over 20,000 files in the folder structure I am monitoring, at present I am checking each file individually to see if the filepath is in my database table, however this is taking around ten minutes and I would like to speed it up is possible,
I can store the date the folder was last checked - is it easy to get all files with createddate > last checked date.
anyone got any Ideas?
Thanks
Mark
Your approach is the only feasible (i.e. file system watcher allows you to see changes, not check on start).
Find out what takes so long. 20.000 checks should not take 10 minutes - maybe 1 maximum. Your program is written slowly. How do you test it?
Hint: do not ask the database, get a list of all files into memory, a list of all filesi n the database, check in memory. 20.000 SQL statements to the database are too slow, this way you need ONE to get the list.
10 minutes seems awfully long for 20,000 files. How are you going about doing the comparison? Your suggestion doesn't account for deleted files either. If you want to remove those from the database, you will have to do a full comparison.
Perhaps the problem is the database round trips. You can retrieve a known file list from the database in large chunks (or all at once), sorted alphabetically. Sort the local file list as well and walk the two lists, processing missing or new entries as you go along.
FileSystemWatcher is not reliable, so even if you could use a service, it would not necessarily work for you.
The two options I can see are:
Keep a list of files you know about and keep comparing to this list. This will allow you to see if files were added, deleted etc. Keep this list in memory, instead of querying the database for each file.
As you suggest, store a timestamp and compare to that.
You can write in somewhere the last timestamp that onfile was created, it is simple and can work for you.
Can you write a service that runs on that machine? The service can then use FileSystemWtcher
Having a FileSystemWatcher service like Kevin Jones suggests is probably the most pragmatic answer, but there are some other options.
You can watch the directory with inotify if you mount it with Samba on a linux box. That of course assumes you don't mind fragmenting your platform, but that's what inotify is there for.
And then more correctly but with correspondingly less chance of you getting a go-ahead, if you're sitting monitoring a directory with 20K files in it it is probably time to evolve your system architecture. Not knowing all that much more about your application, it sounds like a message queue might be worth looking at.

Categories