I have a program that runs as a Windows Service which is processing files in a specific folder.
Since it's a service, it constantly monitors a folder for new files that have been added. Part of the program's job is to perform comparisons of files in the target folder and flag non-matching files.
What I would like to do is to detect a running copy operation and when it is completed, so that a file is not getting prematurely flagged if it's matching file has not been copied over to the target folder yet.
What I was thinking of doing was using the FileSystemWatcher to watch the target folder and see if a copy operation is occurring. If there is, I put my program's main thread to sleep until the copy operation has completed, then proceed to perform the operation on the folder like normal.
I just wanted to get some insight on this approach and see if it is valid. If anyone else has any other unique approaches to this problem, it would be greatly appreciated.
UPDATE:
I apologize for the confusion, when I say target directory, I mean the source folder containing all the files I want to process. A part of the function of my program is to copy the directory structure of the source directory to a destination directory and copy all valid files to that destination directory, preserving the directory structure of the original source directory, i.e. a user may copy folders containing files to the source directory. I want to prevent errors by ensuring that if a new set of folders containing more subfolders and files is copied to the source directory for processing, my program will not start operating on the target directory until the copy process has completed.
Yup, use a FileSystemWatcher but instead of watching for the created event, watch for the changed event. After every trigger, try to open the file. Something like this:
var watcher = new FileSystemWatcher(path, filter);
watcher.Changed += (sender, e) => {
FileStream file = null;
try {
Thread.Sleep(100); // hack for timing issues
file = File.Open(
e.FullPath,
FileMode.Open,
FileAccess.Read,
FileShare.Read
);
}
catch(IOException) {
// we couldn't open the file
// this is probably because the copy operation is not done
// just swallow the exception
return;
}
// now we have a handle to the file
};
This is about the best that you can do, unfortunately. There is no clean way to know that the file is ready for you to use.
What you are looking for is a typical producer/consumer scenario. What you need to do is outlined in 'Producer/consumer queue' section on this page. This will allow you to use multi threading (maybe span a backgroundworker) to copy files so you don't block the main service thread from listening to system events & you can perform more meaningful tasks there - like checking for new files & updating the queue. So on main thread do check for new files on background threads perform the actual coping task. From personal experience (have implemented this tasks) there is not too much performance gain from this approach unless you are running on multiple CPU machine but the process is very clean & smooth + the code is logically separated nicely.
In short, what you have to do is have an object like the following:
public class File
{
public string FullPath {get; internal set;}
public bool CopyInProgress {get; set;} // property to make sure
// .. other properties if desired
}
Then following the tutorial posted above issue a lock on the File object & the queue to update it & copy it. Using this approach you can use this type approaches instead of constantly monitoring for file copy completion.
The important point to realize here is that your service has only one instance of File object per actual physical file - just make sure you (1)lock your queue when adding & removing & (2) lock the actual File object when initializing an update.
EDIT: Above where I say "there is not too much performance gain from this approach unless" I refere to if you do this approach in a single thread compare to #Jason's suggesting this approach must be noticeably faster due to #Jason's solution performing very expensive IO operations which will fail on most cases. This I haven't tested but I'm quite sure as my approach does not require IO operations open(once only), stream(once only) & close file(once only). #Jason approach suggests multiple open,open,open,open operations which will all fail except the last one.
One approach is to attempt to open the file and see if you get an error. The file will be locked if it is being copied. This will open the file in shared mode so it will conflict with an already open write lock on the file:
using(System.IO.File.Open("file", FileMode.Open,FileAccess.Read, FileShare.Read)) {}
Another is to check the file size. It would change over time if the file is being copied to.
It is also possible to get a list of all applications that has opened a certain file, but I don't know the API for this.
I know this is an old question, but here's an answer I spun up after searching for an answer to just this problem. This had to be tweaked a lot to remove some of the proprietary-ness from what I was working on, so this may not compile directly, but it'll give you an idea. This is working great for me:
void BlockingFileCopySync(FileInfo original, FileInfo copyPath)
{
bool ready = false;
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.NotifyFilter = NotifyFilters.LastWrite;
watcher.Path = copyPath.Directory.FullName;
watcher.Filter = "*" + copyPath.Extension;
watcher.EnableRaisingEvents = true;
bool fileReady = false;
bool firsttime = true;
DateTime previousLastWriteTime = new DateTime();
// modify this as you think you need to...
int waitTimeMs = 100;
watcher.Changed += (sender, e) =>
{
// Get the time the file was modified
// Check it again in 100 ms
// When it has gone a while without modification, it's done.
while (!fileReady)
{
// We need to initialize for the "first time",
// ie. when the file was just created.
// (Really, this could probably be initialized off the
// time of the copy now that I'm thinking of it.)
if (firsttime)
{
previousLastWriteTime = System.IO.File.GetLastWriteTime(copyPath.FullName);
firsttime = false;
System.Threading.Thread.Sleep(waitTimeMs);
continue;
}
DateTime currentLastWriteTime = System.IO.File.GetLastWriteTime(copyPath.FullName);
bool fileModified = (currentLastWriteTime != previousLastWriteTime);
if (fileModified)
{
previousLastWriteTime = currentLastWriteTime;
System.Threading.Thread.Sleep(waitTimeMs);
continue;
}
else
{
fileReady = true;
break;
}
}
};
System.IO.File.Copy(original.FullName, copyPath.FullName, true);
// This guy here chills out until the filesystemwatcher
// tells him the file isn't being writen to anymore.
while (!fileReady)
{
System.Threading.Thread.Sleep(waitTimeMs);
}
}
Related
My code is searchcing inside a loop if a *txt file has been created.
If file will not be created after x time then i will throw an exception.
Here is my code:
var AnswerFile = #"C:\myFile.txt";
for (int i = 0; i <= 30; i++)
{
if (File.Exists(AnswerFile))
break;
await Task.Delay(100);
}
if (File.Exists(AnswerFile))
{
}
else
{
}
After the loop i check my file if has been created or not. Loop will expire in 3 seconds, 100ms * 30times.
My code is working, i am just looking for the performance and quality of my code. Is there any better approach than mine? Example should i use FileInfo class instead this?
var fi1 = new FileInfo(AnswerFile);
if(fi1.Exists)
{
}
Or should i use filewatcher Class?
You should perhaps use a FileSystemWatcher for this and decouple the process of creating the file from the process of reacting to its presence. If the file must be generated in a certain time because it has some expiry time then you could make the expiry datetime part of the file name so that if it appears after that time you know it's expired. A note of caution with the FileSystemWatcher - it can sometimes miss something (the fine manual says that events can be missed if large numbers are generated in a short time)
In the past I've used this for watching for files being uploaded via ftp. As soon as the notification of file created appears I put the file into a list and check it periodically to see if it is still growing - you can either look at the filesystem watcher lastwritetime event for this or directly check the size of the file now vs some time ago etc - in either approach it's probably easiest to use a dictionary to track the file and the previous size/most recent lastwritedate event.
After a minute of no growth I consider the file uploaded completely and I process it. It might be wise for you to implement a similar delay if using a file system watcher and the files are arriving by some slow generating method
Why you don't retrieve a list of files name, then search in the list? You can use Directory.GetFiles to get the files list inside a directory then search in this list.
This would be more fixable for you since you will create the list once, and reuse it across the application, instead of calling File.Exists for each file.
Example :
var path = #"C:\folder\"; // set the folder path, which contains all answers files
var ext = "*.txt"; // set the file extension.
// GET filename list (bare name) and make them all lowercase.
var files = Directory.GetFiles(path, ext).Select(x=> x.Substring(path.Length, (x.Length - path.Length) - ext.Length + 1 ).Trim().ToLower()).ToList();
// Search for this filename
var search = "myFile";
// Check
if(files.Contains(search.ToLower()))
{
Console.WriteLine($"File : {search} is already existed.");
}
else
{
Console.WriteLine($"File : {search} is not found.");
}
It was clearly stated that File.Move is atomic operation here: Atomicity of File.Move.
But the following code snippet results in visibility of moving the same file multiple times.
Does anyone know what is wrong with this code?
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
namespace FileMoveTest
{
class Program
{
static void Main(string[] args)
{
string path = "test/" + Guid.NewGuid().ToString();
CreateFile(path, new string('a', 10 * 1024 * 1024));
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
var task = Task.Factory.StartNew(() =>
{
try
{
string newPath = path + "." + Guid.NewGuid();
File.Move(path, newPath);
// this line does NOT solve the issue
if (File.Exists(newPath))
Console.WriteLine(string.Format("Moved {0} -> {1}", path, newPath));
}
catch (Exception e)
{
Console.WriteLine(string.Format(" {0}: {1}", e.GetType(), e.Message));
}
});
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
static void CreateFile(string path, string content)
{
string dir = Path.GetDirectoryName(path);
if (!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
using (FileStream f = new FileStream(path, FileMode.OpenOrCreate))
{
using (StreamWriter w = new StreamWriter(f))
{
w.Write(content);
}
}
}
}
}
The paradoxical output is below. Seems that file was moved multiple times onto different locations. On the disk only one of them is present. Any thoughts?
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.0018d317-ed7c-4732-92ac-3bb974d29017
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.3965dc15-7ef9-4f36-bdb7-94a5939b17db
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.fb66306a-5a13-4f26-ade2-acff3fb896be
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.c6de8827-aa46-48c1-b036-ad4bf79eb8a9
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
The resulting file is here: eb85560d-8c13-41c1-926a-6871be030742.fb66306a-5a13-4f26-ade2-acff3fb896be
UPDATE. I can confirm that checking File.Exists also does NOT solve the issue - it can report that single file was really moved into several different locations.
SOLUTION. The solution I end up with is following: Prior to operations with source file create special "lock" file, if it succeeded then we can be sure that only this thread got exclusive access to the file and we are safe to do anything we want. The below is right set of parameters to create suck "lock" file.
File.Open(lockPath, FileMode.CreateNew, FileAccess.Write);
Does anyone know what is wrong with this code?
I guess that depends on what you mean by "wrong".
The behavior you're seeing is not IMHO unexpected, at least if you're using NTFS (other file systems may or may not behave similarly).
The documentation for the underlying OS API (MoveFile() and MoveFileEx() functions) is not specific, but in general the APIs are thread-safe, in that they guarantee the file system will not be corrupted by concurrent operations (of course, your own data could be corrupted, but it will be done in a file-system-coherent way).
Most likely what is occurring is that as the move-file operation proceeds, it does so by first getting the actual file handle from the given directory link to it (in NTFS, all "file names" that you see are actually hard links to an underlying file object). Having obtained that file handle, the API then creates a new file name for the underlying file object (i.e. as a hard link), and then deletes the previous hard link.
Of course, as this progresses, there is a window during the time between a thread having obtained the underlying file handle but before the original hard link has been deleted. This allows some but not all of the other concurrent move operations to appear to succeed. I.e. eventually the original hard link doesn't exist and further attempts to move it won't succeed.
No doubt the above is an oversimplification. File system behaviors can be complex. In particular, your stated observation is that you only wind up with a single instance of the file when all is said and done. This suggests that the API does also somehow coordinate the various operations, such that only one of the newly-created hard links survives, probably by virtue of the API actually just renaming the associated hard link after retrieving the file object handle, as opposed to creating a new one and deleting the old one (implementation detail).
At the end of the day, what's "wrong" with the code is that it is intentionally attempting to perform concurrent operations on a single file. While the file system itself will ensure that it remains coherent, it's up to your own code to ensure that such operations are coordinated so that the results are predictable and reliable.
I am new to c# so please forgive my ignorance, I am running a fileSystemWatcher on a text file. And it is working fine, I can do some simple tasks after the file has changed. All but what I want to do.
I am trying to read the last line of the text file that has changed with this code
public void File_Changed( object source, FileSystemEventArgs e )
{
string MACH1 = File.ReadText(#"C:\MACHINE_1.txt").Last();
if (MACH1=="SETUP")
{
MACHINE1IND.BackColor = Color.Green;
}
else
{
MACHINE1IND.BackColor = Color.Red;
}
}
It works fine inside a button but not after file watcher.
Says it cannot find file?
One thing to be aware of is that the FSW can issue multiple change notifications during a save operation. You have no way of knowing when the save is complete. As a result, you need to always wrap your code in a try..catch block and support retry after a timeout to allow the file write to be completed. Typically, I will try to move the file to a temp location where I will do my processing. If the move fails, wait a couple seconds and try again.
As Jim Wooley explains in his answer, the file operation might still be in progress, when FSW fires a Created or Changed event. If FSW is used to communicate between two applications and you are in control of the "sending" application as well, you can solve the problem as follows:
Write the information to a temporary file. Close the file. Rename the temporary file and give it a definitive name.
In the other application (the receiver) watch for the Renamed event using the FileSystemWatcher. The renamed file is guaranteed be complete.
You'll have to check if the file exists before accessing it.
public void File_Changed(object source, FileSystemEventArgs e)
{
string filePath = #"C:\MACHINE_1.txt";
if(!File.Exists(filePath)) //Checks if file exists
return;
string MACH1 = File.ReadText(filePath).Last();
if (MACH1=="SETUP")
{
MACHINE1IND.BackColor = Color.Green;
}
else
{
MACHINE1IND.BackColor = Color.Red;
}
}
I'm writing a back up solution (of sorts). Simply it copies a file from location C:\ and pastes it to location Z:\
To ensure the speed is fast, before copying and pasting it checks to see if the original file exists. If it does, it performs a few 'calculations' to work out if the copy should continue or if the backup file is up to date. It is these calculations I'm finding difficult.
Originally, I compared the file size but this is not good enough because it would be very possible to change a file and it to be the same size (for example saving the character C in notepad is the same size as if I saved the Character T).
So, I need to find out if the modified date differs. At the moment, I get the file info using the FileInfo class but after reviewing all the fields there is nothing which appears to be suitable.
How can I check to ensure that I'm copying files which have been modified?
EDIT
I have seen suggestions on SO to use MD5 checksums, but I'm concerned this may be a problem as some of the files I'm comparing will be up to 10GB
Going by modified date will be unreliable - the computer clock can go backwards when it synchronizes, or when manually adjusted. Some programs might not behave well when modifying or copying files in terms of managing the modified date.
Going by the archive bit might work in a controlled environment but what happens if another piece of software is running that uses the archive bit as well?
The Windows archive bit is evil and must be stopped
If you want (almost) complete reliability then what you should do is store a hash value of the last backed up version using a good hashing function like SHA1, and if the hash value changes then you upload the new copy.
Here is the SHA1 class along with a code sample on the bottom:
http://msdn.microsoft.com/en-us/library/system.security.cryptography.sha1.aspx
Just run the file bytes through it and store the hash value. Pass a FileStream to it instead of loading your file into memory with a byte array to reduce memory usage, especially for large files.
You can combine this with modified date in various ways to tweak your program as needed for speed and reliability. For example, you can check modified dates for most backups and periodically run a hash checker that runs while the system is idle to make sure nothing got missed. Sometimes the modified date will change but the file contents are still the same (i.e. got overwritten with the same data), in which case you can avoid resending the whole file after you recompute the hash and realize it is still the same.
Most version control systems use some kind of combined approach with hashes and modified dates.
Your approach will generally involve some kind of risk management with a compromise between performance and reliability if you don't want to do a full backup and send all the data over each time. It's important to do "full backups" once in a while for this reason.
You can compare files by their hashes:
private byte[] GetFileHash(string fileName)
{
HashAlgorithm sha1 = HashAlgorithm.Create();
using(FileStream stream = new FileStream(fileName,FileMode.Open,FileAccess.Read))
return sha1.ComputeHash(stream);
}
If content was changed, hashes will be different.
You may like to check out the FileSystemWatcher class.
"This class lets you monitor a directory for changes and will fire an
event when something is modified."
Your code can then handle the event and process the file.
Code source - MSDN:
// Create a new FileSystemWatcher and set its properties.
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];
/* Watch for changes in LastAccess and LastWrite times, and
the renaming of files or directories. */
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
| NotifyFilters.FileName | NotifyFilters.DirectoryName;
// Only watch text files.
watcher.Filter = "*.txt";
// Add event handlers.
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
Generally speaking, you'd let the OS take care of tracking whether a file has changed or not.
If you use:
File.GetAttributes
And check for the archive flag, this will tell you if the file has changed since it was last archived. I believe XCOPY and similar reset this flag once it has done the copy, but you may need to take care of this yourself.
You can easily test the flag in DOS using:
dir /aa yourfilename
Or just add the attributes column in windows explorer.
The file archive flag is normally used by backup programs to check whether a file needs backing up. When Windows modifies or creates a file, it sets the archive flag (see here). Check whether the archive flag is set to decide whether the file needs backing up:
if ((File.GetAttributes(fileName) & FileAttributes.Archive) == FileAttributes.Archive)
{
// Archive file.
}
After backing up the file, clear the archive flag:
File.SetAttributes(fileName, File.GetAttributes(fileName) & ~FileAttributes.Archive);
This assumes no other programs (e.g., system backup software) are clearing the archive flag.
From this article get the Crc32 class
Calculating CRC-32 in C# and .NET
Pass your file path to this function...
It returns a CRC value... compare it to your file that already exists... if the CRC's are different then the file is changed.
internal Int32 GetCRC(string filepath)
{
Int32 ret = 0;
StringBuilder hash = new StringBuilder();
try
{
Crc32 crc32 = new Crc32();
using (System.IO.FileStream fs = File.Open(filepath, FileMode.Open, FileAccess.Read, FileShare.None))
foreach (byte b in crc32.ComputeHash(fs)) hash.Append(b.ToString("x2").ToLower());
ret = Int32.Parse(hash.ToString(), System.Globalization.NumberStyles.HexNumber);
}
catch (Exception ex)
{
string msg = (ex.InnerException == null) ? ex.Message : ex.InnerException.Message;
Console.WriteLine($"FILE ERROR: {msg}");
ret = 0;
}
finally
{
hash.Clear();
hash = null;
}
return ret;
}
My application use "FileSystemWatcher()" to raise an event when a TXT file is created by an "X" application and then read its content.
the "X" application create a file (my application detect it successfully) but it take some time to fill the data on it, so the this txt file cannot be read at the creation time, so im
looking for something to wait until the txt file come available to reading. not a static delay but something related to that file.
any help ? thx
Create the file like this:
myfile.tmp
Then when it's finished, rename it to
myfile.txt
and have your filewatcher watch for the .txt extension
The only way I have found to do this is to put the attempt to read the file in a loop, and exit the loop when I don't get an exception. Hopefully someone else will come up with a better way...
bool FileRead = false;
while (!FileRead)
{
try
{
// code to read file, which you already know
FileRead = true;
}
catch(Exception)
{
// do nothing or optionally cause the code to sleep for a second or two
}
}
You could track the file's Changed event, and see if it's available for opening on change. If the file is still locked, just watch for the next change event.
You can open and read a locked file like this
using (var stream = new FileStream(#"c:\temp\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
using (var file = new StreamReader(stream)) {
while (!file.EndOfStream) {
var line = file.ReadLine();
Console.WriteLine(line);
}
}
}
However, make sure your file writer flushes otherwise you may not see any changes.
The application X should lock the file until it closes it. Is application X also a .NET application and can you modify it? In that case you can simply use the FileInfo class with the proper value for FileShare (in this case FileShare.Read).
If you have no control over application X, the situation becomes a little more complex. But then you can always attempt to open the file exclusively via the same FileInfo.Open method. Provide FileShare.None in that case. It will attempt to open the file exclusively and will fail if the file is still in use. You can perform this action inside a loop until the file is closed by application X and ready to be read.
We have a virtual printer for creating pdf documents, and I do something like this to access that document after it's sent to the printer:
using (FileSystemWatcher watcher = new FileSystemWatcher(folder))
{
if(!File.Exists(docname))
for (int i = 0; i < 3; i++)
watcher.WaitForChanged(WatcherChangeTypes.Created, i * 1000);
}
So I wait for a total of 6 seconds (some documents can take a while to print but most come very fast, hence the increasing wait time) before deciding that something has gone awry.
After this, I also read in a for loop, in just the same way that I wait for it to be created. I do this just in case the document has been created, but not released by the printer yet, which happens nearly every time.
You can use the same class to be notified when file changes.
The Changed event is raised when changes are made to the size, system attributes, last write time, last access time, or security permissions of a file or directory in the directory being monitored.
So I think you can use that event to check if file is readable and open it if it is.
If you have a DB at your disposal I would recommend using a DB table as a queue with the file names and then monitor that instead. nice and transactional.
You can check if file's size has changed. Although this will require you to poll it's value with some frequency.
Also, if you want to get the data faster, you can .Flush() while writing, and make sure to .Close() stream as soon as you will finish writing to it.