Concurrent file usage in C#

Concurrent file usage in C# - c#

I have one application that will read from a folder and wait for a file to appear in this folder. When this file appear, the application shall read the content, execute a few functions to external systems with the data from the file and then delete the file (and in turn wait for next file).
Now, I want to run this application on two different machines but both listen in the same folder. So it’s the exact same application but two instances. Let’s call it instance A and instance B.
So when a new file appear, both A and B will find the file, and both will try to read it. This will lead to some sort of race condition between the two instances. I want that if A started read the file before B, B shall simply skip the file and let A process and delete it. Same thing if B finds the file first, A shall do nothing.
Now how can I implement this, setting a lock on the file is not sufficient I guess because lets say A started to read the file, it is then locked by A, then A will unlock it in order to delete it. During that time B might try to read the file. In that case the file is processed twice, which is not acceptable.
So to summarize, I have two instances of one program and one folder / network share, whenever a file appear in the folder. I want EITHER instance A or instance B process the file. NEVER both, any ideas of how I can implement such functionality in C#?

The correct way to do this is to open the file with a write lock (e.g., System.IO.FileAccess.Write, and a read share (e.g., System.IO.FileShare.Read). If one of the processes tries to open the file when the other process already has it open, then the open command will throw an exception, which you need to catch and handle as you see fit (e.g., log and retry). By using a write lock for the file open, you guarantee that the opening and locking are atomic and therefore synchronised between the two processes, and there is no race condition.
So something like this:
try
{
using (FileStream fileStream = new FileStream(FileName, FileMode.Open, FileAccess.Write, FileShare.Read))
{
// Read from or write to file.
}
}
catch (IOException ex)
{
// The file is locked by the other process.
// Some options here:
// Log exception.
// Ignore exception and carry on.
// Implement a retry mechanism to try opening the file again.
}
You can use FileShare.None if you do not want other processes to be able to access the file at all when your program has it open. I prefer FileShare.Read because it allows me to monitor what is happening in the file (e.g., open it in Notepad).
To cater for deleting the file is a similar principle: first rename/move the file and catch the IOException that occurs if the other process has already renamed it/moved it, then open the renamed/moved file. You rename/move the file to indicate that the file is already being processed and should be ignored by the other process. E.g., rename it with a .pending file extension, or move it to a Pending directory.
try
{
// This will throw an exception if the other process has already moved the file -
// either FileName no longer exists, or it is locked.
File.Move(FileName, PendingFileName);
// If we get this far we know we have exclusive access to the pending file.
using (FileStream fileStream = new FileStream(PendingFileName, FileMode.Open, FileAccess.Write, FileShare.Read))
{
// Read from or write to file.
}
File.Delete(PendingFileName);
}
catch (IOException ex)
{
// The file is locked by the other process.
// Some options here:
// Log exception.
// Ignore exception and carry on.
// Implement a retry mechanism to try moving the file again.
}
As with opening files, File.Move is atomic and protected by locks, therefore it is guaranteed that if you have multiple concurrent threads/processes attempting to move the file, only one will succeed and the others will throw an exception. See here for a similar question: Atomicity of File.Move.

I can think of two quick solutions to this;
Distribute the load
Have your 2 processes so that they only work on some files. How you do this could be based on the filename, or the date/time. E.g. Process 1 reads files which have a time stamp ending in an odd number, and process 2 reads the ones with an even number.
Database as lock
The other alternative is that you use some kind of database as a lock.
Process 1 reads a file and does an insert into a database table based on the filename (must be unique). If the insert works, then it is responsible for the file and continues processing it, else if the insert fails, then the other process has already inserted it so it is responsible and process 1 ignores the file.
The database has to be accessible to both processes, and this will incur some overhead. But might be a better option if you want to scale this out to more processes.

So if you are going to apply lock you can try to use file name as a lock object. You can try to rename file in special way (like by adding dot in front of file name)
and first service that was lucky to rename file will continue with it. And second one (slow) will get exception that file does not exist.
And you have to add check to your file processing logic that service will not try to "lock" file that is "locked" already (have a name started with dot).
UPD may be it is better to include special set of characters (like a mark) and some service identificator (machinename concatenated with PID)
because i'm not sure how file rename will work in the concurrent mode.
So if you have got file.txt in the shared folder
first of all you have to check is there .lock string in the file name
already
if no service can try to rename it to the file.txt.lockDevhost345 (where .lock - special marker, Devhost - name of current computer and 345 is a PID (process identifier)
then service have to check is there file.txt.lockDevhost345 file
available
if yes - it was locked by current service instance and can be used
if no - it was "stolen" by concurrent service so it should not be processed.
If you do not have write permission you can use another network share and try to create additional file lock marker, for example for file.txt service can try to create (and hold write lock) new file like file.txt.lock First service that has created lock file is taking care about original file and removes lock only when original file was processed.

Instead getting deep in file access change, I would suggest to use a functionality-server approach. Additional argument for this approach is file usage from different computers. This particular thing goes deep in access and permission administration.
My suggestion is about to have a single point of file access (Files repository) that implements the following functionality:
Get files list. (gets a list of available files)
Checkout file. (proprietary grab access to the file so that the owner of the checkout was authorized to modify the file)
Modify file. (update file content or delete it)
Check-in changes to the repository
There are a lot of ways to implement the approach. (Use API of a files a file versioning system; implement a service; use a database, ...)
An easy one (requires a database that supports transactions, triggers or stored procedures)
Get files list. (SQL SELECT from an "available files table")
Checkout file. (SQL UPDATE or Update stored procedure. By update in the trigger or in the stored procedure define an "raise error" state in case of multiple checkout)
Modify file. (update file content or delete it. Please keep in mind that is till better to do over a functionality "server". In this case you would need to implement security policy once)
Check-in changes to the repository (Release the "Checked Out" filed of the particular file entry. Implement the Check-In in transaction)

Related

Best file mutex in .NET 3.5

I want to use some mutex on files, so any process won't touch certain files before other stop using them. How can I do it in .NET 3.5? Here are some details:
I have some service, which checks every period of time if there are any files/directories in certain folder and if there are, service's doing something with it.
My other process is responsible for moving files (and directories) into certain folder and everything works just fine.
But I'm worrying because there can be situation, when my copying process will copy the files to certain folder and in the same time (in the same milisecond) my service will check if there are some files, and will do something with them (but not with all of them, because it checked during the copying).
So my idea is to put some mutex in there (maybe one extra file can be used as a mutex?), so service won't check anything until copying is done.
How can I achieve something like that in possibly easy way?
Thanks for any help.

The canonical way to achieve this is the filename:
Process A copies the files to e.g. "somefile.ext.noprocess" (this is non-atomic)
Process B ignores all files with the ".noprocess" suffix
After Process B has finished copying, it renames the file to "somefile.ext"
Next time Process B checks, it sees the file and starts processing.
If you have more than one file, that have to be processd together (or none), you need to adapt this scheme to an additional transaction file containing the file names for the transaction: Only if this file exists and has the correct name, must process B read it and process the files mentioned in it.

Your problem really is not of mutual exclusion, but of atomicity. Copying multiple files is not an atomic operation, and so it is possible to observe the files in a half-copied state which you'd like to prevent.
To solve your problem, you could hinge your entire operation on a single atomic file system operation, for example renaming (or moving) of a folder. That way no one can observe an intermediate state. You can do it as follows:
Copy the files to a folder outside the monitored folder, but on the same drive.
When the copying operation is complete, move the folder inside the monitored folder. To any outside process, all the files would appear at once, and it would have no chance to see only part of the files.

Lock file exclusively then delete/move it

I'm implementing a class in C# that is supposed to monitor a directory, process the files as they are dropped then delete (or move) the processed file as soon as processing is complete. Since there can be multiple threads running this code, the first one that picks up the file, locks it exclusively, so no other threads will read the same file and no external process or user can access in any way. I would like to keep the lock until the file is deleted/moved, so there's no risk of another thread/process/user accessing it.
So far, I tried 2 implementation options, but none of them works as I want.
Option 1
FileStream fs = file.Open(FileMode.Open, FileAccess.Read, FileShare.Delete);
//Read and process
File.Delete(file.FullName); //Or File.Move, based on a flag
fs.Close();
Option 2
FileStream fs = file.Open(FileMode.Open, FileAccess.Read, FileShare.None);
//Read and process
fs.Close();
File.Delete(file.FullName); //Or File.Move, based on a flag
The issue with Option 1 is that other processes can access the file (they can delete, move, rename) while it should be fully locked.
The issue with Option 2 is that the file is unlocked before being deleted, so other processes/threads can lock the file before the delete happens, so the delete will fail.
I was looking for some API that can perform the delete using the file handle I already have exclusive access.
Edit
The directory being monitored resides in a pub share, so other users and processes have access to it.
The issue is not managing the locks within my own process. The issue I'm trying to solve is how to lock a file exclusively then move/delete it without releasing the lock

Two solutions come to mind.
The first and simplest is to have the thread rename the file to something that the other threads won't touch. Something like "filename.dat.<unique number>", where <unique number> is something thread-specific. Then the thread can party on the file all it wants.
If two threads get the file at the same time, only one of them will be able to rename it. You'll have to handle the IOException that occurs in the other threads, but that shouldn't be a problem.
The other way is to have a single thread monitoring the directory and placing file names into a BlockingCollection. Worker threads take items from that queue and process them. Because only one thread can get that particular item from the queue, there is no contention.
The BlockingCollection solution is a little bit (but only a little bit) more complicated to set up, but should perform better than a solution that has multiple threads monitoring the same directory.
Edit
Your edited question changes the problem quite a bit. If you have a file in a publicly accessible directory, it's at risk of being viewed, modified, or deleted at any point between the time it's placed there and the time your thread locks it.
Since you can't move or delete a file while you have it open (not that I'm aware of), your best bet is to have the thread move the file to a directory that's not publicly accessible. Ideally to a directory that's locked down so that only the user under which your application runs has access. So your code becomes:
File.Move(sourceFilename, destFilename);
// the file is now in a presumably safe place.
// Assuming that all of your threads obey the rules,
// you have exclusive access by agreement.
Edit #2
Another possibility would be to open the file exclusively and copy it using your own copy loop, leaving the file open when the copy is done. Then you can rewind the file and do your processing. Something like:
var srcFile = File.Open(/* be sure to specify exclusive access */);
var destFile = File.OpenWrite(/* destination path */);
// copy the file
var buffer = new byte[32768];
int bytesRead = 0;
while ((bytesRead = srcFile.Read(buffer, 0, buffer.Length)) != 0)
{
destFile.Write(buffer, 0, bytesRead);
}
// close destination
destFile.Close();
// rewind source
srcFile.Seek(0, SeekOrigin.Start);
// now read from source to do your processing.
// for example, to get a StreamReader, just pass the srcFile stream to the constructor.
You can process and then copy, sometimes. It depends on if the stream stays open when you're finished processing. Typically, code does something like:
using (var strm = new StreamReader(srcStream, ...))
{
// do stuff here
}
That ends up closing the stream and the srcStream. You'd have to write your code like this:
using (var srcStream = new FileStream( /* exclusive access */))
{
var reader = new StreamReader(srcStream, ...);
// process the stream, leaving the reader open
// rewind srcStream
// copy srcStream to destination
// close reader
}
Doable, but clumsy.
Oh, and if you want to eliminate the potential of somebody reading the file before you can delete it, just truncate the file at 0 before you close it. As in:
srcStream.Seek(0, SeekOrigin.Begin);
srcStream.SetLength(0);
That way if somebody does get to it before you get around to deleting it, there's nothing to modify, etc.

Here is the most robust way I know of that will even work correctly if you have multiple processes on multiple servers working with these files.
Instead of locking the files themselves, create a temporary file for locking, this way you can unlock/move/delete the original file without problems, but still be sure that at least any copies of your code running on any server/thread/process will not try to work with the file at the same time.
Psuedo code:
try
{
// get an exclusive cross-server/process/thread lock by opening/creating a temp file with no sharing allowed
var lockFilePath = $"{file}.lck";
var lockFile = File.Open(lockFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
try
{
// open file itself with no sharing allowed, in case some process that does not use our locking schema is trying to use it
var fileHandle = File.Open(file, FileMode.Open, FileAccess.Read, FileShare.None);
// TODO: add processing -- we have exclusive access to the file, and also the locking file
fileHandle.Close();
// at this point it is possible for some other process that does not use our locking schema to lock the file before we
// move it, causing us to process this file again -- we would always have to handle issues where we failed to move
// the file anyway (maybe we just lost power, or crashed?) so we had to design around this no matter what
File.Move(file, archiveDestination);
}
finally
{
lockFile.Close();
try
{
File.Delete(lockFilePath);
}
catch (Exception ex)
{
// another process opened locked file after we closed it, before it was deleted -- safely ignore, other process will delete lock file
}
}
}
catch (Exception ex)
{
// another process already has exclusive access to the lock file, we don't need to do anything
// or we failed while processing, in which case we did not move the file so it will be tried again by this process or another
}
One nice thing about this pattern is it can also be used for times when locking is supported by the file storage. For example, if you were trying to process files on an FTP/SFTP server, you could make your temporary locking files use a normal drive (or SMB share) -- since the locking files do not have to be in the same location as the files themselves.
I can't take credit for the idea, it's been around longer than the PC, and used by plenty of apps like Microsoft Word, Excel, Access, and most older database systems. Read: well tested.

The file system itself is volatile in nature so it's very difficult to try and do what you want. This is a classic race condition in the file system. With option 2, you could alternatively move the file to a "processing" or staging directory that you create before doing your work. YMMV on performance but you could at least benchmark it to see if it could fit your needs.

You may need to implement some form of shared / synchronised List from the spawning thread. If the parent thread keeps track of files by periodically checking the directory, it can then hand them off to child threads and that'll eliminate the locking problem.

This solution, thought not 100% water tight, may well get you what you need. (It did for us.)
Use two locks that together give you exclusive access to the file. When you are ready to delete the file, you release one of them, then deleted the file. The remaining lock will still prevent most other processes from obtaining a lock.
FileInfo file = ...
// Get read access to the file and only allow other processes write or delete access.
// Keeps others from locking the file for reading.
var readStream = file.Open(FileMode.Open, FileAccess.Read, FileShare.Write | FileShare.Delete);
FileStream preventWriteAndDelete;
try
{
// Now try to get a lock on than only allows others to read the file. We can acquire both
// locks because they each allow the other. Together, they give us exclusive access to the
// file.
preventWriteAndDelete = file.Open(FileMode.Open, FileAccess.Write, FileShare.Read);
}
catch
{
// We couldn't get the second lock, so release the first.
readStream.Dispose();
throw;
}
Now you can read the file (with readStream). If you need to write to it, you'll have to do that with the other stream.
When you are ready to delete the file, you first release the lock that prevents writing and deletion while still holding the lock that prevents reading.
preventWriteAndDelete.Dispose(); // Release lock that prevents deletion.
file.Delete();
// This lock specifically allowed deletion, but with the file gone, we're done with it now.
readStream.Dispose();
The only opportunity for another process (or thread) to get a lock on the file is if it requests a shared write lock, one which gives it write-only access and also allows others to write to the file. This is not very common. Most processes attempt either a shared read lock (read access allowing others to read, but not write or delete) or an exclusive write lock (write or read/write access with no sharing). Both of these common scenarios will fail. A shared read/write lock (requesting read/write access and allowing others the same) will also fail.
In addition, the window of opportunity for a process to request and acquire a shared write lock is very small. If a process is hammering away trying to acquire such a lock, then it may succeed, but few applications do this. So unless you have such an application in your scenario, this strategy should meet your needs.
You can also use the same strategy to move the file.
preventWriteAndDelete.Dispose();
file.MoveTo(destination);
readStream.Dispose();

You could use the MoveFileEx API function to mark the file for deletion upon next reboot. Source

Error: The process cannot access the file '...' because it is being used by another process

I have a function that always creates a directory and put in it some files (images).
When the code runs first time, no problem. Second time (always), it gets an error when I have to delete the directory (because I want to recreate it to put in it the images). The error is "The process cannot access the file '...' because it is being used by another process". The only process that access to this files is this function.
It's like the function "doesn't leave" the files.
How can I resolve this with a clear solution?
Here a part of the code:
String strPath = Environment.CurrentDirectory.ToString() + "\\sessionPDF";
if (Directory.Exists(strPath))
Directory.Delete(strPath, true); //Here I get the error
Directory.CreateDirectory(strPath);
//Then I put the files in the directory

If your code or another process is serving up the images, they will be locked for an indefinite amount of time. If it's IIS, they're locked for a short time while being served. I'm not sure about this, but if Explorer is creating thumbs for the images, it may lock the files while it does that. It may be for a split second, but if your code and that process collide, it's a race condition.
Be sure you release your locks when you're done. If the class implements IDisposable, wrap a using statement around it if you're not doing extensive work on that object:
using (var Bitmap = ... || var Stream = ... || var File = ...) { ... }
...which will close the object afterwards and the file will not be locked.

Just going out on a limb here without seeing the code that dumps the files, but if you're using FileStreams or Bitmap objects, I would double check to ensure you are properly disposing of all of those objects before running the second method.

The only clear solution on this case is keep track of who is handling access to the directory and fix the bug, by releasing that access.
If the object/resource that handling access is 3rd party, or by any means is not possible to change or access, it's a time to revise an architecture, to handle IO access in a different way.
Hope this helps.

Sounds like you are not releasing the file handle when the file is created. Try doing all of your IO within the using statement, that way the file will be released automatically when you are finished with it.
http://msdn.microsoft.com/en-us/library/yh598w02%28v=vs.80%29.aspx

I have seen cases where a virus scanner will scan the new file and prevent the file from being deleted, though that is highly unlikely.
Be sure to .Dispose of all IDisposable objects and make sure that nothing has changed your Environment.CurrentDirectory to the directory you want to delete.

Lock a file while retaining the ability to read/append/write/truncate in the same thread?

I have a file containing, roughly speaking, the state of the application.
I want to implement the following behaviour:
When the application is started, lock the file so that no other applications (or user itself) will be able to modify it;
Read the previous application state from the file;
... do work ...
Update the file with a new state (which, given the format of the file, involves rewriting the entire file; the length of the file may decrease after the operation);
... do work ...
Update the file again
... do work ...
If the work failed (application crashed), the lock is taken off, and the content of the file is left as it was after the previous unit of work executed.
It seems that, to rewrite the file, one should open it with a Truncate option; that means one should open a new FileStream each time they want to rewrite a file. So it seems that behavior I want could only achieved by such a dirty way:
When the application is started, read the file, then open the FileStream with the FileShare.Read;
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
On the Dispose, close the handle opened previously.
Such a way has some disadvantages: extra FileStream are opened; the file integrity is not guaranteed between FileStream close and FileStream open; the code is much more complicated.
Is there any other way, lacking these disadvantages?

Don't close and reopen the file. Instead, use FileStream.SetLength(0) to truncate the file to zero length when you want to rewrite it.
You might (or might not) also need to set FileStream.Position to zero. The documentation doesn't make it clear whether SetLength moves the file pointer or not.

Why don't you take exclusive access to the file when application starts, and create an in-memory cache of the file that can be shared across all threads in the process while your actual file remains locked for OS. You can use lock(memoryStream) to avoid concurrency issues. when you are done updating the local in-memory version of file just update the file on disk and release lock on it.
Regards.

Reliable file saving (File.Replace) in a busy environment

I am working on server software that periodically needs to save data to disk. I need to make sure that the old file is overwritten, and that the file cannot get corrupted (e.g. only partially overwritten) in case of unexpected circumstances.
I've adopted the following pattern:
string tempFileName = Path.GetTempFileName();
// ...write out the data to temporary file...
MoveOrReplaceFile(tempFileName, fileName);
...where MoveOrReplaceFile is:
public static void MoveOrReplaceFile( string source, string destination ) {
if (source == null) throw new ArgumentNullException("source");
if (destination == null) throw new ArgumentNullException("destination");
if (File.Exists(destination)) {
// File.Replace does not work across volumes
if (Path.GetPathRoot(Path.GetFullPath(source)) == Path.GetPathRoot(Path.GetFullPath(destination))) {
File.Replace(source, destination, null, true);
} else {
File.Copy(source, destination, true);
}
} else {
File.Move(source, destination);
}
}
This works well as long as the server has exclusive access to files. However, File.Replace appears to be very sensitive to external access to files. Any time my software runs on a system with an antivirus or a real-time backup system, random File.Replace errors start popping up:
System.IO.IOException: Unable to remove the file to be replaced.
Here are some possible causes that I've eliminated:
Unreleased file handles: using() ensures that all file handles are released as soon as possible.
Threading issues: lock() guards all access to each file.
Different disk volumes: File.Replace() fails when used across disk volumes. My method checks this already, and falls back to File.Copy().
And here are some suggestions that I've come across, and why I'd rather not use them:
Volume Shadow Copy Service: This only works as long as the problematic third-party software (backup and antivirus monitors, etc) also use VSS. Using VSS requires tons of P/Invoke, and has platform-specific issues.
Locking files: In C#, locking a file requires maintaining a FileStream open. It would keep third-party software out, but 1) I still won't be able to replace the file using File.Replace, and 2) Like I mentioned above, I'd rather write to a temporary file first, to avoid accidental corruption.
I'd appreciate any input on either getting File.Replace to work every time or, more generally, saving/overwriting files on disk reliably.

You really want to use the 3rd parameter, the backup file name. That allows Windows to simply rename the original file without having to delete it. Deleting will fail if any other process has the file opened without delete sharing, renaming is never a problem. You could then delete it yourself after the Replace() call and ignore an error. Also delete it before the Replace() call so the rename won't fail and you'll cleanup failed earlier attempts. So roughly:
string backup = destination + ".bak";
File.Delete(backup);
File.Replace(source, destination, backup, true);
try {
File.Delete(backup);
}
catch {
// optional:
filesToDeleteLater.Add(backup);
}

There are several possible approaches, here some of them:
Use a "lock" file - a temporary file that is created before the operation and indicates other writers (or readers) that the file is being modified and thus exclusively locked. After the operation complete - remove the lock file. This method assumes that the file-creation command is atomic.
Use NTFS transactional API (if appropriate).
Create a link to the file, write the changed file under a random name (for example Guid.NewGuid()) - and then remap the link to the new file. All readers will access the file through the link (which name is known).
Of course all 3 approaches have their own drawbacks and advantages

If the software is writing to an NTFS partition then try using Transactional NTFS. You can use AlphFS for a .NET wrapper to the API. That is probably the most reliable way to write files and prevent corruption.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.