Reliable file saving (File.Replace) in a busy environment

Reliable file saving (File.Replace) in a busy environment - c#

I am working on server software that periodically needs to save data to disk. I need to make sure that the old file is overwritten, and that the file cannot get corrupted (e.g. only partially overwritten) in case of unexpected circumstances.
I've adopted the following pattern:
string tempFileName = Path.GetTempFileName();
// ...write out the data to temporary file...
MoveOrReplaceFile(tempFileName, fileName);
...where MoveOrReplaceFile is:
public static void MoveOrReplaceFile( string source, string destination ) {
if (source == null) throw new ArgumentNullException("source");
if (destination == null) throw new ArgumentNullException("destination");
if (File.Exists(destination)) {
// File.Replace does not work across volumes
if (Path.GetPathRoot(Path.GetFullPath(source)) == Path.GetPathRoot(Path.GetFullPath(destination))) {
File.Replace(source, destination, null, true);
} else {
File.Copy(source, destination, true);
}
} else {
File.Move(source, destination);
}
}
This works well as long as the server has exclusive access to files. However, File.Replace appears to be very sensitive to external access to files. Any time my software runs on a system with an antivirus or a real-time backup system, random File.Replace errors start popping up:
System.IO.IOException: Unable to remove the file to be replaced.
Here are some possible causes that I've eliminated:
Unreleased file handles: using() ensures that all file handles are released as soon as possible.
Threading issues: lock() guards all access to each file.
Different disk volumes: File.Replace() fails when used across disk volumes. My method checks this already, and falls back to File.Copy().
And here are some suggestions that I've come across, and why I'd rather not use them:
Volume Shadow Copy Service: This only works as long as the problematic third-party software (backup and antivirus monitors, etc) also use VSS. Using VSS requires tons of P/Invoke, and has platform-specific issues.
Locking files: In C#, locking a file requires maintaining a FileStream open. It would keep third-party software out, but 1) I still won't be able to replace the file using File.Replace, and 2) Like I mentioned above, I'd rather write to a temporary file first, to avoid accidental corruption.
I'd appreciate any input on either getting File.Replace to work every time or, more generally, saving/overwriting files on disk reliably.

You really want to use the 3rd parameter, the backup file name. That allows Windows to simply rename the original file without having to delete it. Deleting will fail if any other process has the file opened without delete sharing, renaming is never a problem. You could then delete it yourself after the Replace() call and ignore an error. Also delete it before the Replace() call so the rename won't fail and you'll cleanup failed earlier attempts. So roughly:
string backup = destination + ".bak";
File.Delete(backup);
File.Replace(source, destination, backup, true);
try {
File.Delete(backup);
}
catch {
// optional:
filesToDeleteLater.Add(backup);
}

There are several possible approaches, here some of them:
Use a "lock" file - a temporary file that is created before the operation and indicates other writers (or readers) that the file is being modified and thus exclusively locked. After the operation complete - remove the lock file. This method assumes that the file-creation command is atomic.
Use NTFS transactional API (if appropriate).
Create a link to the file, write the changed file under a random name (for example Guid.NewGuid()) - and then remap the link to the new file. All readers will access the file through the link (which name is known).
Of course all 3 approaches have their own drawbacks and advantages

If the software is writing to an NTFS partition then try using Transactional NTFS. You can use AlphFS for a .NET wrapper to the API. That is probably the most reliable way to write files and prevent corruption.

Related

How to handle properly temporary files?

Problem:
I have a web api which expose a method UploadFile, which will upload a file from a client to a specific directory of the server. The piece of code that handle the request and do the upload is the following:
var boundary = MultipartRequestHelper.GetBoundary(MediaTypeHeaderValue.Parse(Request.ContentType), _defaultFormOptions.MultipartBoundaryLengthLimit);
var reader = new MultipartReader(boundary, HttpContext.Request.Body);
try
{
// Read the form data.
var section = await reader.ReadNextSectionAsync();
// This illustrates how to get the file names.
while (section != null)
{
var hasContentDispositionHeader = ContentDispositionHeaderValue.TryParse(section.ContentDisposition, out ContentDispositionHeaderValue contentDisposition);
if (hasContentDispositionHeader)
{
if (MultipartRequestHelper.HasFileContentDisposition(contentDisposition))
{
targetFilePath = Path.Combine(root, contentDisposition.FileName.ToString());
using (var targetStream = System.IO.File.Create(targetFilePath))
{
await section.Body.CopyToAsync(targetStream);
//_logger.LogInformation($"Copied the uploaded file '{targetFilePath}'");
}
}
I always calledthis method using the following statement:
bool res = await importClient.UploadFileAsync(filePath);
where UploadFileAsync (which is on the client) build the request in this way:
var requestContent = new MultipartFormDataContent();
var array = File.ReadAllBytes(filePath);
var fileContent = new ByteArrayContent(array);
fileContent.Headers.ContentType = MediaTypeHeaderValue.Parse("application/octet-stream");
requestContent.Add(fileContent, "file", Path.GetFileName(filePath));
As you can see, this method expect a file name/path to work, this means that the file must "exist" somewhere in the client machine. I've used this method without any problem until now. I have a very specific case in which i need to upload something needed on the server that the user don't want to save on his client.
Possible solutions:
The first thing that i thought was to manually create a file in the client, and after the upload delete it. However I'm not very happy with this solution cause i need to handle everything manually
I can use the System.IO.Path.GetTempFileName() method, which will create a file in the temporary directory, but i'm not quite sure how the cancellation of the files is handled
I can use the TempFileCollection, but it seems more or less a mix of the previous point. I can technically create this collection in a using statement to get rid of it when the upload is done
I'm inexperienced about these topics, so I'm not sure which solution could fit best this scenario
My requirements are that i need to be 100% sure that the file is deleted after the upload is done, and i would like the solution to be "async friendly", i.e. i need the whole process to keep going without problems.
EDIT: I see a little bit of confusion. My problem is not how to handle the files on the server. That part is not a problem. I need to handle "temporary" files on the client.

Once you write something on the disk you can't be 100% that you will able to delete it. Moreover, even if you delete the file, you can't be sure that file can't be recovered.
So you have to ask why I need to delete the file. If it contains some secret, keep it in memory. If you can't fit the file into memory, write it encrypted on the disk and keep only key in the memory.
If you relax 100% to 99%, I would go for creating a file with Path.GetTempFileName and deleting it in finally block.
If 99% is not enough but 99.98% is, I would store names of created temporary files in persistent storage and regularly check if they are deleted.

For completition i'm writing the solution i used based on the suggestions i received here. Also the filename written as i did grant that statistically you won't have 2 temporary file with the same name
try
{
string file = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".xml";
tempFile = Path.GetFileName(file);
using (FileStream fs = new FileStream(file, FileMode.Create, FileAccess.Write, FileShare.None))
{
XmlSerializer serializer = new XmlSerializer(typeof(FileTemplate));
serializer.Serialize(fs, w.Template);
}
}
catch(Exception ex)
{
logger.Error(ex.Message);
//...
}
finally
{
//.... do stuff
File.Delete(tempFile );
}

You clearly shouldn't be using a file, in fact you don't want your data to ever leave RAM. You need to use "secure" memory storage so that the data is "guaranteed" to be pinned to physical RAM, untouched by the garbage collector, "never" paged out to swap. I use the quotes, because all those terms are somewhat misleading: the implementation isn't secure in absolute sense, it's just more secure than writing stuff to a disk file. Absolute security is impossible.
There are no common mechanisms that guarantee deletion of anything: the machine could "die" at any point between the writing of the data to the file, and whatever deletion operation you'd use to wipe the file "clean". Then you have no guarantee that e.g. the SSD or the hard drive won't duplicate the data should e.g. a sector become bad and need to be reallocated. You seem to wish to deal with several layers of underdocumented and complex (and often subtly buggy) layers of software when you talk about files:
The firmware in the storage device controller.
The device driver for the storage device.
The virtual memory system.
The filesystem driver.
The virtual filesystem layer (present in most OSes).
The .net runtime (and possibly the C runtime, depending on implementation).
By using a file you're making a bet that all those layers will do exactly what you want them to do. That won't usually be the case unless you tightly control all of these layers (e.g. you deploy a purpose-made linux distribution that you audit, and you use your own flash storage firmware or use linux memory technology driver that you'd audit too).
Instead, you can limit your exposure to just the VM system and the runtime. See e.g. this answer; it's easy to use:
using (var secret = new SecureArray<byte>(secretLength))
{
DoSomethingSecret(secret.Buffer);
}
SecureArray makes it likely that secret.Buffer stays in RAM - but you should audit that code as well, since, after all, you need it to do what it does, with your reputation possibly at stake, or legal liability, etc.
A simple test that can give you some peace of mind would involve a small test application that writes a short pseudorandom sequence to secret.Buffer, and then sleeps. Let this run in the background for a few days as you use your computer, then forcibly power it down (on a desktop: turn the on-off switch on the power supply to "off" position). Then boot up from a linux live CD, and run a search for some chunk of the pseudorandom sequence on the raw disk device. The expected outcome is that no identifiable part of the sequence has leaked to disk (say nothing larger than 48-64 bits). Even then you can't be totally sure, but this will thwart the majority of attempts at recovering the information...
...until someone takes the customer's system, dumps liquid nitrogen on the RAM sticks, shuts down the power, then transfers RAM to a readout device you can put together for
...or until they get malware on the system where the software runs, and it helpfully streams out RAM contents over internet, because why not.
...or until someone injects their certificate into the trust root on just one client machine, and MITM-s all the data elsewhere on the client's network.
And so on. It's all a tradeoff: how sure you wish to be that the data doesn't leak? I suggest getting the exact requirements from the customer in writing, and they must agree that they understand that it's not possible to be completely sure.

Does File.Move() deletes original on IoException?

I have a webservice that is writing files that are being read by a different program.
To keep the reader program from reading them before they're done writing, I'm writing them with a .tmp extension, then using File.Move to rename them to a .xml extension.
My problem is when we are running at volume - thousands of files in just a couple of minutes.
I've successfully written file "12345.tmp", but when I try to rename it, File.Move() throws an IOException:
File.Move("12345.tmp", "12345.xml")
Exception: The process cannot access the file because it is being used
by another process.
For my situation, I don't really care what the filenames are, so I retry:
File.Move("12345.tmp", "12346.xml")
Exception: Exception: Could not find file '12345.tmp'.
Is File.Move() deleting the source file, if it encounters an error in renaming the file?
Why?
Is there someway to ensure that the file either renames successfully or is left unchanged?

The answer is that it depends much on how the file system itself is implemented. Also, if the Move() is between two file systems (possibly even between two machines, if the paths are network shares) - then it also depends much on the O/S implementation of Move(). Therefore, the guarantees depend less on what System.IO.File does, and more about the underlying mechanisms: the O/S code, file-system drivers, file system structure etc.
Generally, in the vast majority of cases Move() will behave the way you expect it to: either the file is moved or it remains as it was. This is because a Move within a single file system is an act of removing the file reference from one directory (an on-disk data structure), and adding it to another. If something bad happens, then the operation is rolled back: the removal from the source directory is undone by an opposite insert operation. Most modern file systems have a built-in journaling mechanism, which ensures that the move operation is either carried out completely or rolled back completely, even in case the machine loses power in the midst of the operation.
All that being said, it still depends, and not all file systems provide these guarantees. See this study
If you are running over Windows, and the file system is local (not a network share), then you can use the Transacitonal File System (TxF) feature of Windows to ensure the atomicity of your move operation.

Concurrent file usage in C#

I have one application that will read from a folder and wait for a file to appear in this folder. When this file appear, the application shall read the content, execute a few functions to external systems with the data from the file and then delete the file (and in turn wait for next file).
Now, I want to run this application on two different machines but both listen in the same folder. So it’s the exact same application but two instances. Let’s call it instance A and instance B.
So when a new file appear, both A and B will find the file, and both will try to read it. This will lead to some sort of race condition between the two instances. I want that if A started read the file before B, B shall simply skip the file and let A process and delete it. Same thing if B finds the file first, A shall do nothing.
Now how can I implement this, setting a lock on the file is not sufficient I guess because lets say A started to read the file, it is then locked by A, then A will unlock it in order to delete it. During that time B might try to read the file. In that case the file is processed twice, which is not acceptable.
So to summarize, I have two instances of one program and one folder / network share, whenever a file appear in the folder. I want EITHER instance A or instance B process the file. NEVER both, any ideas of how I can implement such functionality in C#?

The correct way to do this is to open the file with a write lock (e.g., System.IO.FileAccess.Write, and a read share (e.g., System.IO.FileShare.Read). If one of the processes tries to open the file when the other process already has it open, then the open command will throw an exception, which you need to catch and handle as you see fit (e.g., log and retry). By using a write lock for the file open, you guarantee that the opening and locking are atomic and therefore synchronised between the two processes, and there is no race condition.
So something like this:
try
{
using (FileStream fileStream = new FileStream(FileName, FileMode.Open, FileAccess.Write, FileShare.Read))
{
// Read from or write to file.
}
}
catch (IOException ex)
{
// The file is locked by the other process.
// Some options here:
// Log exception.
// Ignore exception and carry on.
// Implement a retry mechanism to try opening the file again.
}
You can use FileShare.None if you do not want other processes to be able to access the file at all when your program has it open. I prefer FileShare.Read because it allows me to monitor what is happening in the file (e.g., open it in Notepad).
To cater for deleting the file is a similar principle: first rename/move the file and catch the IOException that occurs if the other process has already renamed it/moved it, then open the renamed/moved file. You rename/move the file to indicate that the file is already being processed and should be ignored by the other process. E.g., rename it with a .pending file extension, or move it to a Pending directory.
try
{
// This will throw an exception if the other process has already moved the file -
// either FileName no longer exists, or it is locked.
File.Move(FileName, PendingFileName);
// If we get this far we know we have exclusive access to the pending file.
using (FileStream fileStream = new FileStream(PendingFileName, FileMode.Open, FileAccess.Write, FileShare.Read))
{
// Read from or write to file.
}
File.Delete(PendingFileName);
}
catch (IOException ex)
{
// The file is locked by the other process.
// Some options here:
// Log exception.
// Ignore exception and carry on.
// Implement a retry mechanism to try moving the file again.
}
As with opening files, File.Move is atomic and protected by locks, therefore it is guaranteed that if you have multiple concurrent threads/processes attempting to move the file, only one will succeed and the others will throw an exception. See here for a similar question: Atomicity of File.Move.

I can think of two quick solutions to this;
Distribute the load
Have your 2 processes so that they only work on some files. How you do this could be based on the filename, or the date/time. E.g. Process 1 reads files which have a time stamp ending in an odd number, and process 2 reads the ones with an even number.
Database as lock
The other alternative is that you use some kind of database as a lock.
Process 1 reads a file and does an insert into a database table based on the filename (must be unique). If the insert works, then it is responsible for the file and continues processing it, else if the insert fails, then the other process has already inserted it so it is responsible and process 1 ignores the file.
The database has to be accessible to both processes, and this will incur some overhead. But might be a better option if you want to scale this out to more processes.

So if you are going to apply lock you can try to use file name as a lock object. You can try to rename file in special way (like by adding dot in front of file name)
and first service that was lucky to rename file will continue with it. And second one (slow) will get exception that file does not exist.
And you have to add check to your file processing logic that service will not try to "lock" file that is "locked" already (have a name started with dot).
UPD may be it is better to include special set of characters (like a mark) and some service identificator (machinename concatenated with PID)
because i'm not sure how file rename will work in the concurrent mode.
So if you have got file.txt in the shared folder
first of all you have to check is there .lock string in the file name
already
if no service can try to rename it to the file.txt.lockDevhost345 (where .lock - special marker, Devhost - name of current computer and 345 is a PID (process identifier)
then service have to check is there file.txt.lockDevhost345 file
available
if yes - it was locked by current service instance and can be used
if no - it was "stolen" by concurrent service so it should not be processed.
If you do not have write permission you can use another network share and try to create additional file lock marker, for example for file.txt service can try to create (and hold write lock) new file like file.txt.lock First service that has created lock file is taking care about original file and removes lock only when original file was processed.

Instead getting deep in file access change, I would suggest to use a functionality-server approach. Additional argument for this approach is file usage from different computers. This particular thing goes deep in access and permission administration.
My suggestion is about to have a single point of file access (Files repository) that implements the following functionality:
Get files list. (gets a list of available files)
Checkout file. (proprietary grab access to the file so that the owner of the checkout was authorized to modify the file)
Modify file. (update file content or delete it)
Check-in changes to the repository
There are a lot of ways to implement the approach. (Use API of a files a file versioning system; implement a service; use a database, ...)
An easy one (requires a database that supports transactions, triggers or stored procedures)
Get files list. (SQL SELECT from an "available files table")
Checkout file. (SQL UPDATE or Update stored procedure. By update in the trigger or in the stored procedure define an "raise error" state in case of multiple checkout)
Modify file. (update file content or delete it. Please keep in mind that is till better to do over a functionality "server". In this case you would need to implement security policy once)
Check-in changes to the repository (Release the "Checked Out" filed of the particular file entry. Implement the Check-In in transaction)

Error: The process cannot access the file '...' because it is being used by another process

I have a function that always creates a directory and put in it some files (images).
When the code runs first time, no problem. Second time (always), it gets an error when I have to delete the directory (because I want to recreate it to put in it the images). The error is "The process cannot access the file '...' because it is being used by another process". The only process that access to this files is this function.
It's like the function "doesn't leave" the files.
How can I resolve this with a clear solution?
Here a part of the code:
String strPath = Environment.CurrentDirectory.ToString() + "\\sessionPDF";
if (Directory.Exists(strPath))
Directory.Delete(strPath, true); //Here I get the error
Directory.CreateDirectory(strPath);
//Then I put the files in the directory

If your code or another process is serving up the images, they will be locked for an indefinite amount of time. If it's IIS, they're locked for a short time while being served. I'm not sure about this, but if Explorer is creating thumbs for the images, it may lock the files while it does that. It may be for a split second, but if your code and that process collide, it's a race condition.
Be sure you release your locks when you're done. If the class implements IDisposable, wrap a using statement around it if you're not doing extensive work on that object:
using (var Bitmap = ... || var Stream = ... || var File = ...) { ... }
...which will close the object afterwards and the file will not be locked.

Just going out on a limb here without seeing the code that dumps the files, but if you're using FileStreams or Bitmap objects, I would double check to ensure you are properly disposing of all of those objects before running the second method.

The only clear solution on this case is keep track of who is handling access to the directory and fix the bug, by releasing that access.
If the object/resource that handling access is 3rd party, or by any means is not possible to change or access, it's a time to revise an architecture, to handle IO access in a different way.
Hope this helps.

Sounds like you are not releasing the file handle when the file is created. Try doing all of your IO within the using statement, that way the file will be released automatically when you are finished with it.
http://msdn.microsoft.com/en-us/library/yh598w02%28v=vs.80%29.aspx

I have seen cases where a virus scanner will scan the new file and prevent the file from being deleted, though that is highly unlikely.
Be sure to .Dispose of all IDisposable objects and make sure that nothing has changed your Environment.CurrentDirectory to the directory you want to delete.

IOException with File.Copy() inspite of prior File.Exists() check

This is pertaining to a simple file copy operation code. My requirement is that only new files be copied from the source folder to the destination folder, so before I copy the file, I check that:
it exists in the source folder
it does not exist in the destination folder
After this I proceed with the copy operation.
However, I randomly get an IOException stating that "The file <filename> already exists."
Now, I have this code running (as part of a win service) on 2 servers so I'm willing to concede that maybe, just maybe, within that short interval where Server1 checked the conditions and proceeded to copy the file, Server2 copied it to destination, resulting in the IOException on Server1.
But, I have several thousands of files being copied and I get this error in thousands. How is this possible? What am I missing? Here's the code:
try
{
if(File.Exists(String.Format("{0}\\{1}",pstrSourcePath,strFileName)) && !File.Exists(String.Format("{0}\\{1}",pstrDestPath,strFileName)))
File.Copy(String.Format("{0}\\{1}",pstrSourcePath,strFileName),String.Format("{0}\\{1}",pstrDestPath,strFileName))
}
catch(IOException ioEx)
{
txtDesc.Value=ioEx.Message;
}

I imagine it's a permissions issue. From the docs for File.Exists:
If the caller does not have sufficient permissions to read the specified file, no exception is thrown and the method returns false regardless of the existence of path.
Perhaps the file does exist, but your code doesn't have permission to check it?
Note that your code would be clearer if you used string.Format once for each file and saved the results to temporary variables. It would also be better to use Path.Combine instead of string.Format, like this:
string sourcePath = Path.Combine(pstrSourcePath, strFileName);
string targetPath = Path.Combine(pstrDestPath, strFileName);
if (File.Exists(sourcePath) && !File.Exists(targetPath))
{
File.Copy(sourcePath, targetPath);
}
(I'd also ditch the str and pstr prefixes, but hey...)

The two server scenario is sufficient to explain the problem. Beware that they'll have a knack for automatically synchronizing to each other's copy operation. Whatever server is behind will quickly catch up because the file is already present in the target machine's file system cache.
You have to give up on the File.Exist test, it just cannot work reliably on a multi-tasking operating system. The race condition is unsolvable, the reason that neither Windows nor .NET has an IsFileLocked() method for example. Just call File.Copy(). You'll get an IOException of course if the file already exists. Filter out the exception messages by using Marshal.GetLastWin32Error(). The ERROR_FILE_EXISTS error code is 80.

The same thing is happening to me and I cannot figure it out. In my case, I am always writing to a new location and yet sometimes I receive the same error. When I look at that location, there is a file with that name zero bytes in size. I can guarantee that file did not exist prior and some other process is not also writing to that location. This is copying across to a network share, not sure if that is significant, but thought I would mention it. It is almost like the File.Copy operation is writing the files, then erring because the file exists (but not always). I am logging my copy operation as I recurse a directory structure and see no duplicate copy operations that might overlap.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.