Concurrent File.Move of the same file - c#

It was clearly stated that File.Move is atomic operation here: Atomicity of File.Move.
But the following code snippet results in visibility of moving the same file multiple times.
Does anyone know what is wrong with this code?
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
namespace FileMoveTest
{
class Program
{
static void Main(string[] args)
{
string path = "test/" + Guid.NewGuid().ToString();
CreateFile(path, new string('a', 10 * 1024 * 1024));
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
var task = Task.Factory.StartNew(() =>
{
try
{
string newPath = path + "." + Guid.NewGuid();
File.Move(path, newPath);
// this line does NOT solve the issue
if (File.Exists(newPath))
Console.WriteLine(string.Format("Moved {0} -> {1}", path, newPath));
}
catch (Exception e)
{
Console.WriteLine(string.Format(" {0}: {1}", e.GetType(), e.Message));
}
});
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
static void CreateFile(string path, string content)
{
string dir = Path.GetDirectoryName(path);
if (!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
using (FileStream f = new FileStream(path, FileMode.OpenOrCreate))
{
using (StreamWriter w = new StreamWriter(f))
{
w.Write(content);
}
}
}
}
}
The paradoxical output is below. Seems that file was moved multiple times onto different locations. On the disk only one of them is present. Any thoughts?
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.0018d317-ed7c-4732-92ac-3bb974d29017
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.3965dc15-7ef9-4f36-bdb7-94a5939b17db
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.fb66306a-5a13-4f26-ade2-acff3fb896be
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.c6de8827-aa46-48c1-b036-ad4bf79eb8a9
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
The resulting file is here: eb85560d-8c13-41c1-926a-6871be030742.fb66306a-5a13-4f26-ade2-acff3fb896be
UPDATE. I can confirm that checking File.Exists also does NOT solve the issue - it can report that single file was really moved into several different locations.
SOLUTION. The solution I end up with is following: Prior to operations with source file create special "lock" file, if it succeeded then we can be sure that only this thread got exclusive access to the file and we are safe to do anything we want. The below is right set of parameters to create suck "lock" file.
File.Open(lockPath, FileMode.CreateNew, FileAccess.Write);

Does anyone know what is wrong with this code?
I guess that depends on what you mean by "wrong".
The behavior you're seeing is not IMHO unexpected, at least if you're using NTFS (other file systems may or may not behave similarly).
The documentation for the underlying OS API (MoveFile() and MoveFileEx() functions) is not specific, but in general the APIs are thread-safe, in that they guarantee the file system will not be corrupted by concurrent operations (of course, your own data could be corrupted, but it will be done in a file-system-coherent way).
Most likely what is occurring is that as the move-file operation proceeds, it does so by first getting the actual file handle from the given directory link to it (in NTFS, all "file names" that you see are actually hard links to an underlying file object). Having obtained that file handle, the API then creates a new file name for the underlying file object (i.e. as a hard link), and then deletes the previous hard link.
Of course, as this progresses, there is a window during the time between a thread having obtained the underlying file handle but before the original hard link has been deleted. This allows some but not all of the other concurrent move operations to appear to succeed. I.e. eventually the original hard link doesn't exist and further attempts to move it won't succeed.
No doubt the above is an oversimplification. File system behaviors can be complex. In particular, your stated observation is that you only wind up with a single instance of the file when all is said and done. This suggests that the API does also somehow coordinate the various operations, such that only one of the newly-created hard links survives, probably by virtue of the API actually just renaming the associated hard link after retrieving the file object handle, as opposed to creating a new one and deleting the old one (implementation detail).
At the end of the day, what's "wrong" with the code is that it is intentionally attempting to perform concurrent operations on a single file. While the file system itself will ensure that it remains coherent, it's up to your own code to ensure that such operations are coordinated so that the results are predictable and reliable.

Related

Files disappear after they fail to be moved

We have a process where people scan documents with photocopiers and drop them in a certain directory on our file server. We then have a hourly service within an .NET Core app, that scans the directory, grabs the file and moves them according to their file name to a certain directory. Here comes the problems.
The code looks something like that:
private string MoveFile(string file, string commNumber)
{
var fileName = Path.GetFileName(file);
var baseFileName = Path.GetFileNameWithoutExtension(fileName).Split("-v")[0];
// 1. Check if the file already exists at destination
var existingFileList = luxWebSamContext.Documents.Where(x => EF.Functions.Like(x.DocumentName, "%" + Path.GetFileNameWithoutExtension(baseFileName) + "%")).ToList();
// If the file exists, check for the current version of file
if (existingFileList.Count > 0)
{
var nextVersion = existingFileList.Max(x => x.UploadVersion) + 1;
var extension = Path.GetExtension(fileName);
fileName = baseFileName + "-v" + nextVersion.ToString() + extension;
}
var from = #file;
var to = Path.Combine(#destinationPath, commNumber,fileName);
try
{
log.Info($"------ Moving File! ------ {fileName}");
Directory.CreateDirectory(Path.Combine(#destinationPath, commNumber));
File.Move(from, to, true);
return to;
}
catch (Exception ex)
{
log.Error($"----- Couldn't MOVE FILE: {file} ----- commission number: {commNumber}", ex);
The interesting part is in the try-block, where the file move takes place. Sometmes we have the problem that the program throws the following exception
2021-11-23 17:15:37,960 [60] ERROR App ----- Couldn't MOVE FILE:
\PATH\PATH\PATH\Filename_423489120.pdf ----- commission number:
05847894
System.IO.IOException: The process cannot access the file because it is being used by another process.
at System.IO.FileSystem.MoveFile(String sourceFullPath, String destFullPath, Boolean overwrite)
at System.IO.File.Move(String sourceFileName, String destFileName, Boolean overwrite)
So far so good. I would expect that after the file cannot be moved, it remains in the directory from it was supposed to be moved. But that's not the case. We had this issue yesterday afternoon and after I looked for the file, it was gone from the directory.
Is this the normal behaviour of the File.Move() method?
First to your question:
Is this the normal behaviour of the File.Move() method?
No, thats not the expected behaviour. The documentation says:
Moving the file across disk volumes is equivalent to copying the file
and deleting it from the source if the copying was successful.
If you try to move a file across disk volumes and that file is in use,
the file is copied to the destination, but it is not deleted from the
source.
Your Exception says, that another process is using the file in the same moment. So you should check, whether other parts of your application may performs a Delete, or someone (if this scenario is valid) is deleting files manually from the file system.
Typically, File.Move() only removes the source file, once the destination file is successfully transferred in place. So the answer to your question is no, it cannot be purely the File.Move(). The interesting part is, why is this file locked? Probaby because some file stream is still open and blocking access to the file. Also, do you have multiple instances of the copy process services running? This may cause several services trying to access the file simultaneously, causing the exception you posted.
There must be a different cause making the files disappear because the File.Move() will certainly not remove the file when the copy process did not succeed.
For debugging purposes, you may try and open the file with a lock on it. This will fail when a different process locks the file providing you a little bit more information.

Best practice for writing big files

I need to write a big file in my project.
What I learned:
I should NOT write the big file directly to the destination path,
because this may leave a incomplete file in case the app crash while writing it.
Instead, I should write to a temporary file and move (rename) it. (called atomic file operation)
My code snippet:
[NotNull]
public static async Task WriteAllTextAsync([NotNull] string path, [NotNull] string content)
{
string temporaryFilePath = null;
try {
temporaryFilePath = Path.GetTempFileName();
using (var stream = new StreamWriter(temporaryFilePath, true)) {
await stream.WriteAsync(content).ConfigureAwait(false);
}
File.Delete(path);
File.Move(temporaryFilePath, path);
}
finally {
if (temporaryFilePath != null) File.Delete(temporaryFilePath);
}
}
My Question:
The file will be missing if the app crashes between File.Delete and File.Move. Can I avoid this?
Is there any other best practice for writing big files?
Is there any suggestion on my code?
The file will be missing if the app crashes between File.Delete and File.Move. Can I avoid this?
Not that I'm aware of, but you can detect it - and if you use a more predictable filename, you can recover from that. It helps if you tweak the process somewhat to use three file names: the target, a "new" file and an "old" file. The process becomes:
Write to "new" file (e.g. foo.txt.new)
Rename the target file to the "old" file (e.g. foo.txt.old)
Rename the "new" file to the target file
Delete the "old" file
You then have three files, each of which may be present or absent. That can help you to detect the situation when you come to read the new file:
No files: Nothing's written data yet
Just target: All is well
Target and new: App crashed while writing new file
Target and old: App failed to delete old file
New and old: App failed after the first rename, but before the second
All three, or just old, or just new: Something very odd is going on! User may have interfered
Note: I was unaware of File.Replace before, but I suspect it's effectively just a simpler and possibly more efficient way of doing the code you're already doing. (That's great - use it!) The recovery process would still be the same though.
You can use File.Replace instead of deleting and moving files. In case of hard fault (electricity cut or something like this) you will always lost data, you have to count with that.

What would cause files being added to a zip file to not be included in the zip?

I work with a program that takes large amounts of data, turns the data into xml files, then takes those xml files and zips them for use in another program. Occasionally, during the zipping process, one or two xml files gets left out. It is fairly rare, once or twice a month, but when it does happen it's a big mess. I am looking for help figuring out why the files don't get zipped and how to prevent it. This code is straightforward:
public string AddToZip(string outfile, string toCompress)
{
if (!File.Exists(toCompress)) throw new FileNotFoundException("Could not find the file to compress", toCompress);
string dir = Path.GetDirectoryName(outfile);
if(!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
// The program that gets this data can't handle files over
// 20 MB, so it splits it up into two or more files if it hits the
// limit.
if (File.Exists(outfile))
{
FileInfo tooBig = new FileInfo(outfile);
int converter = 1024;
float fileSize = tooBig.Length / converter; //bytes to KB
fileSize = fileSize / converter; //KB to MB
int limit = CommonTypes.Helpers.ConfigHelper.GetConfigEntryInt("zipLimit", "19");
if (fileSize >= limit)
{
outfile = MakeNewName(outfile);
}
}
using (ZipFile zf = new ZipFile(outfile))
{
zf.AddFile(toCompress,"");
zf.Save();
}
return outfile;
}
Ultimately, what I want to do is have a check that sees if any xml files weren't added to the zip after the zip file is created, but stopping the problem in its tracks are best overall. Thanks for the help.
Make sure you have that code inside a try... catch statement. Also make sure that if you have done that, you do something with the exception. It would not be the first case that has this type of exception handling:
try
{
//...
}
catch { }
Given the code above if you have any exception on your process, you will never notice.
It's hard to judge from this function alone, here's a list of things that can go wrong:
- The toCompress file can be gone by the time zf.AddFile is called (but after the Exists test). Test return value or add exception handling to detect this.
- The zip outFile can be just below the size limit, adding a new file can make it go over the limit.
- The AddToZip() may be called concurrently, that may cause adding to fail.
How is the toCompress file remove handled? I think adding locking to the AddoZip() on a function scope might also be a good idea.
This could be a timing issue. You are checking to see if outfile is too big before trying to add the toCompress file. What you should be doing is:
Add toCompress to outfile
Check to see if adding the file made outfile too big
If outfile is now too big, remove toCompress, create new outfile, add toCompress to new outfile.
I suspect that you occasionally have an outfile that is just under the limit, but adding toCompress puts it over. Then the receiving program does not process outfile because it is too big.
I could be completely off base, but it is something to check.

C# - Waiting for a copy operation to complete

I have a program that runs as a Windows Service which is processing files in a specific folder.
Since it's a service, it constantly monitors a folder for new files that have been added. Part of the program's job is to perform comparisons of files in the target folder and flag non-matching files.
What I would like to do is to detect a running copy operation and when it is completed, so that a file is not getting prematurely flagged if it's matching file has not been copied over to the target folder yet.
What I was thinking of doing was using the FileSystemWatcher to watch the target folder and see if a copy operation is occurring. If there is, I put my program's main thread to sleep until the copy operation has completed, then proceed to perform the operation on the folder like normal.
I just wanted to get some insight on this approach and see if it is valid. If anyone else has any other unique approaches to this problem, it would be greatly appreciated.
UPDATE:
I apologize for the confusion, when I say target directory, I mean the source folder containing all the files I want to process. A part of the function of my program is to copy the directory structure of the source directory to a destination directory and copy all valid files to that destination directory, preserving the directory structure of the original source directory, i.e. a user may copy folders containing files to the source directory. I want to prevent errors by ensuring that if a new set of folders containing more subfolders and files is copied to the source directory for processing, my program will not start operating on the target directory until the copy process has completed.
Yup, use a FileSystemWatcher but instead of watching for the created event, watch for the changed event. After every trigger, try to open the file. Something like this:
var watcher = new FileSystemWatcher(path, filter);
watcher.Changed += (sender, e) => {
FileStream file = null;
try {
Thread.Sleep(100); // hack for timing issues
file = File.Open(
e.FullPath,
FileMode.Open,
FileAccess.Read,
FileShare.Read
);
}
catch(IOException) {
// we couldn't open the file
// this is probably because the copy operation is not done
// just swallow the exception
return;
}
// now we have a handle to the file
};
This is about the best that you can do, unfortunately. There is no clean way to know that the file is ready for you to use.
What you are looking for is a typical producer/consumer scenario. What you need to do is outlined in 'Producer/consumer queue' section on this page. This will allow you to use multi threading (maybe span a backgroundworker) to copy files so you don't block the main service thread from listening to system events & you can perform more meaningful tasks there - like checking for new files & updating the queue. So on main thread do check for new files on background threads perform the actual coping task. From personal experience (have implemented this tasks) there is not too much performance gain from this approach unless you are running on multiple CPU machine but the process is very clean & smooth + the code is logically separated nicely.
In short, what you have to do is have an object like the following:
public class File
{
public string FullPath {get; internal set;}
public bool CopyInProgress {get; set;} // property to make sure
// .. other properties if desired
}
Then following the tutorial posted above issue a lock on the File object & the queue to update it & copy it. Using this approach you can use this type approaches instead of constantly monitoring for file copy completion.
The important point to realize here is that your service has only one instance of File object per actual physical file - just make sure you (1)lock your queue when adding & removing & (2) lock the actual File object when initializing an update.
EDIT: Above where I say "there is not too much performance gain from this approach unless" I refere to if you do this approach in a single thread compare to #Jason's suggesting this approach must be noticeably faster due to #Jason's solution performing very expensive IO operations which will fail on most cases. This I haven't tested but I'm quite sure as my approach does not require IO operations open(once only), stream(once only) & close file(once only). #Jason approach suggests multiple open,open,open,open operations which will all fail except the last one.
One approach is to attempt to open the file and see if you get an error. The file will be locked if it is being copied. This will open the file in shared mode so it will conflict with an already open write lock on the file:
using(System.IO.File.Open("file", FileMode.Open,FileAccess.Read, FileShare.Read)) {}
Another is to check the file size. It would change over time if the file is being copied to.
It is also possible to get a list of all applications that has opened a certain file, but I don't know the API for this.
I know this is an old question, but here's an answer I spun up after searching for an answer to just this problem. This had to be tweaked a lot to remove some of the proprietary-ness from what I was working on, so this may not compile directly, but it'll give you an idea. This is working great for me:
void BlockingFileCopySync(FileInfo original, FileInfo copyPath)
{
bool ready = false;
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.NotifyFilter = NotifyFilters.LastWrite;
watcher.Path = copyPath.Directory.FullName;
watcher.Filter = "*" + copyPath.Extension;
watcher.EnableRaisingEvents = true;
bool fileReady = false;
bool firsttime = true;
DateTime previousLastWriteTime = new DateTime();
// modify this as you think you need to...
int waitTimeMs = 100;
watcher.Changed += (sender, e) =>
{
// Get the time the file was modified
// Check it again in 100 ms
// When it has gone a while without modification, it's done.
while (!fileReady)
{
// We need to initialize for the "first time",
// ie. when the file was just created.
// (Really, this could probably be initialized off the
// time of the copy now that I'm thinking of it.)
if (firsttime)
{
previousLastWriteTime = System.IO.File.GetLastWriteTime(copyPath.FullName);
firsttime = false;
System.Threading.Thread.Sleep(waitTimeMs);
continue;
}
DateTime currentLastWriteTime = System.IO.File.GetLastWriteTime(copyPath.FullName);
bool fileModified = (currentLastWriteTime != previousLastWriteTime);
if (fileModified)
{
previousLastWriteTime = currentLastWriteTime;
System.Threading.Thread.Sleep(waitTimeMs);
continue;
}
else
{
fileReady = true;
break;
}
}
};
System.IO.File.Copy(original.FullName, copyPath.FullName, true);
// This guy here chills out until the filesystemwatcher
// tells him the file isn't being writen to anymore.
while (!fileReady)
{
System.Threading.Thread.Sleep(waitTimeMs);
}
}

How To Write To An Embedded Resource?

I keep getting the error "Stream was not writable" whenever I try to execute the following code. I understand that there's still a reference to the stream in memory, but I don't know how to solve the problem. The two blocks of code are called in sequential order. I think the second one might be a function call or two deeper in the call stack, but I don't think this should matter, since I have "using" statements in the first block that should clean up the streams automatically. I'm sure this is a common task in C#, I just have no idea how to do it...
string s = "";
using (Stream manifestResourceStream =
Assembly.GetExecutingAssembly().GetManifestResourceStream("Datafile.txt"))
{
using (StreamReader sr = new StreamReader(manifestResourceStream))
{
s = sr.ReadToEnd();
}
}
...
string s2 = "some text";
using (Stream manifestResourceStream =
Assembly.GetExecutingAssembly().GetManifestResourceStream("Datafile.txt"))
{
using (StreamWriter sw = new StreamWriter(manifestResourceStream))
{
sw.Write(s2);
}
}
Any help will be very much appreciated. Thanks!
Andrew
Embedded resources are compiled into your assembly, you can't edit them.
As stated above, embedded resources are read only. My recommendation, should this be applicable, (say for example your embedded resource was a database file, XML, CSV etc.) would be to extract a blank resource to the same location as the program, and read/write to the extracted resource.
Example Pseudo Code:
if(!Exists(new PhysicalResource())) //Check to see if a physical resource exists.
{
PhysicalResource.Create(); //Extract embedded resource to disk.
}
PhysicalResource pr = new PhysicalResource(); //Create physical resource instance.
pr.Read(); //Read from physical resource.
pr.Write(); //Write to physical resource.
Hope this helps.
Additional:
Your embedded resource may be entirely blank, contain data structure and / or default values.
A bit late, but for descendants=)
About embedded .txt:
Yep, on runtime you couldnt edit embedded because its embedded. You could play a bit with disassembler, but only with outter assemblies, which you gonna load in current context.
There is a hack if you wanna to write to a resource some actual information, before programm starts, and to not keep the data in a separate file.
I used to worked a bit with winCE and compact .Net, where you couldnt allow to store strings at runtime with ResourceManager. I needed some dynamic information, in order to catch dllNotFoundException before it actually throws on start.
So I made embedded txt file, which I filled at the pre-build event.
like this:
cd $(ProjectDir)
dir ..\bin\Debug /a-d /b> assemblylist.txt
here i get files in debug folder
and the reading:
using (var f = new StreamReader(Assembly.GetExecutingAssembly().GetManifestResourceStream("Market_invent.assemblylist.txt")))
{
str = f.ReadToEnd();
}
So you could proceed all your actions in pre-build event run some exes.
Enjoy! Its very usefull to store some important information and helps avoid redundant actions.

Categories