I have an application that stores data in XML file every 500 ms using XElement object's .Save("path") method.
The problem is : when a sudden shutdown is occurred the content of the file is deleted so on the next run of the application the file can not be used.
How to prevent that / make sure the data will not be lost?
P.S: I'm using .NET 2010 C# under windows 7
I've made an experiment: instead of writing to the same data.xml file I've created (by copying from the original file) a new file each time and when the power was off while copying from data.xml file it would corrupt all previously created files?!?!?
Let's assume your file is data.xml. Instead of writing to data.xml all the time, write to a temporary file data.xml.tmp, and when finished, rename it to data.xml. But renaming will not work if you already have a data.xml file, so you will need to delete it first and then rename the temporary file.
That way, data.xml will contain the last safe data. If you have a sudden shutdown, the incomplete file will be the temporary data.xml.tmp. If your program tries to read the file later on and there is no data.xml file, that means the shutdown happened between the delete and rename operations, so you will have to read the temporary file instead. We know it is safe because otherwise there would be a data.xml file.
You can use a 2-phase commit:
Write the new XML to a file with a different name
Delete the old file
Rename to new file to the old name
This way, there will always be at least one complete file.
If you restart, and the standard name doesn't exist, check for the different name.
This one could be a life savior but with little more efforts. There should be a separate process which does
Take backup to its stash automatically whenever the file gets updated.
It internally maintains two versions in a linked list.
If the file gets updated, then the latest shall be updated to HEAD using linkedList.AddFirst() and the least version pointed by TAIL could be removed by linkedList.RemoveLast().
And of course, it should scan and load the stash about the latest version available in the stash during startup.
In the hard shutdown scenario, when the system starts up next time, this process should check whether the file is valid / corrupted. If corrupted, then restore the latest from HEAD and subscribe for FileChanged notification using a simple FileSystemWatcher.
This approach is well tested.
Problems seen
What if the Hard shutdown happens while updating the HEAD?
-- Well, there is another version we have it in the stash next to HEAD
What if the Hard shutdown happens while updating the HEAD when the stash is empty? -- We know that the file was valid while updating HEAD. The process shall try copying again at next startup since it is not corrupted.
What if the stash is empty and the file has been corrupted? -- This is the death pit and no solution is available for this. But this scenario occurs only when you deploy this recovery process after the file corruption happened.
Related
Step 1- I am copying a file manually by reading from source then writing to target files in chunk. I keep the file handle open until all copy is not over. The handle is safely closed as copy is over.
Step 2- After copy is over, I set the time stamp, attributes, ACL and may more things.
Sometime in step 2, I get the issue that file is being used by some other process. This issue raise mostly for exe files. I got the answer which process was using that file from File locked by other process. A sper answer, OS locks the file to set the icon or some other information on file for a very minor time.
But if I go to perform step 2 without any delay after finishing step 1 then I get access error. How I can ensure that OS will not lock the file?
Looping to check for file access is not an solution as per because the locking may be happen at any point of time in step 2. Step 2 is not atomic, there I need to open/close same file multiple times.
I have a project that uses the .net FileSystemWatcher to watch a Samba network share for video files. When it sees a file, it adds it to an encode queue. When files are dequeued, they are moved to a local directory where the process then encodes the file to several different formats and spits them out to an output directory.
The problem arises because the video files are so big, that it often takes several minutes for them to copy completely into the network directory, so when a file is dequeued, it may or may not have completely finished being copied to the network share. When the file is being copied from a windows machine, I am able to work around it because trying to move a file that is still being copied throws an IOException. I simply catch the exception and retry every few seconds until it is done copying.
When a file is dropped into the Samba share from a computer running OS X however, that IOException is not thrown. Instead, a partial file is copied to the working directory which then fails to encode because it is not a valid video file.
So my question is, is there any way to make the FileSystemWatcher wait for files to be completely written before firing its "Created" event (based on this question I think the answer to that question is "no")? Alternatively, is there a way to get files copied from OS X to behave similarly to those in windows? Or do I need to find another solution for watching the Samba share? Thanks for any help.
Option 3. Your best bet is to have a process that watches the incoming share for files. When it sees a file, note its size and/or modification date.
Then, after some amount of time (like, 1 or 2 seconds), look again. Note any files that were seen before and compare their new sizes/mod dates to the one you saw last time.
Any file that has not changed for some "sufficiently long" period of time (1s? 5s?) is considered "done".
Once you have a "done" file, MOVE/rename that file to another directory. It is from THIS directory that your loading process can run. It "knows" that only files that are complete are in this directory.
By having this two stage process, you are able to later possibly add other rules for acceptance of a file, since all of those rules must pass before the file gets moved to its proper staging area (you can check format, check size, etc.) beyond a simple rule of just file existence.
Your later process can rely on file existence, both as a start mechanism and a restart mechanism. When the process restarts after failure or shut down, it can assume that any files in the second staging are either new or incomplete and take appropriate action based on its own internal state. When the processing is done it can choose to either delete the file, or move it to a "finished" area for archiving or what not.
I have a program that overwrites a certain set of files required for my website, however the traffic on my website has increased so much that i now get the error File in use. Which results in it being unable to update the file..
This program runs every 5 minutes to update the specified files.
The reason I let this program handle the writing of the file and not the program, is cause I also need to update it to a different webserver (through ftp). this way I also ensure that the file gets updated every 5 minutes, instead of when a user would take a look at page.
My question therefore is; Can i tell IIS7.5 to cache the file after (say 5 seconds to 1 minute) it has been updated? This should ensure that the next time the program runs to update the file it won't encounter any problems.
Simplest solution would be change that program that refreshes the file to store that new information in database, not in filesystem.
But if you can't use database I would take different approach, store file contents to System.Web.Caching.Cache with time when was last modified, and then check if file is changed if not use cached version, and if was changed store new contents and time in same cache variable.
Of course you will have to check if you can read the file, and only then refresh cache contents, and if you can not read the file you can simply get last version from the cache.
Initial reading of file would have to be in application_start to ensure that cache has been initialized, and there you will have to wait until you can read file to store it in the cache for the first time.
Best way to check that you can read from file is to catch exception, because lock can happen after your check, see this post : How to check for file lock?
I'm making a little app in C#/.NET that watch for the creation of a file and when it is created it gets its content, parse it and write it in another file.
Everything is working fine so far. But the problem is : there's another process that watch for this file as well. My process is only READING the file while the second one reads it and then DELETES it.
My application is making its job but when it reads the file, the other process can't read it and totally crashes (Not made by me and don't have the sources to fix it).
My application is running very fast and other open the files for a very little time to get the content and put it in a variable so it could close the file faster and then parse the content of the file which is in the variable.
I clearly don't know how but I'd like to be able to read the file and let the other read the file at the same time without any hiccups. Is it possible? I still think that there will be a problem about the fact that the file is being deleted after the other app is done parsing it...
Any suggestions or ideas?
Thanks very much!
You can open the file as follows to ensure you don't lock it from other processes:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// do your stuff
}
But if the other process is trying to open it in exclusive mode, it won't help and it will still crash. There's no way to deal with that other than fixing the code for the other process.
KISS: Can you have the file created in a location which the first program isn't looking at, but your software is - and when you are done processing it you then move it to the current location where the first program is looking?
Otherwise:
You are going to have contention since it's going to be a race to see which process actually "notices" the file first and begins working.
I'm assuming you also don't have any control over the process creating the file?
In that case you might look at PsSuspend or PauseSp - if you can control the other process by suspending it until you are ready for it (done with the file) then that might be viable. Not sure how robust this would be.
There's also still the potential race condition of "noticing" the file and performing an action (whatever it is) - keeping the other process paused perpetually until you want it to run (or killing it and starting it) is the only completely deterministic way to achieve what you want within the constraints.
If you are using an NTFS drive (which is very likely), then you can create a hard-link to the file. Essentially, this duplicates the file without actually creating a duplicate. You can read the file with the hard-link. The other process can delete the file, which will only remove their link to the file. This will leave the file in place for you to read. When your program is done reading the file, it can delete the hard-link, and the file system will see that both links have been deleted, and it will delete the file itself.
This can be done from the command line with
fsutil hardlink create <NewFileName> <ExistingFileName>
Or you can P/Invoke the CreateHardLink function in the Windows API.
Can you create another empty zero bytes file called .reading file which has the same name but extension "reading" to it. Then once first process is done reading the file, rename .reading to .done and the second process can check .done files and delete the original file,since both .done and original file have same name but different extensions ?.
#Prashant's response gave me the inspiration for this, and it's very similar, but I believe will solve your problem.
If the other process must match a certain filename pattern
Rename the file to something that
won't match first, a very cheap/fast
operation
Rename it back when finished
If it matches every file in a given folder
Move it to another folder (also a very cheap operation in most filesystems)
Move it back when finished.
If the other process had already locked your file (even for read) then your process would fail, and you can make that graceful. If not you should be safe.
There is still a race condition possibility, of course, but this should be much safer than what you are doing.
A Winforms program needs to save some run time information to an XML file. The file can sometimes be a couple of hundred kilobytes in size. During beta testing we found some users would not hesitate to terminate processes seemingly at random and occasionally causing the file to be half written and therefore corrupted.
As such, we changed the algorithm to save to a temp file and then to delete the real file and do a move.
Our code currently looks like this..
private void Save()
{
XmlTextWriter streamWriter = null;
try
{
streamWriter = new XmlTextWriter(xmlTempFilePath, System.Text.Encoding.UTF8);
XmlSerializer xmlSerializer = new XmlSerializer(typeof(MyCollection));
xmlSerializer.Serialize(streamWriter, myCollection);
if (streamWriter != null)
streamWriter.Close();
// Delete the original file
System.IO.File.Delete(xmlFilePath);
// Do a move over the top of the original file
System.IO.File.Move(xmlTempFilePath, xmlFilePath);
}
catch (System.Exception ex)
{
throw new InvalidOperationException("Could not save the xml file.", ex);
}
finally
{
if (streamWriter != null)
streamWriter.Close();
}
}
This works in the lab and in production almost all of the time. The program is running on 12 computers and this code is called on average once every 5 min. About once or twice a day we get this exception:
System.InvalidOperationException:
Could not save the xml file.
---> System.IO.IOException: Cannot create a file when that file already exists.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.__Error.WinIOError()
at System.IO.File.Move(String sourceFileName, String destFileName)
at MyApp.MyNamespace.InternalSave()
It is as if the Delete is not actually issued to the hard drive before the Move is issued.
This is happening on Win7 machines.
A couple of questions: Is there some concept of a Flush() one can do for the entire disk operating system? Is this a bug with my code, .net, the OS or something else? Should I be putting in some Thread.Sleep(x)? Maybe I should do a File.Copy(src, dest, true)? Should I write the following code? (But it looks pretty silly.)
while (System.IO.File.Exists(xmlFilePath))
{
System.IO.File.Delete(xmlFilePath);
}
// Do a move over the top of the main file
bool done = false;
while (!done)
{
try
{
System.IO.File.Move(xmlTempFilePath, xmlFilePath);
done = true;
}
catch (System.IO.IOException)
{
// let it loop
}
}
Has anyone seen this before?
You can never assume that you can delete a file and remove it on a multi-user multi-tasking operating system. Short from another app or the user herself having an interest in the file, you've also got services running that are interested in files. A virus scanner and a search indexer are classic trouble makers.
Such programs open a file and try to minimize the impact that has by specifying delete share access. That's available in .NET as well, it is the FileShare.Delete option. With that option in place, Windows allows a process to delete the file, even though it is opened. It gets internally marked as "delete pending". The file does not actually get removed from the file system, it is still there after the File.Delete call. Anybody that tries to open the file after that gets an access denied error. The file doesn't actually get removed until the last handle to the file object gets closed.
You can probably see where this is heading, this explains why File.Delete succeeded but File.Move failed. What you need to do is File.Move the file first so it has a different name. Then rename the new file, then delete the original. Very first thing you do is delete a possible stray copy with the renamed name, it might have been left behind by a power failure.
Summarizing:
Create file.new
Delete file.tmp
Rename file.xml to file.tmp
Rename file.new to file.xml
Delete file.tmp
Failure of step 5 is not critical.
how about using Move.Copy with overwrite set to true so that it overwrites your app state and then you can delete your temp state ?
You can also attach to App_Exit event and try to perform clean shut down?
If multiple threads in this application can call Save, or if multiple instances of the program are attempting to update the same file (on e.g. a network share), you can get a race condition such that the file doesn't exist when both threads/processes attempt the delete, then one succeeds in performing its Move (or the Move is in progress), when the second attempts to use the same filename and that second Move will fail.
As anvarbek raupov says, you can use File.Copy(String, String, Boolean) to allow overwriting to occur (so no longer need the delete), but this means that the last updater wins - you need to consider if this is what you want (especially in multi-threaded scenarios, where the last updater may, if you're unlucky, have been working with older state).
Safer would be to force each thread/process to use separate files, or implement some form of file system based locking (e.g. create another file to announce "I'm working on this file", update the main file, then delete this lock file).
As Hans said, a workaround is to move the file first and THEN delete it.
That said with Transactional NTFS' delete, I haven't been able to reproduce the error described. Check out github.com/haf/Castle.Transactions and the corresponding nuget packages... 2.5 is well tested and documented for file transactions.
When testing with non transacted file systems, the unit-test code for this project's unit tests always does moves before deletes.
3.0 is currently in pre-alpha state but will integrate regular transactions with file io with transactional files to a much higher level.