File containing null values instead of JSON object after computer restart - c#

I am working on a C# project in which I have several configuration files. Each of these files contains a JSON Object. During the whole lifecycle of the program, these files can be read or be written at various moments.
This program manages an industrial machine which for various reasons can be turned off at any moment. Turning off the machine leads to instantly turning off the computer on which my program is running. The computer is running Windows 10 Pro x64 with a NTFS formatted SSD.
When the machine is turned on, and thus my program restarts, it throws an exception when reading a configuration file telling the file does not contain any JSON object. When I open the file with Notepad, the file really is "empty".
For example, instead of having a JSON Object:
{
"key": value
}
I have the following content:
NULNULNULNULNULNULNULNUL etc.
The properties of the file show the same filesize whether it contains a JSON object or is "empty", the same goes for the size on disk property.
I have other configuration files that are read and written but as plain text, which are not affected.
This issue does not arise at each power off / power on, and does not affect each configuration file. It mostly appears with the same file but not always.
I've checked if the configuration files are correctly closed whenever I read or write them:
Read file:
JObject jsondata = JObject.Parse(File.ReadAllText(Path));
Write file:
File.WriteAllText(Path, jsondata.ToString());
Both methods (ReadAllText and WriteAllText) specify that they open, read and close the file.
These methods are surrounded with try catch clauses and I never had an issue with a wrong JSON structure or a NULL Object. If I'm correct, even a NULL JSON object would write at least the brackets {} into the file.
I've tried to programmatically backup my configuration files in another folder. Backing up files are done without reading the files (using the File.Copy() method):
Periodically (every 10 minutes), update the backup files with the latest configuration files.
If a configuration file is "empty" (by checking if all bytes in file equal 0), replace it with the corresponding backup file.
// Check if any file has been modified since last check
for (int file = 0; file < Directory.GetFiles(_FolderToBackup).Length; ++file)
{
// Get file to check
string FilePath = Directory.GetFiles(_FolderToBackup)[file];
string FileName = Path.GetFileName(FilePath);
// Check if backup file with same name exists in Backup folder
if (BackupFileExist(FileName))
{
// File path to backup file
string BackupFilePath = _BackupFolder + "\\" + FileName;
// If backup file is empty
if (isFileEmpty(BackupFilePath))
{
Log.Write("File " + FilePath + " is empty");
// Copy file to backupfolder, we don't have to check if file to backup is empty, because destination is already empty !
File.Copy(FilePath, BackupFilePath, true);
}
// If file to backup is empty
if (isFileEmpty(FilePath))
{
Log.Write("File " + FilePath + " is empty");
// Copy backup file back to folder to backup
File.Copy(BackupFilePath, FilePath, true);
}
// If no file is empty, update only files that have been modified since last check
if(new FileInfo(FilePath).LastWriteTime > new FileInfo(BackupFilePath).LastWriteTime)
{
File.Copy(FilePath, BackupFilePath, true);
}
}
// If backup file does not exist
else
{
string BackupFilePath = Path.Combine(_BackupFolder, FileName);
File.Copy(FilePath, BackupFilePath);
}
}
This turnaround works perfectly, when a configuration file is "empty".
However, sometimes when I turn off/on the machine, both the configuration file and it's backup file were empty.
I also managed once to obtain an empty configuration file on machine restart even if the power off happened while my code wasn't running.
At this point, I don't know if my issue is related to the power off/on or the way I read/write my files:
Why does it happen when the computer is shut down / turned on ?
Why does it affect only my JSON configuration files ?
Why does it empty the files and not corrupt them ?
Why does it happen even if the file is not open in my program ?
Thank you very much for your time.

Looking at the source for File.WriteAllText(), it seems that your data could be the victim of buffering (seems to be a 1K buffer size). If you want to guarantee immediate writing to disk, you'll need your own method:
using (Stream stream = File.Create(yourPath, 64 * 1024, FileOptions.WriteThrough))
using (TextWriter textWriter = new StreamWriter(stream))
{
textWriter.Write(jsonData);
}

Non-authoritative answer, but googling "non-atomic writes windows" I stumble across a really interesting article that suggests what you're experiencing is reasonably normal even on NTFS: https://blogs.msdn.microsoft.com/adioltean/2005/12/28/how-to-do-atomic-writes-in-a-file/
If I've understood correctly, then for your use-case what it recommends you do, is:
Do your writes (your JSON config file write) to a temporary file
(if power fails here, you've just lost this round of changes, the original file is fine)
"Flush the writes" (not sure what the right way to do that is, in your environment, but this question explores exactly that: How to ensure all data has been physically written to disk? ), or do the write with FileOptions.WriteThrough as outlined by #JesseC.Slicer
(if power fails here, you've just lost this round of changes, the original file is fine)
Rename the original file to an "I know I'm doing something dangerous" naming format, eg with a specific suffix
(if power fails here, you don't have a main config file, you've lost this round of changes, but you can still find the backup)
Rename the temporary file to the final/original name
(if power fails here, you have a main updated config file AND a redundant outdated "temporarily renamed" file)
Delete the temporarily renamed file
All this of course assumes you're able to ensure the temp file is fully written before you start renaming things. If you've managed that, then at startup your process would be something like:
If a "temporarily renamed" file is found, then either delete it (if there is also a "main file"), or rename it to the main file name
Load the main file (should never be corrupted)

Related

Create and open as read-only a temporary copy of an existing file and delete after use

I've scoured for information, but I just fear I may be getting in over my head here as I am not proficient in multi-threading. I have desktop app that needs to create a read-only, temp copy of an existing file, open the file in it's default application and then delete the file once the user is done viewing it.
It must open read only as the user may try and save it thinking it's the original file.
To do this I have created a new thread which copies the file to a temp path, set's the files attributes, attaches a Process handler to it and then "waits" and deletes the file on exit. The advantage of this is that the thread will continue to run even after the program has exited (so it seems anyway). This way the file will still delete even if the user keeps it open longer than the program.
Here is my code. The att object holds my file information.
new Thread(() =>
{
//Create the temp file name
string temp = System.IO.Path.GetTempPath() + att.FileNameWithExtension;
//Determine if this file already exists (in case it didn't delete)
//This is important as setting it readonly will create User Access Control (UAC) issues for overwritting
//if the read only attribute exists
if (File.Exists(temp)) { File.SetAttributes(temp, FileAttributes.Temporary ); }
//Copy original file to temp location. Overwrite if it already exists due to previous deletion failure
File.Copy(att.FullFileName, temp, true);
//Set temp file attributes
File.SetAttributes(temp, FileAttributes.Temporary | FileAttributes.ReadOnly);
//Start process and monitor
var p = Process.Start(temp);//Open attachment in default program
if (p != null) { p.WaitForExit(); }
//After process ends remove readonly attribute to allow deletion without causing UAC issues
File.SetAttributes(temp, FileAttributes.Temporary);
File.Delete(temp);
}
).Start();
I've tested it and so far it seems to be doing the job, but it all feels so messy. I honestly feel like there should be an easier way to handle this that doesn't involve the need to creating new threads. If looked into copying files into memory first, but I can't seem to figure out how to open them in their default application from a MemoryStream.
So my question is.
Is there a better way to achieve opening a readonly, temp copy of a file that doesn’t write to disk first?
If not, what implications could I face from taking the mutlithreaded approach?
Any info is appreciated.
Instead of removing the temporary file(s) on shutdown, remove the 'left over' files at startup.
This is often easier to implement than trying to ensure that such cleanup code runs at process termination and handles those 'forced' cases like power fail, 'kill -9', 'End process' etc.
I like to create a 'temp' folder for such files: all of my apps scan and delete any files in such a folder at startup and the code can just be added to any new project without change.

Best practise for using Directory.GetFiles() or EnumerateFiles with a target directory that contains locked files?

Currently I try to improve the design of two windows services (C#).
Service A produces data exports (csv files) and writes them to a temporary directory.
So the file is written to a temporary directory that is a sub dir. of the main output directory.
Then the file is moved (via File.Move) to the output directory (after a successful write).
This export may be performed by multiple threads.
Another service B tries to fetch the files from this output directory in a defined interval.
How to assure that Directory.GetFiles() excludes locked files.
Should I try to check every file by creating a new FileStream (using
(Stream stream = new FileStream("MyFilename.txt", FileMode.Open)) as
described
here.
Or should the producer service (A) use temporary file names (*.csv.tmp) that are
automatically excluded by the consumer serivce (B) with appropriate search pattterns. And rename a file after the move was finished.
Are there better ways to handle such file listing operations.
Don't bother checking!
Huh? How can that be?
If the files are on the same drive, a Move operation is atomic! The operation is effectively a rename, erasing the directory entry from the previous, and inserting it into the next directory, pointing to the same sectors (or whatevers) where the data really are, without rewriting it. The file system's internal locking mechanism has to lock & block directory reads during this process to prevent a directory scan from returning corrupt results.
That means, by the time it ever shows up in a directory, it won't be locked; in fact, the file won't have been opened/modified since the close operation that wrote it to the previous directory.
caveats - (1) definitely won't work between drives, partitions, or other media mounted as a subdirectory. The OS does a copy+delete behind the scenes instead of a directory entry edit. (2) this behaviour is a convention, not a rule. Though I've never seen it, file systems are free to break it, and even to break it inconsistently!
So this will probably work. If it doesn't, I'd recommend using your own idea of temp extensions (I've done it before for this exact purpose, but between a client and server that only could talk by communicating via a shared drive) and it's not that hard and worked flawlessly.
If your own idea is too low-tech, and you're on the same machine (sounds like you are), you can set a mutex (google that), with the filename embedded, that lives while the file is being written, in the writer process; then do a blocking test on it when you open each file you are reading from the other process. If you want the second process to respond ASAP combine this with the filesystem watcher. Then pat yourself on the back for spending ten times the effort as the temp filename idea, with no extra gain >:-}
good luck!
One way would be to mark the files as temporary from the writing app whilst they're in use, and only unmark them once they are written to and closed, eg.
FileStream f = File.Create (filename);
FileAttributes attr = File.GetAttributes (filename);
File.SetAttributes (filename, attr | FileAttributes.Temporary);
//write to file.
f.Close ();
File.SetAttributes (filename, attr);
From the consuming app, you just want to skip any temporary files.
foreach (var file in Directory.GetFiles (Path.GetDirectoryName (filename))) {
if ((File.GetAttributes (file) & FileAttributes.Temporary) != 0) continue;
// do normal stuff.
}

System.IO.File.Delete() / System.IO.File.Move() sometimes does not work

A Winforms program needs to save some run time information to an XML file. The file can sometimes be a couple of hundred kilobytes in size. During beta testing we found some users would not hesitate to terminate processes seemingly at random and occasionally causing the file to be half written and therefore corrupted.
As such, we changed the algorithm to save to a temp file and then to delete the real file and do a move.
Our code currently looks like this..
private void Save()
{
XmlTextWriter streamWriter = null;
try
{
streamWriter = new XmlTextWriter(xmlTempFilePath, System.Text.Encoding.UTF8);
XmlSerializer xmlSerializer = new XmlSerializer(typeof(MyCollection));
xmlSerializer.Serialize(streamWriter, myCollection);
if (streamWriter != null)
streamWriter.Close();
// Delete the original file
System.IO.File.Delete(xmlFilePath);
// Do a move over the top of the original file
System.IO.File.Move(xmlTempFilePath, xmlFilePath);
}
catch (System.Exception ex)
{
throw new InvalidOperationException("Could not save the xml file.", ex);
}
finally
{
if (streamWriter != null)
streamWriter.Close();
}
}
This works in the lab and in production almost all of the time. The program is running on 12 computers and this code is called on average once every 5 min. About once or twice a day we get this exception:
System.InvalidOperationException:
Could not save the xml file.
---> System.IO.IOException: Cannot create a file when that file already exists.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.__Error.WinIOError()
at System.IO.File.Move(String sourceFileName, String destFileName)
at MyApp.MyNamespace.InternalSave()
It is as if the Delete is not actually issued to the hard drive before the Move is issued.
This is happening on Win7 machines.
A couple of questions: Is there some concept of a Flush() one can do for the entire disk operating system? Is this a bug with my code, .net, the OS or something else? Should I be putting in some Thread.Sleep(x)? Maybe I should do a File.Copy(src, dest, true)? Should I write the following code? (But it looks pretty silly.)
while (System.IO.File.Exists(xmlFilePath))
{
System.IO.File.Delete(xmlFilePath);
}
// Do a move over the top of the main file
bool done = false;
while (!done)
{
try
{
System.IO.File.Move(xmlTempFilePath, xmlFilePath);
done = true;
}
catch (System.IO.IOException)
{
// let it loop
}
}
Has anyone seen this before?
You can never assume that you can delete a file and remove it on a multi-user multi-tasking operating system. Short from another app or the user herself having an interest in the file, you've also got services running that are interested in files. A virus scanner and a search indexer are classic trouble makers.
Such programs open a file and try to minimize the impact that has by specifying delete share access. That's available in .NET as well, it is the FileShare.Delete option. With that option in place, Windows allows a process to delete the file, even though it is opened. It gets internally marked as "delete pending". The file does not actually get removed from the file system, it is still there after the File.Delete call. Anybody that tries to open the file after that gets an access denied error. The file doesn't actually get removed until the last handle to the file object gets closed.
You can probably see where this is heading, this explains why File.Delete succeeded but File.Move failed. What you need to do is File.Move the file first so it has a different name. Then rename the new file, then delete the original. Very first thing you do is delete a possible stray copy with the renamed name, it might have been left behind by a power failure.
Summarizing:
Create file.new
Delete file.tmp
Rename file.xml to file.tmp
Rename file.new to file.xml
Delete file.tmp
Failure of step 5 is not critical.
how about using Move.Copy with overwrite set to true so that it overwrites your app state and then you can delete your temp state ?
You can also attach to App_Exit event and try to perform clean shut down?
If multiple threads in this application can call Save, or if multiple instances of the program are attempting to update the same file (on e.g. a network share), you can get a race condition such that the file doesn't exist when both threads/processes attempt the delete, then one succeeds in performing its Move (or the Move is in progress), when the second attempts to use the same filename and that second Move will fail.
As anvarbek raupov says, you can use File.Copy(String, String, Boolean) to allow overwriting to occur (so no longer need the delete), but this means that the last updater wins - you need to consider if this is what you want (especially in multi-threaded scenarios, where the last updater may, if you're unlucky, have been working with older state).
Safer would be to force each thread/process to use separate files, or implement some form of file system based locking (e.g. create another file to announce "I'm working on this file", update the main file, then delete this lock file).
As Hans said, a workaround is to move the file first and THEN delete it.
That said with Transactional NTFS' delete, I haven't been able to reproduce the error described. Check out github.com/haf/Castle.Transactions and the corresponding nuget packages... 2.5 is well tested and documented for file transactions.
When testing with non transacted file systems, the unit-test code for this project's unit tests always does moves before deletes.
3.0 is currently in pre-alpha state but will integrate regular transactions with file io with transactional files to a much higher level.

Can I tell if another process is in the process of creating a file?

I'm writing a Windows service to process files created by another process over which I have no control. These files could potentially be very large (hundreds of megabytes).
I need to process and then delete the files after they've been created.
All the files will be written to a particular directory (by just a straight file copy as far as I'm aware), so I can just periodically iterate over the files in that directory, process them and then delete them.
What I'm worried about is what happens if my service queries the directory during the writing of a large file? Will the file show up to my service? Will it be locked so that I can't get read access? Do I need to do anything special to check whether the file has finished copying, or can I just query File.Exists() or try to Open it with FileAccess.Read. How does Windows mark a file that is in the process of being copied?
If this was plain win32 you would try to open the file, with CreateFile(), and a share mode that denied write access to others. That would have to fail if the other program was still writing the file since you can't deny write access when the file is already opened with write access. If it succeeds, you know that the other process has finished.
In .net you could, for example, create a FileStream using one of the constructors that receives a FileShare parameter. This will ultimately map down to the underlying CreateFile() API.
AFAIK there is no special mark in a file to indicate that it is being copied, other than it will have a write lock. It is standard practice in this situation to try to open the file yourself with a write lock (e.g., FileShare.Read), and to catch any IOException that occurs because the file is already locked; in which case, pause for a bit (Thread.Sleep) before retrying the file open. You may want to limit the number of retries (to prevent an infinite loop in case the existing file lock is never released).
You say you want to process the files and then delete them? To avoid a race with another process/thread writing to the same file whilst you are processing it/deleting it, you should think of your processing/deleting as an atomic operation, e.g., something like this:
string sourcePath = #"C:\temp1\temp.txt";
string targetPath = #"C:\temp2\temp.txt";
int attempt = 0;
const int maxAttempts = 3;
bool moved = false;
do
{
try
{
File.Move(sourcePath, targetPath);
moved = true;
}
catch (IOException)
{
if (attempt < maxAttempts)
{
System.Threading.Thread.Sleep(1000);
attempt++;
}
}
} while (!moved && attempt < maxAttempts);
if (moved)
{
ProcessFile(targetPath);
File.Delete(targetPath);
}
else
{
throw new InvalidOperationException("Unable to process '" + sourcePath + "'.");
}
Edit: I see you say the files could be very large, so you shouldn't use File.ReadAllText. You could instead try to move the files to another directory - this will throw an exception is the file is still locked by the other process. You only process the file if you successfully move it. This also has the benefit of removing the file from the input directory.
Write the file with a temporary filename and then rename the file.
The renaming is a atomary process, so your service which process the files should be ok. Just make sure that the service skips the temporary filenames.

Reading file right after it has been written I get all zeros (.net)

I have a program that needs to load files from a directory as soon as they are written. I got the FileSystemWatcher to notify me of changes to the directory. Rather than check the event for what changed I just list the files and start processing all that I found.
To prevent trying to read a file which is still being written I have code like:
try {
fs = fi.Open(FileMode.Open, FileAccess.ReadWrite,
FileShare.None);
message = new byte[fs.Length];
int br = fs.Read(message, 0, (int)fi.Length);
}
catch (Exception e) {
// I'll get it next time around
return;
}
finally {
if (fs != null)
fs.Close();
}
The problem is that for some files, about 1 in 200, the program read all zeros. File length is correct but the content appear to be all zero bytes. When I check the file latter I find it does contain actual correct data. I thought the way I was opening the file would prevent premature access to the file.
I'm testing this by copying files to the directory with the DOS command 'copy InFile_0* dropdir' (About 100 files per execution.) Possibly this command does the copy in two steps: 1) allocate space and 2) fill space and my program occasionally jumps in the middle of the two.
Any ideas on how to code this to be reliable?
Update: I don't have control over the writing program - it could be anything. Looks like I have to code defensively.
You're hitting a race condition. It's only going to get worse from here (with networked file systems etc) unless you fix it definitively.
Try having the program writing the files write each one using a "whatever.tmp" name, then close it, then rename it. When reading, ignore .tmp files.
Or, keep a zero-length file named "sentinel" or some such in the directory. Get the program writing the files to rewrite the sentinel file AFTER each successful write of another file. Then, don't attempt to read files whose modification date/times are >= the modification date/time of the sentinel file.
Or, if you have no control over the writer of the files, check each file's modification date/time against the current system date/time. Let the files age an appropriate amount (a few seconds if they're small, longer if they're larger) before attempting to read them.
Good luck. This is a notorious pain in the neck.
Well, I disagree with previous post by #Ollie Jones.
You already established exclusive access to the file, so no race condition problem.
I think you should examine the writer's behavior more carefully. And try reduce the interference on the file access with read only, share all access:
fi.Open(FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
This could fail your read but will reduce write error. To decide when to read safely, you could check on file time or file size or whatever. If many files will be written subsequently you could start read the first file after second file created.

Categories