Best practice for writing big files

Best practice for writing big files - c#

I need to write a big file in my project.
What I learned:
I should NOT write the big file directly to the destination path,
because this may leave a incomplete file in case the app crash while writing it.
Instead, I should write to a temporary file and move (rename) it. (called atomic file operation)
My code snippet:
[NotNull]
public static async Task WriteAllTextAsync([NotNull] string path, [NotNull] string content)
{
string temporaryFilePath = null;
try {
temporaryFilePath = Path.GetTempFileName();
using (var stream = new StreamWriter(temporaryFilePath, true)) {
await stream.WriteAsync(content).ConfigureAwait(false);
}
File.Delete(path);
File.Move(temporaryFilePath, path);
}
finally {
if (temporaryFilePath != null) File.Delete(temporaryFilePath);
}
}
My Question:
The file will be missing if the app crashes between File.Delete and File.Move. Can I avoid this?
Is there any other best practice for writing big files?
Is there any suggestion on my code?

The file will be missing if the app crashes between File.Delete and File.Move. Can I avoid this?
Not that I'm aware of, but you can detect it - and if you use a more predictable filename, you can recover from that. It helps if you tweak the process somewhat to use three file names: the target, a "new" file and an "old" file. The process becomes:
Write to "new" file (e.g. foo.txt.new)
Rename the target file to the "old" file (e.g. foo.txt.old)
Rename the "new" file to the target file
Delete the "old" file
You then have three files, each of which may be present or absent. That can help you to detect the situation when you come to read the new file:
No files: Nothing's written data yet
Just target: All is well
Target and new: App crashed while writing new file
Target and old: App failed to delete old file
New and old: App failed after the first rename, but before the second
All three, or just old, or just new: Something very odd is going on! User may have interfered
Note: I was unaware of File.Replace before, but I suspect it's effectively just a simpler and possibly more efficient way of doing the code you're already doing. (That's great - use it!) The recovery process would still be the same though.

You can use File.Replace instead of deleting and moving files. In case of hard fault (electricity cut or something like this) you will always lost data, you have to count with that.

Related

Include source code for library in that library

I've a plugin based web app that allows an administrator to assign various small pieces of functionality (the actual functionality is unimportant here) to users.
This functionality is configurable and an administrator having an understanding of the plugins is important. The administrator is technical enough to be able to read the very simple source code for these plugins. (Mostly just arithmetic).
I'm aware there are a couple of questions already about accessing source code from within a DLL built from that source:
How to include source code in dll? for example.
I've played with getting the .cs files into a /Resources folder. However doing this with a pre-build event obviously don't include these files in the project. So VS never copies them and I'm unable to access them on the Resources object.
Is there a way to reference the source for a particular class... from the same assembly that I'm missing? Or alternatively a way to extract COMPLETE source from the pdb for that assembly? I'm quite happy to deploy detailed PDB files. There's no security risk for this portion of the solution.
As I've access to the source code, I don't want to go about decompiling it to display it. This seems wasteful and overly complicated.

The source code isn't included in the DLLs, and it isn't in the PDBs either (PDBs only contain a link between the addresses in the DLL and the corresponding lines of code in the sources, as well as other trivia like variable names).
A pre-build event is a possible solution - just make sure that it produces a single file that's included in the project. A simple zip archive should work well enough, and it's easy to decompress when you need to access the source. Text compresses very well, so it might make sense to compress it anyway. If you don't want to use zip, anything else will do fine as well - an XML file, for example. It might even give you the benefit of using something like Roslyn to provide syntax highlighting with all the necessary context.
Decompilation isn't necessarily a terrible approach. You're trading memory for CPU, basically. You'll lose comments, but that shouldn't be a problem if your code is very simple. Method arguments keep their names, but locals don't - you'd need the PDBs for that, which is a massive overkill. It actually does depend a lot on the kind of calculations you're doing. For most cases, it probably isn't the best solution, though.
A bit roundabout way of handling this would be a multi-file T4 template. Basically, you'd produce as many files as there are source code files, and have them be embedded resources. I'm not sure how simple this is, and I'm sure not going to include the code :D
Another (a bit weird) option is to use file links. Simply have the source code files in the project as usual, but also make a separate folder where the same files will be added using "Add as link". The content will be shared with the actual source code, but you can specify a different build action - namely, Embedded Resource. This requires a (tiny) bit of manual work when adding or moving files, but it's relatively simple. If needed, this could also be automated, though that sounds like an overkill.
The cleanest option I can think of is adding a new build action - Compile + Embed. This requires you to add a build target file to your project, but that's actually quite simple. The target file is just an XML file, and then you just manually edit your SLN/CSPROJ file to include that target in the build, and you're good to go. The tricky part is that you'll also need to force the Microsoft.CSharp.Core.target to use your Compile + Embed action to be used as both the source code and the embedded resource. This is of course easily done by manually changing that target file, but that's a terrible way of handling that. I'm not sure what the best way of doing that is, though. Maybe there's a way to redefine #(Compile) to mean #(Compile;MyCompileAndEmbed)? I know it's possible with the usual property groups, but I'm not sure if something like this can be done with the "lists".

Taking from #Luaan's suggestion of using a pre-build step to create a single Zipped folder I created a basic console app to package the source files into a zip file at a specific location.
static void Main(string[] args)
{
Console.WriteLine("Takes a folder of .cs files and flattens and compacts them into a .zip." +
"Arg 1 : Source Folder to be resursively searched" +
"Arg 2 : Destination zip file" +
"Arg 3 : Semicolon List of folders to ignore");
if (args[0] == null || args[1] == null)
{
Console.Write("Args 1 or 2 missing");
return;
};
string SourcePath = args[0];
string ZipDestination = args[1];
List<String> ignoreFolders = new List<string>();
if (args[2] != null)
{
ignoreFolders = args[2].Split(';').ToList();
}
var files = DirSearch(SourcePath, "*.cs", ignoreFolders);
Console.WriteLine($"{files.Count} files found to zip");
if (File.Exists(ZipDestination))
{
Console.WriteLine("Destination exists. Deleting zip file first");
File.Delete(ZipDestination);
}
int zippedCount = 0;
using (FileStream zipToOpen = new FileStream(ZipDestination, FileMode.OpenOrCreate))
{
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Create))
{
foreach (var filePath in files)
{
Console.WriteLine($"Writing {Path.GetFileName(filePath)} to zip {Path.GetFileName(ZipDestination)}");
archive.CreateEntryFromFile(filePath, Path.GetFileName(filePath));
zippedCount++;
}
}
}
Console.WriteLine($"Zipped {zippedCount} files;");
}
static List<String> DirSearch(string sDir, string filePattern, List<String> excludeDirectories)
{
List<String> filePaths = new List<string>();
foreach (string d in Directory.GetDirectories(sDir))
{
if (excludeDirectories.Any(ed => ed.ToLower() == d.ToLower()))
{
continue;
}
foreach (string f in Directory.GetFiles(d, filePattern))
{
filePaths.Add(f);
}
filePaths.AddRange(DirSearch(d, filePattern, excludeDirectories));
}
return filePaths;
}
Takes 3 parameters for source dir, output zip file and a ";" separated list of paths to exclude. I've just built this as a binary. Committed it to source control for simplicity and included it in the pre-build for projects I want the source for.
No error checking really and I'm certain it will fail for missing args. But if anyone wants it. Here it is! Again Thanks to #Luaan for clarifying PDBs aren't all that useful!

If no text to write via StreamWriter, then discard (do not create) the file

I am using the StreamWriter to create a file and to write some text to that file. In some cases I have no text to write via StreamWriter, but the file was already created when StreamWriter was initialized.
using (StreamWriter sw = new StreamWriter(#"C:\FileCreated.txt"))
{
}
Currently I am using the following code, when StreamWriter is closed, to check if the FileCreated.txt content is empty, if it is delete it. I am wondering if there is a more elegant approach than this (an option within StreamWriter perhaps)?
if (File.Exists(#"C:\FileCreated.txt"))
{
if (new FileInfo(#"C:\FileCreated.txt").Length == 0)
{
File.Delete(#"C:\FileCreated.txt");
}
}
By the way, I must open a stream to write before I can check if there is any text because of some other logic in the code.

If you want to take input from the user bit by bit, you can make your source a StringBuilder, and then just commit to disk when you're done
StringBuilder SB = new StringBuilder();
...
SB.AppendLine("text");
...
if(SB.Length > 0)
File.WriteAllLines(SB.ToString());

Delaying opening the file until the first output would solve this problem, but it might create a new one (if there's a permission error creating the file, you won't find out until later, maybe when the operator is no longer at the computer).
Your current approach is decent. I don't see the need to test File.Exists, though, if you just closed a stream to it. Also consider the race condition:
You find that the file is zero-length
Another process writes to the file
You delete the file
Also consider that you might have permission to create a file, and not to delete it afterwards!
Doing this correctly requires using the raw Win32 API, as I described in a previous answer. Do note that a .NET stream could be used for the first file handle, as long as you specify the equivalent of FILE_SHARE_WRITE.

Revisit your assumptions, i.e. that you must open the stream before checking for content. Simply reorganize your logic.

UnauthorizedAccessException StreamWriter

I have the following code:
public WriteToFile(string path, List<string> text)
{
File.Delete(path);
using (TextWriter writer = new StreamWriter(path, true))
{
foreach(string t in text)
{
writer.WriteLine(text);
}
}
}
Most of the time it works fine, the file is deleted and then created again with the text inside. However every so often the using statement throws an UnauthorizedAccessException. Any idea why? I have admin rights and the program is run as admin.

This is normal, it became undiagnosable because you used File.Delete(). Which is unnecessary, just use the StreamWriter(string) constructor.
This goes wrong because deleting a file doesn't provide a guarantee that the file will actually be deleted. It may be opened by another process. Which has opened the file with delete sharing, programs like virus scanners and file indexers commonly do this. Which makes the Delete() call succeed but the file doesn't disappear until all handles on the file are closed. You got the UnauthorizedAccessException exception because the file didn't get deleted yet.
Get ahead by removing the File.Delete() call. You still need to assume that the StreamReader() constructor can fail. Less often, it is bound to happen sooner or later. You'll get a better exception message. Such are the vagaries of a multi-tasking operating system.

Why isn't this working?

I am trying to just write an array of strings to a file, which SHOULD normally be an easy thing to do. However the following trivial code is throwing an IOException saying that the file is in use by another process. The problem is, the file doesn't even exist until this code is run. And I can guarantee you that there is no other process using the file. So how do I convince the stupid .NET framework that the file is not in use by another process and that it is okay to continue? Because this really shouldn't be that hard.
StreamWriter writer = new StreamWriter(ListFileName);
foreach (string s in InfoLineList)
{
writer.WriteLine(s);
}

This might be because you're not closing the stream when you're done with it, so some handle is getting stuck open somewhere. Perhaps the code is part of a web app, and the web server process keeps that lock around, or the code is being run multiple times. I'd recommend using the stream in a using block:
using(StreamWriter writer = new StreamWriter(ListFileName))
{
foreach (string s in InfoLineList)
{
writer.WriteLine(s);
}
}
This will make sure the StreamWriter is disposed of properly.
If you really want to know what has the file open, use SysInternal's Handle tool to check. I'd be willing to bet it's your own program.
Finally, as I said in my comments, the File.WriteAllLines() method can write an enumerable list of strings to a file all at once:
File.WriteAllLines(ListFileName, InfoListList);

how to check if a file is being accessed by any other processs and release it?

I am trying to delete/open/edit some files in my C# .Net application.Sometimes i get exception stating the file/directory is being accessed by another process.Is there a way to check if a file/directory is being accessed by process and try to release the file from that process?

No. The only way to do this is to try to access the file, and handle the IOException.
Realistically this is the only safe way anyway. Suppose there was a IsFileInUse() method, and you called it, and it returned "nope, nobody's using that file," and you went ahead and accessed the file. The problem is that in the meantime some other process might have locked or deleted the file. So you'd need to put exception handling around your attempt to access the file anyway. The "test by acquiring" model is the only one that is 100% reliable.
If a file is in use by another process, .NET doesn't provide a way of determining which other process that might be. I believe this would require some pretty low-level unmanaged code though I could be wrong. It is a very low-level operation, if it is possible at all, to "release the file from that process" because that would violate the other process' expectations -- e.g. it thinks it is allowed to write to the file but you have deleted the file and garbaged the handle. I believe you would need to terminate the other process if it's not willing to give up its lock voluntarily.

First, I suppose there are 2 things that may help you:
consider using FileAccess and FileShare flags when opening files
if data from the file is needed only withing the scope of the function use the construction
using(FileStream stream = File.Open(...)) { <file operations> }
this will ensure that file is closed immediately after exiting 'using' block, and not when FileStream object is collected by GC.
Second, there is an unsafe way to get processes that use the file. It is based on debugging features provided by windows. The main idea is to get all system handles and iterate through them to find which are the files handle and additional information. This is done using functions that I'm not sure are documented. If you are interested use google to find more information, but I do not think it is not a good way.

public bool IsInUse(string path)
{
bool IsFree = true;
try
{
//Just opening the file as open/create
using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate))
{
//we can check by using
fs.CanRead // or
fs.CanWrite
}
}
catch (IOException ex)
{
IsFree = false;
}
return IsFree;
}
string path = "D:\\test.doc";
bool IsFileFree = IsInUse(path);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.