C# - How to copy a file being written

C# - How to copy a file being written - c#

I have a class implementing a Log file writer.
Logs must be written for the application to "work correctly", so it is of the utmost importance that the writings to disk are ok.
The log file is kept open for the whole life of the application, and write operations are accordingly very fast:
var logFile = new FileInfo(filepath);
_outputStream = logFile.Open(FileMode.Append, FileAccess.Write, FileShare.Read);
Now, I need to synchronize this file to a network path, during application lifetime.
This network copy can be slightly delayed without problems. The important bit is that I have to guarantee that it doesn't interfere with log writing.
Given this network copy must be eventually consistent, I need to make sure that all file contents are written, instead only the last message(s).
A previous implementation used heavy locking and a simple System.IO.File.Copy(filepath, networkPath, true), but I would like to lock as little as possible.
How could I approach this problem? I'm out of ideas.

Related

How to handle properly temporary files?

Problem:
I have a web api which expose a method UploadFile, which will upload a file from a client to a specific directory of the server. The piece of code that handle the request and do the upload is the following:
var boundary = MultipartRequestHelper.GetBoundary(MediaTypeHeaderValue.Parse(Request.ContentType), _defaultFormOptions.MultipartBoundaryLengthLimit);
var reader = new MultipartReader(boundary, HttpContext.Request.Body);
try
{
// Read the form data.
var section = await reader.ReadNextSectionAsync();
// This illustrates how to get the file names.
while (section != null)
{
var hasContentDispositionHeader = ContentDispositionHeaderValue.TryParse(section.ContentDisposition, out ContentDispositionHeaderValue contentDisposition);
if (hasContentDispositionHeader)
{
if (MultipartRequestHelper.HasFileContentDisposition(contentDisposition))
{
targetFilePath = Path.Combine(root, contentDisposition.FileName.ToString());
using (var targetStream = System.IO.File.Create(targetFilePath))
{
await section.Body.CopyToAsync(targetStream);
//_logger.LogInformation($"Copied the uploaded file '{targetFilePath}'");
}
}
I always calledthis method using the following statement:
bool res = await importClient.UploadFileAsync(filePath);
where UploadFileAsync (which is on the client) build the request in this way:
var requestContent = new MultipartFormDataContent();
var array = File.ReadAllBytes(filePath);
var fileContent = new ByteArrayContent(array);
fileContent.Headers.ContentType = MediaTypeHeaderValue.Parse("application/octet-stream");
requestContent.Add(fileContent, "file", Path.GetFileName(filePath));
As you can see, this method expect a file name/path to work, this means that the file must "exist" somewhere in the client machine. I've used this method without any problem until now. I have a very specific case in which i need to upload something needed on the server that the user don't want to save on his client.
Possible solutions:
The first thing that i thought was to manually create a file in the client, and after the upload delete it. However I'm not very happy with this solution cause i need to handle everything manually
I can use the System.IO.Path.GetTempFileName() method, which will create a file in the temporary directory, but i'm not quite sure how the cancellation of the files is handled
I can use the TempFileCollection, but it seems more or less a mix of the previous point. I can technically create this collection in a using statement to get rid of it when the upload is done
I'm inexperienced about these topics, so I'm not sure which solution could fit best this scenario
My requirements are that i need to be 100% sure that the file is deleted after the upload is done, and i would like the solution to be "async friendly", i.e. i need the whole process to keep going without problems.
EDIT: I see a little bit of confusion. My problem is not how to handle the files on the server. That part is not a problem. I need to handle "temporary" files on the client.

Once you write something on the disk you can't be 100% that you will able to delete it. Moreover, even if you delete the file, you can't be sure that file can't be recovered.
So you have to ask why I need to delete the file. If it contains some secret, keep it in memory. If you can't fit the file into memory, write it encrypted on the disk and keep only key in the memory.
If you relax 100% to 99%, I would go for creating a file with Path.GetTempFileName and deleting it in finally block.
If 99% is not enough but 99.98% is, I would store names of created temporary files in persistent storage and regularly check if they are deleted.

For completition i'm writing the solution i used based on the suggestions i received here. Also the filename written as i did grant that statistically you won't have 2 temporary file with the same name
try
{
string file = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".xml";
tempFile = Path.GetFileName(file);
using (FileStream fs = new FileStream(file, FileMode.Create, FileAccess.Write, FileShare.None))
{
XmlSerializer serializer = new XmlSerializer(typeof(FileTemplate));
serializer.Serialize(fs, w.Template);
}
}
catch(Exception ex)
{
logger.Error(ex.Message);
//...
}
finally
{
//.... do stuff
File.Delete(tempFile );
}

You clearly shouldn't be using a file, in fact you don't want your data to ever leave RAM. You need to use "secure" memory storage so that the data is "guaranteed" to be pinned to physical RAM, untouched by the garbage collector, "never" paged out to swap. I use the quotes, because all those terms are somewhat misleading: the implementation isn't secure in absolute sense, it's just more secure than writing stuff to a disk file. Absolute security is impossible.
There are no common mechanisms that guarantee deletion of anything: the machine could "die" at any point between the writing of the data to the file, and whatever deletion operation you'd use to wipe the file "clean". Then you have no guarantee that e.g. the SSD or the hard drive won't duplicate the data should e.g. a sector become bad and need to be reallocated. You seem to wish to deal with several layers of underdocumented and complex (and often subtly buggy) layers of software when you talk about files:
The firmware in the storage device controller.
The device driver for the storage device.
The virtual memory system.
The filesystem driver.
The virtual filesystem layer (present in most OSes).
The .net runtime (and possibly the C runtime, depending on implementation).
By using a file you're making a bet that all those layers will do exactly what you want them to do. That won't usually be the case unless you tightly control all of these layers (e.g. you deploy a purpose-made linux distribution that you audit, and you use your own flash storage firmware or use linux memory technology driver that you'd audit too).
Instead, you can limit your exposure to just the VM system and the runtime. See e.g. this answer; it's easy to use:
using (var secret = new SecureArray<byte>(secretLength))
{
DoSomethingSecret(secret.Buffer);
}
SecureArray makes it likely that secret.Buffer stays in RAM - but you should audit that code as well, since, after all, you need it to do what it does, with your reputation possibly at stake, or legal liability, etc.
A simple test that can give you some peace of mind would involve a small test application that writes a short pseudorandom sequence to secret.Buffer, and then sleeps. Let this run in the background for a few days as you use your computer, then forcibly power it down (on a desktop: turn the on-off switch on the power supply to "off" position). Then boot up from a linux live CD, and run a search for some chunk of the pseudorandom sequence on the raw disk device. The expected outcome is that no identifiable part of the sequence has leaked to disk (say nothing larger than 48-64 bits). Even then you can't be totally sure, but this will thwart the majority of attempts at recovering the information...
...until someone takes the customer's system, dumps liquid nitrogen on the RAM sticks, shuts down the power, then transfers RAM to a readout device you can put together for
...or until they get malware on the system where the software runs, and it helpfully streams out RAM contents over internet, because why not.
...or until someone injects their certificate into the trust root on just one client machine, and MITM-s all the data elsewhere on the client's network.
And so on. It's all a tradeoff: how sure you wish to be that the data doesn't leak? I suggest getting the exact requirements from the customer in writing, and they must agree that they understand that it's not possible to be completely sure.

Access file being written

We are planing to migrate a portal from one platform(A) to another(B) and for this, a utility is provided by the vendor to generate XML for A which will be used by us to migrate to B.
Now this utility has a bug that after generating the relevant XML, it doesn't terminate, rather it keeps on appending static junk nodes to it.
For this purpose, I am writing a C# utility to terminate the application when the XML starts getting junk nodes.
Can I access the file which is already being written as below and be assured that it won't error out
var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
Also, just to confirm, when a file is being written, the new content is appended to already existing file and it does not first flushes out everything and then write the updated content.(I'm almost sure that I am right on this).

It depends on whether the utility keeps the file open in write mode the whole time. It may process the data in-memory in chunks and then write to the file in chunks, but more likely it will just keep the file open for writing, keeping it locked. In that case, you cannot read from it until the utility releases the lock.

What about User Mode File system? For example this driver supports .NET extensions. The idea is to make a proxy for the file system driver that will intercept all write/flush operations from the app.

Lock file exclusively then delete/move it

I'm implementing a class in C# that is supposed to monitor a directory, process the files as they are dropped then delete (or move) the processed file as soon as processing is complete. Since there can be multiple threads running this code, the first one that picks up the file, locks it exclusively, so no other threads will read the same file and no external process or user can access in any way. I would like to keep the lock until the file is deleted/moved, so there's no risk of another thread/process/user accessing it.
So far, I tried 2 implementation options, but none of them works as I want.
Option 1
FileStream fs = file.Open(FileMode.Open, FileAccess.Read, FileShare.Delete);
//Read and process
File.Delete(file.FullName); //Or File.Move, based on a flag
fs.Close();
Option 2
FileStream fs = file.Open(FileMode.Open, FileAccess.Read, FileShare.None);
//Read and process
fs.Close();
File.Delete(file.FullName); //Or File.Move, based on a flag
The issue with Option 1 is that other processes can access the file (they can delete, move, rename) while it should be fully locked.
The issue with Option 2 is that the file is unlocked before being deleted, so other processes/threads can lock the file before the delete happens, so the delete will fail.
I was looking for some API that can perform the delete using the file handle I already have exclusive access.
Edit
The directory being monitored resides in a pub share, so other users and processes have access to it.
The issue is not managing the locks within my own process. The issue I'm trying to solve is how to lock a file exclusively then move/delete it without releasing the lock

Two solutions come to mind.
The first and simplest is to have the thread rename the file to something that the other threads won't touch. Something like "filename.dat.<unique number>", where <unique number> is something thread-specific. Then the thread can party on the file all it wants.
If two threads get the file at the same time, only one of them will be able to rename it. You'll have to handle the IOException that occurs in the other threads, but that shouldn't be a problem.
The other way is to have a single thread monitoring the directory and placing file names into a BlockingCollection. Worker threads take items from that queue and process them. Because only one thread can get that particular item from the queue, there is no contention.
The BlockingCollection solution is a little bit (but only a little bit) more complicated to set up, but should perform better than a solution that has multiple threads monitoring the same directory.
Edit
Your edited question changes the problem quite a bit. If you have a file in a publicly accessible directory, it's at risk of being viewed, modified, or deleted at any point between the time it's placed there and the time your thread locks it.
Since you can't move or delete a file while you have it open (not that I'm aware of), your best bet is to have the thread move the file to a directory that's not publicly accessible. Ideally to a directory that's locked down so that only the user under which your application runs has access. So your code becomes:
File.Move(sourceFilename, destFilename);
// the file is now in a presumably safe place.
// Assuming that all of your threads obey the rules,
// you have exclusive access by agreement.
Edit #2
Another possibility would be to open the file exclusively and copy it using your own copy loop, leaving the file open when the copy is done. Then you can rewind the file and do your processing. Something like:
var srcFile = File.Open(/* be sure to specify exclusive access */);
var destFile = File.OpenWrite(/* destination path */);
// copy the file
var buffer = new byte[32768];
int bytesRead = 0;
while ((bytesRead = srcFile.Read(buffer, 0, buffer.Length)) != 0)
{
destFile.Write(buffer, 0, bytesRead);
}
// close destination
destFile.Close();
// rewind source
srcFile.Seek(0, SeekOrigin.Start);
// now read from source to do your processing.
// for example, to get a StreamReader, just pass the srcFile stream to the constructor.
You can process and then copy, sometimes. It depends on if the stream stays open when you're finished processing. Typically, code does something like:
using (var strm = new StreamReader(srcStream, ...))
{
// do stuff here
}
That ends up closing the stream and the srcStream. You'd have to write your code like this:
using (var srcStream = new FileStream( /* exclusive access */))
{
var reader = new StreamReader(srcStream, ...);
// process the stream, leaving the reader open
// rewind srcStream
// copy srcStream to destination
// close reader
}
Doable, but clumsy.
Oh, and if you want to eliminate the potential of somebody reading the file before you can delete it, just truncate the file at 0 before you close it. As in:
srcStream.Seek(0, SeekOrigin.Begin);
srcStream.SetLength(0);
That way if somebody does get to it before you get around to deleting it, there's nothing to modify, etc.

Here is the most robust way I know of that will even work correctly if you have multiple processes on multiple servers working with these files.
Instead of locking the files themselves, create a temporary file for locking, this way you can unlock/move/delete the original file without problems, but still be sure that at least any copies of your code running on any server/thread/process will not try to work with the file at the same time.
Psuedo code:
try
{
// get an exclusive cross-server/process/thread lock by opening/creating a temp file with no sharing allowed
var lockFilePath = $"{file}.lck";
var lockFile = File.Open(lockFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
try
{
// open file itself with no sharing allowed, in case some process that does not use our locking schema is trying to use it
var fileHandle = File.Open(file, FileMode.Open, FileAccess.Read, FileShare.None);
// TODO: add processing -- we have exclusive access to the file, and also the locking file
fileHandle.Close();
// at this point it is possible for some other process that does not use our locking schema to lock the file before we
// move it, causing us to process this file again -- we would always have to handle issues where we failed to move
// the file anyway (maybe we just lost power, or crashed?) so we had to design around this no matter what
File.Move(file, archiveDestination);
}
finally
{
lockFile.Close();
try
{
File.Delete(lockFilePath);
}
catch (Exception ex)
{
// another process opened locked file after we closed it, before it was deleted -- safely ignore, other process will delete lock file
}
}
}
catch (Exception ex)
{
// another process already has exclusive access to the lock file, we don't need to do anything
// or we failed while processing, in which case we did not move the file so it will be tried again by this process or another
}
One nice thing about this pattern is it can also be used for times when locking is supported by the file storage. For example, if you were trying to process files on an FTP/SFTP server, you could make your temporary locking files use a normal drive (or SMB share) -- since the locking files do not have to be in the same location as the files themselves.
I can't take credit for the idea, it's been around longer than the PC, and used by plenty of apps like Microsoft Word, Excel, Access, and most older database systems. Read: well tested.

The file system itself is volatile in nature so it's very difficult to try and do what you want. This is a classic race condition in the file system. With option 2, you could alternatively move the file to a "processing" or staging directory that you create before doing your work. YMMV on performance but you could at least benchmark it to see if it could fit your needs.

You may need to implement some form of shared / synchronised List from the spawning thread. If the parent thread keeps track of files by periodically checking the directory, it can then hand them off to child threads and that'll eliminate the locking problem.

This solution, thought not 100% water tight, may well get you what you need. (It did for us.)
Use two locks that together give you exclusive access to the file. When you are ready to delete the file, you release one of them, then deleted the file. The remaining lock will still prevent most other processes from obtaining a lock.
FileInfo file = ...
// Get read access to the file and only allow other processes write or delete access.
// Keeps others from locking the file for reading.
var readStream = file.Open(FileMode.Open, FileAccess.Read, FileShare.Write | FileShare.Delete);
FileStream preventWriteAndDelete;
try
{
// Now try to get a lock on than only allows others to read the file. We can acquire both
// locks because they each allow the other. Together, they give us exclusive access to the
// file.
preventWriteAndDelete = file.Open(FileMode.Open, FileAccess.Write, FileShare.Read);
}
catch
{
// We couldn't get the second lock, so release the first.
readStream.Dispose();
throw;
}
Now you can read the file (with readStream). If you need to write to it, you'll have to do that with the other stream.
When you are ready to delete the file, you first release the lock that prevents writing and deletion while still holding the lock that prevents reading.
preventWriteAndDelete.Dispose(); // Release lock that prevents deletion.
file.Delete();
// This lock specifically allowed deletion, but with the file gone, we're done with it now.
readStream.Dispose();
The only opportunity for another process (or thread) to get a lock on the file is if it requests a shared write lock, one which gives it write-only access and also allows others to write to the file. This is not very common. Most processes attempt either a shared read lock (read access allowing others to read, but not write or delete) or an exclusive write lock (write or read/write access with no sharing). Both of these common scenarios will fail. A shared read/write lock (requesting read/write access and allowing others the same) will also fail.
In addition, the window of opportunity for a process to request and acquire a shared write lock is very small. If a process is hammering away trying to acquire such a lock, then it may succeed, but few applications do this. So unless you have such an application in your scenario, this strategy should meet your needs.
You can also use the same strategy to move the file.
preventWriteAndDelete.Dispose();
file.MoveTo(destination);
readStream.Dispose();

You could use the MoveFileEx API function to mark the file for deletion upon next reboot. Source

Lock a file while retaining the ability to read/append/write/truncate in the same thread?

I have a file containing, roughly speaking, the state of the application.
I want to implement the following behaviour:
When the application is started, lock the file so that no other applications (or user itself) will be able to modify it;
Read the previous application state from the file;
... do work ...
Update the file with a new state (which, given the format of the file, involves rewriting the entire file; the length of the file may decrease after the operation);
... do work ...
Update the file again
... do work ...
If the work failed (application crashed), the lock is taken off, and the content of the file is left as it was after the previous unit of work executed.
It seems that, to rewrite the file, one should open it with a Truncate option; that means one should open a new FileStream each time they want to rewrite a file. So it seems that behavior I want could only achieved by such a dirty way:
When the application is started, read the file, then open the FileStream with the FileShare.Read;
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
On the Dispose, close the handle opened previously.
Such a way has some disadvantages: extra FileStream are opened; the file integrity is not guaranteed between FileStream close and FileStream open; the code is much more complicated.
Is there any other way, lacking these disadvantages?

Don't close and reopen the file. Instead, use FileStream.SetLength(0) to truncate the file to zero length when you want to rewrite it.
You might (or might not) also need to set FileStream.Position to zero. The documentation doesn't make it clear whether SetLength moves the file pointer or not.

Why don't you take exclusive access to the file when application starts, and create an in-memory cache of the file that can be shared across all threads in the process while your actual file remains locked for OS. You can use lock(memoryStream) to avoid concurrency issues. when you are done updating the local in-memory version of file just update the file on disk and release lock on it.
Regards.

Read from a growing file in C#?

In C#/.NET (on Windows) is there a way to read a "growing" file using a file stream? The length of the file will be very small when the filestream is opened, but the file will be being written to by another thread. If/when the filestream "catches up" to the other thread (i.e. when Read() returns 0 bytes read), I want to pause to allow the file to buffer a bit, then continue reading.
I don't really want to use a FilesystemWatcher and keep creating new file streams (as was suggested for log files), since this isn't a log file (it's a video file being encoded on the fly) and performance is an issue.
Thanks,
Robert

You can do this, but you need to keep careful track of the file read and write positions using Stream.Seek and with appropriate synchronization between the threads. Typically you would use an EventWaitHandle or subclass thereof to do the synchronization for data, and you would also need to consider synchronization for the access to the FileStream object itself (probably via a lock statement).
Update: In answering this question I implemented something similar - a situation where a file was being downloaded in the background and also being uploaded at the same time. I used memory buffers, and posted a gist which has working code. (It's GPL but that might not matter for you - in any case you can use the principles to do your own thing.)

This worked with a StreamReader around a file, with the following steps:
In the program that writes to the file, open it with read sharing, like this:
var out = new StreamWriter(File.Open("logFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read));
In the program that reads the file, open it with read-write sharing, like this:
using (FileStream fileStream = File.Open("logFile.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using ( var file = new StreamReader(fileStream))
Before accessing the input stream, check whether the end has been reached, and if so, wait around a while.
while (file.EndOfStream)
{
Thread.Sleep(5);
}

The way i solved this is using the DirectoryWatcher / FilesystemWatcher class, and when it triggers on the file you want you open a FileStream and read it to the end. And when im done reading i save the position of the reader, so next time the DirectoryWatcher / FilesystemWatcher triggers i open a stream set the position to where i was last time.
Calling FileStream.length is actualy very slow, i have had no performance issues with my solution ( I was im reading a "log" ranging from 10mb to 50 ish).
To me the solution i describe is very simple and easy to maintain, i would try it and profile it. I dont think your going to get any performance issues based on it. I do this when ppl are playing a multi threaded game, taking their entire CPU and nobody has complained that my parser is more demanding then the competing parsers.

One other thing that might be useful is the FileStream class has a property on it called ReadTimeOut which is defined as:
Gets or sets a value, in miliseconds, that determines how long the stream will attempt to read before timing out. (inherited from Stream)
This could be useful in that when your reads catch up to your writes the thread performing the reads may pause while the write buffer gets flushed. It would certianly be worth writing a small test to see if this property would help your cause in any way.
Are the read and write operations happening on the same object? If so you could write your own abstractions over the file and then write cross thread communication code such that the thread that is performing the writes and notify the thread performing the reads when it is done so that the thread doing the reads knows when to stop reading when it reaches EOF.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.