Access file being written - c#

We are planing to migrate a portal from one platform(A) to another(B) and for this, a utility is provided by the vendor to generate XML for A which will be used by us to migrate to B.
Now this utility has a bug that after generating the relevant XML, it doesn't terminate, rather it keeps on appending static junk nodes to it.
For this purpose, I am writing a C# utility to terminate the application when the XML starts getting junk nodes.
Can I access the file which is already being written as below and be assured that it won't error out
var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
Also, just to confirm, when a file is being written, the new content is appended to already existing file and it does not first flushes out everything and then write the updated content.(I'm almost sure that I am right on this).

It depends on whether the utility keeps the file open in write mode the whole time. It may process the data in-memory in chunks and then write to the file in chunks, but more likely it will just keep the file open for writing, keeping it locked. In that case, you cannot read from it until the utility releases the lock.

What about User Mode File system? For example this driver supports .NET extensions. The idea is to make a proxy for the file system driver that will intercept all write/flush operations from the app.

Related

C# - How to copy a file being written

I have a class implementing a Log file writer.
Logs must be written for the application to "work correctly", so it is of the utmost importance that the writings to disk are ok.
The log file is kept open for the whole life of the application, and write operations are accordingly very fast:
var logFile = new FileInfo(filepath);
_outputStream = logFile.Open(FileMode.Append, FileAccess.Write, FileShare.Read);
Now, I need to synchronize this file to a network path, during application lifetime.
This network copy can be slightly delayed without problems. The important bit is that I have to guarantee that it doesn't interfere with log writing.
Given this network copy must be eventually consistent, I need to make sure that all file contents are written, instead only the last message(s).
A previous implementation used heavy locking and a simple System.IO.File.Copy(filepath, networkPath, true), but I would like to lock as little as possible.
How could I approach this problem? I'm out of ideas.

Cannot open filestream for reading but still I am able to copy the file?

This line:
using (FileStream fs = File.Open(src, FileMode.Open, FileAccess.Read, FileShare.Read))
throws:
System.IO.IOException: The process cannot access the file 'X' because
it is being used by another process.
When I replace the line with:
File.Copy(src, dst, true);
using (FileStream fs = File.Open(dst, FileMode.Open, FileAccess.Read, FileShare.Read))
it works.
But why I can copy, which surely reads the whole content of file, while being restricted from directly reading the file? Is there a workaround?
When you open a file there is a check for access modes and sharing modes. The access modes of any process must be compatible with the sharing modes of others. So if A wants access to read, others must have allowed reading in sharing mode. Same for writing.
If process A has opened a file for writing and you say SharingMode.Read the call will fail. You are in this case saying "others may only read from the file, not write."
If you specify ShareMode.ReadWrite you're saying "others can read or write, I don't care" and if no other process has specified ShareMode.Write you are allowed to read from the file.
But why I can copy, which surely reads the whole content of file
Well, conceptually it reads the whole file, though it can happen at a lower level than copying streams. On Windows it's a call to the CopyFileEx system function, passing in the paths. On *nix systems it also uses a system call, but does open the source file with FileAccess.Read, FileShare.Read for that call, so you would have the same issue.
while being restricted from directly reading the file?
If a file may be written to then you cannot open it FileShare.Read because at some point between the various operations you are doing the file could be changed and your operations will give the wrong results.
CopyFileEx can succeed by preventing any writes that happen to it during the short period it was operating from affecting the results. There would be no way to offer a more general form of this, because there's no way to know you are going to close the stream of it again quickly.
Is there a workaround?
A workaround for what? That you can't open a stream, or that you can copy the file? For the former the latter provides just such a workaround: Copy the file to get a snapshot of how it was, though note that it isn't guaranteed.

Lock a file while retaining the ability to read/append/write/truncate in the same thread?

I have a file containing, roughly speaking, the state of the application.
I want to implement the following behaviour:
When the application is started, lock the file so that no other applications (or user itself) will be able to modify it;
Read the previous application state from the file;
... do work ...
Update the file with a new state (which, given the format of the file, involves rewriting the entire file; the length of the file may decrease after the operation);
... do work ...
Update the file again
... do work ...
If the work failed (application crashed), the lock is taken off, and the content of the file is left as it was after the previous unit of work executed.
It seems that, to rewrite the file, one should open it with a Truncate option; that means one should open a new FileStream each time they want to rewrite a file. So it seems that behavior I want could only achieved by such a dirty way:
When the application is started, read the file, then open the FileStream with the FileShare.Read;
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
On the Dispose, close the handle opened previously.
Such a way has some disadvantages: extra FileStream are opened; the file integrity is not guaranteed between FileStream close and FileStream open; the code is much more complicated.
Is there any other way, lacking these disadvantages?
Don't close and reopen the file. Instead, use FileStream.SetLength(0) to truncate the file to zero length when you want to rewrite it.
You might (or might not) also need to set FileStream.Position to zero. The documentation doesn't make it clear whether SetLength moves the file pointer or not.
Why don't you take exclusive access to the file when application starts, and create an in-memory cache of the file that can be shared across all threads in the process while your actual file remains locked for OS. You can use lock(memoryStream) to avoid concurrency issues. when you are done updating the local in-memory version of file just update the file on disk and release lock on it.
Regards.

Multiple Threads reading from the same file

I have a xml file that needs to be read from many many times. I am trying to use the Parallel.ForEach to speed this processes up since none of that data being read in is relevant as to what order it is being read in. The data is just being used to populate objects. My problem is even though I am opening the file each time in the thread as read only it complains that it is open by another program. (I don't have it opened in a text editor or anything :))
How can I accomplish multi reads from the same file?
EDIT: The file is ~18KB pretty small. It is read from about 1,800 times.
Thanks
If you want multiple threads to read from the same file, you need to specify FileShare.Read:
using (var stream = File.Open("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
However, you will not achieve any speedup from this, for multiple reasons:
Your hard disk can only read one thing at a time. Although you have multiple threads running at the same time, these threads will all end up waiting for each other.
You cannot easily parse a part of an XML file. You will usually have to parse the entire XML file every time. Since you have multiple threads reading it all the time, it seems that you are not expecting the file to change. If that is the case, then why do you need to read it multiple times?
Depending on the size of the file and the type of reads you are doing it might be faster to load the file into memory first, and then provide access to it directly to your threads.
You didnt provide any specifics on the file, the reads, etc so I cant say for sure if it would address your specific needs.
The general premise would be to load the file once in a single thread, and then either directly (via the Xml structure) or indirectly (via XmlNodes, etc) provide access to the file to each of your threads. I envision something similar to:
Load the file
For each Xpath query dispatch the matching nodes to your threads.
If the threads dont modify the XML directly, this might be a viable alternative.
When you open the file, you need to specify FileShare.Read :
using (var stream = new FileStream("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
That way the file can be opened multiple times for reading
While an old post, it seems to be a popular one so I thought I would add a solution that I have used to good effect for multi-threaded environments that need read access to a file. The file must however be small enough to hold in memory at least for the duration of your processing, and the file must only be read and not written to during the period of shared access.
string FileName = "TextFile.txt";
string[] FileContents = File.ReadAllLines(FileName);
foreach (string strOneLine in FileContents)
{
// Do work on each line of the file here
}
So long as the file is only being read, multiple threads or programs can access and process it at the same time without treading on one another's toes.

Read from a growing file in C#?

In C#/.NET (on Windows) is there a way to read a "growing" file using a file stream? The length of the file will be very small when the filestream is opened, but the file will be being written to by another thread. If/when the filestream "catches up" to the other thread (i.e. when Read() returns 0 bytes read), I want to pause to allow the file to buffer a bit, then continue reading.
I don't really want to use a FilesystemWatcher and keep creating new file streams (as was suggested for log files), since this isn't a log file (it's a video file being encoded on the fly) and performance is an issue.
Thanks,
Robert
You can do this, but you need to keep careful track of the file read and write positions using Stream.Seek and with appropriate synchronization between the threads. Typically you would use an EventWaitHandle or subclass thereof to do the synchronization for data, and you would also need to consider synchronization for the access to the FileStream object itself (probably via a lock statement).
Update: In answering this question I implemented something similar - a situation where a file was being downloaded in the background and also being uploaded at the same time. I used memory buffers, and posted a gist which has working code. (It's GPL but that might not matter for you - in any case you can use the principles to do your own thing.)
This worked with a StreamReader around a file, with the following steps:
In the program that writes to the file, open it with read sharing, like this:
var out = new StreamWriter(File.Open("logFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read));
In the program that reads the file, open it with read-write sharing, like this:
using (FileStream fileStream = File.Open("logFile.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using ( var file = new StreamReader(fileStream))
Before accessing the input stream, check whether the end has been reached, and if so, wait around a while.
while (file.EndOfStream)
{
Thread.Sleep(5);
}
The way i solved this is using the DirectoryWatcher / FilesystemWatcher class, and when it triggers on the file you want you open a FileStream and read it to the end. And when im done reading i save the position of the reader, so next time the DirectoryWatcher / FilesystemWatcher triggers i open a stream set the position to where i was last time.
Calling FileStream.length is actualy very slow, i have had no performance issues with my solution ( I was im reading a "log" ranging from 10mb to 50 ish).
To me the solution i describe is very simple and easy to maintain, i would try it and profile it. I dont think your going to get any performance issues based on it. I do this when ppl are playing a multi threaded game, taking their entire CPU and nobody has complained that my parser is more demanding then the competing parsers.
One other thing that might be useful is the FileStream class has a property on it called ReadTimeOut which is defined as:
Gets or sets a value, in miliseconds, that determines how long the stream will attempt to read before timing out. (inherited from Stream)
This could be useful in that when your reads catch up to your writes the thread performing the reads may pause while the write buffer gets flushed. It would certianly be worth writing a small test to see if this property would help your cause in any way.
Are the read and write operations happening on the same object? If so you could write your own abstractions over the file and then write cross thread communication code such that the thread that is performing the writes and notify the thread performing the reads when it is done so that the thread doing the reads knows when to stop reading when it reaches EOF.

Categories