Read from a growing file in C#?

Read from a growing file in C#? - c#

In C#/.NET (on Windows) is there a way to read a "growing" file using a file stream? The length of the file will be very small when the filestream is opened, but the file will be being written to by another thread. If/when the filestream "catches up" to the other thread (i.e. when Read() returns 0 bytes read), I want to pause to allow the file to buffer a bit, then continue reading.
I don't really want to use a FilesystemWatcher and keep creating new file streams (as was suggested for log files), since this isn't a log file (it's a video file being encoded on the fly) and performance is an issue.
Thanks,
Robert

You can do this, but you need to keep careful track of the file read and write positions using Stream.Seek and with appropriate synchronization between the threads. Typically you would use an EventWaitHandle or subclass thereof to do the synchronization for data, and you would also need to consider synchronization for the access to the FileStream object itself (probably via a lock statement).
Update: In answering this question I implemented something similar - a situation where a file was being downloaded in the background and also being uploaded at the same time. I used memory buffers, and posted a gist which has working code. (It's GPL but that might not matter for you - in any case you can use the principles to do your own thing.)

This worked with a StreamReader around a file, with the following steps:
In the program that writes to the file, open it with read sharing, like this:
var out = new StreamWriter(File.Open("logFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read));
In the program that reads the file, open it with read-write sharing, like this:
using (FileStream fileStream = File.Open("logFile.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using ( var file = new StreamReader(fileStream))
Before accessing the input stream, check whether the end has been reached, and if so, wait around a while.
while (file.EndOfStream)
{
Thread.Sleep(5);
}

The way i solved this is using the DirectoryWatcher / FilesystemWatcher class, and when it triggers on the file you want you open a FileStream and read it to the end. And when im done reading i save the position of the reader, so next time the DirectoryWatcher / FilesystemWatcher triggers i open a stream set the position to where i was last time.
Calling FileStream.length is actualy very slow, i have had no performance issues with my solution ( I was im reading a "log" ranging from 10mb to 50 ish).
To me the solution i describe is very simple and easy to maintain, i would try it and profile it. I dont think your going to get any performance issues based on it. I do this when ppl are playing a multi threaded game, taking their entire CPU and nobody has complained that my parser is more demanding then the competing parsers.

One other thing that might be useful is the FileStream class has a property on it called ReadTimeOut which is defined as:
Gets or sets a value, in miliseconds, that determines how long the stream will attempt to read before timing out. (inherited from Stream)
This could be useful in that when your reads catch up to your writes the thread performing the reads may pause while the write buffer gets flushed. It would certianly be worth writing a small test to see if this property would help your cause in any way.
Are the read and write operations happening on the same object? If so you could write your own abstractions over the file and then write cross thread communication code such that the thread that is performing the writes and notify the thread performing the reads when it is done so that the thread doing the reads knows when to stop reading when it reaches EOF.

Related

C# - How to copy a file being written

I have a class implementing a Log file writer.
Logs must be written for the application to "work correctly", so it is of the utmost importance that the writings to disk are ok.
The log file is kept open for the whole life of the application, and write operations are accordingly very fast:
var logFile = new FileInfo(filepath);
_outputStream = logFile.Open(FileMode.Append, FileAccess.Write, FileShare.Read);
Now, I need to synchronize this file to a network path, during application lifetime.
This network copy can be slightly delayed without problems. The important bit is that I have to guarantee that it doesn't interfere with log writing.
Given this network copy must be eventually consistent, I need to make sure that all file contents are written, instead only the last message(s).
A previous implementation used heavy locking and a simple System.IO.File.Copy(filepath, networkPath, true), but I would like to lock as little as possible.
How could I approach this problem? I'm out of ideas.

c# FileStream.Lock what happens?

I've recently read in a book about FileStream.Lock to lock part or whole of a file. I understand this part however I'm confused about what happens next.
If you create a FileStream object, and then lock the file.
fileStream.Lock(0, fileStream.Lenght);
can you only use that filestream object to write to the file? Even if only part of the file is locked
fileStream.Lock(0, 2);
The impression I got from the book is that you can create another FileStream object and buffer text to be flushed, but flushing before releasing the lock causes an Exception.
Whilst the file is locked;
Can you only use the original FileStream to write to the file, and, is this writing restricted to the locked portion?
If there are different answers for both partial locking and full file content locking then I'd like to hear both.
I'm curious if you can create multiple filestreams on a file each with partial locks? All and any information to help make FileStream.Lock clear is appreciated.

Lock a file while retaining the ability to read/append/write/truncate in the same thread?

I have a file containing, roughly speaking, the state of the application.
I want to implement the following behaviour:
When the application is started, lock the file so that no other applications (or user itself) will be able to modify it;
Read the previous application state from the file;
... do work ...
Update the file with a new state (which, given the format of the file, involves rewriting the entire file; the length of the file may decrease after the operation);
... do work ...
Update the file again
... do work ...
If the work failed (application crashed), the lock is taken off, and the content of the file is left as it was after the previous unit of work executed.
It seems that, to rewrite the file, one should open it with a Truncate option; that means one should open a new FileStream each time they want to rewrite a file. So it seems that behavior I want could only achieved by such a dirty way:
When the application is started, read the file, then open the FileStream with the FileShare.Read;
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
When some work is done, close the handle opened previously, open another FileStream with the FileMode.Truncate and FileShare.Read, write the data and flush the FileStream.
On the Dispose, close the handle opened previously.
Such a way has some disadvantages: extra FileStream are opened; the file integrity is not guaranteed between FileStream close and FileStream open; the code is much more complicated.
Is there any other way, lacking these disadvantages?

Don't close and reopen the file. Instead, use FileStream.SetLength(0) to truncate the file to zero length when you want to rewrite it.
You might (or might not) also need to set FileStream.Position to zero. The documentation doesn't make it clear whether SetLength moves the file pointer or not.

Why don't you take exclusive access to the file when application starts, and create an in-memory cache of the file that can be shared across all threads in the process while your actual file remains locked for OS. You can use lock(memoryStream) to avoid concurrency issues. when you are done updating the local in-memory version of file just update the file on disk and release lock on it.
Regards.

Multiple Threads reading from the same file

I have a xml file that needs to be read from many many times. I am trying to use the Parallel.ForEach to speed this processes up since none of that data being read in is relevant as to what order it is being read in. The data is just being used to populate objects. My problem is even though I am opening the file each time in the thread as read only it complains that it is open by another program. (I don't have it opened in a text editor or anything :))
How can I accomplish multi reads from the same file?
EDIT: The file is ~18KB pretty small. It is read from about 1,800 times.
Thanks

If you want multiple threads to read from the same file, you need to specify FileShare.Read:
using (var stream = File.Open("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
However, you will not achieve any speedup from this, for multiple reasons:
Your hard disk can only read one thing at a time. Although you have multiple threads running at the same time, these threads will all end up waiting for each other.
You cannot easily parse a part of an XML file. You will usually have to parse the entire XML file every time. Since you have multiple threads reading it all the time, it seems that you are not expecting the file to change. If that is the case, then why do you need to read it multiple times?

Depending on the size of the file and the type of reads you are doing it might be faster to load the file into memory first, and then provide access to it directly to your threads.
You didnt provide any specifics on the file, the reads, etc so I cant say for sure if it would address your specific needs.
The general premise would be to load the file once in a single thread, and then either directly (via the Xml structure) or indirectly (via XmlNodes, etc) provide access to the file to each of your threads. I envision something similar to:
Load the file
For each Xpath query dispatch the matching nodes to your threads.
If the threads dont modify the XML directly, this might be a viable alternative.

When you open the file, you need to specify FileShare.Read :
using (var stream = new FileStream("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
That way the file can be opened multiple times for reading

While an old post, it seems to be a popular one so I thought I would add a solution that I have used to good effect for multi-threaded environments that need read access to a file. The file must however be small enough to hold in memory at least for the duration of your processing, and the file must only be read and not written to during the period of shared access.
string FileName = "TextFile.txt";
string[] FileContents = File.ReadAllLines(FileName);
foreach (string strOneLine in FileContents)
{
// Do work on each line of the file here
}
So long as the file is only being read, multiple threads or programs can access and process it at the same time without treading on one another's toes.

How do I avoid excessive Network File I/O when appending to a large file with .NET?

I have a program that opens a large binary file, appends a small amount of data to it, and closes the file.
FileStream fs = File.Open( "\\\\s1\\temp\\test.tmp", FileMode.Append, FileAccess.Write, FileShare.None );
fs.Write( data, 0, data.Length );
fs.Close();
If test.tmp is 5MB before this program is run and the data array is 100 bytes, this program will cause over 5MB of data to be transmitted across the network. I would have expected that the data already in the file would not be transmitted across the network since I'm not reading it or writing it. Is there any way to avoid this behavior? This makes it agonizingly slow to append to very large files.

0xA3 provided the answer in a commment above. The poor performance was due to an on-access virus scan. Each time my program opened the file, the virus scanner read the entire contents of the file to check for viruses even though my program didn't read any of the existing content. Disabling the on-access virus scan eliminated the excessive network I/O and the poor performance.
Thanks to everyone for your suggestions.

I found this on MSDN (CreateFile is called internally):
When an application creates a file across a network, it is better to use GENERIC_READ | GENERIC_WRITE for dwDesiredAccess than to use GENERIC_WRITE alone. The resulting code is faster, because the redirector can use the cache manager and send fewer SMBs with more data. This combination also avoids an issue where writing to a file across a network can occasionally return ERROR_ACCESS_DENIED.
Using Reflector, FileAccess maps to dwDesiredAccess, so it would seem to suggest using FileAccess.ReadWrite instead of just FileAccess.Write.
I have no idea if this will help :)

You could cache your data into a local buffer and periodically (much less often than now) append to the large file. This would save on a bunch of network transfers but... This would also increase the risk of losing that cache (and your data) in case your app crashes.
Logging (if that's what it is) of this type is often stored in a db. Using a decent RDBMS would allow you to post that 100 bytes of data very frequently with minimal overhead. The caveat there is the maintenance of an RDBMS.

If you have system access or perhaps a friendly admin for the machine actually hosting the file you could make a small listener program that sits on the other end.
You make a call to it passing just the data to be written and it does the write locally, avoiding the extra network traffic.

The File object in .NET has quite a few static methods to handle this type of thing. I would suggest trying:
File file = File.AppendAllText("FilePath", "What to append", Encoding.UTF8);
When you reflect this method it turns out that it's using:
using (StreamWriter writer = new StreamWriter(path, true, encoding))
{
writer.Write(contents);
}
This StreamWriter method should allow you to simply append something to the end (at least this is the method I've seen used in every instance of logging that I've encountered so far).

Write the data to separate files, then join them (do it on the hosting machine if possible) only when necessary.

I did some googling and was looking more at how to read excessively large files quickly and found this link https://web.archive.org/web/20190906152821/http://www.4guysfromrolla.com/webtech/010401-1.shtml
The most interesting part there would be the part about byte reading:
Besides the more commonly used ReadAll and ReadLine methods, the TextStream object also supports a Read(n) method, where n is the number of bytes in the file/textstream in question. By instantiating an additional object (a file object), we can obtain the size of the file to be read, and then use the Read(n) method to race through our file. As it turns out, the "read bytes" method is extremely fast by comparison:
const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath("myfile.txt"))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)
strSearchThis = objTS.Read(objFile.Size)
if instr(strSearchThis, "keyword") > 0 then
Response.Write "Found it!"
end if
This method could then be used by you to go to the end of the file and manually appending it instead of loading the entire file in append mode with a filestream.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.