Multiple Threads reading from the same file - c#

I have a xml file that needs to be read from many many times. I am trying to use the Parallel.ForEach to speed this processes up since none of that data being read in is relevant as to what order it is being read in. The data is just being used to populate objects. My problem is even though I am opening the file each time in the thread as read only it complains that it is open by another program. (I don't have it opened in a text editor or anything :))
How can I accomplish multi reads from the same file?
EDIT: The file is ~18KB pretty small. It is read from about 1,800 times.
Thanks

If you want multiple threads to read from the same file, you need to specify FileShare.Read:
using (var stream = File.Open("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
However, you will not achieve any speedup from this, for multiple reasons:
Your hard disk can only read one thing at a time. Although you have multiple threads running at the same time, these threads will all end up waiting for each other.
You cannot easily parse a part of an XML file. You will usually have to parse the entire XML file every time. Since you have multiple threads reading it all the time, it seems that you are not expecting the file to change. If that is the case, then why do you need to read it multiple times?

Depending on the size of the file and the type of reads you are doing it might be faster to load the file into memory first, and then provide access to it directly to your threads.
You didnt provide any specifics on the file, the reads, etc so I cant say for sure if it would address your specific needs.
The general premise would be to load the file once in a single thread, and then either directly (via the Xml structure) or indirectly (via XmlNodes, etc) provide access to the file to each of your threads. I envision something similar to:
Load the file
For each Xpath query dispatch the matching nodes to your threads.
If the threads dont modify the XML directly, this might be a viable alternative.

When you open the file, you need to specify FileShare.Read :
using (var stream = new FileStream("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
That way the file can be opened multiple times for reading

While an old post, it seems to be a popular one so I thought I would add a solution that I have used to good effect for multi-threaded environments that need read access to a file. The file must however be small enough to hold in memory at least for the duration of your processing, and the file must only be read and not written to during the period of shared access.
string FileName = "TextFile.txt";
string[] FileContents = File.ReadAllLines(FileName);
foreach (string strOneLine in FileContents)
{
// Do work on each line of the file here
}
So long as the file is only being read, multiple threads or programs can access and process it at the same time without treading on one another's toes.

Related

Is it thread-safe to copy a file to different destinations with one thread each

I need to load the content of a file on different computers at the same time. Because a StreamReader will occupy the file, I want to copy it to a temporary folder before opening it. (The title is more general as there should be no difference between two threads running on one computer and two computers running one thread each.)
Question: will two threads copying a file at the same time affect each other even when the copy destinations are separated?
It's safe to read a file from multiple threads/processes/machines, as long as there is no one writing to the file at the same time.
Don't - it's much easier to just use the correct arguments when creating your file stream.
The key is the FileShare setting - it says what kinds of operations are allowed on the file you have opened. Specify FileShare.Read, and any number of concurrent read operations (that also have a FileShare.Read) on the same file will work just fine.
It's as simple as
File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read)
Your operation is reading. It is like reading a book by 2 person simultaneously. So it is thread safe.
However, if you write some note to the file by two threads - it is not thread safe. It is like two person writing some notes in one copybook. They will just disturb each other and letters will be not correct and the meaning will be incorrect.

Most efficient way of reading file

I have a file which contains a certain number of fixed length rows having some numbers. I need to read each row in order to get that number and process them and write to a file.
Since I need to read each row, as the number of rows increases it becomes time consuming.
Is there an efficient way of reading each row of the file? I'm using C#.
File.ReadLines (.NET 4.0+) is probably the most memory efficient way to do this.
It returns an IEnumerable<string> meaning that lines will get read lazily in a streaming fashion.
Previous versions do not have the streaming option available in this manner, but using StreamReader to read line by line would achieve the same.
Reading all rows from a file is always at least O(n). When file size starts becoming an issue then its probably a good time to look at creating a database for the information instead of flat files.
Not sure this is the most efficient, but it works well for me:
http://msdn.microsoft.com/en-us/library/system.io.fileinfo.aspx
//Declare a new file and give it the path to your file
FileInfo fi1 = new FileInfo(path);
//Open the file and read the text
using (StreamReader sr = fi1.OpenText())
{
string s = "";
// Loop through each line
while ((s = sr.ReadLine()) != null)
{
//Here is where you handle your row in the file
Console.WriteLine(s);
}
}
No matter which operating system you're using, there will be several layers between your code and the actual storage mechanism. Hard drives and tape drives store files in blocks, which these days are usually around 4K each. If you want to read one byte, the device will still read the entire block into memory -- it's just faster that way. The device and the OS also may each keep a cache of blocks. So there's not much you can do to change the standard (highly optimized) file reading behavior; just read the file as you need it and let the system take care of the rest.
If the time to process the file is becoming a problem, two options that might help are:
Try to arrange to use shorter files. It sounds like you're processing log files or something -- running your program more frequently might help to at least give the appearance of better performance.
Change the way the data is stored. Again, I understand that the file comes from some external source, but perhaps you can arrange for a job to run that periodically converts the raw file to something that you can read more quickly.
Good luck.

How do I avoid excessive Network File I/O when appending to a large file with .NET?

I have a program that opens a large binary file, appends a small amount of data to it, and closes the file.
FileStream fs = File.Open( "\\\\s1\\temp\\test.tmp", FileMode.Append, FileAccess.Write, FileShare.None );
fs.Write( data, 0, data.Length );
fs.Close();
If test.tmp is 5MB before this program is run and the data array is 100 bytes, this program will cause over 5MB of data to be transmitted across the network. I would have expected that the data already in the file would not be transmitted across the network since I'm not reading it or writing it. Is there any way to avoid this behavior? This makes it agonizingly slow to append to very large files.
0xA3 provided the answer in a commment above. The poor performance was due to an on-access virus scan. Each time my program opened the file, the virus scanner read the entire contents of the file to check for viruses even though my program didn't read any of the existing content. Disabling the on-access virus scan eliminated the excessive network I/O and the poor performance.
Thanks to everyone for your suggestions.
I found this on MSDN (CreateFile is called internally):
When an application creates a file across a network, it is better to use GENERIC_READ | GENERIC_WRITE for dwDesiredAccess than to use GENERIC_WRITE alone. The resulting code is faster, because the redirector can use the cache manager and send fewer SMBs with more data. This combination also avoids an issue where writing to a file across a network can occasionally return ERROR_ACCESS_DENIED.
Using Reflector, FileAccess maps to dwDesiredAccess, so it would seem to suggest using FileAccess.ReadWrite instead of just FileAccess.Write.
I have no idea if this will help :)
You could cache your data into a local buffer and periodically (much less often than now) append to the large file. This would save on a bunch of network transfers but... This would also increase the risk of losing that cache (and your data) in case your app crashes.
Logging (if that's what it is) of this type is often stored in a db. Using a decent RDBMS would allow you to post that 100 bytes of data very frequently with minimal overhead. The caveat there is the maintenance of an RDBMS.
If you have system access or perhaps a friendly admin for the machine actually hosting the file you could make a small listener program that sits on the other end.
You make a call to it passing just the data to be written and it does the write locally, avoiding the extra network traffic.
The File object in .NET has quite a few static methods to handle this type of thing. I would suggest trying:
File file = File.AppendAllText("FilePath", "What to append", Encoding.UTF8);
When you reflect this method it turns out that it's using:
using (StreamWriter writer = new StreamWriter(path, true, encoding))
{
writer.Write(contents);
}
This StreamWriter method should allow you to simply append something to the end (at least this is the method I've seen used in every instance of logging that I've encountered so far).
Write the data to separate files, then join them (do it on the hosting machine if possible) only when necessary.
I did some googling and was looking more at how to read excessively large files quickly and found this link https://web.archive.org/web/20190906152821/http://www.4guysfromrolla.com/webtech/010401-1.shtml
The most interesting part there would be the part about byte reading:
Besides the more commonly used ReadAll and ReadLine methods, the TextStream object also supports a Read(n) method, where n is the number of bytes in the file/textstream in question. By instantiating an additional object (a file object), we can obtain the size of the file to be read, and then use the Read(n) method to race through our file. As it turns out, the "read bytes" method is extremely fast by comparison:
const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath("myfile.txt"))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)
strSearchThis = objTS.Read(objFile.Size)
if instr(strSearchThis, "keyword") > 0 then
Response.Write "Found it!"
end if
This method could then be used by you to go to the end of the file and manually appending it instead of loading the entire file in append mode with a filestream.

Read from a growing file in C#?

In C#/.NET (on Windows) is there a way to read a "growing" file using a file stream? The length of the file will be very small when the filestream is opened, but the file will be being written to by another thread. If/when the filestream "catches up" to the other thread (i.e. when Read() returns 0 bytes read), I want to pause to allow the file to buffer a bit, then continue reading.
I don't really want to use a FilesystemWatcher and keep creating new file streams (as was suggested for log files), since this isn't a log file (it's a video file being encoded on the fly) and performance is an issue.
Thanks,
Robert
You can do this, but you need to keep careful track of the file read and write positions using Stream.Seek and with appropriate synchronization between the threads. Typically you would use an EventWaitHandle or subclass thereof to do the synchronization for data, and you would also need to consider synchronization for the access to the FileStream object itself (probably via a lock statement).
Update: In answering this question I implemented something similar - a situation where a file was being downloaded in the background and also being uploaded at the same time. I used memory buffers, and posted a gist which has working code. (It's GPL but that might not matter for you - in any case you can use the principles to do your own thing.)
This worked with a StreamReader around a file, with the following steps:
In the program that writes to the file, open it with read sharing, like this:
var out = new StreamWriter(File.Open("logFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read));
In the program that reads the file, open it with read-write sharing, like this:
using (FileStream fileStream = File.Open("logFile.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using ( var file = new StreamReader(fileStream))
Before accessing the input stream, check whether the end has been reached, and if so, wait around a while.
while (file.EndOfStream)
{
Thread.Sleep(5);
}
The way i solved this is using the DirectoryWatcher / FilesystemWatcher class, and when it triggers on the file you want you open a FileStream and read it to the end. And when im done reading i save the position of the reader, so next time the DirectoryWatcher / FilesystemWatcher triggers i open a stream set the position to where i was last time.
Calling FileStream.length is actualy very slow, i have had no performance issues with my solution ( I was im reading a "log" ranging from 10mb to 50 ish).
To me the solution i describe is very simple and easy to maintain, i would try it and profile it. I dont think your going to get any performance issues based on it. I do this when ppl are playing a multi threaded game, taking their entire CPU and nobody has complained that my parser is more demanding then the competing parsers.
One other thing that might be useful is the FileStream class has a property on it called ReadTimeOut which is defined as:
Gets or sets a value, in miliseconds, that determines how long the stream will attempt to read before timing out. (inherited from Stream)
This could be useful in that when your reads catch up to your writes the thread performing the reads may pause while the write buffer gets flushed. It would certianly be worth writing a small test to see if this property would help your cause in any way.
Are the read and write operations happening on the same object? If so you could write your own abstractions over the file and then write cross thread communication code such that the thread that is performing the writes and notify the thread performing the reads when it is done so that the thread doing the reads knows when to stop reading when it reaches EOF.

How to avoid File Blocking

We are monitoring the progress of a customized app (whose source is not under our control) which writes to a XML Manifest. At times , the application is stuck due to unable to write into the Manifest file. Although we are covering our traces by explicitly closing the file handle using File.Close and also creating the file variables in Using Blocks. But somehow it keeps happening. ( Our application is multithreaded and at most three threads might be accessing the file. )
Another interesting thing is that their app updates this manifest at three different events(add items, deleting items, completion of items) but we are only suffering about one event (completion of items). My code is listed here
using (var st = new FileStream(MenifestPath, FileMode.Open, FileAccess.Read))
{
using (TextReader r = new StreamReader(st))
{
var xml = r.ReadToEnd();
r.Close();
st.Close();
//................ Rest of our operations
}
}
If you are only reading from the file, then you should be able to pass a flag to specify the sharing mode. I don't know how you specify this in .NET, but in WinAPI you'd pass FILE_SHARE_READ | FILE_SHARE_WRITE to CreateFile().
I suggest you check your file API documentation to see where it mentions sharing modes.
Two things:
You should do the rest of your operations outside the scopes of the using statements. This way, you won't risk using the closed stream and reader. Also, you needn't use the Close methods, because when you exit the scope of the using statement, Dispose is called, which is equivalent.
You should use the overload that has the FileShare enumeration. Locking is paranoid in nature, so the file may be locked automatically to protect you from yourself. :)
HTH.
The problem is different because that person is having full control on the file access for all processes while as i mentioned ONE PROCESS IS THIRD PARTY WITH NO SOURCE ACCCESS. And our applications are working fine. However, their application seems stuck if they cant get hold the control of file. So i am willing to find a method of file access that does not disturb their running.
This could happen if one thread was attempting to read from the file while another was writing. To avoid this type of situation where you want multiple readers but only one writer at a time, make use of the ReaderWriterLock or in .NET 2.0 the ReaderWriterLockSlim class in the System.Threading namespace.
Also, if you're using .NET 2.0+, you can simplify your code to just:
string xmlText = File.ReadAllText(ManifestFile);
See also: File.ReadAllText on MSDN.

Categories