C# Sharing file handles across threads - c#

I have a piece of code that is executed by n threads. The code contains,
for(;;)//repeat about 10,000 times
{
lock(padlock)
{
File.AppendAllText(fileName, text);
}
}
Basically, all threads write to the same set of 10,000 files and hence the files are the shared resource. The issue is that the 10,000 open,write,close performed by each thread is slowing down my program considerably. If I could share the file handlers across threads, I'd be able to keep them open and write from different threads. Can someone throw some light on how I can proceed?

Let all threads write to a syncronised list.
Let one thread 'eat' the items on the list and write them with a single FileWriter to the file.
Presto problem solved in exchange for some extra memory usage.

I suggest you open the file using a FileStream and share that instead of the fileName.

Related

How can I lock files to prevent from multithreading [duplicate]

This question already has answers here:
What are the differences between various threading synchronization options in C#?
(7 answers)
Closed 5 months ago.
There is a system who put files on a folder on a disk.
I'm writing an executable (c#) which take these files and send them into a database.
My executable can be started multiples times in the same time (in parallel) and I have a multithread problem with processing files.
Example:
There are 50 files in folder.
The executable 1 takes 10 files to process.
The executable 2 takes 10 files to process.
My questions are:
How can I be sure that my executable 2 don't take executable 1 files?
How can I lock the 10 files from executable 1?
How to make this process thread safe?
Instead of using multiple processes, a more elegant solution would be to use one process with multiple threads.
You could for example have one thread that lists the files on the disk and adds them to a concurrentqueue, with multiple processing threads taking filenames from the queue and doing the processing. . Or use something like dataflow to essentially do the same thing. The absolutely simplestion option might be to create a list of all files, and use Plinq AsParallel to parallelize the processing.
Note that IO typically do not parallelize very well, so if "processing" mostly involve Reading and writing to disk, your gains might be less than expected.
If you insist on using multiple processes you could follow the same pattern by using an actor model, or otherwise having one management process handing out work to multiple worker processes.
I would probably not recommend relying only on file-locks, since it would be difficult to ensure that files are not processed twice.
It is better to read the list of all the files first, then divide between the threads. Otherwise, exception can be used
You can check if the file is in use or not.
try
{
using (Stream stream = new FileStream("File.txt", FileMode.Open))
{
// File ready for
}
} catch {
//the file is in use.
}
Hope this helps!

Is there a better approach for dealing with mulitple processes reading/writing to a file

I have 3 processes each of which listens to a data feed. All the 3 processes need to read and update the same file after receiving data. The 3 processes keep running whole day. Obviously, each process needs to have exclusive read/write lock on the file.
I could use "named mutex" to protect the read/write of the file or I can open the file using FileShare.None.
Would these 2 approaches work? Which one is better?
The programe is written in C# and runs on Windows.
Use a named mutex for this. If you open the file with FileShare.None in one process, the other processes will get an exception thrown when they attempt to open the file, which means you have to deal with waiting and retrying etc. in these processes.
I agree with the named mutex. Waiting / retrying on file access is very tedious and exception-prone, whereas the named mutex solution is very clean and straightforward.
"monitor" is one of you choise since it is the solution of synchronization problem.

Synchronize writing to a file at file-system level

I have a text file and multiple threads/processes will write to it (it's a log file).
The file gets corrupted sometimes because of concurrent writings.
I want to use a file writing mode from all of threads which is sequential at file-system level itself.
I know it's possible to use locks (mutex for multiple processes) and synchronize writing to this file but I prefer to open the file in the correct mode and leave the task to System.IO.
Is it possible ? what's the best practice for this scenario ?
Your best bet is just to use locks/mutexex. It's a simple approach, it works and you can easily understand it and reason about it.
When it comes to synchronization it often pays to start with the simplest solution that could work and only try to refine if you hit problems.
To my knowledge, Windows doesn't have what you're looking for. There is no file handle object that does automatic synchronization by blocking all other users while one is writing to the file.
If your logging involves the three steps, open file, write, close file, then you can have your threads try to open the file in exclusive mode (FileShare.None), catch the exception if unable to open, and then try again until success. I've found that tedious at best.
In my programs that log from multiple threads, I created a TextWriter descendant that is essentially a queue. Threads call the Write or WriteLine methods on that object, which formats the output and places it into a queue (using a BlockingCollection). A separate logging thread services that queue--pulling things from it and writing them to the log file. This has a few benefits:
Threads don't have to wait on each other in order to log
Only one thread is writing to the file
It's trivial to rotate logs (i.e. start a new log file every hour, etc.)
There's zero chance of an error because I forgot to do the locking on some thread
Doing this across processes would be a lot more difficult. I've never even considered trying to share a log file across processes. Were I to need that, I would create a separate application (a logging service). That application would do the actual writes, with the other applications passing the strings to be written. Again, that ensures that I can't screw things up, and my code remains simple (i.e. no explicit locking code in the clients).
you might be able to use File.Open() with a FileShare value set to None, and make each thread wait if it can't get access to the file.

Multiple Threads reading from the same file

I have a xml file that needs to be read from many many times. I am trying to use the Parallel.ForEach to speed this processes up since none of that data being read in is relevant as to what order it is being read in. The data is just being used to populate objects. My problem is even though I am opening the file each time in the thread as read only it complains that it is open by another program. (I don't have it opened in a text editor or anything :))
How can I accomplish multi reads from the same file?
EDIT: The file is ~18KB pretty small. It is read from about 1,800 times.
Thanks
If you want multiple threads to read from the same file, you need to specify FileShare.Read:
using (var stream = File.Open("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
However, you will not achieve any speedup from this, for multiple reasons:
Your hard disk can only read one thing at a time. Although you have multiple threads running at the same time, these threads will all end up waiting for each other.
You cannot easily parse a part of an XML file. You will usually have to parse the entire XML file every time. Since you have multiple threads reading it all the time, it seems that you are not expecting the file to change. If that is the case, then why do you need to read it multiple times?
Depending on the size of the file and the type of reads you are doing it might be faster to load the file into memory first, and then provide access to it directly to your threads.
You didnt provide any specifics on the file, the reads, etc so I cant say for sure if it would address your specific needs.
The general premise would be to load the file once in a single thread, and then either directly (via the Xml structure) or indirectly (via XmlNodes, etc) provide access to the file to each of your threads. I envision something similar to:
Load the file
For each Xpath query dispatch the matching nodes to your threads.
If the threads dont modify the XML directly, this might be a viable alternative.
When you open the file, you need to specify FileShare.Read :
using (var stream = new FileStream("theFile.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
{
...
}
That way the file can be opened multiple times for reading
While an old post, it seems to be a popular one so I thought I would add a solution that I have used to good effect for multi-threaded environments that need read access to a file. The file must however be small enough to hold in memory at least for the duration of your processing, and the file must only be read and not written to during the period of shared access.
string FileName = "TextFile.txt";
string[] FileContents = File.ReadAllLines(FileName);
foreach (string strOneLine in FileContents)
{
// Do work on each line of the file here
}
So long as the file is only being read, multiple threads or programs can access and process it at the same time without treading on one another's toes.

Concurrent file write

how to write to a text file that can be accessed by multiple sources (possibly in a concurrent way) ensuring that no write operation gets lost?
Like, if two different processes are writing in the same moment to the file, this can lead to problems. The simples solution (not very fast and not very elegant) would be locking the file while beginning the process (create a .lock file or similar) and release it (delete the lock) while the writing is done.
When beginning to write, i would check if the .lock file exists and delay the writing till the file is released.
What is the recommended pattern to follow for this kind of situation?
Thanks
EDIT
I mean processes, like different programs from different clients, different users and so on, not threads within the same program
Consider using a simple database. You will get all this built-in safety relatively easy.
The fastest way of synchronizing access between processes is to use Mutexes / Semaphores. This thread answers how to use them, to simulate read-writer lock pattern:
Is there a global named reader/writer lock?
I suggest using the ReaderWriterLock. It's designed for multiple readers but ensures only a single writer can writer data at any one time MSDN.
I would look at something like the Command Pattern for doing this. Concurrent writing would be a nightmare for data integrity.
Essentially you use this pattern to queue your write commands so that they are done in order of request.
You should also use the ReaderWriterLock to ensure that you can read from the file while writing occurs. This would be a second line of defense behind the command pattern so that only one thread could write to the file at a given time.
You can try lock too. It's easy - "lock ensures that one thread does not enter a critical section while another thread is in the critical section of code. If another thread attempts to enter a locked code, it will wait (block) until the object is released."
http://msdn.microsoft.com/en-us/library/c5kehkcz%28VS.71%29.aspx
I would also recommend you look for examples of having multiple readers and only 1 writer in a critical section heres a short paper with a good solution http://arxiv.org/PS_cache/cs/pdf/0303/0303005v1.pdf
Alternatively you could look at creating copies of the file each time it is requested and when it comes time to write any changes to the file you merge with the original file.

Categories