how to write to a text file that can be accessed by multiple sources (possibly in a concurrent way) ensuring that no write operation gets lost?
Like, if two different processes are writing in the same moment to the file, this can lead to problems. The simples solution (not very fast and not very elegant) would be locking the file while beginning the process (create a .lock file or similar) and release it (delete the lock) while the writing is done.
When beginning to write, i would check if the .lock file exists and delay the writing till the file is released.
What is the recommended pattern to follow for this kind of situation?
Thanks
EDIT
I mean processes, like different programs from different clients, different users and so on, not threads within the same program
Consider using a simple database. You will get all this built-in safety relatively easy.
The fastest way of synchronizing access between processes is to use Mutexes / Semaphores. This thread answers how to use them, to simulate read-writer lock pattern:
Is there a global named reader/writer lock?
I suggest using the ReaderWriterLock. It's designed for multiple readers but ensures only a single writer can writer data at any one time MSDN.
I would look at something like the Command Pattern for doing this. Concurrent writing would be a nightmare for data integrity.
Essentially you use this pattern to queue your write commands so that they are done in order of request.
You should also use the ReaderWriterLock to ensure that you can read from the file while writing occurs. This would be a second line of defense behind the command pattern so that only one thread could write to the file at a given time.
You can try lock too. It's easy - "lock ensures that one thread does not enter a critical section while another thread is in the critical section of code. If another thread attempts to enter a locked code, it will wait (block) until the object is released."
http://msdn.microsoft.com/en-us/library/c5kehkcz%28VS.71%29.aspx
I would also recommend you look for examples of having multiple readers and only 1 writer in a critical section heres a short paper with a good solution http://arxiv.org/PS_cache/cs/pdf/0303/0303005v1.pdf
Alternatively you could look at creating copies of the file each time it is requested and when it comes time to write any changes to the file you merge with the original file.
Related
Could I write (with IndexWriter) new documents into index while it is opened for reading (with IndexReader)? Or must I close reading before writing?
Could I read/search documents (with IndexReader) in index while it is opened for writing (with IndexWriter)? Or must I close writing before reading?
Is Lucene.Net thread safely or not? Or must I write my own?
You may have any amount of readers/searchers opened at any time, but only one writer. This is enforced by a directory specific lock, usually involving a file named "write.lock".
Readers open snapshots, and writers adds more data to the index. Readers need to be opened or reopened (IndexReader.Reopen) after your writer has commited (IndexWriter.Commit) the data for it to be seen, unless you're working with near-realtime-searches. This involves a special reader returned from (IndexWriter.GetReader) which will be able to see content up to the time the call to GetReader was executed. It also means that the reader may see data that will never be commited due to application logic calling IndexWriter.Rollback.
Searchers uses readers, so identical limitations on these. (Unlimited number of them, can only see what's already commited, unless based on a near-realtime reader.)
Lucene is thread-safe, and best practices is to share readers and searchers between several threads, while checking that IndexReader.IsCurrent() == true. You could have a background thread running that reopens the reader once it detects changes, create a new searcher, and then let the main threads use it. This would also allow you to prewarm any FieldCache you use to increase search speed when once the new searcher is in place.
As I found in this mail list
Lucene.NET is thread-safe. So you can share the same instance of IndexWriter
or IndexSearcher among threads. Using a write-lock, it also hinders a second
IndexWriter instance to be opened with the same index.
As I see I can write and read separatly; I'll check it;)
I have a piece of code that is executed by n threads. The code contains,
for(;;)//repeat about 10,000 times
{
lock(padlock)
{
File.AppendAllText(fileName, text);
}
}
Basically, all threads write to the same set of 10,000 files and hence the files are the shared resource. The issue is that the 10,000 open,write,close performed by each thread is slowing down my program considerably. If I could share the file handlers across threads, I'd be able to keep them open and write from different threads. Can someone throw some light on how I can proceed?
Let all threads write to a syncronised list.
Let one thread 'eat' the items on the list and write them with a single FileWriter to the file.
Presto problem solved in exchange for some extra memory usage.
I suggest you open the file using a FileStream and share that instead of the fileName.
I have 3 processes each of which listens to a data feed. All the 3 processes need to read and update the same file after receiving data. The 3 processes keep running whole day. Obviously, each process needs to have exclusive read/write lock on the file.
I could use "named mutex" to protect the read/write of the file or I can open the file using FileShare.None.
Would these 2 approaches work? Which one is better?
The programe is written in C# and runs on Windows.
Use a named mutex for this. If you open the file with FileShare.None in one process, the other processes will get an exception thrown when they attempt to open the file, which means you have to deal with waiting and retrying etc. in these processes.
I agree with the named mutex. Waiting / retrying on file access is very tedious and exception-prone, whereas the named mutex solution is very clean and straightforward.
"monitor" is one of you choise since it is the solution of synchronization problem.
I have a text file and multiple threads/processes will write to it (it's a log file).
The file gets corrupted sometimes because of concurrent writings.
I want to use a file writing mode from all of threads which is sequential at file-system level itself.
I know it's possible to use locks (mutex for multiple processes) and synchronize writing to this file but I prefer to open the file in the correct mode and leave the task to System.IO.
Is it possible ? what's the best practice for this scenario ?
Your best bet is just to use locks/mutexex. It's a simple approach, it works and you can easily understand it and reason about it.
When it comes to synchronization it often pays to start with the simplest solution that could work and only try to refine if you hit problems.
To my knowledge, Windows doesn't have what you're looking for. There is no file handle object that does automatic synchronization by blocking all other users while one is writing to the file.
If your logging involves the three steps, open file, write, close file, then you can have your threads try to open the file in exclusive mode (FileShare.None), catch the exception if unable to open, and then try again until success. I've found that tedious at best.
In my programs that log from multiple threads, I created a TextWriter descendant that is essentially a queue. Threads call the Write or WriteLine methods on that object, which formats the output and places it into a queue (using a BlockingCollection). A separate logging thread services that queue--pulling things from it and writing them to the log file. This has a few benefits:
Threads don't have to wait on each other in order to log
Only one thread is writing to the file
It's trivial to rotate logs (i.e. start a new log file every hour, etc.)
There's zero chance of an error because I forgot to do the locking on some thread
Doing this across processes would be a lot more difficult. I've never even considered trying to share a log file across processes. Were I to need that, I would create a separate application (a logging service). That application would do the actual writes, with the other applications passing the strings to be written. Again, that ensures that I can't screw things up, and my code remains simple (i.e. no explicit locking code in the clients).
you might be able to use File.Open() with a FileShare value set to None, and make each thread wait if it can't get access to the file.
I want to run three threads (i) one should append strings into a file.(ii) The other thread should remove special characters from the written stream.(iii)The third thread should sort the words in ascending order.How could i do it in a thread safe(synchronized) manner ?
I mean
Thread 1
sample.txt
Apple
Ma#22ngo
G#ra&&pes
Thread 2
(after removing special characters sample.txt)
Apple
Mango
Grapes
Thread 3 (sample.txt)
Apple
Grapes
Mango
Why do you want to do this using several threads? Perhaps your example has been oversimplified, but if you can avoid threading then do so - and it would appear that this problem doesn't need it.
Do you have lots of files, or one really large file, or something else? Why can't you simply perform the three actions one after another?
UPDATE
I felt I should at least try and help you solve the underlying problem you're facing. I think you need to consider the task you're looking at as a pipeline. You start with strings, you remove special characters (clean up), you write them to file. When you've finally written all the strings you need to sort them.
Everything up to the sorting stage can, and should, be done by a single thread. Read string, clean it, write it to file, move to next string. The final task of sorting can't easily happen until all the strings are written cleanly to the file.
If you have many files to write and sort then each of these can be dealt with by a seperate thread, but I would avoid involving multiple threads in the processing of any one given file.
I would perform operation 1 and 2 in the same thread by removing special characters before writing to the file. Operation 3 cannot be run in parallel with others because while the file is being written you cannot read it and sort. So basically these operations are sequential and it makes no sense to put them into separate threads.
You should implement a ThredSafe Queue (or use the one that comes with Parallels extensions).
Threads have the problem of sharing information so, even if theoretically your solution would be seen as a perfect parallel scenario, the reality is you can't just let the threads access freely to the shared data, because when you do, bad things happen.
Instead you can use a synchronization mechanism, in this case a ConcurrentQueue. That way you'll have this:
ConcurrentQueue<string> queueStrings;
ConcurrentQueue<string> queueFile;
Thread1 inserts strings into the queueStrings queue.
Thread2 reads strings from the queueString, process them, and then inserts them into the queueFile queue.
Finally, Thread3 reads the processed strings from queueFile and write them into the file.