how to write to file in an multithreaded environment

how to write to file in an multithreaded environment - c#

I have a program that runs on multithread but all of them need to save results to same text file
I get access violation error
how can i avoid doing that

Wrap file IO into a lock statement:
private static object _syncRoot = new object();
and then:
lock(_syncRoot)
{
// do whatever you have to do with this file
}

Take a look at the lock statement: http://msdn.microsoft.com/en-us/library/c5kehkcz.aspx

The simplest is to simply make sure you have some locking construct (mutex, monitor, etc) against access to the file, then each thread can access it in isolation. This could either be accessing the same underlying Stream/TextWriter/etc, or could be opening/closing the file inside the locked region.
A more complex approach would be to have a dedicated writer thread, and a synchronised work queue. Then all threads can add to the queue and a single thread draughts and writes to the file. This means your main threads are only blocked while adding to a queue (very brief), rather than blocked on IO (slower). However, note that if the process exits abnormally, data in the queue may be lost.

I would recommend reading up on the ReaderWriterLock class or the ReaderWriterLockSlim class which is faster but has some gotchas, I believe it would suit your needs perfectly.
ReaderWriterLock
ReaderWriterLockSlim

Related

If I use WriteAsync will I still have problems with when multiple backgroundworkers try to write to same .txt file?

I have multiple backgroundworkers that all wants to write to log.txt, which results in the exception The process cannot access the file 'C:\...\log.txt' because it is being used by another process.. I know it's a long shot but would it help if I used WriteAsync() instead or would it have no effect at all?
(If that's not a simple solution, I guess I have to implement the mutex object I've seen before.)
public static void WriteToLog(string text, bool append = true)
{
try
{
using (var writer = new StreamWriter("log.txt", append))
{
writer.Write(text);
// writer.WriteAsync(text); // Would this 'queue up' instead of trying
to access the same process at the same time?
}
}
catch (Exception ex)
{
Console.WriteLine($"ERROR! Fejl i loggen! {ex.Message}. {ex.StackTrace}");
}
}

To actually answer your question. No, it wont save you from the locking issue. async is not a magic keyword that will synchronize all thread. On the opposite it might even start its own thread depending on the synchronizer.
Unless you are on a single thread model then yes this will queue up since the synchronizer only has one thread to work with. It will then have to queue up all async calls with context switch. However if you are on single thread model you wouldnt have this problem in the first place.

You can solve the problem with multiple ways.
Use a locking mechanism to synchronize access to a shared resource. One good option for this scenario is
ReaderWriterLockSlim
Use a logging framework(there are a lot of good libraries and very reliable).
Personally i would prefer going with a logging framework, as there are many features that you will use useful (rolling file appender, db logger, etc) that will offer you a clean solution for logging with zero hacks and maintenance.

While using a logging framework is the best solution, to specifically address the issue...
The append mode requires the file to be locked, and when a lock can't be obtained you get the error you're receiving. You could synchronize all threads but then you'd be blocking them for a time. Using WriteAsync does not alleviate the problem.
A better solution is to enqueue your messages and then have a dedicated thread dequeue them and write to the log. Thus, you need no synchronization because all writes are done by a single thread.
I will warn again: use a logging framework.

Cannot access file because it's used by another process

I am using TvdbLib in a program. This library can use a cache for loading TV series quicker. To further improve the speed of the program, I do all my loading of TV series on separate threads. When two threads run simultaneously and try to read/write from the cache simultaneously, I will get the following error:
The process cannot access the file
'C:\BinaryCache\79349\series_79349.ser' because it is being used by
another process.
Does anyone know how to avoid this and still have the program running smoothly?

CacheProvider is not built for being used in multi-threaded scenarios... either use it in one thread only or lock on every access via a shared object or supply every thread with its own CacheProvider and its own distinct _root directory (in the constructor).

You can use the lock statement to ensure only one thread is accessing the cache at the same time:
http://msdn.microsoft.com/en-us/library/c5kehkcz(v=vs.71).aspx

From the error I assume that TvdbLib does not support multiple concurrent threads accessing the same cache. As it is an open source project, you could get the source code and implement your own protection around the cache access, e.g., using the lock statement. Of course, you could lock within your own code before it calls TvdbLib, but because this will be a higher level, the lock will be maintained for longer and you may not get the fine-grained concurrency that you want.

C# Synchronizing Threads depending on string

In my program I need to process files. My program could use several threads to process files and therefore I need some sort of locking, because each file should not be processed by more than one thread at a time.
private object lockObj = new object();
public void processFile(string file)
{
lock(lockObj)
{
//... actuall processing
}
}
With the above code only one file can be processed at a time, but two threads should be able to process two different files at a time, but not the same file.
My first idea was to create a Dictionary with the key beeing the file and the value beeing the lock-object.
But I was wondering if it would also be possible to lock the string file? Any thoughts on this?
PS: sorry for not beeing able to find a better title

My first idea was to create a Dictionary with the key beeing the file and the value beeing the lock-object. But I was wondering if it would also be possible to lock the string file? Any thoughts on this?
If the strings will be created at runtime, locking on the string will not be safe. It would be better to make a dictionary of objects, and use those to lock.
That being said, you should consider using a ConcurrentDictionary<string,object> for this dictionary, as it will prevent a race condition (or other lock) in your dictionary itself. By using GetOrAdd you can safely get the appropriate object to use for locking.
That being said, a different approach may be appropriate here. You could use a ConcurrentQueue or BlockingCollection to provide a list of items to process, and then have a fixed number of threads process them. This will prevent synchronization problems from occurring in the first place.

I think you are approaching this the wrong way. You can simply use a producer-consumer pattern with a BlockingCollection. A thread keeps reading files and putting them in the queue (using Add) and a bunch of worker threads keep taking from the queue (using Take) and processing files. The way the queue is implemented it's guaranteed that two threads cannot retrieve the same file, so no explicit locking is needed.

If you're working with threads in c#, you owe it to yourself to check out the task parallel library (TPL). There's a learning curve, but once you get the hang of it your multi-threaded code will be simpler and more maintainable.
Here's an example that does exactly what your asking using TPL.

How to implement thread safety for the tree structure used for the following scenario?

I am making use of the C# code located at the following links to implement a Ram-disk project.
Link to description of source code
Link to source code
As a summary, the code indicated above makes use of a simple tree structure to store the directories, sub-directories and files. At the root is a MemoryFolder object which stores zero or more 'MemoryFolder' objects and/or MemoryFile objects. Each MemoryFolder object in turn stores zero or more MemoryFolder objects and/or MemoryFile objects and so forth up to an unlimited depth.
However, the code is not thread safe. What is the most elegant way of implementing thread safety? In addition, how should the following non-exhaustive list of multithreading requirements for a typical file system be enforced by using the appropriate locking strategy?
The creation of two different folder (each by a different thread) simultaneously under the same
parent folder can occur concurrently if the thread safe
implementation allows it. Otherwise, some locking strategy should be
implemented to only allow sequential creation.
None of the direct or indirect parent folders of the folder
containing a specific file (that is currently read by another
thread) propagating all the way up to the root folder can be moved
or deleted by another thread until the ReadFile thread completes its
execution.
With regards to each unique file, allows concurrent access for multiple ReadFile threads but restricting access to a single WriteFile thread.
If two separate ReadFile threads (fired almost simultaneously),
each from a different application attempts to create a folder with
the same name (assuming that the folder does not already exist
before both threads are fired), the first thread that enters the
Ram-Disk always succeeds while the second one always fails. In other
words, the order of thread execution is deterministic.
The total disk space calculation method GetDiskFreeSpace running
under a separate thread should not complete its execution until all
WriteFile threads that are already in progress complete its execution. All subsequent WriteFile threads that have not begun executing are blocked until the GetDiskFreeSpace thread completes its execution.

The easiest way to do this would be to protect the entire tree with a ReaderWriterLockSlim. That allows concurrent access by multiple readers or exclusive access by a single writer. Any method that will modify the structure in any way will have to acquire the write lock, and no other threads will be allowed to read or write to the structure until that thread releases the write lock.
Any thread that wants to read the structure has to acquire the read lock. Multiple readers can acquire the read lock concurrently, but if a thread wants to acquire the write lock--which means waiting until all existing read locks are released.
There might be a way to make that data structure lock-free. Doing so, however, could be quite difficult. The reader/writer lock will give you the functionality you want, and I suspect it would be fast enough.
If you want to share this across processes, that's another story. The ReaderWriterLockSlim doesn't work across processes. You could, however, implement something similar using a combination of the synchronization primitives, or create a device driver (or service) that serves the requests, thereby keeping it all in the same process.

Will lock() statement block all threads in the process/appdomain?

Maybe the question sounds silly, but I don't understand 'something about threads and locking and I would like to get a confirmation (here's why I ask).
So, if I have 10 servers and 10 request in the same time come to each server, that's 100 request across the farm. Without locking, thats 100 request to the database.
If I do something like this:
private static readonly object myLockHolder = new object();
if (Cache[key] == null)
{
lock(myLockHolder)
{
if (Cache[key] == null)
{
Cache[key] = LengthyDatabaseCall();
}
}
}
How many database requests will I do? 10? 100? Or as much as I have threads?

You have a hierarchy of objects:
You have servers (10)
On each server you have processes (probably only 1 - your service/app pool)
In each process you have threads (probably many)
Your code will only prohibit threads within the same process on the same server access to modify the Cache object simultaneously. You can create locks across processes and even across servers, but the cost increases a lot as you move up the hierarchy.
Using the lock statement does not actually lock any threads. However, if one thread is executing code inside the lock (that is in the block of code following the lock statement) any other thread that wants to take the lock and execute the same code has to wait until the first thread holding the lock leaves the block of code and releases the lock.
The C# lock statement uses a Windows critical section which a lightweight locking mechanism. If you want to lock across processes you can use a mutex instead. To lock across servers you can use a database or a shared file.
As dkackman has pointed out .NET has the concept of an AppDomain that is a kind of lightweight process. You can have multiple AppDomains per process. The C# lock statement only locks a resource within a single AppDomain, and a proper description of the hierarchy would include the AppDomain below the process and above the threads. However, quite often you only have a single AppDomain in a process making the distinction somewhat irrelevant.

The C# lock statement locks on a particular instance of an object (the object you created with new object()). Objects are (in most cases) not shared across AppDomains, thus if you are having 10 servers, 10 threads can run concurrently access your database with that piece of code.

Lock is not blocking threads.
It is locking some instance of an object. And each thread which tries to access it is blocked.
So in your case each thread which will try to access myLockHolder will be locked and not all the threads.
In other words we can say that Lock statement is syntactic sugar for using Critical Section.
Like you can see in MSDN :
lock(expression) statement block
where:
expression Specifies the object that you want to lock on. expression must
be a reference type. Typically,
expression will either be this, if you
want to protect an instance variable,
or typeof(class), if you want to
protect a static variable (or if the
critical section occurs in a static
method in the given class).
statement block The statements of the critical section.

lock will block all threads in that application from accessing the myLockHolder object.
So if you have 10 instances of the application running you'll get 10 requests to the server while the object is locked on each. The moment you exit the lock statement, the next request will process in that application, but as long as Cache[key] is not null, it won't access the database..
The number of actual requests you get depends on what happens here:
if (Cache[key] == null)
{
Cache[key] = LengthyDatabaseCall();
}
If LengthyDatabaseCall(); fails, the next request will try and access the database server and retrieve the information as well, so really your best case scenario is that there will only be 10 requests to the server.

Only the threads that need access to your shared variable at the moment another thread is using it will go into a wait state.
how many that is at any give time is hard to determine.

Your DB will get 10 requests, with odds being good that requests 2-10 run much faster than request 1.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.