C# Synchronizing Threads depending on string - c#

In my program I need to process files. My program could use several threads to process files and therefore I need some sort of locking, because each file should not be processed by more than one thread at a time.
private object lockObj = new object();
public void processFile(string file)
{
lock(lockObj)
{
//... actuall processing
}
}
With the above code only one file can be processed at a time, but two threads should be able to process two different files at a time, but not the same file.
My first idea was to create a Dictionary with the key beeing the file and the value beeing the lock-object.
But I was wondering if it would also be possible to lock the string file? Any thoughts on this?
PS: sorry for not beeing able to find a better title

My first idea was to create a Dictionary with the key beeing the file and the value beeing the lock-object. But I was wondering if it would also be possible to lock the string file? Any thoughts on this?
If the strings will be created at runtime, locking on the string will not be safe. It would be better to make a dictionary of objects, and use those to lock.
That being said, you should consider using a ConcurrentDictionary<string,object> for this dictionary, as it will prevent a race condition (or other lock) in your dictionary itself. By using GetOrAdd you can safely get the appropriate object to use for locking.
That being said, a different approach may be appropriate here. You could use a ConcurrentQueue or BlockingCollection to provide a list of items to process, and then have a fixed number of threads process them. This will prevent synchronization problems from occurring in the first place.

I think you are approaching this the wrong way. You can simply use a producer-consumer pattern with a BlockingCollection. A thread keeps reading files and putting them in the queue (using Add) and a bunch of worker threads keep taking from the queue (using Take) and processing files. The way the queue is implemented it's guaranteed that two threads cannot retrieve the same file, so no explicit locking is needed.

If you're working with threads in c#, you owe it to yourself to check out the task parallel library (TPL). There's a learning curve, but once you get the hang of it your multi-threaded code will be simpler and more maintainable.
Here's an example that does exactly what your asking using TPL.

Related

If I use WriteAsync will I still have problems with when multiple backgroundworkers try to write to same .txt file?

I have multiple backgroundworkers that all wants to write to log.txt, which results in the exception The process cannot access the file 'C:\...\log.txt' because it is being used by another process.. I know it's a long shot but would it help if I used WriteAsync() instead or would it have no effect at all?
(If that's not a simple solution, I guess I have to implement the mutex object I've seen before.)
public static void WriteToLog(string text, bool append = true)
{
try
{
using (var writer = new StreamWriter("log.txt", append))
{
writer.Write(text);
// writer.WriteAsync(text); // Would this 'queue up' instead of trying
to access the same process at the same time?
}
}
catch (Exception ex)
{
Console.WriteLine($"ERROR! Fejl i loggen! {ex.Message}. {ex.StackTrace}");
}
}
To actually answer your question. No, it wont save you from the locking issue. async is not a magic keyword that will synchronize all thread. On the opposite it might even start its own thread depending on the synchronizer.
Unless you are on a single thread model then yes this will queue up since the synchronizer only has one thread to work with. It will then have to queue up all async calls with context switch. However if you are on single thread model you wouldnt have this problem in the first place.
You can solve the problem with multiple ways.
Use a locking mechanism to synchronize access to a shared resource. One good option for this scenario is
ReaderWriterLockSlim
Use a logging framework(there are a lot of good libraries and very reliable).
Personally i would prefer going with a logging framework, as there are many features that you will use useful (rolling file appender, db logger, etc) that will offer you a clean solution for logging with zero hacks and maintenance.
While using a logging framework is the best solution, to specifically address the issue...
The append mode requires the file to be locked, and when a lock can't be obtained you get the error you're receiving. You could synchronize all threads but then you'd be blocking them for a time. Using WriteAsync does not alleviate the problem.
A better solution is to enqueue your messages and then have a dedicated thread dequeue them and write to the log. Thus, you need no synchronization because all writes are done by a single thread.
I will warn again: use a logging framework.

ConcurrentDictionary Object - Reading and writing via different threads

I want to use a ConcurrentDictionary in my app, but first I need to make sure I understand correctly how it works. In my app, I'll have one or more threads that write to, or delete from, the dictionary. And, I'll have one or more threads that read from the dictionary. Potentially, all at the same time.
Am I correct that the implementation of ConcurrentDictionary takes care of all the required locking for this to happen, and I don't need to provide my own locking? In other words, if one thread is writing to, or deleting from, the dictionary, a reading thread (or another write thread) will be blocked until the update or delete is finished?
Thanks very much.
The current implementation uses a mixture of striped locks (the technique I suggested in an answer to someone yesterday at https://stackoverflow.com/a/11950835/400547) and thinking very very hard about the situations in which an operation cannot possibly cause problems for or have problems cause by, a concurrent operation (there's quite a lot of these, but you have to be very sure if you make use of them).
As such if you have several operations happening on the concurrent dictionary at once, each of the following is possible:
No threads even lock, but everything happens correctly.
Some threads lock, but they lock on separate things, and there is no lock contention.
One or two threads have lock contention with each other, and are slowed down, but the effect upon performance is less than if there were a single lock.
One or two threads need to lock the entire thing for a while (generally for internal resizing) which blocks all the threads that could possibly be blocked in case 3 above, though some can keep going (those that read).
None of this involves dirty reads, which is a matter only vaguely related to locking (my own form of concurrent dictionary uses no locks at all, and it doesn't have dirty reads either).
This thread-safety doesn't apply to batches done by your code (if you read a value and then write a value, the value read may have changed before you finished the write), but note that some common cases which would require a couple of calls on Dictionary are catered for by single methods on ConcurrentDictionary (GetOrAdd and AddOrUpdate do things that would be two calls with a Dictionary so they can be done atomically - though note that the Func involved in some overloads may be called more than once).
Due to this, there's no added danger with ConcurrentDictionary, so you should pick as follows:
If you're going to have to lock over some batches of operations that don't match what ConcurrentDictionary offers like e.g.:
lock(lockObj)
{
var test = dict[key1];
var test2 = dict[key2];
if(test < test2 && test2 < dict[key3] && SomeOtherBooleanProducer())
dict[key4] = SomeFactoryCall(key4);
}
Then you would have to lock on ConcurrentDictionary, and while there may be a way to combine that with what it offers in the way of support for concurrency, there probably won't, so just use Dictionary with a lock.
Otherwise it comes down to how much concurrent hits there will probably be. If you're mostly only going to have one thread hitting the dictionary, but you need to guard against the possibility of concurrent access, then you should definitely go for Dictionary with a lock. If you're going to have periods where half a dozen or more threads are hitting the dictionary, then you should definitely go for ConcurrentDictionary (if they're likely to be hitting the same small number of keys then take a look at my version because that's the one situation where I have better performance).
Just where the middle point between "few" and "many" threads lies, is hard to say. I'd say that if there are more than two threads on a regular basis then go with ConcurrentDictionary. If nothing else, demands from concurrency tend to increase throughout the lifetime of a project more often than they decrease.
Edit: To answer about the particular case you give, of one writer and one reader, there won't be any blocking at all, as that is safe for roughly the same reason why multiple readers and one writer is safe on Hashtable, though ConcurrentDictionary goes beyond that in several ways.
In other words, if one thread is writing to, or deleting from, the dictionary, a reading thread (or another write thread) will be blocked until the update or delete is finished?
I don't believe it will block - it will just be safe. There won't be any corruption - you'll just have a race in terms of whether the read sees the write.
From a FAQ about the lock-free-ness of the concurrent collections:
ConcurrentDictionary<TKey,TValue> uses fine-grained locking when adding to or updating data in the dictionary, but it is entirely lock-free for read operations. In this way, it’s optimized for scenarios where reading from the dictionary is the most frequent operation.

how to write to file in an multithreaded environment

I have a program that runs on multithread but all of them need to save results to same text file
I get access violation error
how can i avoid doing that
Wrap file IO into a lock statement:
private static object _syncRoot = new object();
and then:
lock(_syncRoot)
{
// do whatever you have to do with this file
}
Take a look at the lock statement: http://msdn.microsoft.com/en-us/library/c5kehkcz.aspx
The simplest is to simply make sure you have some locking construct (mutex, monitor, etc) against access to the file, then each thread can access it in isolation. This could either be accessing the same underlying Stream/TextWriter/etc, or could be opening/closing the file inside the locked region.
A more complex approach would be to have a dedicated writer thread, and a synchronised work queue. Then all threads can add to the queue and a single thread draughts and writes to the file. This means your main threads are only blocked while adding to a queue (very brief), rather than blocked on IO (slower). However, note that if the process exits abnormally, data in the queue may be lost.
I would recommend reading up on the ReaderWriterLock class or the ReaderWriterLockSlim class which is faster but has some gotchas, I believe it would suit your needs perfectly.
ReaderWriterLock
ReaderWriterLockSlim

How to implement thread safety for the tree structure used for the following scenario?

I am making use of the C# code located at the following links to implement a Ram-disk project.
Link to description of source code
Link to source code
As a summary, the code indicated above makes use of a simple tree structure to store the directories, sub-directories and files. At the root is a MemoryFolder object which stores zero or more 'MemoryFolder' objects and/or MemoryFile objects. Each MemoryFolder object in turn stores zero or more MemoryFolder objects and/or MemoryFile objects and so forth up to an unlimited depth.
However, the code is not thread safe. What is the most elegant way of implementing thread safety? In addition, how should the following non-exhaustive list of multithreading requirements for a typical file system be enforced by using the appropriate locking strategy?
The creation of two different folder (each by a different thread) simultaneously under the same
parent folder can occur concurrently if the thread safe
implementation allows it. Otherwise, some locking strategy should be
implemented to only allow sequential creation.
None of the direct or indirect parent folders of the folder
containing a specific file (that is currently read by another
thread) propagating all the way up to the root folder can be moved
or deleted by another thread until the ReadFile thread completes its
execution.
With regards to each unique file, allows concurrent access for multiple ReadFile threads but restricting access to a single WriteFile thread.
If two separate ReadFile threads (fired almost simultaneously),
each from a different application attempts to create a folder with
the same name (assuming that the folder does not already exist
before both threads are fired), the first thread that enters the
Ram-Disk always succeeds while the second one always fails. In other
words, the order of thread execution is deterministic.
The total disk space calculation method GetDiskFreeSpace running
under a separate thread should not complete its execution until all
WriteFile threads that are already in progress complete its execution. All subsequent WriteFile threads that have not begun executing are blocked until the GetDiskFreeSpace thread completes its execution.
The easiest way to do this would be to protect the entire tree with a ReaderWriterLockSlim. That allows concurrent access by multiple readers or exclusive access by a single writer. Any method that will modify the structure in any way will have to acquire the write lock, and no other threads will be allowed to read or write to the structure until that thread releases the write lock.
Any thread that wants to read the structure has to acquire the read lock. Multiple readers can acquire the read lock concurrently, but if a thread wants to acquire the write lock--which means waiting until all existing read locks are released.
There might be a way to make that data structure lock-free. Doing so, however, could be quite difficult. The reader/writer lock will give you the functionality you want, and I suspect it would be fast enough.
If you want to share this across processes, that's another story. The ReaderWriterLockSlim doesn't work across processes. You could, however, implement something similar using a combination of the synchronization primitives, or create a device driver (or service) that serves the requests, thereby keeping it all in the same process.

What "thread safe" really means...In Practical terms

please bear with my newbie questions..
I was trying to convert PDF to PNG using ghostscript, with ASP.NET and C#. However, I also read that ghostscript is not thread safe. So my questions are:
What exactly does "ghostscript is not thread safe" mean in practical terms? What impact does it have if I use it in a live ASP.NET(aspx) web application with many concurrent users accessing it at the same time?
I also read from another site that the major feature of ghostscript ver. 8.63 is multithreaded rendering. Does this mean our thread safe issue is now resolved? Is ghostscript thread safe now?
I am also evaluating PDF2Image from PDFTron, which is supposed to be thread safe. But the per CPU license doesn't come cheap. Is it worth paying the extra money for "thread safe" vs "not safe"?
A precise technical definition that everyone agrees on is difficult to come up with.
Informally, "thread safe" simply means "is reasonably well-behaved when called from multiple threads". The object will not crash or produce crazy results when called from multiple threads.
The question you actually need to get answered if you intend to do multi-threaded programming involving a particular object is "what is the threading model expected by the object?"
There are a bunch of different threading models. For example, the "free threaded" model is "do whatever you want from any thread; the object will deal with it." That's the easiest model for you to deal with, and the hardest for the object provider to provide.
On the other end of the spectrum is the "single threaded" model -- all instances of all objects must be accessed from a single thread, period.
And then there's a bunch of stuff in the middle. The "apartment threaded" model is "you can create two instances on two different threads, but whatever thread you use to create an instance is the thread you must always use to call methods on that instance".
The "rental threaded" model is "you can call one instance on two different threads, but you are responsible for ensuring that no two threads are ever doing so at the same time".
And so on. Find out what the threading model your object expects before you attempt to write threading code against it.
Given that a Collection, for instance, is not threasafe:
var myDic = new Dictionary<string, string>();
In a multhread environment, this will throw:
string s = null;
if (!myDic.TryGetValue("keyName", out s)) {
s = new string('#', 10);
myDic.Add("keyName", s);
}
As one thread is working trying to add the KeyValuePair to the dictionary myDic, another one may TryGetValue(). As Collections can't be read and written at the same time, an Exception will occur.
However, on the other hand, if you try this:
// Other threads will wait here until the variable myDic gets unlocked from the preceding thread that has locked it.
lock (myDic) {
string s = null;
if (!myDic.TryGetValue("keyName", out s)) {
s = new string('#', 10);
myDic.Add("keyName", s);
}
} // The first thread that locked the myDic variable will now release the lock so that other threads will be able to work with the variable.
Then suddenly, the second thread trying to get the same "keyName" key value will not have to add it to the dictionary as the first thread already added it.
So in short, threadsafe means that an object supports being used by multiple threads at the same time, or will lock the threads appropriately for you, without you having to worry about threadsafety.
2. I don't think GhostScript is now threadsafe. It is majorly using multiple threads to perform its tasks, so this makes it deliver a greater performance, that's all.
3. Depending on your budget and your requirements, it may be worthy. But if you build around wrapper, you could perhaps only lock() where it is convenient to do so, or if you do not use multithreading yourself, it is definitely not worth to pay for threadsafety. This means only that if YOUR application uses multithreading, then you will not suffer the consequences of a library not being threadsafe. Unless you really multihread, it is not worth paying for a threadsafe library.
I am a Ghostscript developer, and won't repeat the general theory about thread safety. We have been working on getting GS to be thread safe so that multiple 'instances' can be created using gsapi_new_instance from within a single process, but we have not yet completed this to our satisfaction (which includes our QA testing of this). The graphics library is, however, thread safe and the multi-threaded rendering relies on this to allow us to spawn multiple threads to render bands from a display list in parallel. The multi-threaded rendering has been subjected to a lot of QA testing and is used by many commercial licensees to improve performance on multi-core CPU's.
You can bet we will announce when we finally support multiple instances of GS. Most people that want to use current GS from applications that need multiple instances spawn separate processes for each instance so that GS doesn't need to be thread safe. The GS can run a job as determined by the argument list options or I/O can be piped to/from the process to provide data and collect output.
1) It means if you share the same Ghostscript objects or fields among multiple threads, it will crash. For example:
private GhostScript someGSObject = new GhostScript();
...
// Uh oh, 2 threads using shared memory. This can crash!
thread1.Use(someGSObject);
thread2.Use(someGSObject);
2) I don't think so - multithreaded rendering suggests GS is internally using multiple threads to render. It doesn't address the problem of GS being unsafe for use from multiple threads.
3) Is there a question in there?
To make GhostScript thread safe, make sure only 1 thread at a time is accessing it. You can do this via locks:
lock(someObject)
{
thread1.Use(someGSObject);
}
lock(someObject)
{
thread2.Use(someGSObject);
}
If you are using ghostscript from a shell object (i.e. running a command line to process the file) you will not be caught by threading problems because every instance running will in a different process on the server. Where you need to be careful is when you have a dll that you are using from C# to process the PDF, that code would need to be synchronized to keep from two threads from executing the same code at the same time.
Thread safe basically means that a piece of code will function correctly even when accessed by multiple threads. Multiple problems can occur if you use non-thread safe code in a threaded application. The most common problem is deadlocking. However, there are much more nefarious problems (race conditions) which can be more of a problem because thread issues are notoriously difficult to debug.
No. Multithreaded rendering just means that GS will be able to render faster because it is using threads to render (in theory, anyway - not always true in practice).
That really depends on what you want to use your renderer for. If you are going to be accessing your application with multiple threads, then, yes, you'll need to worry about it being thread safe. Otherwise, it's not a big deal.
In general it is an ambiguous term.
Thread-Safety could be at the conceptual level, where you have correct synchronization of your shared data. This is usually, what is meant by library writers.
Sometimes, it means concurrency is defined at the language level. i.e. the memory model of the language supports concurrency. This is tricky! because as a library writer you can't produce concurrent libraries, because the language have no guarantees for many essential primitives that are needed to use. This concerns compiler writers more than library users. C# is thread-safe in that sense.
I know I didn't answer your question directly, but hope that helps.

Categories