Good morning,
Say I have some 6 different threads, and I want to share the same data with each of them at the same time. Can I make a class variable with the data I want to share and make each thread access that memory concurrently without performance downgrade, or is it preferable to pass a true copy of the data to each thread?
Thank you very much.
It depends entirely on the data;
if the data is immutable (or mutable but you don't actually mutate it), then chuck all the threads at it - great
if you need to mutate it, but no two threads will ever depend on the data mutated by another - great
if you need to mutate it, and there are conflicts but you can sensibly synchronize access to the data such that there is no risk of two threads deadlocking etc - great, but not always trivial
if it is not safe to make any assumptions, then a true clone of the data is the safest approach, but has the most overhead in terms of data duplication; if the data is cheap to copy, this may be fine - and indeed may outperform synchronization
if the threads do co-depend on each other, then you have no option other than to figure out some kind of sensibly locking strategy; again - to stress: deadlocks are a problem here - some ideas:
always provide a timeout when obtaining a lock
if you need to lock two items, it may help to try locking both eagerly (rather than locking one at the start, and the other after you've done lots of changes) - then you can simply release and re-take the locks, without having to either undo changes, or put the changes back into a particular state
Related
I search about Why .NET String is immutable? And got this answer:
Instances of immutable types are inherently thread-safe, since no
thread can modify it, the risk of a thread modifying it in a way that
interfers with another is removed (the reference itself is a different
matter).
So I want to know How Instances of immutable types are inherently thread-safe?
Why Instances of immutable types are inherently thread-safe?
Because an instance of a string type can't be mutated across multiple threads. This effectively means that one thread changing the string won't result in that same string being changed in another thread, since a new string is allocated in the place the mutation is taking place.
Generally, everything becomes easier when you create an object once, and then only observe it. Once you need to modify it, a new local copy gets created.
Wikipedia:
Immutable objects can be useful in multi-threaded applications.
Multiple threads can act on data represented by immutable objects
without concern of the data being changed by other threads. Immutable
objects are therefore considered to be more thread-safe than mutable
objects.
#xanatos (and wikipedia) point out that immutable isn't always thread-safe. We like to make that correlation because we say "any type which has persistent non-changing state is safe across thread boundaries", but may not be always the case. Assume a type is immutable from the "outside", but internally will need to modify it's state in a way which may not be safe when done in parallel from multiple threads, and may cause undetermined behavior. This means that although immutable, it is not thread safe.
To conclude, immutable != thread-safe. But immutability does take you one step closer, when done right, towards being able to do multi-threaded work correctly.
The short answer:
Because you only write the data in 1 thread and always read it after writing in multiple threads. Because there is no read/write conflict possible, it's thread safe.
The long answer:
A string is essentially a pointer to a buffer of memory. Basically what happens is that you create a buffer, fill it with characters and then expose the pointer to the outside world.
Note that you cannot access the contents of the string before the string object itself is constructed, which enforces this ordering of 'write data', then 'expose pointer'. If you would do it the other way around (I guess that's theoretically possible), problems might arrise.
If another thread (let's say: CPU) reads the pointer, it is a 'new pointer' for the CPU, which therefore requires the CPU to go to the 'real' memory and then read the data. If it would take the pointer contents from cache, we would have had a problem.
The last piece of the puzzle has to do with memory management: we have to know it's a 'new' pointer. In .NET we know this is the case: memory on the heap is basically never re-used until a GC occurs. The garbage collector then does a mark, sweep and compact.
Now, you might argue that the 'compact' phase reuses pointers, therefore changing the contents of the pointers. While this is true, the GC also has to stop the threads and force a full memory fence, which in simple terms, flushes the CPU cache. After that, all memory access is guaranteed, which ensures you always have to go to memory after the GC phase completes.
As you can see there is no way to read the data by not reading it directly from memory (the way it was written). Since it's immutable, the contents remain the same for all threads until it's eventually collected. As such, it's thread safe.
I've seen some discussion about immutable here, that suggests you can change an internal state. Of course, the moment you start changing things, you can potentially introduce read/write conflicts.
The definition of that I'm using here is to keep the contents constant after creation. That is: write once, read many, don't change (any) state after exposing the pointer. You get the picture.
One of the biggest problem in multi-threading code is two threads accessing the same memory cell at the same time with at least one of them modifying this memory cell.
If none of the threads can modify a memory cell, the problem does not exist any longer.
Because an immutable variable is not modifyable, it can be used from several threads without any further measures (for example locks).
If I can guarantee myself that only one method in my entire app will ever write to a certain variable, then may I allow other methods in my app to safely read that value ?
If so, can I get away that stunt without locking the variable ?
In this context, what I'm doing (or, trying to do, or want to do) is for one method in one thread to put a value into the variable, and then other methods in other threads will read that value and make decisions.
A very nice option would be to lock against writes, while allowing reads.
Looked here MSDN page on lock and didn't see a way to do that.
As always, it depends a lot on the context.
a variable read in a tight loop may be stored in a register or local cache, so no change will be noticed unless you have a "fence"; volatile will fix this, but as a side-effect rather than by explicit intention; most people (including me) can't properly define what volatile means - so be very careful of using it as a "fix".
an oversize type (large struct) will not be atomic (for either read or write) - and cannot be handled safely without risk of tearing
an object or value might involve multiple sub-values; if they aren't changed atomically, it could cause problems
You might, however, find that Interlocked solves most of your problems without needing a lock. At the same time, an uncontested lock is insanely fast, and even a contested lock is still alarmingly fast. Frankly, I'm not sure that it is worth the thought you are giving it: a flat lock is almost certainly fast-enough, as long as you do the thinking first outside the lock, and only lock it when you know the changes you want to make.
There is also ReaderWriterLockSlim, but the number of cases where that actually improves performance is slim - in my experience, the simplest approach possible is usually the fastest, meaning either lock or Interlocked. ReaderWriterLockSlim is a more complex beast, designed for more complex scenarios, and has a little overhead because of it. Not massive amounts, but enough to make it worth looking carefully.
Is it necessary to lock LINQ statements as follows? If omitting the lock, any exceptions will be countered when multiple threads execute it concurrently?
lock (syncKey)
{
return (from keyValue in dictionary
where keyValue.Key > versionNumber
select keyValue.Value).ToList();
}
PS: Writer threads do exist to mutate the dictionary.
Most types are thread-safe to read, but not thread-safe during mutation.
If none of the threads is changing the dictionary, then you don't need to do anything - just read away.
If, however, one of the threads is changing it then you have problems and need to synchronize. The simplest approach is a lock, however this prevents concurrent readers even when there is no writer. If there is a good chance you will have more readers that writers, consider using a ReaderWriterLockSlim to synchronize - this will allow any number of readers (with no writer), or: one writer.
In 4.0 you might also consider a ConcurrentDictionary<,>
So long as the query has no side-effects (such as any of the expressions calling code that make changes) there there is no need to lock a LINQ statement.
Basically, if you don't modify the data (and nothing else is modifying the data you are using) then you don't need locks.
If you are using .NET 4.0 and there is a ConcurrentDictionary that is thread safe. Here is an example of using a concurrent dictionary (admittedly not in a LINQ statement)
UPDATE
If you are modifying data then you need to use locks. If two or more threads attempt to access a locked section of code there will be a small performance loss as one or more of the threads waits for the lock to be released. NOTE: If you over-lock then you may end up with worse performance that you would if you had just built the code using a sequential algorithm from the start.
If you are only ever reading data then you don't need locks as there is no mutable shared state to protect.
If you do not use locks then you may end up with intermittent bugs where the data is not quite right or exceptions are thrown when collisions occur between readers and writers. In my experience, most of the time you may never get an exception, you just get corrupt data (except you don't necessarily know it is corrupt). Here is another example showing how data can be corrupted if you don't use locks or redesign your algorithm to cope.
You often get the best out of a system if you consider the constraints of developing in a parallel system from the outset. Sometimes you can re-write your code so it uses no shared data. Sometime you can split the data up into chunks and have each thread/task work on its own chunk then have some process at the end stitch it all back together again.
If your dictionary is static and a method where you run the query is not (or another concurrent access scenarios), and dictionary can be modified from another thread, then yes, lock is required otherwise - is not.
Yes, you need to lock your shared resources when using LINQ in multi-threaded scenarios (EDIT: of course, if your source collection is being modified as Marc said, if you are only reading it, you don't need to worry about it). If you are using .Net 4 or the parallel extensions for 3.5 you could look at replacing your Dictionary with a ConcurrentDictionary (or use some other custom implementation anyway).
I have found possible slowdown in my app so I would have two questions:
What is the real difference between simple locking on object and reader/writer locks?
E.g. I have a collection of clients, that change quickly. For iterations should I use readerlock or the simple lock is enough?
In order to decrease load, I have left iteration (only reading) of one collection without any locks. This collection changes often and quickly, but items are added and removed with writerlocks. Is it safe (I dont mind occassionally skipped item, this method runs in loop and its not critical) to left this reading unsecured by lock? I just dont want to have random exceptions.
No, your current scenario is not safe.
In particular, if a collection changes while you're iterating over it, you'll get an InvalidOperationException in the iterating thread. You should obtain a reader lock for the whole duration of your iterator:
Obtain reader lock
Iterate over collection
Release reader lock
Note this is not the same as obtaining a reader lock for each step of the iteration - that won't help.
As for the difference between reader/writer locks and "normal" locks - the idea of a reader/writer lock is that multiple threads can read at the same time, but only one thread can write (and only when no-one is reading). In some cases this can improve performance - but it increases the complexity of the solution too (in terms of getting it right). I'd also advise you to use ReaderWriterLockSlim from .NET 3.5 if you possibly can - it's much more efficient than the original ReaderWriterLock, and there are some inherent problems with ReaderWriterLock IIRC.
Personally I normally use simple locks until I've proved that lock contention is a performance bottleneck. Have you profiled your application yet to find out where the bottleneck is?
Ok first about the reading iteration without locks thing. It's not safe, and you shouldn't do it. Just to illustrate the point in the most simple way - you're iterating through a collection but you never know how many items are in that collection and have no way to find out. Where do you stop? Checking the count every iteration doesn't help because it can change after you check it but before you get the element.
ReaderWriterLock is designed for a situation where you allow multiple threads have concurrent read access, but force synchronous write. From the sounds of your application you don't have multiple concurrent readers, and writes are just as common as reads, so the ReaderWriterLock provides no benefit. You'd be better served by classic locking in this case.
In general whatever tiny performance benefits you squeeze out of not locking access to shared objects with multithreading are dramatically offset by random weirdness and unexplainable behavior. Lock everything that is shared, test the application, and then when everything works you can run a profiler on it, check just how much time the app is waiting on locks and then implement some dangerous trickery if needed. But chances are the impact is going to be small.
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified” - Donald Knuth
From the MSDN documentation:
"Synchronized supports multiple writing threads, provided that no threads are reading the Hashtable. The synchronized wrapper does not provide thread-safe access in the case of one or more readers and one or more writers."
Source:
http://msdn.microsoft.com/en-us/library/system.collections.hashtable.synchronized.aspx
It sounds like I still have to use locks anyways, so my question is why would we use Hashtable.Synchronized at all?
For the same reason there are different levels of DB transaction. You may care that writes are guaranteed, but not mind reading stale/possibly bad data.
EDIT I note that their specific example is an Enumerator. They can't handle this case in their wrapper, because if you break from the enumeration early, the wrapper class would have no way to know that it can release its lock.
Think instead of the case of a counter. Multiple threads can increase a value in the table, and you want to display the value of the count. It doesn't matter if you display 1,200,453 and the count is actually 1,200,454 - you just need it close. However, you don't want the data to be corrupt. This is a case where thread-safety is important for writes, but not reads.
For the case where you can guarantee that no reader will access the data structure when writing to it (or when you don't care reading wrong data). For example, where the structure is not continually being modified, but a one time calculation that you'll later have to access, although huge enough to warrant many threads writing to it.
you would need it when you are for-eaching over a hashtable on one thread (reads) and there exists other threads that may add/remove items to/from it (writes) ...