Why use Hashtable.Synchronized? - c#

From the MSDN documentation:
"Synchronized supports multiple writing threads, provided that no threads are reading the Hashtable. The synchronized wrapper does not provide thread-safe access in the case of one or more readers and one or more writers."
Source:
http://msdn.microsoft.com/en-us/library/system.collections.hashtable.synchronized.aspx
It sounds like I still have to use locks anyways, so my question is why would we use Hashtable.Synchronized at all?

For the same reason there are different levels of DB transaction. You may care that writes are guaranteed, but not mind reading stale/possibly bad data.
EDIT I note that their specific example is an Enumerator. They can't handle this case in their wrapper, because if you break from the enumeration early, the wrapper class would have no way to know that it can release its lock.
Think instead of the case of a counter. Multiple threads can increase a value in the table, and you want to display the value of the count. It doesn't matter if you display 1,200,453 and the count is actually 1,200,454 - you just need it close. However, you don't want the data to be corrupt. This is a case where thread-safety is important for writes, but not reads.

For the case where you can guarantee that no reader will access the data structure when writing to it (or when you don't care reading wrong data). For example, where the structure is not continually being modified, but a one time calculation that you'll later have to access, although huge enough to warrant many threads writing to it.

you would need it when you are for-eaching over a hashtable on one thread (reads) and there exists other threads that may add/remove items to/from it (writes) ...

Related

How should I share a large read-only List<T> with each Task.Factory.StartNew() method

Consider that I have a custom class called Terms and that class contains a number of strings properties. Then I create a fairly large (say 50,000) List<Terms> object. This List<Terms> only needs to be read from but it needs to be read from by multiple instances of Task.Factory.StartNew (the number of instances could vary from 1 to 100s).
How would I best pass that list into the long running task? Memory isn't too much of a concern as this is a custom application for a specific use on a specific server with plenty of memory. Should I reference it or should I just pass it off as a normal argument into the method doing the work?
Since you're passing a reference it doesn't really matter how you pass it, it won't copy the list itself. As Ket Smith said, I would pass it as a parameter to the method you are executing.
The issue is List<T> is not entirely thread-safe. Reads by multiple threads are safe but a write can cause some issues:
It is safe to perform multiple read operations on a List, but issues can occur if the collection is modified while it’s being read. To ensure thread safety, lock the collection during a read or write operation.
From List<T>
You say your list is read-only so that may be a non-issue, but a single unpredictable change could lead to unexpected behavior and so it's bug-prone.
I recommend using ImmutableList<T> which is inherently thread-safe since it's immutable.
So long as you don't try to copy it into each separate task, it shouldn't make much difference: more a matter of coding style than anything else. Each task will still be working with the same list in memory: just a different reference to the same underlying list.
That said, sheerly as a matter of coding style and maintainability, I'd probably try to pass it in as a parameter to whatever method you're executing in your Task.Factory.StartNew() (or better yet, Task.Run() - see here). That way, you've clearly called out your task's dependencies, and if you decide that you need to get the list from some other place, it's more clear what you've got to change. (But you could probably find 20 places in my own code where I haven't followed that rule: sometimes I go with what's easier for me now than with what's likely to be easier for the me six months from now.)

Multiple writers, one reader, which collection

My situation is this:
Multiple threads must write concurrently to the same collection (add and addrange). Order of items is not an issue.
When all threads have completed (join) and im back on my main thread, then I need to read all the collected data fast in a foreach style, where no actual locking is needed since all threads are done.
In the "old days" I would probably use a readerwriter lock for this on a List, but with the new concurrent collections I wonder if not there is a better alternative. I just can't figure out which as most concurrent collections seem to assume that the reader is also on a concurrent thread.
I don't believe you want to use any of the collections in System.Collections.Concurrent. These generally have extra overhead to allow for concurrent reading.
Unless you have a lot of contention, you are probably better taking a lock on a simple List<T> and adding to it. You will have a small amount of overhead as the List resizes, but it will be fairly infrequent.
However, what I would probably do in this case is simply adding to a List<T> per thread rather than a shared one, and either merge them at the end of processing, or simply iterate over all elements in each of the collections.
You could possibly use a ConcurrentBag and then call .ToArray() or GetEnumerator() on it when ready to read (bypassing a per-read penalty), but you will may find that the speed of insertions is a bit slower than your manual write lock on a simple List. It really depends on amount of contention. The ConcurrentBag is pretty good about partitioning but as you noted, is geared to concurrent reads and writes.
As always, benchmark your particular situation! Multithreading performance is highly dependent on many things in actual usage, and things like type of data, number of insertions, and such will change the results dramatically - a handful of reality is worth a gallon of theory.
Order of items is not an issue. When all threads have completed (join) and im back on my main thread, then I need to read all the collected data
You have not stated a requirement for a thread-safe collection at all. There's no point in sharing a single collection since you never read at the same time you write. Nor does it matter that all writing happens to the same collection since order doesn't matter. Nor should it matter since order would be random anyway.
So just give each thread its own collection to fill, no locking required. And iterate them one by one afterwards, no locking required.
Try the System.Collections.Concurrent.ConcurrentBag.
From the collection's description:
Represents a thread-safe, unordered collection of objects.
I believe this meets your criteria of handling multiple threads and order of items not being important, and later when you are back in the main thread, you can quickly foreach iterate over the collection and act on each item.

C# lock with LINQ query

Is it necessary to lock LINQ statements as follows? If omitting the lock, any exceptions will be countered when multiple threads execute it concurrently?
lock (syncKey)
{
return (from keyValue in dictionary
where keyValue.Key > versionNumber
select keyValue.Value).ToList();
}
PS: Writer threads do exist to mutate the dictionary.
Most types are thread-safe to read, but not thread-safe during mutation.
If none of the threads is changing the dictionary, then you don't need to do anything - just read away.
If, however, one of the threads is changing it then you have problems and need to synchronize. The simplest approach is a lock, however this prevents concurrent readers even when there is no writer. If there is a good chance you will have more readers that writers, consider using a ReaderWriterLockSlim to synchronize - this will allow any number of readers (with no writer), or: one writer.
In 4.0 you might also consider a ConcurrentDictionary<,>
So long as the query has no side-effects (such as any of the expressions calling code that make changes) there there is no need to lock a LINQ statement.
Basically, if you don't modify the data (and nothing else is modifying the data you are using) then you don't need locks.
If you are using .NET 4.0 and there is a ConcurrentDictionary that is thread safe. Here is an example of using a concurrent dictionary (admittedly not in a LINQ statement)
UPDATE
If you are modifying data then you need to use locks. If two or more threads attempt to access a locked section of code there will be a small performance loss as one or more of the threads waits for the lock to be released. NOTE: If you over-lock then you may end up with worse performance that you would if you had just built the code using a sequential algorithm from the start.
If you are only ever reading data then you don't need locks as there is no mutable shared state to protect.
If you do not use locks then you may end up with intermittent bugs where the data is not quite right or exceptions are thrown when collisions occur between readers and writers. In my experience, most of the time you may never get an exception, you just get corrupt data (except you don't necessarily know it is corrupt). Here is another example showing how data can be corrupted if you don't use locks or redesign your algorithm to cope.
You often get the best out of a system if you consider the constraints of developing in a parallel system from the outset. Sometimes you can re-write your code so it uses no shared data. Sometime you can split the data up into chunks and have each thread/task work on its own chunk then have some process at the end stitch it all back together again.
If your dictionary is static and a method where you run the query is not (or another concurrent access scenarios), and dictionary can be modified from another thread, then yes, lock is required otherwise - is not.
Yes, you need to lock your shared resources when using LINQ in multi-threaded scenarios (EDIT: of course, if your source collection is being modified as Marc said, if you are only reading it, you don't need to worry about it). If you are using .Net 4 or the parallel extensions for 3.5 you could look at replacing your Dictionary with a ConcurrentDictionary (or use some other custom implementation anyway).

How to correctly read an Interlocked.Increment'ed int field?

Suppose I have a non-volatile int field, and a thread which Interlocked.Increments it. Can another thread safely read this directly, or does the read also need to be interlocked?
I previously thought that I had to use an interlocked read to guarantee that I'm seeing the current value, since, after all, the field isn't volatile. I've been using Interlocked.CompareExchange(int, 0, 0) to achieve that.
However, I've stumbled across this answer which suggests that actually plain reads will always see the current version of an Interlocked.Incremented value, and since int reading is already atomic, there's no need to do anything special. I've also found a request in which Microsoft rejects a request for Interlocked.Read(ref int), further suggesting that this is completely redundant.
So can I really safely read the most current value of such an int field without Interlocked?
If you want to guarantee that the other thread will read the latest value, you must use Thread.VolatileRead(). (*)
The read operation itself is atomic so that will not cause any problems but without volatile read you may get old value from the cache or compiler may optimize your code and eliminate the read operation altogether. From the compiler's point of view it is enough that the code works in single threaded environment. Volatile operations and memory barriers are used to limit the compiler's ability to optimize and reorder the code.
There are several participants that can alter the code: compiler, JIT-compiler and CPU. It does not really matter which one of them shows that your code is broken. The only important thing is the .NET memory model as it specifies the rules that must be obeyed by all participants.
(*) Thread.VolatileRead() does not really get the latest value. It will read the value and add a memory barrier after the read. The first volatile read may get cached value but the second would get an updated value because the memory barrier of the first volatile read has forced a cache update if it was necessary. In practice this detail has little importance when writing the code.
A bit of a meta issue, but a good aspect about using Interlocked.CompareExchange(ref value, 0, 0) (ignoring the obvious disadvantage that it's harder to understand when used for reading) is that it works regardless of int or long. It's true that int reads are always atomic, but long reads are not or may be not, depending on the architecture. Unfortunately, Interlocked.Read(ref value) only works if value is of type long.
Consider the case that you're starting with an int field, which makes it impossible to use Interlocked.Read(), so you'll read the value directly instead since that's atomic anyway. However, later in development you or somebody else decides that a long is required - the compiler won't warn you, but now you may have a subtle bug: The read access is not guaranteed to be atomic anymore. I found using Interlocked.CompareExchange() the best alternative here; It may be slower depending on the underlying processor instructions, but it is safer in the long run. I don't know enough about the internals of Thread.VolatileRead() though; It might be "better" regarding this use case since it provides even more signatures.
I would not try to read the value directly (i.e. without any of the above mechanisms) within a loop or any tight method call though, since even if the writes are volatile and/or memory barrier'd, nothing is telling the compiler that the value of the field can actually change between two reads. So, the field should be either volatile or any of the given constructs should be used.
My two cents.
You're correct that you do not need a special instruction to atomically read a 32bit integer, however, what that means is you will get the "whole" value (i.e. you won't get part of one write and part of another). You have no guarantees that the value won't have changed once you have read it.
It is at this point where you need to decide if you need to use some other synchronization method to control access, say if you're using this value to read a member from an array, etc.
In a nutshell, atomicity ensures an operation happens completely and indivisibly. Given some operation A that contained N steps, if you made it to the operation right after A you can be assured that all N steps happened in isolation from concurrent operations.
If you had two threads which executed the atomic operation A you are guaranteed you will see only the complete result of one of the two threads. If you want to coordinate the threads, atomic operations could be used to create the required synchronization. But atomic operations in and of themselves do not provide higher level synchronization. The Interlocked family of methods are made available to provide some fundamental atomic operations.
Synchronization is a broader kind of concurrency control, often built around atomic operations. Most processors include memory barriers which allow you to ensure all cache lines are flushed and you have a consistent view of memory. Volatile reads are a way to ensure consistent access to a given memory location.
While not immediately applicable to your problem, reading up on ACID (atomicity, consistency, isolation, and durability) with respect to databases may help you with the terminology.
Me, being paranoid, I do Interlocked.Add(ref incrementedField, 0) for int values
Yes everything you've read is correct. Interlocked.Increment is designed so that normal reads will not be false while making the changes to the field. Reading a field is not dangerous, writing a field is.

Multithreaded access to memory

Good morning,
Say I have some 6 different threads, and I want to share the same data with each of them at the same time. Can I make a class variable with the data I want to share and make each thread access that memory concurrently without performance downgrade, or is it preferable to pass a true copy of the data to each thread?
Thank you very much.
It depends entirely on the data;
if the data is immutable (or mutable but you don't actually mutate it), then chuck all the threads at it - great
if you need to mutate it, but no two threads will ever depend on the data mutated by another - great
if you need to mutate it, and there are conflicts but you can sensibly synchronize access to the data such that there is no risk of two threads deadlocking etc - great, but not always trivial
if it is not safe to make any assumptions, then a true clone of the data is the safest approach, but has the most overhead in terms of data duplication; if the data is cheap to copy, this may be fine - and indeed may outperform synchronization
if the threads do co-depend on each other, then you have no option other than to figure out some kind of sensibly locking strategy; again - to stress: deadlocks are a problem here - some ideas:
always provide a timeout when obtaining a lock
if you need to lock two items, it may help to try locking both eagerly (rather than locking one at the start, and the other after you've done lots of changes) - then you can simply release and re-take the locks, without having to either undo changes, or put the changes back into a particular state

Categories