lock keyword on a LINQ Parallel.ForEach<> loop

lock keyword on a LINQ Parallel.ForEach<> loop - c#

This is more a conceptual question. I was wondering if I used a lock inside of Parallel.ForEach<> loop if that would take away the benefits of Paralleling a foreachloop.
Here is some sample code where I have seen it done.
Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
fileLocks.Add(reference.Key, new object());
}
}
RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount, ref recursionCount);
lock (fileLocks[reference.Key])
{
reference.Value.Document.Save(reference.Key);
}
});
Where fileLockObject and fileLocks are as follows.
private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();
Does this technique completely make the loop not parallel?
I would like to see your thoughts on this.

It means all of the work inside of the lock can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.
That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary. You can instead use a ConcurrentDictionary, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.

if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.
Proportionally. When RecursiveBinUpdate() is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.
But worse, your code is not thread-safe. From your 2 operations on fileLocks, only the first is actually inside a lock.
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
...
}
}
and
lock (fileLocks[reference.Key]) // this access to fileLocks[] is not protected
change the 2nd part to:
lock (fileLockObject)
{
reference.Value.Document.Save(reference.Key);
}
and the use of ref recursionCount as a parameter looks suspicious too. It might work with Interlocked.Increment though.

The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.

When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate).
Bottom line, I see nothing substantially wrong with what you're doing here.

Related

Best Practices Concerning Method Locking

I have a method whom access myst be synchronized allowing only one thread at once to go though it. Here is my current implementation:
private Boolean m_NoNeedToProceed;
private Object m_SynchronizationObject = new Object();
public void MyMethod()
{
lock (m_SynchronizationObject)
{
if (m_NoNeedToProceed)
return;
Now I was thinking about changing it a little bit like so:
private Boolean m_NoNeedToProceed;
private Object m_SynchronizationObject = new Object();
public void MyMethod()
{
if (m_NoNeedToProceed)
return;
lock (m_SynchronizationObject)
{
Shouldn't it be better to do a quick return before locking it so that calling threads can proceed without waiting for the previous one to complete the method call?

Shouldn't it be better to do a quick return before locking it...
No. A lock is is not just a mutual-exclusion mechanism, it's also a memory barrier1. Without a lock, you could introduce a data race if any of the concurrent threads tries to modify the variable2.
BTW, locks have a good performance when there is no contention, so you wouldn't be gaining much performance anyway. As always, refrain from making assumptions about performance, especially this "close to the metal". If in doubt, measure!
...so that calling threads can proceed without waiting for the previous one to complete the method call?
This just means you are holding the lock for longer than necessary. Release the lock as soon as the shared memory no longer needs protection (which might be sooner than the method exit), and you won't need to try to artificially circumvent it.
1 I.e. triggers a cache coherency mechanism so all CPU cores see the "same" memory.
2 For example, one thread writes to the variable, but that change lingers in one core's write buffer for some time, so other threads on other cores don't see it immediately.

Yes, as long as m_NoNeedToProceed doesn't have any race conditions associated with it.
If the method takes a long time to run, and some threads do not need to actually access the critical sections of the method. Then it would be best to let them return early without getting the lock.

Yes it's better to that before you lock.
Make m_NoNeedToProceed volatile
Just a disclaimer: volatile doesn't make it thread safe. It just causes a barrier to check if the value has changed in another processor.

When would you ever use nested locking?

I was reading in Albahari's excellent eBook on threading and came across the following scenario he mentions that "a thread can repeatedly lock the same object in a nested (reentrant) fashion"
lock (locker)
lock (locker)
lock (locker)
{
// Do something...
}
as well as
static readonly object _locker = new object();
static void Main()
{
lock (_locker)
{
AnotherMethod();
// We still have the lock - because locks are reentrant.
}
}
static void AnotherMethod()
{
lock (_locker) { Console.WriteLine ("Another method"); }
}
From the explanation, any threads will block on the first (outermost) lock and that it is unlocked only after the outer lock has exited.
He states "nested locking is useful when one method calls another within a lock"
Why is this useful? When would you NEED to do this and what problem does it solve?

Lets say you have two public methods, A() and B(), which both need the same lock.
Furthermore, let's say that A() calls B()
Since the client can also call B() directly, you need to lock in both methods.
Therefore, when A() is called, B() will take the lock a second time.

It's not so much that it's useful to do so, as it's useful to be allowed to. Consider how you may often have public methods that call other public methods. If the public method called into locks, and the public method calling into it needs to lock on the wider scope of what it does, then being able to use recursive locks means you can do so.
There are some cases where you might feel like using two lock objects, but you're going to be using them together and hence if you make a mistake, there's a big risk of deadlock. If you can deal with the wider scope being given to the lock, then using the same object for both cases - and recursing in those cases where you'd be using both objects - will remove those particular deadlocks.
However...
This usefulness is debatable.
On the first case, I'll quote from Joe Duffy:
Recursion typically indicates an over-simplification in your synchronization design that often leads to less reliable code. Some designs use lock recursion as a way to avoid splitting functions into those that take locks and those that assume locks are already taken. This can admittedly lead to a reduction in code size and therefore a shorter time-to-write, but results in a more brittle design in the end.
It is always a better idea to factor code into public entry-points that take non-recursive locks, and internal worker functions that assert a lock is held. Recursive lock calls are redundant work that contributes to raw performance overhead. But worse, depending on recursion can make it more difficult to understand the synchronization behavior of your program, in particular at what boundaries invariants are supposed to hold. Usually we’d like to say that the first line after a lock acquisition represents an invariant “safe point” for an object, but as soon as recursion is introduced this statement can no longer be made confidently. This in turn makes it more difficult to ensure correct and reliable behavior when dynamically composed.
(Joe has more to say on the topic elsewhere in his blog, and in his book on concurrent programming).
The second case is balanced by the cases where recursive lock entry just makes different types of deadlock happen, or push up the rate of contention so high that there might as well be deadlocks (This guy says he'd prefer it just to hit a deadlock the first time you recursed, I disagree - I'd much prefer it just to throw a big exception that brought my app down with a nice stack-trace).
One of the worse things, is it simplifies at the wrong time: When you're writing code it can be simpler to use lock recursion than to split things out more and think more deeply about just what should be locking when. However, when you're debugging code, the fact that leaving a lock does not mean leaving that lock complicates things. What a bad way around - it's when we think we know what we're doing that complicated code is a temptation to be enjoyed in your off-time so you don't indulge while on the clock, and when we realised we messed up that we most want things to be nice and simple.
You really don't want to mix them with condition variables.
Hey, POSIX-threads only has them because of a dare!
At least the lock keyword means we avoid the possibility of not having matching Monitor.Exit()s for every Monitor.Enter()s which makes some of the risks less likely. Up until the time you need to do something outside of that model.
With more recent locking classes, .NET does it's bit to help people avoid using lock-recursion, without blocking those who use older coding patterns. ReaderWriterLockSlim has a constructor overload that lets you use it recursion, but the default is LockRecursionPolicy.NoRecursion.
Often in dealing with issues of concurrency we have to make a decision between a more fraught technique that could potentially give us better concurrency but which requires much more care to be sure of correctness vs a simpler technique that could potentially give worse concurrency but where it is easier to be sure of the correctness. Using locks recursively gives us a technique where we will hold locks longer and have less good concurrency, and also be less sure of correctness and have harder debugging.

If you have a resource that you want exclusive control over, but many methods act upon this resource. A method might not be able to assume that it is locked so it will lock it within it's method. If it's locked in the outer method AND inner method, then it provides a situation similar to the example in the book. I cannot see a time where I would want to lock twice in the same code block.

When to use the lock thread in C#?

I have a server which handles multiple incoming socket connections and creates 2 different threads which store the data in XML format.
I was using the lock statement for thread safety almost in every event handler called asyncronously and in the 2 threads in different parts of code. Sadly using this approach my application significantly slows down.
I tried to not use lock at all and the server is very fast in execution, even the file storage seems to boost; but the program crashes for reasons I don't understand after 30sec - 1min. of work.
So. I thought that the best way is to use less locks or to use it only there where's strictly necessary. As such, I have 2 questions:
Is the lock needed when I write to the public accessed variables (C# lists) only or even when I read from them ?
Is the lock needed only in the asyncronous threads created by the socket handler or in other places too ?
Someone could give me some practical guidelines, about how to operate. I'll not post the whole code this time. It hasn't sense to post about 2500 lines of code.

You ever sit in your car or on the bus at a red light when there's no cross traffic? Big waste of time, right? A lock is like a perfect traffic light. It is always green except when there is traffic in the intersection.
Your question is "I spend too much time in traffic waiting at red lights. Should I just run the red light? Or even better, should I remove the lights entirely and just let everyone drive through the intersection at highway speeds without any intersection controls?"
If you're having a performance problem with locks then removing locks is the last thing you should do. You are waiting at that red light precisely because there is cross traffic in the intersection. Locks are extraordinarily fast if they are not contended.
You can't eliminate the light without eliminating the cross traffic first. The best solution is therefore to eliminate the cross traffic. If the lock is never contended then you'll never wait at it. Figure out why the cross traffic is spending so much time in the intersection; don't remove the light and hope there are no collisions. There will be.
If you can't do that, then adding more finely-grained locks sometimes helps. That is, maybe you have every road in town converging on the same intersection. Maybe you can split that up into two intersections, so that code can be moving through two different intersections at the same time.
Note that making the cars faster (getting a faster processor) or making the roads shorter (eliminating code path length) often makes the problem worse in multithreaded scenarios. Just as it does in real life; if the problem is gridlock then buying faster cars and driving them on shorter roads gets them to the traffic jam faster, but not out of it faster.

Is the lock needed when I write to the public accessed variables (C# lists) only or even when I read from them ?
Yes (even when you read).
Is the lock needed only in the asyncronous threads created by the socket handler or in other places too ?
Yes. Wherever code accesses a section of code which is shared, always lock.
This sounds like you may not be locking individual objects, but locking one thing for all lock situations.
If so put in smart discrete locks by creating individual unique objects which relate and lock only certain sections at a time, which don't interfere with other threads in other sections.
Here is an example:
// This class simulates the use of two different thread safe resources and how to lock them
// for thread safety but not block other threads getting different resources.
public class SmartLocking
{
private string StrResource1 { get; set; }
private string StrResource2 { get; set; }
private object _Lock1 = new object();
private object _Lock2 = new object();
public void DoWorkOn1( string change )
{
lock (_Lock1)
{
_Resource1 = change;
}
}
public void DoWorkOn2( string change2 )
{
lock (_Lock2)
{
_Resource2 = change2;
}
}
}

Always use a lock when you access members (either read or write). If you are iterating over a collection, and from another thread you're removing items, things can go wrong quickly.
A suggestion is when you want to iterate a collection, copy all the items to a new collection and then iterate the copy. I.e.
var newcollection; // Initialize etc.
lock(mycollection)
{
// Copy from mycollection to newcollection
}
foreach(var item in newcollection)
{
// Do stuff
}
Likewise, only use the lock the moment you are actually writing to the list.

The reason that you need to lock while reading is:
let's say you are making change to one property and it has being read twice while the thread is inbetween a lock. Once right before we made any change and another after, then we will have inconsistent results.
I hope that helps,

Basically this can be answered pretty simple:
You need to lock all the things that are accessed by different threads. It actually doesnt really matter if its about reading or writing. If you are reading and another thread is overwriting the data at the same time the data read may get invalid and you possibly are performing invalid operations.

locking only when modifying vs entire method

When should locks be used? Only when modifying data or when accessing it as well?
public class Test {
static Dictionary<string, object> someList = new Dictionary<string, object>();
static object syncLock = new object();
public static object GetValue(string name) {
if (someList.ContainsKey(name)) {
return someList[name];
} else {
lock(syncLock) {
object someValue = GetValueFromSomeWhere(name);
someList.Add(name, someValue);
}
}
}
}
Should there be a lock around the the entire block or is it ok to just add it to the actual modification? My understanding is that there still could be some race condition where one call might not have found it and started to add it while another call right after might have also run into the same situation - but I'm not sure. Locking is still so confusing. I haven't run into any issues with the above similar code but I could just be lucky so far. Any help above would be appriciated as well as any good resources for how/when to lock objects.

You have to lock when reading too, or you can get unreliable data, or even an exception if a concurrent modification physically changes the target data structure.
In the case above, you need to make sure that multiple threads don't try to add the value at the same time, so you need at least a read lock while checking whether it is already present. Otherwise multiple threads could decide to add, find the value is not present (since this check is not locked), and then all try to add in turn (after getting the lock)
You could use a ReaderWriterLockSlim if you have many reads and only a few writes. In the code above you would acquire the read lock to do the check and upgrade to a write lock once you decide you need to add it. In most cases, only a read lock (which allows your reader threads to still run in parallel) would be needed.
There is a summary of the available .Net 4 locking primitives here. Definitely you should understand this before you get too deep into multithreaded code. Picking the correct locking mechanism can make a huge performance difference.
You are correct that you have been lucky so far - that's a frequent feature of concurrency bugs. They are often hard to reproduce without targeted load testing, meaning correct design (and exhaustive testing, of course) is vital to avoid embarrassing and confusing production bugs.

Lock the whole block before you check for the existence of name. Otherwise, in theory, another thread could add it between the check, and your code that adds it.
Actually locking just when you perform the Add really doesn't do anything at all. All that would do is prevent another thread from adding something simultaneously. But since that other thread would have already decided it was going to do the add, it would just try to do it anyway as soon as the lock was released.

If a resource can only be accessed by multiple threads, you do not need any locks.
If a resource can be accessed by multiple threads and can be modified, then all accesses/modifications need to be synchronized. In your example, if GetValueFromSomeWhere takes a long time to return, it is possible for a second call to be made with the same value in name, but the value has not been stored in the Dictionary.

ReaderWriterLock or the slim version if you under 4.0.
You will aquire the reader lock for the reads (will allow for concurrent reads) and upgrade the lock to the writer lock when something is to write (will allow only one write at the time and will block all the reads until is done, as well as the concurrent write-threads).
Make sure to release your locks with the pattern to avoid deadlocking:
void Write(object[] args)
{
this.ReaderWriterLock.AquireWriteLock(TimeOut.Infinite);
try
{
this.myData.Write(args);
}
catch(Exception ex)
{
}
finally
{
this.ReaderWriterLock.RelaseWriterLock();
}
}

Is the "lock" statement in C# time-consuming?

I have a method which has been called many times by other methods to hash data. Inside the method, some lock statements are used. Could you please let me know whether the lock statement is time-consuming and what is the best way to improve it.
P/S: I have been finding a way to avoid using the lock statement in this method.

Your question is not answerable. It depends entirely on whether the lock is contended or not.
Let me put it this way: you're asking "does it take a long time to enter the bathroom?" without telling us how many people are already in line to use it. If there is never anyone in line, not long at all. If there are usually twenty people waiting to get in, perhaps very long indeed.

The lock statement itself is not very time-consuming, but it may cause contention over shared data.
Followup: if you need to protect shared data, use a lock. Lock-free code is wicked difficult to do correctly, as this article illustrates.

You might find this article on threads interesting and relevant. One of the claims that it makes is "Locking itself is very fast: a lock is typically obtained in tens of nanoseconds assuming no blocking."

The lock statement itself is actually some syntactic sugar that creates and manages a Monitor object.
This in itself is usually not overly resource intensive, but can become a problem if you have multiple reads but no writes to your variable across multiple threads. Each read will have to wait for the other to finish before a read can complete. In scenarios where you might be getting the variable from multiple threads, but not setting it, you might want to look at using a ReaderWriterLockSlim object to manage asynchronous reads and writes.

I landed here for a slightly different question.
In a piece of code that can be run as single threaded or multi threaded shoud I refactor the code to remove the lock statement (i.e. is the lock statement without parallelism costless)?
This is my test
class Program
{
static void Main(string[] args)
{
var startingTime = DateTime.Now;
(new Program()).LockMethod();
Console.WriteLine("Elapsed {0}", (DateTime.Now - startingTime).TotalMilliseconds);
Console.ReadLine();
}
private void LockMethod()
{
int a = 0;
for (int i = 0; i < 10000000; i++)
{
lock (this)
{
a++; // costless operation
}
}
}
}
To be sure that this code is not optimized I decompiled it. No optimizations at all (a++ changed to ++a).
RESULT: 1Mln of not contended locks acquirements takes about 160ms that is about 15ns for acquire a not contended lock.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.