Best Practices Concerning Method Locking

Best Practices Concerning Method Locking - c#

I have a method whom access myst be synchronized allowing only one thread at once to go though it. Here is my current implementation:
private Boolean m_NoNeedToProceed;
private Object m_SynchronizationObject = new Object();
public void MyMethod()
{
lock (m_SynchronizationObject)
{
if (m_NoNeedToProceed)
return;
Now I was thinking about changing it a little bit like so:
private Boolean m_NoNeedToProceed;
private Object m_SynchronizationObject = new Object();
public void MyMethod()
{
if (m_NoNeedToProceed)
return;
lock (m_SynchronizationObject)
{
Shouldn't it be better to do a quick return before locking it so that calling threads can proceed without waiting for the previous one to complete the method call?

Shouldn't it be better to do a quick return before locking it...
No. A lock is is not just a mutual-exclusion mechanism, it's also a memory barrier1. Without a lock, you could introduce a data race if any of the concurrent threads tries to modify the variable2.
BTW, locks have a good performance when there is no contention, so you wouldn't be gaining much performance anyway. As always, refrain from making assumptions about performance, especially this "close to the metal". If in doubt, measure!
...so that calling threads can proceed without waiting for the previous one to complete the method call?
This just means you are holding the lock for longer than necessary. Release the lock as soon as the shared memory no longer needs protection (which might be sooner than the method exit), and you won't need to try to artificially circumvent it.
1 I.e. triggers a cache coherency mechanism so all CPU cores see the "same" memory.
2 For example, one thread writes to the variable, but that change lingers in one core's write buffer for some time, so other threads on other cores don't see it immediately.

Yes, as long as m_NoNeedToProceed doesn't have any race conditions associated with it.
If the method takes a long time to run, and some threads do not need to actually access the critical sections of the method. Then it would be best to let them return early without getting the lock.

Yes it's better to that before you lock.
Make m_NoNeedToProceed volatile
Just a disclaimer: volatile doesn't make it thread safe. It just causes a barrier to check if the value has changed in another processor.

Related

How can I make a interrupt in C#? [duplicate]

I understand Thread.Abort() is evil from the multitude of articles I've read on the topic, so I'm currently in the process of ripping out all of my abort's in order to replace it for a cleaner way; and after comparing user strategies from people here on stackoverflow and then after reading "How to: Create and Terminate Threads (C# Programming Guide)" from MSDN both which state an approach very much the same -- which is to use a volatile bool approach checking strategy, which is nice, but I still have a few questions....
Immediately what stands out to me here, is what if you do not have a simple worker process which is just running a loop of crunching code? For instance for me, my process is a background file uploader process, I do in fact loop through each file, so that's something, and sure I could add my while (!_shouldStop) at the top which covers me every loop iteration, but I have many more business processes which occur before it hits it's next loop iteration, I want this cancel procedure to be snappy; don't tell me I need to sprinkle these while loops every 4-5 lines down throughout my entire worker function?!
I really hope there is a better way, could somebody please advise me on if this is in fact, the correct [and only?] approach to do this, or strategies they have used in the past to achieve what I am after.
Thanks gang.
Further reading: All these SO responses assume the worker thread will loop. That doesn't sit comfortably with me. What if it is a linear, but timely background operation?

Unfortunately there may not be a better option. It really depends on your specific scenario. The idea is to stop the thread gracefully at safe points. That is the crux of the reason why Thread.Abort is not good; because it is not guaranteed to occur at safe points. By sprinkling the code with a stopping mechanism you are effectively manually defining the safe points. This is called cooperative cancellation. There are basically 4 broad mechanisms for doing this. You can choose the one that best fits your situation.
Poll a stopping flag
You have already mentioned this method. This a pretty common one. Make periodic checks of the flag at safe points in your algorithm and bail out when it gets signalled. The standard approach is to mark the variable volatile. If that is not possible or inconvenient then you can use a lock. Remember, you cannot mark a local variable as volatile so if a lambda expression captures it through a closure, for example, then you would have to resort to a different method for creating the memory barrier that is required. There is not a whole lot else that needs to be said for this method.
Use the new cancellation mechanisms in the TPL
This is similar to polling a stopping flag except that it uses the new cancellation data structures in the TPL. It is still based on cooperative cancellation patterns. You need to get a CancellationToken and the periodically check IsCancellationRequested. To request cancellation you would call Cancel on the CancellationTokenSource that originally provided the token. There is a lot you can do with the new cancellation mechanisms. You can read more about here.
Use wait handles
This method can be useful if your worker thread requires waiting on an specific interval or for a signal during its normal operation. You can Set a ManualResetEvent, for example, to let the thread know it is time to stop. You can test the event using the WaitOne function which returns a bool indicating whether the event was signalled. The WaitOne takes a parameter that specifies how much time to wait for the call to return if the event was not signaled in that amount of time. You can use this technique in place of Thread.Sleep and get the stopping indication at the same time. It is also useful if there are other WaitHandle instances that the thread may have to wait on. You can call WaitHandle.WaitAny to wait on any event (including the stop event) all in one call. Using an event can be better than calling Thread.Interrupt since you have more control over of the flow of the program (Thread.Interrupt throws an exception so you would have to strategically place the try-catch blocks to perform any necessary cleanup).
Specialized scenarios
There are several one-off scenarios that have very specialized stopping mechanisms. It is definitely outside the scope of this answer to enumerate them all (never mind that it would be nearly impossible). A good example of what I mean here is the Socket class. If the thread is blocked on a call to Send or Receive then calling Close will interrupt the socket on whatever blocking call it was in effectively unblocking it. I am sure there are several other areas in the BCL where similiar techniques can be used to unblock a thread.
Interrupt the thread via Thread.Interrupt
The advantage here is that it is simple and you do not have to focus on sprinkling your code with anything really. The disadvantage is that you have little control over where the safe points are in your algorithm. The reason is because Thread.Interrupt works by injecting an exception inside one of the canned BCL blocking calls. These include Thread.Sleep, WaitHandle.WaitOne, Thread.Join, etc. So you have to be wise about where you place them. However, most the time the algorithm dictates where they go and that is usually fine anyway especially if your algorithm spends most of its time in one of these blocking calls. If you algorithm does not use one of the blocking calls in the BCL then this method will not work for you. The theory here is that the ThreadInterruptException is only generated from .NET waiting call so it is likely at a safe point. At the very least you know that the thread cannot be in unmanaged code or bail out of a critical section leaving a dangling lock in an acquired state. Despite this being less invasive than Thread.Abort I still discourage its use because it is not obvious which calls respond to it and many developers will be unfamiliar with its nuances.

Well, unfortunately in multithreading you often have to compromise "snappiness" for cleanliness... you can exit a thread immediately if you Interrupt it, but it won't be very clean. So no, you don't have to sprinkle the _shouldStop checks every 4-5 lines, but if you do interrupt your thread then you should handle the exception and exit out of the loop in a clean manner.
Update
Even if it's not a looping thread (i.e. perhaps it's a thread that performs some long-running asynchronous operation or some type of block for input operation), you can Interrupt it, but you should still catch the ThreadInterruptedException and exit the thread cleanly. I think that the examples you've been reading are very appropriate.
Update 2.0
Yes I have an example... I'll just show you an example based on the link you referenced:
public class InterruptExample
{
private Thread t;
private volatile boolean alive;
public InterruptExample()
{
alive = false;
t = new Thread(()=>
{
try
{
while (alive)
{
/* Do work. */
}
}
catch (ThreadInterruptedException exception)
{
/* Clean up. */
}
});
t.IsBackground = true;
}
public void Start()
{
alive = true;
t.Start();
}
public void Kill(int timeout = 0)
{
// somebody tells you to stop the thread
t.Interrupt();
// Optionally you can block the caller
// by making them wait until the thread exits.
// If they leave the default timeout,
// then they will not wait at all
t.Join(timeout);
}
}

If cancellation is a requirement of the thing you're building, then it should be treated with as much respect as the rest of your code--it may be something you have to design for.
Lets assume that your thread is doing one of two things at all times.
Something CPU bound
Waiting for the kernel
If you're CPU bound in the thread in question, you probably have a good spot to insert the bail-out check. If you're calling into someone else's code to do some long-running CPU-bound task, then you might need to fix the external code, move it out of process (aborting threads is evil, but aborting processes is well-defined and safe), etc.
If you're waiting for the kernel, then there's probably a handle (or fd, or mach port, ...) involved in the wait. Usually if you destroy the relevant handle, the kernel will return with some failure code immediately. If you're in .net/java/etc. you'll likely end up with an exception. In C, whatever code you already have in place to handle system call failures will propagate the error up to a meaningful part of your app. Either way, you break out of the low-level place fairly cleanly and in a very timely manner without needing new code sprinkled everywhere.
A tactic I often use with this kind of code is to keep track of a list of handles that need to be closed and then have my abort function set a "cancelled" flag and then close them. When the function fails it can check the flag and report failure due to cancellation rather than due to whatever the specific exception/errno was.
You seem to be implying that an acceptable granularity for cancellation is at the level of a service call. This is probably not good thinking--you are much better off cancelling the background work synchronously and joining the old background thread from the foreground thread. It's way cleaner becasue:
It avoids a class of race conditions when old bgwork threads come back to life after unexpected delays.
It avoids potential hidden thread/memory leaks caused by hanging background processes by making it possible for the effects of a hanging background thread to hide.
There are two reasons to be scared of this approach:
You don't think you can abort your own code in a timely fashion. If cancellation is a requirement of your app, the decision you really need to make is a resource/business decision: do a hack, or fix your problem cleanly.
You don't trust some code you're calling because it's out of your control. If you really don't trust it, consider moving it out-of-process. You get much better isolation from many kinds of risks, including this one, that way.

The best answer largely depends on what you're doing in the thread.
Like you said, most answers revolve around polling a shared boolean every couple lines. Even though you may not like it, this is often the simplest scheme. If you want to make your life easier, you can write a method like ThrowIfCancelled(), which throws some kind of exception if you're done. The purists will say this is (gasp) using exceptions for control flow, but then again cacelling is exceptional imo.
If you're doing IO operations (like network stuff), you may want to consider doing everything using async operations.
If you're doing a sequence of steps, you could use the IEnumerable trick to make a state machine. Example:
<
abstract class StateMachine : IDisposable
{
public abstract IEnumerable<object> Main();
public virtual void Dispose()
{
/// ... override with free-ing code ...
}
bool wasCancelled;
public bool Cancel()
{
// ... set wasCancelled using locking scheme of choice ...
}
public Thread Run()
{
var thread = new Thread(() =>
{
try
{
if(wasCancelled) return;
foreach(var x in Main())
{
if(wasCancelled) return;
}
}
finally { Dispose(); }
});
thread.Start()
}
}
class MyStateMachine : StateMachine
{
public override IEnumerabl<object> Main()
{
DoSomething();
yield return null;
DoSomethingElse();
yield return null;
}
}
// then call new MyStateMachine().Run() to run.
>
Overengineering? It depends how many state machines you use. If you just have 1, yes. If you have 100, then maybe not. Too tricky? Well, it depends. Another bonus of this approach is that it lets you (with minor modifications) move your operation into a Timer.tick callback and void threading altogether if it makes sense.
and do everything that blucz says too.

Perhaps the a piece of the problem is that you have such a long method / while loop. Whether or not you are having threading issues, you should break it down into smaller processing steps. Let's suppose those steps are Alpha(), Bravo(), Charlie() and Delta().
You could then do something like this:
public void MyBigBackgroundTask()
{
Action[] tasks = new Action[] { Alpha, Bravo, Charlie, Delta };
int workStepSize = 0;
while (!_shouldStop)
{
tasks[workStepSize++]();
workStepSize %= tasks.Length;
};
}
So yes it loops endlessly, but checks if it is time to stop between each business step.

You don't have to sprinkle while loops everywhere. The outer while loop just checks if it's been told to stop and if so doesn't make another iteration...
If you have a straight "go do something and close out" thread (no loops in it) then you just check the _shouldStop boolean either before or after each major spot inside the thread. That way you know whether it should continue on or bail out.
for example:
public void DoWork() {
RunSomeBigMethod();
if (_shouldStop){ return; }
RunSomeOtherBigMethod();
if (_shouldStop){ return; }
//....
}

Instead of adding a while loop where a loop doesn't otherwise belong, add something like if (_shouldStop) CleanupAndExit(); wherever it makes sense to do so. There's no need to check after every single operation or sprinkle the code all over with them. Instead, think of each check as a chance to exit the thread at that point and add them strategically with this in mind.

All these SO responses assume the worker thread will loop. That doesn't sit comfortably with me
There are not a lot of ways to make code take a long time. Looping is a pretty essential programming construct. Making code take a long time without looping takes a huge amount of statements. Hundreds of thousands.
Or calling some other code that is doing the looping for you. Yes, hard to make that code stop on demand. That just doesn't work.

lock keyword on a LINQ Parallel.ForEach<> loop

This is more a conceptual question. I was wondering if I used a lock inside of Parallel.ForEach<> loop if that would take away the benefits of Paralleling a foreachloop.
Here is some sample code where I have seen it done.
Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
fileLocks.Add(reference.Key, new object());
}
}
RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount, ref recursionCount);
lock (fileLocks[reference.Key])
{
reference.Value.Document.Save(reference.Key);
}
});
Where fileLockObject and fileLocks are as follows.
private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();
Does this technique completely make the loop not parallel?
I would like to see your thoughts on this.

It means all of the work inside of the lock can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.
That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary. You can instead use a ConcurrentDictionary, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.

if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.
Proportionally. When RecursiveBinUpdate() is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.
But worse, your code is not thread-safe. From your 2 operations on fileLocks, only the first is actually inside a lock.
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
...
}
}
and
lock (fileLocks[reference.Key]) // this access to fileLocks[] is not protected
change the 2nd part to:
lock (fileLockObject)
{
reference.Value.Document.Save(reference.Key);
}
and the use of ref recursionCount as a parameter looks suspicious too. It might work with Interlocked.Increment though.

The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.

When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate).
Bottom line, I see nothing substantially wrong with what you're doing here.

Is Interlocked.CompareExchange really faster than a simple lock?

I came across a ConcurrentDictionary implementation for .NET 3.5 (I'm so sorry I could find the link right now) that uses this approach for locking:
var current = Thread.CurrentThread.ManagedThreadId;
while (Interlocked.CompareExchange(ref owner, current, 0) != current) { }
// PROCESS SOMETHING HERE
if (current != Interlocked.Exchange(ref owner, 0))
throw new UnauthorizedAccessException("Thread had access to cache even though it shouldn't have.");
Instead of the traditional lock:
lock(lockObject)
{
// PROCESS SOMETHING HERE
}
The question is: Is there any real reason for doing this? Is it faster or have some hidden benefit?
PS: I know there's a ConcurrentDictionary in some latest version of .NET but I can't use for a legacy project.
Edit:
In my specific case, what I'm doing is just manipulating an internal Dictionary class in such a way that it's thread safe.
Example:
public bool RemoveItem(TKey key)
{
// open lock
var current = Thread.CurrentThread.ManagedThreadId;
while (Interlocked.CompareExchange(ref owner, current, 0) != current) { }
// real processing starts here (entries is a regular `Dictionary` class.
var found = entries.Remove(key);
// verify lock
if (current != Interlocked.Exchange(ref owner, 0))
throw new UnauthorizedAccessException("Thread had access to cache even though it shouldn't have.");
return found;
}
As #doctorlove suggested, this is the code: https://github.com/miensol/SimpleConfigSections/blob/master/SimpleConfigSections/Cache.cs

There is no definitive answer to your question. I would answer: it depends.
What the code you've provided is doing is:
wait for an object to be in a known state (threadId == 0 == no current work)
do work
set back the known state to the object
another thread now can do work too, because it can go from step 1 to step 2
As you've noted, you have a loop in the code that actually does the "wait" step. You don't block the thread until you can access to your critical section, you just burn CPU instead. Try to replace your processing (in your case, a call to Remove) by Thread.Sleep(2000), you'll see the other "waiting" thread consuming all of one of your CPUs for 2s in the loop.
Which means, which one is better depends on several factors. For example: how many concurrent accesses are there? How long the operation takes to complete? How many CPUs do you have?
I would use lock instead of Interlocked because it's way easier to read and maintain. The exception would be the case you've got a piece of code called millions of times, and a particular use case you're sure Interlocked is faster.
So you'll have to measure by yourself both approaches. If you don't have time for this, then you probably don't need to worry about performances, and you should use lock.

Your CompareExchange sample code doesn't release the lock if an exception is thrown by "PROCESS SOMETHING HERE".
For this reason as well as the simpler, more readable code, I would prefer the lock statement.
You could rectify the problem with a try/finally, but this makes the code even uglier.
The linked ConcurrentDictionary implementation has a bug: it will fail to release the lock if the caller passes a null key, potentially leaving other threads spinning indefinitely.
As for efficiency, your CompareExchange version is essentially a Spinlock, which can be efficient if threads are only likely to be blocked for short periods. But inserting into a managed dictionary can take a relatively long time, since it may be necessary to resize the dictionary. Therefore, IMHO, this isn't a good candidate for a spinlock - which can be wasteful, especially on single-processor system.

A little bit late... I have read your sample but in short:
Fastest to slowest MT sync:
Interlocked.* => This is a CPU atomic instruction. Can't be beat if it is sufficient for your need.
SpinLock => Uses Interlocked behind and is really fast. Uses CPU when wait. Do not use for code that wait long time (it is usually used to prevent thread switching for lock that do quick action). If you often have to wait more than one thread cycle, I would suggest to go with "Lock"
Lock => The slowest but easier to use and read than SpinLock. The instruction itself is very fast but if it can't acquire the lock it will relinquish the cpu. Behind the scene, it will do a WaitForSingleObject on a kernel objet (CriticalSection) and then Window will give cpu time to the thread only when the lock will be freed by the thread that acquired it.
Have fun with MT!

The docs for the Interlocked class tell us it
"Provides atomic operations for variables that are shared by multiple threads. "
The theory is an atomic operation can be faster than locks. Albahari gives some further details on interlocked operations stating they are faster.
Note that Interlocked provides a "smaller" interface than Lock - see previous question here

Yes.
The Interlocked class offer atomic operations which means they do not block other code like a lock because they don't really need to.
When you lock a block of code you want to make sure no 2 threads are in it at the same time, that means that when a thread is inside all other threads wait to get in, which uses resources (cpu time and idle threads).
The atomic operations on the other hand do not need to block other atomic operations because they are atomic. It's conceptually a one CPU operation, the next ones just go in after the previous, and you're not wasting threads on just waiting. (By the way, that's why it's limited to very basic operations like Increment, Exchange etc.)
I think a lock (which is a Monitor underneath) uses interlocked to know if the lock is already taken, but it can't know that the actions inside it can be atomic.
In most cases, though, the difference is not critical. But you need to verify that for your specific case.

Interlocked is faster - already explained in other comments and you can also define the logic of how the wait is implemented e.g. spinWait.spin(), spinUntil, Thread.sleep etc once the lock fails the first time.. Also, if your code within the lock is expected to run without possibility of crash (custom code/delegates/resource resolution or allocation/events/unexpected code executed during the lock) unless you are going to be catching the exception to allow your software to continue execution, "try" "finally" is also skipped so extra speed there. lock(something) makes sure if you catch the exception from outside to unlock that something, just like "using" makes sure (C#) when the execution exits the execution block for whatever reason to dispose the "used" disposable object.

One important difference between lock and interlock.CompareExhange is how it can be used in async environments.
async operations cannot be awaited inside a lock, as they can easily occur in deadlocks if the thread that continues execution after the await is not the same one that originally acquired the lock.
This is not a problem with interlocked however, because nothing is "acquired" by a thread.
Another solution for asynchronous code that may provide better readability than interlocked may be semaphore as described in this blog post:
https://blog.cdemi.io/async-waiting-inside-c-sharp-locks/

locking only when modifying vs entire method

When should locks be used? Only when modifying data or when accessing it as well?
public class Test {
static Dictionary<string, object> someList = new Dictionary<string, object>();
static object syncLock = new object();
public static object GetValue(string name) {
if (someList.ContainsKey(name)) {
return someList[name];
} else {
lock(syncLock) {
object someValue = GetValueFromSomeWhere(name);
someList.Add(name, someValue);
}
}
}
}
Should there be a lock around the the entire block or is it ok to just add it to the actual modification? My understanding is that there still could be some race condition where one call might not have found it and started to add it while another call right after might have also run into the same situation - but I'm not sure. Locking is still so confusing. I haven't run into any issues with the above similar code but I could just be lucky so far. Any help above would be appriciated as well as any good resources for how/when to lock objects.

You have to lock when reading too, or you can get unreliable data, or even an exception if a concurrent modification physically changes the target data structure.
In the case above, you need to make sure that multiple threads don't try to add the value at the same time, so you need at least a read lock while checking whether it is already present. Otherwise multiple threads could decide to add, find the value is not present (since this check is not locked), and then all try to add in turn (after getting the lock)
You could use a ReaderWriterLockSlim if you have many reads and only a few writes. In the code above you would acquire the read lock to do the check and upgrade to a write lock once you decide you need to add it. In most cases, only a read lock (which allows your reader threads to still run in parallel) would be needed.
There is a summary of the available .Net 4 locking primitives here. Definitely you should understand this before you get too deep into multithreaded code. Picking the correct locking mechanism can make a huge performance difference.
You are correct that you have been lucky so far - that's a frequent feature of concurrency bugs. They are often hard to reproduce without targeted load testing, meaning correct design (and exhaustive testing, of course) is vital to avoid embarrassing and confusing production bugs.

Lock the whole block before you check for the existence of name. Otherwise, in theory, another thread could add it between the check, and your code that adds it.
Actually locking just when you perform the Add really doesn't do anything at all. All that would do is prevent another thread from adding something simultaneously. But since that other thread would have already decided it was going to do the add, it would just try to do it anyway as soon as the lock was released.

If a resource can only be accessed by multiple threads, you do not need any locks.
If a resource can be accessed by multiple threads and can be modified, then all accesses/modifications need to be synchronized. In your example, if GetValueFromSomeWhere takes a long time to return, it is possible for a second call to be made with the same value in name, but the value has not been stored in the Dictionary.

ReaderWriterLock or the slim version if you under 4.0.
You will aquire the reader lock for the reads (will allow for concurrent reads) and upgrade the lock to the writer lock when something is to write (will allow only one write at the time and will block all the reads until is done, as well as the concurrent write-threads).
Make sure to release your locks with the pattern to avoid deadlocking:
void Write(object[] args)
{
this.ReaderWriterLock.AquireWriteLock(TimeOut.Infinite);
try
{
this.myData.Write(args);
}
catch(Exception ex)
{
}
finally
{
this.ReaderWriterLock.RelaseWriterLock();
}
}

Question about terminating a thread cleanly in .NET

I understand Thread.Abort() is evil from the multitude of articles I've read on the topic, so I'm currently in the process of ripping out all of my abort's in order to replace it for a cleaner way; and after comparing user strategies from people here on stackoverflow and then after reading "How to: Create and Terminate Threads (C# Programming Guide)" from MSDN both which state an approach very much the same -- which is to use a volatile bool approach checking strategy, which is nice, but I still have a few questions....
Immediately what stands out to me here, is what if you do not have a simple worker process which is just running a loop of crunching code? For instance for me, my process is a background file uploader process, I do in fact loop through each file, so that's something, and sure I could add my while (!_shouldStop) at the top which covers me every loop iteration, but I have many more business processes which occur before it hits it's next loop iteration, I want this cancel procedure to be snappy; don't tell me I need to sprinkle these while loops every 4-5 lines down throughout my entire worker function?!
I really hope there is a better way, could somebody please advise me on if this is in fact, the correct [and only?] approach to do this, or strategies they have used in the past to achieve what I am after.
Thanks gang.
Further reading: All these SO responses assume the worker thread will loop. That doesn't sit comfortably with me. What if it is a linear, but timely background operation?

Well, unfortunately in multithreading you often have to compromise "snappiness" for cleanliness... you can exit a thread immediately if you Interrupt it, but it won't be very clean. So no, you don't have to sprinkle the _shouldStop checks every 4-5 lines, but if you do interrupt your thread then you should handle the exception and exit out of the loop in a clean manner.
Update
Even if it's not a looping thread (i.e. perhaps it's a thread that performs some long-running asynchronous operation or some type of block for input operation), you can Interrupt it, but you should still catch the ThreadInterruptedException and exit the thread cleanly. I think that the examples you've been reading are very appropriate.
Update 2.0
Yes I have an example... I'll just show you an example based on the link you referenced:
public class InterruptExample
{
private Thread t;
private volatile boolean alive;
public InterruptExample()
{
alive = false;
t = new Thread(()=>
{
try
{
while (alive)
{
/* Do work. */
}
}
catch (ThreadInterruptedException exception)
{
/* Clean up. */
}
});
t.IsBackground = true;
}
public void Start()
{
alive = true;
t.Start();
}
public void Kill(int timeout = 0)
{
// somebody tells you to stop the thread
t.Interrupt();
// Optionally you can block the caller
// by making them wait until the thread exits.
// If they leave the default timeout,
// then they will not wait at all
t.Join(timeout);
}
}

If cancellation is a requirement of the thing you're building, then it should be treated with as much respect as the rest of your code--it may be something you have to design for.
Lets assume that your thread is doing one of two things at all times.
Something CPU bound
Waiting for the kernel
If you're CPU bound in the thread in question, you probably have a good spot to insert the bail-out check. If you're calling into someone else's code to do some long-running CPU-bound task, then you might need to fix the external code, move it out of process (aborting threads is evil, but aborting processes is well-defined and safe), etc.
If you're waiting for the kernel, then there's probably a handle (or fd, or mach port, ...) involved in the wait. Usually if you destroy the relevant handle, the kernel will return with some failure code immediately. If you're in .net/java/etc. you'll likely end up with an exception. In C, whatever code you already have in place to handle system call failures will propagate the error up to a meaningful part of your app. Either way, you break out of the low-level place fairly cleanly and in a very timely manner without needing new code sprinkled everywhere.
A tactic I often use with this kind of code is to keep track of a list of handles that need to be closed and then have my abort function set a "cancelled" flag and then close them. When the function fails it can check the flag and report failure due to cancellation rather than due to whatever the specific exception/errno was.
You seem to be implying that an acceptable granularity for cancellation is at the level of a service call. This is probably not good thinking--you are much better off cancelling the background work synchronously and joining the old background thread from the foreground thread. It's way cleaner becasue:
It avoids a class of race conditions when old bgwork threads come back to life after unexpected delays.
It avoids potential hidden thread/memory leaks caused by hanging background processes by making it possible for the effects of a hanging background thread to hide.
There are two reasons to be scared of this approach:
You don't think you can abort your own code in a timely fashion. If cancellation is a requirement of your app, the decision you really need to make is a resource/business decision: do a hack, or fix your problem cleanly.
You don't trust some code you're calling because it's out of your control. If you really don't trust it, consider moving it out-of-process. You get much better isolation from many kinds of risks, including this one, that way.

The best answer largely depends on what you're doing in the thread.
Like you said, most answers revolve around polling a shared boolean every couple lines. Even though you may not like it, this is often the simplest scheme. If you want to make your life easier, you can write a method like ThrowIfCancelled(), which throws some kind of exception if you're done. The purists will say this is (gasp) using exceptions for control flow, but then again cacelling is exceptional imo.
If you're doing IO operations (like network stuff), you may want to consider doing everything using async operations.
If you're doing a sequence of steps, you could use the IEnumerable trick to make a state machine. Example:
<
abstract class StateMachine : IDisposable
{
public abstract IEnumerable<object> Main();
public virtual void Dispose()
{
/// ... override with free-ing code ...
}
bool wasCancelled;
public bool Cancel()
{
// ... set wasCancelled using locking scheme of choice ...
}
public Thread Run()
{
var thread = new Thread(() =>
{
try
{
if(wasCancelled) return;
foreach(var x in Main())
{
if(wasCancelled) return;
}
}
finally { Dispose(); }
});
thread.Start()
}
}
class MyStateMachine : StateMachine
{
public override IEnumerabl<object> Main()
{
DoSomething();
yield return null;
DoSomethingElse();
yield return null;
}
}
// then call new MyStateMachine().Run() to run.
>
Overengineering? It depends how many state machines you use. If you just have 1, yes. If you have 100, then maybe not. Too tricky? Well, it depends. Another bonus of this approach is that it lets you (with minor modifications) move your operation into a Timer.tick callback and void threading altogether if it makes sense.
and do everything that blucz says too.

Perhaps the a piece of the problem is that you have such a long method / while loop. Whether or not you are having threading issues, you should break it down into smaller processing steps. Let's suppose those steps are Alpha(), Bravo(), Charlie() and Delta().
You could then do something like this:
public void MyBigBackgroundTask()
{
Action[] tasks = new Action[] { Alpha, Bravo, Charlie, Delta };
int workStepSize = 0;
while (!_shouldStop)
{
tasks[workStepSize++]();
workStepSize %= tasks.Length;
};
}
So yes it loops endlessly, but checks if it is time to stop between each business step.

You don't have to sprinkle while loops everywhere. The outer while loop just checks if it's been told to stop and if so doesn't make another iteration...
If you have a straight "go do something and close out" thread (no loops in it) then you just check the _shouldStop boolean either before or after each major spot inside the thread. That way you know whether it should continue on or bail out.
for example:
public void DoWork() {
RunSomeBigMethod();
if (_shouldStop){ return; }
RunSomeOtherBigMethod();
if (_shouldStop){ return; }
//....
}

Instead of adding a while loop where a loop doesn't otherwise belong, add something like if (_shouldStop) CleanupAndExit(); wherever it makes sense to do so. There's no need to check after every single operation or sprinkle the code all over with them. Instead, think of each check as a chance to exit the thread at that point and add them strategically with this in mind.

All these SO responses assume the worker thread will loop. That doesn't sit comfortably with me
There are not a lot of ways to make code take a long time. Looping is a pretty essential programming construct. Making code take a long time without looping takes a huge amount of statements. Hundreds of thousands.
Or calling some other code that is doing the looping for you. Yes, hard to make that code stop on demand. That just doesn't work.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.