Why volatile and MemoryBarrier do not prevent operations reordering?

Why volatile and MemoryBarrier do not prevent operations reordering? - c#

If I understand meaning of volatile and MemoryBarrier correctly than the program below has never to be able to show any result.
It catches reordering of write operations every time I run it. It does not matter if I run it in Debug or Release. It also does not matter if I run it as 32bit or 64bit application.
Why does it happen?
using System;
using System.Threading;
using System.Threading.Tasks;
namespace FlipFlop
{
class Program
{
//Declaring these variables as volatile should instruct compiler to
//flush all caches from registers into the memory.
static volatile int a;
static volatile int b;
//Track a number of iteration that it took to detect operation reordering.
static long iterations = 0;
static object locker = new object();
//Indicates that operation reordering is not found yet.
static volatile bool continueTrying = true;
//Indicates that Check method should continue.
static volatile bool continueChecking = true;
static void Main(string[] args)
{
//Restarting test until able to catch reordering.
while (continueTrying)
{
iterations++;
var checker = new Task(Check);
var writter = new Task(Write);
lock (locker)
{
continueChecking = true;
checker.Start();
}
writter.Start();
checker.Wait();
writter.Wait();
}
Console.ReadKey();
}
static void Write()
{
//Writing is locked until Main will start Check() method.
lock (locker)
{
//Using memory barrier should prevent opration reordering.
a = 1;
Thread.MemoryBarrier();
b = 10;
Thread.MemoryBarrier();
b = 20;
Thread.MemoryBarrier();
a = 2;
//Stops spinning in the Check method.
continueChecking = false;
}
}
static void Check()
{
//Spins until finds operation reordering or stopped by Write method.
while (continueChecking)
{
int tempA = a;
int tempB = b;
if (tempB == 10 && tempA == 2)
{
continueTrying = false;
Console.WriteLine("Caught when a = {0} and b = {1}", tempA, tempB);
Console.WriteLine("In " + iterations + " iterations.");
break;
}
}
}
}
}

You aren't cleaning the variables between tests, so (for all but the first) initially a is 2 and b is 20 - before Write has done anything.
Check can get that initial value of a (so tempA is 2), and then Write can get in, get as far as changing b to 10.
Now Check reads the b (so tempB is 10).
Et voila. No re-order necessary to repro.
Reset a and b to 0 between runs and I expect it will go away.
edit: confirmed; "as is" I get the issue almost immediately (<2000 iterations); but by adding:
while (continueTrying)
{
a = b = 0; // reset <======= added this
it then loops for any amount of time without any issue.
Or as a flow:
Write A= B= Check
(except first run) 2 20
int tempA = a;
a = 1; 1 20
Thread.MemoryBarrier();
b = 10; 1 10
int tempB = b;

I don't think this is re-ordering.
This piece of code is simply not thread-safe:
while (continueChecking)
{
int tempA = a;
int tempB = b;
...
I think this scenario is possible:
int tempA = a; executes with the values of the last loop (a == 2)
There is a context switch to the Write thread
b = 10 and the loop stops
There is a context switch to the Check thread
int tempB = b; executes with b == 10
I notice that the calls to MemoryBarrier() enhance the chances of this scenario. Probably because they cause more context-switching.

The result has nothing to do with reordering, with memory barries, or with volatile. All these constructs are needed to avoid effects of compiler or CPU reordering of the instructions.
But this program would produce the same result even assuming fully consistent single-CPU memory model and no compiler optimization.
First of all, notice that there will be multiple Write() tasks started in parallel. They are running sequentially due to lock() inside Write(), but a signle Check() method can read a and b produced by different instances of Write() tasks.
Because Check() function has no synchronization with Write function - it can read a and b at two arbitrary and different moments. There is nothing in your code that prevents Check() from reading a produced by previous Write() at one moment and then reading b produced by following Write() at another moment. First of all you need synchronization (lock) in Check() and then you might (but probably not in this case) need memory barriers and volatile to fight with memory model problems.
This is all you need:
int tempA, tempB;
lock (locker)
{
tempA = a;
tempB = b;
}

If you use MemoryBarrier in writer, why don't you do that in checker? Put Thread.MemoryBarrier(); before int tempA = a;.
Calling Thread.MemoryBarrier(); so many times blocks all of the advantages of the method. Call it only once before or after a = 1;.

Related

Join statement in multiple thread

I have a program that starts 2 threads and use Join.My understanding says that joins blocks the calling operation till it is finished executing .So,the below program should give 2 Million as answer since both the threads blocks till execution is completed but I am always getting the different value.This might be because first thread is completed but second thread is not run completely.
Can someone please explain the output.
Reference -Multithreading: When would I use a Join?
namespace ThreadSample
{
class Program
{
static int Total = 0;
public static void Main()
{
Thread thread1 = new Thread(Program.AddOneMillion);
Thread thread2 = new Thread(Program.AddOneMillion);
thread1.Start();
thread2.Start();
thread1.Join();
thread2.Join();
Console.WriteLine("Total = " + Total);
Console.ReadLine();
}
public static void AddOneMillion()
{
for (int i = 1; i <= 1000000; i++)
{
Total++;
}
}
}
}

When you call start method of thread, it starts immediately. hence by the time u call join on the thread1, thread2 would also have started. As a result variable 'Total' will be accessed by both threads simultaneously. Hence you will not get correct result as one thread operation is overwriting the value of 'Total' value causing data lose.
public static void Main()
{
Thread thread1 = new Thread(Program.AddOneMillion);
Thread thread2 = new Thread(Program.AddOneMillion);
thread1.Start(); //starts immediately
thread2.Start();//starts immediately
thread1.Join(); //By the time this line executes, both threads have accessed the Total varaible causing data loss or corruption.
thread2.Join();
Console.WriteLine("Total = " + Total);
Console.ReadLine();
}
Inorder to correct results either u can lock the Total variable as follows
static object _l = new object();
public static void AddOneMillion()
{
for (int i = 0; i < 1000000; i++)
{
lock(_l)
ii++;
}
}
U can use Interlocked.Increment which atomically updates the variable.
Please refer the link posted by #Emanuel Vintilă in the comment for more insight.
public static void AddOneMillion()
{
for (int i = 0; i < 1000000; i++)
{
Interlocked.Increment(ref Total);
}
}

It's because the increment operation is not done atomically. That means that each thread may hold a copy of Total and increment it. To avoid that you can use a lock or Interlock.Increment that is specific to incrementing a variable.
Clarification:
thread 1: read copy of Total
thread 2: read copy of Total
thread 1: increment and store Total
thread 2: increment and store Total (overwriting previous value)
I leave you with all possible scenarios where things could go wrong.
I would suggest avoiding explicit threading when possible and use map reduce operations that are less error prone.
You need to read about multi-threading programming and functional programming constructs available in mainstream languages. Most languages have added libraries to leverage the multicore capabilities of modern CPUs.

Concurrent Tasks and different cursor positions for each of them

I am starting to practice with Tasks and I tried the following code:
static void Main()
{
Task.Factory.StartNew(() =>
{
Write('a', 0);
});
var t = new Task(() =>
{
Write('b', 10);
});
t.Start();
Write('c', 20);
Console.ReadLine();
}
static void Write(char c, int x)
{
int yCounter = 0;
for (int i = 0; i < 1000; i++)
{
Console.WriteLine(c);
Console.SetCursorPosition(x, yCounter);
yCounter++;
Thread.Sleep(100);
}
}
My idea was to see how the console will go between the three different columns to output the different characters. It does swap the columns, but it does not output the correct characters. For example, in the first column it needs to output only 'a', but it also outputs 'b' and 'c', same goes for the other 2 columns.

This might be a particularly bad example of using tasks - or an example of how to use tasks badly.
Within your tasks you are setting a global state (SetCursorPosition), which will of course affect the other tasks (Console is static after all). It's possible that
Console.WriteLine('b')
is called after the cursor has been set to 0, to 10 or to 20, vice versa for the other values. Tasks should not rely on any global (or class level) state that might have changed (except if it's okay for the task that the value might have changed). With regards to your example, you would have to assure that none of the other tasks call SetCursorPosition before you have written your output. The easiest way to achieve this is locking the task
private static object lockObject = new object(); // you need an object of a reference type for locking
static void Write(char c, int x)
{
int yCounter = 0;
for (int i = 0; i < 1000; i++)
{
lock(lockObject)
{
Console.SetCursorPosition(x, yCounter);
Console.Write(c);
}
yCounter++;
Thread.Sleep(100);
}
}
The lock assures that no two tasks enter the block at a time (given that the lock object is the very same), hence each task can set the cursor to the position it wants to write at and write its char without any other tasks setting the cursor to any other position. (Plus, I've swapped Write and SetCursorPosition, since we'll have to call SetCursorPosition before writing to the output - the lock would be useless without swappinng those two lines, anyway.)

In addition to Paul's answer.
If you're dealing with tasks and async/await, don't mix Task and Thread in any way.
Executing your Write method using Task.Run/Task.Start is called "async-over-sync". This is a bad practice, and should be avoided.
Here's your code, rewritten in async manner, with async synchronization:
class Program
{
static void Main(string[] args)
{
var asyncLock = new AsyncLock();
// we need ToList here, since IEnumerable is lazy,
// and must be enumerated to produce values (tasks in this case);
// WriteAsync call inside Select produces a "hot" task - task, that is already scheduled;
// there's no need to start hot tasks explicitly - they are already started
new[] { ('a', 0), ('b', 10), ('c', 20) }
.Select(_ => WriteAsync(_.Item1, _.Item2, asyncLock))
.ToList();
Console.ReadLine();
}
static async Task WriteAsync(char c, int x, AsyncLock asyncLock)
{
for (var i = 0; i < 1000; i++)
{
using (await asyncLock.LockAsync())
{
Console.SetCursorPosition(x, i);
Console.Write(c);
}
await Task.Delay(100);
}
}
}
AsyncLock lives in Nito.AsyncEx package.

Control threads using AutoResetEvent in C#

Say I have a class A and a class B representing tasks.
I want to perform an experiment, and for the experiment to start I need to finish at least 5 B tasks and only 1 A task.
I have the following classes
abstract class Task
{
public int Id;
public void Start(object resetEvent)
{
EventWaitHandle ewh = (EventWaitHandle)resetEvent;
Thread.Sleep(new Random(DateTime.Now.Ticks.GetHashCode()).Next(5000, 14000));
Console.WriteLine("{0} {1} starts",this.GetType().Name, Id);
ewh.Set();
}
}
class A : Task
{
static int ID = 1;
public A(EventWaitHandle resetEvent)
{
Id = ID++;
new Thread(StartTask).Start(resetEvent);
}
}
class B : Task
{
static int ID = 1;
public B(EventWaitHandle resetEvent)
{
Id = ID++;
new Thread(StartTask).Start(resetEvent);
}
}
and the following main
static void Main()
{
A a;
B[] bs = new B[20];
int numberOfBs = 0;
EventWaitHandle aResetEvent = new AutoResetEvent(false);
EventWaitHandle bResetEvent = new AutoResetEvent(false);
a = new A(aResetEvent);
for (int i = 0; i < bs.Length; i++)
bs[i] = new B(bResetEvent);
while (numberOfBs < 5)
{
bResetEvent.WaitOne();
numberOfBs++;
}
aResetEvent.WaitOne();
Console.WriteLine("Experiment started with {0} B's!", numberOfBs);
Thread.Sleep(3000); // check how many B's got in the middle
Console.WriteLine("Experiment ended with {0} B's!", numberOfBs);
}
now I have few problems/questions:
How can I wait for only N signals out of possible M?
Can I achieve the result I'm looking for with only 1 AutoResetEvent?
I don't understand why all the tasks are printed together, I expected each task to be printed when it is done and now when everything is done.
is the following code thread safe?
.
while (numberOfBs < 5)
{
bResetEvent.WaitOne();
numberOfBs++;
}
could it be that couple of threads signal together? if so, can I fix that using lock on bResetEvent?

1.How can I wait for only N signals out of possible M?
Just as you do here (sort of…see answer to #4).
2.Can I achieve the result I'm looking for with only 1 AutoResetEvent?
Yes. But you will need two counters in that case (one for the A type and one for the B type), and they will need to be accessed in a thread-safe way, e.g. with the Interlocked class, or using a lock statement. All threads, A and B types, will share the same AutoResetEvent, but increment their own type's counter. The main thread can monitor each counter and process once both counters are at their desired value (1 for the A counter, 5 for the B counter).
I'd recommend using the lock statement approach, as it's simpler and would allow you to avoid using AutoResetEvent altogether (the lock statement uses the Monitor class, which provides some functionality similar to AutoResetEvent, while also providing the synchronization needed to ensure coherent use of the counters.
Except that you've written in the comments you have to use AutoResetEvent (why?), so I guess you're stuck with Interlocked (no point in using lock if you're not going to take full advantage).
3.I don't understand why all the tasks are printed together, I expected each task to be printed when it is done and now when everything is done.
Because you have a bug. You should be creating a single Random instance and using it to determine the duration of every task. You can either compute the durations in the thread that creates each task, or you can synchronize access (e.g. with lock) and use the same Random object in multiple threads.
What you can't do is create a whole new Random object using the same seed value for every thread, because then each thread (or at least large blocks of them, depending on timing) is going to wind up getting the exact same "random" number to use as its duration.
You see all the output coming out together, because that's when it happens: all together.
(And yes, if you create multiple Random objects in quick succession, they will all get the same seed, whether you use DateTime.Now yourself explicitly, or just let the Random class do it. The tick counter used for the seed is not updated frequently enough for concurrently running threads to see different values.)
4.is the following code thread safe?
The code in question:
while (numberOfBs < 5)
{
bResetEvent.WaitOne();
numberOfBs++;
}
…is thread safe, because the only data shared between the thread executing that loop and any other thread is the AutoResetEvent object, and that object is itself thread-safe.
That is, for the usual understanding of "thread safe". I highly recommend you read Eric Lippert's article What is this thing you call "thread safe"? Asking if something is thread-safe is a much more complicated question that you probably realize.
In particular, while the code is thread-safe in the usual way (i.e. data remains coherent), as you note it is possible for more than one thread to reach the Set() call before the main thread can react to the first. Thus you may miss some notifications.

The task that requires taks A and B reach certain changes could be notified each time a task is done. When it gets notified it could check if the conditions are good and proceed only then.
Output:
Task 3 still waiting: A0, B0
B reached 1
Task 3 still waiting: A0, B1
A reached 1
Task 3 still waiting: A1, B1
B reached 2
Task 3 still waiting: A1, B2
B reached 3
Task 3 still waiting: A1, B3
A reached 2
Task 3 still waiting: A2, B3
B reached 4
Task 3 still waiting: A2, B4
B reached 5
Task 3 done: A2, B5
A reached 3
B reached 6
B reached 7
B reached 8
B reached 9
B reached 10
All done
Program:
class Program
{
static int stageOfA = 0;
static int stageOfB = 0;
private static readonly AutoResetEvent _signalStageCompleted = new AutoResetEvent(false);
static void DoA()
{
for (int i = 0; i < 3; i++) {
Thread.Sleep(100);
Interlocked.Increment(ref stageOfA);
Console.WriteLine($"A reached {stageOfA}");
_signalStageCompleted.Set();
}
}
static void DoB()
{
for (int i = 0; i < 10; i++)
{
Thread.Sleep(50);
Interlocked.Increment(ref stageOfB);
Console.WriteLine($"B reached {stageOfB}");
_signalStageCompleted.Set();
}
}
static void DoAfterB5andA1()
{
while( (stageOfA < 1) || (stageOfB < 5))
{
Console.WriteLine($"Task 3 still waiting: A{stageOfA}, B{stageOfB}");
_signalStageCompleted.WaitOne();
}
Console.WriteLine($"Task 3 done: A{stageOfA}, B{stageOfB}");
}
static void Main(string[] args)
{
Task[] taskArray = { Task.Factory.StartNew(() => DoA()),
Task.Factory.StartNew(() => DoB()),
Task.Factory.StartNew(() => DoAfterB5andA1()) };
Task.WaitAll(taskArray);
Console.WriteLine("All done");
Console.ReadLine();
}
}

Counting stuff in multiple threads

In my .NET program, I want to count the number of times a piece of code will be hit. To make it a bit more challenging, my code is usually executed in multiple threads and I cannot control the creation / destruction of threads (and don't know when they are created)... they can even be pooled. Say:
class Program
{
static int counter = 0;
static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
Parallel.For(0, 100000000, (a) =>
{
Interlocked.Increment(ref counter);
});
Console.WriteLine(sw.Elapsed.ToString());
}
}
As the performance counter and method are hit quite a few times, I'd like to use a 'normal' variable in contrast to an atomic / interlocked integer. My second attempt was therefore to use threadlocal storage in combination with IDisposable to speed things up. Because I cannot control creation/destruction, I have to keep track of the storage variables:
class Program
{
static int counter = 0;
// I don't know when threads are created / joined, which is why I need this:
static List<WeakReference<ThreadLocalValue>> allStorage =
new List<WeakReference<ThreadLocalValue>>();
// The performance counter
[ThreadStatic]
static ThreadLocalValue local;
class ThreadLocalValue : IDisposable
{
public ThreadLocalValue()
{
lock (allStorage)
{
allStorage.Add(new WeakReference<ThreadLocalValue>(this));
}
}
public int ctr = 0;
public void Dispose()
{
// Atomic add and exchange
int tmp = Interlocked.Exchange(ref ctr, 0); // atomic set to 0-with-read
Interlocked.Add(ref Program.counter, tmp); // atomic add
}
~ThreadLocalValue()
{
// Make sure it's merged.
Dispose();
}
}
// Create-or-increment
static void LocalInc()
{
if (local == null) { local = new ThreadLocalValue(); }
++local.ctr;
}
static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
Parallel.For(0, 100000000, (a) =>
{
LocalInc();
});
lock (allStorage)
{
foreach (var item in allStorage)
{
ThreadLocalValue target;
if (item.TryGetTarget(out target))
{
target.Dispose();
}
}
}
Console.WriteLine(sw.Elapsed.ToString());
Console.WriteLine(counter);
Console.ReadLine();
}
}
My question is: can we do this faster and/or prettier?

What you need is a thread-safe, nonblocking, volatile, static variable to perform the counting for you.
Thanks goodness, the .NET framework provides managed ways to perform what you want.
For starters, you need a volatile, static variable to be used as a counter. Declare it like (where all your threads can access it):
public static volatile int volatileCounter;
Where static means this is a class and not an instance member, and volatile prevents caching errors from happening.
Next, you will need a code that increments it in a thread-safe and nonblocking way. If you don't expect your counter to exceed the limits of the int variable (which is very likely), you can use the Interlocked class for that like:
Interlocked.Increment(ref yourInstance.volatileCounter);
The interlocked class will guarantee that your increment operation will be atomic so no race condition can cause false results, and it is also non-blocking in the manner of on heavy-weighted sync objects and thread blocking is involved here.

How to cleanup hanging tasks on C# Task API?

I have a simple function as the following:
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
var tt = new Task<A>(() =>
a(b =>
{
aa.SetResult(b);
return new TaskCompletionSource<B>().Task;
}).Result
);
tt.Start();
return Task.WhenAny(aa.Task, tt).Result;
}
The idea is simple: for any implementation of a, it must return a Task<A> to me. For this purpose, it may or may not use the parameter (of type Func<A, Task<B>). If it do, our callback will be called and it sets the result of aa, and then aa.Task will complete. Otherwise, the result of a will not depend on its parameter, so we simply return its value. In any of the situation, either aa.Task or the result of a will complete, so it should never block unless a do not uses its parameter and blocks, or the task returned by a blocks.
The above code works, for example
static void Main(string[] args)
{
Func<Func<int, Task<int>>, Task<int>> t = a =>
{
return Task.FromResult(a(20).Result + 10);
};
Console.WriteLine(Peirce(t).Result); // output 20
t = a => Task.FromResult(10);
Console.WriteLine(Peirce(t).Result); // output 10
}
The problem here is, the two tasks aa.Task and tt must be cleaned up once the result of WhenAny has been determined, otherwise I am afraid there will be a leak of hanging tasks. I do not know how to do this, can any one suggest something? Or this is actually not a problem and C# will do it for me?
P.S. The name Peirce came from the famous "Peirce's Law"(((A->B)->A)->A) in propositional logic.
UPDATE: the point of matter is not "dispose" the tasks but rather stop them from running. I have tested, when I put the "main" logic in a 1000 loop it runs slowly (about 1 loop/second), and creates a lot of threads so it is a problem to solve.

A Task is a managed object. Unless you are introducing unmanaged resources, you shouldn't worry about a Task leaking resources. Let the GC clean it up and let the finalizer take care of the WaitHandle.
EDIT:
If you want to cancel tasks, consider using cooperative cancellation in the form of a CancellationTokenSource. You can pass this token to any tasks via the overload, and inside of each task, you may have some code as follows:
while (someCondition)
{
if (cancelToken.IsCancellationRequested)
break;
}
That way your tasks can gracefully clean up without throwing an exception. However you can propogate an OperationCancelledException if you call cancelToken.ThrowIfCancellationRequested(). So the idea in your case would be that whatever finishes first can issue the cancellation to the other tasks so that they aren't hung up doing work.

Thanks to #Bryan Crosby's answer, I can now implement the function as the following:
private class CanceledTaskCache<A>
{
public static Task<A> Instance;
}
private static Task<A> GetCanceledTask<A>()
{
if (CanceledTaskCache<A>.Instance == null)
{
var aa = new TaskCompletionSource<A>();
aa.SetCanceled();
CanceledTaskCache<A>.Instance = aa.Task;
}
return CanceledTaskCache<A>.Instance;
}
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
Func<A, Task<B>> cb = b =>
{
aa.SetResult(b);
return GetCanceledTask<B>();
};
return Task.WhenAny(aa.Task, a(cb)).Unwrap();
}
and it works pretty well:
static void Main(string[] args)
{
for (int i = 0; i < 1000; ++i)
{
Func<Func<int, Task<String>>, Task<int>> t =
async a => (await a(20)).Length + 10;
Console.WriteLine(Peirce(t).Result); // output 20
t = async a => 10;
Console.WriteLine(Peirce(t).Result); // output 10
}
}
Now it is fast and not consuming to much resources. It can be even faster (about 70 times in my machine) if you do not use the async/await keyword:
static void Main(string[] args)
{
for (int i = 0; i < 10000; ++i)
{
Func<Func<int, Task<String>>, Task<int>> t =
a => a(20).ContinueWith(ta =>
ta.IsCanceled ? GetCanceledTask<int>() :
Task.FromResult(ta.Result.Length + 10)).Unwrap();
Console.WriteLine(Peirce(t).Result); // output 20
t = a => Task.FromResult(10);
Console.WriteLine(Peirce(t).Result); // output 10
}
}
Here the matter is, even you can detected the return value of a(20), there is no way to cancel the async block rather than throwing an OperationCanceledException and it prevents WhenAny to be optimized.
UPDATE: optimised code and compared async/await and native Task API.
UPDATE: If I can write the following code it will be ideal:
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
return await? a(async b => {
aa.SetResult(b);
await break;
}) : await aa.Task;
}
Here, await? a : b has value a's result if a successes, has value b if a is cancelled (like a ? b : c, the value of a's result should have the same type of b).
await break will cancel the current async block.

As Stephen Toub of MS Parallel Programming Team says: "No. Don't bother disposing of your tasks."
tldr: In most cases, disposing of a task does nothing, and when the task actually has allocated unmanaged resources, its finalizer will release them when the task object is collected.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.