Parallel.For does not wait all iterations - c#

I am building an optimization program using Genetic Algorithms. I used Parallel.For in order to decrease time. But it caused a problem which is same in code below:
class Program
{
static void Main(string[] args)
{
int j=0;
Parallel.For(0, 10000000, i =>
{
j++;
});
Console.WriteLine(j);
Console.ReadKey();
}
}
Every time i run the program above, it writes a different value of j between 0 and 10000000. I guess it doesn't wait for all iterations to finish. And it passes to next line.
How am i supposed to solve this problem? Any help will be appreciated. Thanks.
Edition:
Interlocked.Increment(ref j); clause solves the unexpected results, but this operation causes about 10 times much more time when i compare with normal for loop.

You could use the Interlocked.Increment(int32) method which would probably be easiest.
Using Parallel.For will create multiple threads which will execute the same lambda expression; in this case all it does is j++.
j++ will be compiled to something like this j = j + 1, which is a read and write operation. This can cause unwanted behavior.
Say that j = 50.
Thread 1 is executing the read for j++ which will get 50 and will add 1 to it. Before that thread could finish the write operation to j another thread does the read operation and reads 50 from j then the first thread has finished his write operation to j making it 51 but the second thread still has 50 in memory as the value for j and will add 1 to that and again write 51 back to j.
Using the Interlocked class makes sure that every operation happens atomically.

Your access to j is not syncronized. Please read a basic book or tutorial on multi-threading and syncronization.
Parallel.For does wait for all iterations.
Using syncronization (and thereby defeating the use of the parallel for):
class Program
{
static void Main(string[] args)
{
object sync = new object;
int j=0;
Parallel.For(0, 10000000, i =>
{
lock(sync)
{
j++;
}
});
Console.WriteLine(j);
Console.ReadKey();
}
}

Parallel.For does wait for all iterations to finish. The reason you're seeing unexpected values in your variable is different - and it is expected.
Basically, Parallel.For dispatches the iterations to multiple threads (as you would expect). However, multiple threads can't share the same writeable memory without some kind of guarding mechanism - if they do, you'll have a data race and the result is logically undeterministic. This is applicable in all programming languages and it is the fundamental caveat of multithreading.
There are many kinds of guards you can put in place, depending on your use case. The fundamental way they work is through atomic operations, which are accessible to you through the Interlocked helper class. Higher-level guards include the Monitor class, the related lock language construct and classes like ReaderWriterLock (and its siblings).

Related

Simultaneously running Threads giving different numbers

I am wondering a bit at the moment. I was just reading a bit about Threads and landed there: Task vs Thread differences [duplicate] here on stackoverflow from Jacek (sorry cant create a link because i can only make 2 with reputation<10)
and the first Comment from MoonKnight led me there: albahari.com/threading
i have taken the code and changed it a little to make it better read able what is happening. Here comes my changed code:
static void Main()
{
Thread t = new Thread(WriteY); // Kick off a new thread
t.Start(); // running WriteY()
// Simultaneously, do something on the main thread.
for (int i = 0; i < 10; i++) { System.Threading.Thread.Sleep(1); Console.Write(i); };
Console.ReadLine();
}
static void WriteY()
{
for (int y = 0; y < 10; y++) { System.Threading.Thread.Sleep(1); Console.Write(y); };
Console.ReadLine();
}
what I expected to happen (and what happens most of the time) was this:
Good Thread:
but here is the thing I am wondering about(its absolutely random and promised the same code):
miracle thread:
my questions:
1.How can this happen that there are different numbers the threads should always run at the same time shouldnt they?
2.all this gets more crazy the lower the sleep time gets so if you remove it completely it fells absolutely random
When you execute the first loop on the main thread and start WriteY() on a separate thread, there is absolutely no way to predict the sequence in which events in one thread will happen relative to events in the other thread.
I've written a few tests to demonstrate this. Here's one. And here's another.
What characterizes both of these examples is that very often they will run in the "expected" sequence, but once in a while they won't.
That tells us a few things about multithreaded operations:
Concurrencty or parallel execution is beneficial when we want to distribute work across threads, but not when events must occur in a predictable sequence.
It requires extra caution because it if we do it wrong it might seem to work anyway. And then once in a while it won't work. Those occasions when it doesn't work will be extremely difficult to debug, one reason being that you won't be able to get the behavior to repeat when you want to.

Parallel.For does not loop expected times

I have this test:
public void Run()
{
var result = new List<int>();
int i = 0;
Parallel.For(0, 100000, new Action<int>((counter) =>
{
i++;
if (counter == 99999)
{
Trace.WriteLine("i is " + i);
}
}));
}
Now why does the output print seemingly random numbers in the range of 50000 to 99999?
I expected the output to always be 99999. Have I missunderstood the parallel for loop implementation?
If I now run the loop 100 times then the program outputs 100, as expected. FYI I have a 8-core CPU
UPDATE:
offcourse! I missed the thread safety aspect of it :) thanks! Now lets see which one is faster, using lock, declaring as volatile, or using Interlocked
Your problem is probably due to i++ not being thread-safe and your tasks being in a kind of 'race condition'.
Further explanation about i++ not being thread-safe you can find here: Are incrementers / decrementers (var++, var--) etc thread safe?
A quote of the answer given by Michael Burr in the aforementioned linked thread (upvote it there):
You can use something like InterlockedIncrement() depending on your
platform. On .NET you can use the Interlocked class methods
(Interlocked.Increment() for example).
A Rob Kennedy mentioned, even if the operation is implemented in terms
of a single INC instruction, as far as the memory is concerned a
read/increment/write set of steps is performed. There is the
opportunity on a multi-processor system for corruption.
There's also the volatile issue, which would be a necessary part of
making the operation thread-safe - however, marking the variable
volatile is not sufficient to make it thread-safe. Use the interlocked
support the platform provides.
This is true in general, and on x86/x64 platforms certainly.
The race to Trace.WriteLine()
Between the time you do ++i and output i, other parallel tasks might have changed/incremented i several times.
Imagine your first task, which is incrementing i so that it becomes 1. However, depending on your runtime environment and the weather of the day, i might be incremented twenty times more by other parallel tasks before the first task outputs the variable -- which would now be 21 (and not 1 anymore). To prevent this from happening use a local variable which remembers the value of the incremented i for a particular task for later processing/output:
int remember = Interlocked.Increment(ref i);
...
Trace.WriteLine("i of this task is: " + remember);
Because your code is not thread safe. i++ is "read modify write" which is not thread safe.
Use Interlocked.Increment instead, or get a lock around it.
Is the ++ operator thread safe?
Parallel.For means it runs all tasks in a parallel way, so for code, it doesn't means the it runs from 0 - 100000, it can start running the delegate function with 99999 first, so that is why you get an arbitrary value of i.
When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced.

Is ++ operation atomic in C#?

I have multiple threads accessing a single Int32 variable with "++" or "--" operation.
Do we need to lock before accessing it as below?
lock (idleAgentsLock)
{
idleAgents--; //we should place it here.
}
Or what consequences will there be if I don't do the locking?
It is not "atomic", not in the multi-threaded sense. However your lock protects the operation, theoretically at least. When in doubt you can of course use Interlocked.Decrement instead.
No ++/-- is not an atomic operation, however reading and writing to integers and other primitive times is considered an atomic operation.
See these links:
Is accessing a variable in C# an atomic operation?
Is the ++ operator thread safe?
What operations are atomic in C#?
Also this blog post way be of some interest to you.
I would suggest the Interlocked.Increment in the Interlocked class for an atomic implementation of ++/-- operators.
It depends on the machine architecture. In general, no, the compiler may generate a load instruction, increment/decrement the value and then store it. So, other threads may indeed read the value between those operations.
Most CPU instructions sets have a special atomic test & set instruction for this purpose. Assuming you don't want to embed assembly instructions into your C# code, the next best approach is to use mutual exclusion, similar to what you've shown. The implementation of that mechanism ultimately uses an instruction that is atomic to implement the mutex (or whatever it uses).
In short: yes, you should ensure mutual exclusion.
Beyond the scope of this answer, there are other techniques for managing shared data that may be appropriate or not depending on the domain logic of your situation.
A unprotected increment/decrement is not thread-safe - and thus not atomic between threads. (Although it might be "atomic" wrt the actual IL/ML transform1.)
This LINQPad example code shows unpredictable results:
void Main()
{
int nWorkers = 10;
int nLoad = 200000;
int counter = nWorkers * nLoad;
List<Thread> threads = new List<Thread>();
for (var i = 0; i < nWorkers; i++) {
var th = new Thread((_) => {
for (var j = 0; j < nLoad; j++) {
counter--; // bad
}
});
th.Start();
threads.Add(th);
}
foreach (var w in threads) {
w.Join();
}
counter.Dump();
}
Note that the visibility between threads is of importance. Synchronization guarantees this visibility in addition to atomicity.
This code is easily fixed, at least in the limited context presented. Switch out the decrement and observe the results:
counter--; // bad
Interlocked.Decrement(ref counter); // good
lock (threads) { counter--; } // good
1 Even when using a volatile variable, the results are still unpredictable. This seems to indicate that (at least here, when I just ran it) it is also not an atomic operator as read/op/write of competing threads were interleaved. To see that the behavior is still incorrect when visibility issues are removed (are they?), add
class x {
public static volatile int counter;
}
and modify the above code to use x.counter instead of the local counter variable.

Interlocked.increment still not solving value missing problems

I'm studying C# right now and currently learning threading.
Here is a simple example to adding 1 to a variable multiple times within different threads.
The book suggested I can use Interlocked.increment(ref number) to replace the number += 1 within the AddOne method, therefore the value will be locked until it's updated within the thread. So the output will be 1000, 2000, ..... 10000 as expected. But My output is still 999, 1999, 2999, ...... 9999.
Only after I uncomment the Thread.Sleep(1000) line will the output be correct but even without the Interlocked been used.
Can anyone explain what's happening here?
static void Main(string[] args)
{
myNum n = new myNum();
for (int i = 0;i<10; Interlocked.Increment(ref i))
{
for(int a =1;a<=1000; Interlocked.Increment(ref a))
{
Thread t = new Thread( new ThreadStart( n.AddOne));
t.Start();
}
//Thread.Sleep(1000);
Console.WriteLine(n.number);
}
}
class myNum
{
public int number = 0;
public void AddOne()
{
//number += 1;
Interlocked.Increment(ref number);
}
}
You are printing out the value before all of the threads have finished executing. You need to join all of the threads before printing.
for(int a = 0; a < 1000; a++)
{
t[a].Join();
}
You'll need to store the threads in an array or list. Also, you don't need the interlocked instruction in any of the for loops. They all run in only one thread (the main thread). Only the code in AddOne runs in multiple threads and hence needs to by synchronized.
It a bit strange for me what you trying to achieve with this code. You are using Interlocked.Increment everywhere without explicit needs for it.
Interlocked.Increment required for access to values which can be accessed from different threads. In your code it is only number, so you don't require it for i and a, just use as usually i++ and a++
The problem you are asking for is that you just don't wait for all threads you started are completed its job. Take a look to Thread.Join() method. You have to wait while all of threads you are started completes its work.
In this simple test you are done with Thread.Sleep(1000); you do similar wait but its not correct to assume that all threads are complete in 1000 ms, so just use Thread.Join() for that.
If you modify your AddOne() method so it starts to executes longer (e.g. add Thread.Sleep(1000) to it) you'll notice that Thread.Sleep(1000); doesn't help any more.
I'll suggest to read more about ThreadPool vs Threads. Also take a look to Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4

Thread usage for optimization

Here is a piece of code in C# which applies an operation over each row of a matrix of doubles (suppose 200x200).
For (int i = 0; i < 200; i++)
{
result = process(row[i]);
DoSomething(result);
}
Process is a static method and I have a Corei5 CPU and Windows XP and I'm using .Net Framework 3.5. To gain performance, I tried to process each row using a separate thread (using Asynchronous delegates). So I rewrote the code as follows:
List<Func<double[], double>> myMethodList = new List<Func<double[], double>>();
List<IAsyncResult> myCookieList = new List<IAsyncResult>();
for (int i = 0; i < 200; i++)
{
Func<double[], double> myMethod = process;
IAsyncResult myCookie = myMethod.BeginInvoke(row[i], null, null);
myMethodList.Add(myMethod);
myCookieList.Add(myCookie);
}
for (int j = 0; j < 200; j++)
{
result = myMethodList[j].EndInvoke(myCookieList[j]);
DoSomething(result);
}
This code is being called for 1000 matrixes in one run. When I tested, surprisingly I didn't get any performance improvement! So this brought up this question for me that in what cases the multi-threading will be of benefit for performance enhancement and also is my code logical?
At first glance, your code looks OK. Maybe the CPU isn't the bottleneck.
Can you confirm that process() and DoSomething() are independent and don't do any I/O or locking for shared resources?
The point here is that you'll have to start measuring.
And of course Fx4 with the TPL makes this kind of thing easier to write and ususally more efficient.
You could achieve more parallelism (in the result processing, specifically) by calling BeginInvoke with an AsyncCallback - this will do the result processing in a ThreadPool thread, instead of inline as you have it currently.
See the last section of the async programming docs here.
Before you do anything to modify the code, you should profile it to find out where the program is spending its time.
Your code is going a little overboard. Look at the loops; for each of 200 iterations, you are creating a new thread to make an asynchronous call. That will result in your process having 201 active threads. There is a law of diminishing returns; at about double the number of threads as the number of "execution units" that the processor has (the number of CPUs, times the number of cores on each CPU, X2 if the cores can be hyper-threaded), your computer will start spending more time scheduling threads than it spends running them. The state-of-the-art servers have 4 quad-core HT CPUs, for about 32 EUs. 200 actively executing threads will make this server break down and cry.
If the order of processing doesn't matter, I would implement a MergeSort-like algorithm; break the array in half, process the left hand, process the right hand. Each "left hand" can be processed by a new thread, but process the "right hand" in the current thread. Then, implement some thread-safe means to limit the thread count to about 1.25 times the number of "execution units"; If the limit has been reached, continue processing linearly without creating a new thread.
It looks like you aren't gaining any performance because of the way you are handling the EndInvoke method call. Since you are calling "process" using BeginInvoke, those function calls return immediately so the first loop probably finishes in no time at all. However, EndInvoke blocks until the call for which it is being called is finished processing, which you are still using sequentially. As Steve said, you should use an AsyncCallback so that each completion event is handled on it's own thread.
you are not seeing much gain because you are not parallelizing the code, yes, you are doing async but that just means that your loop does not wait to calculate to go to the next step. use Parallel.For instead of for loop and see if you see any gain on your multi-core box...
If you are going to use async delegates, this would be the way to do it to enure the callbacks happen on a Thread pool thread;
internal static void Do()
{
AsyncCallback cb = Complete;
List<double[]> row = CreateList();
for (int i = 0; i < 200; i++)
{
Func<double[], double> myMethod = Process;
myMethod.BeginInvoke(row[i], cb, null);
}
}
static double Process (double[] vals)
{
// your implementation
return randy.NextDouble();
}
static void Complete(IAsyncResult token)
{
Func<double[], double> callBack = (Func<double[], double>)((AsyncResult)token).AsyncDelegate;
double res = callBack.EndInvoke(token);
Console.WriteLine("complete res {0}", res);
DoSomething(res);
}

Categories