c# multiple tasks do not run in parallel

c# multiple tasks do not run in parallel - c#

I tried to create a multithreaded dice rolling simulation - just for curiosity, the joy of multithreaded progarmming and to show others the effects of "random results" (many people can't understand that if you roll a laplace dice six times in a row and you already had 1, 2, 3, 4, 5 that the next roll is NOT a 6.). To show them the distribution of n rolls with m dice I created this code.
Well, the result is fine BUT even though I create a new task for each dice the program runs single threaded.
Multithreaded would be reasonable to simulate "millions" of rerolls with 6 or more dice as the time to finish will grow rapidly.
I read several examples from msdn that all indicate that there should be several tasks running simultanously.
Can someone please give me a hint, why this code does not utilize many threads / cores? (Not even, when I try to run it for 400 dice at once and 10 Million rerolls)
At first I initialize the jagged Array that stores the results. 1st dimension: 1 Entry per dice, the second dimension will be the distribution of eyes rolled with each dice.
Next I create an array of tasks that each return an array of results (the second dimension, as described above)
Each of these arrays has 6 entries that represent eachs side of a laplace W6 dice. If the dice roll results in 1 eye the first entry [0] is increased by +1. So you can visualize how often each value has been rolled.
Then I use a plain for-loop to start all threads. There is no indication to wait for a thread until all are started.
At the end I wait for all to finish and sum up the results. It does not make any difference if change
Task.WaitAll(tasks); to
Task.WhenAll(tasks);
Again my quation: Why doesn't that code utilize more than one core of my CPU? What do I have to change?
Thanks in advance!
Here's the code:
private void buttonStart_Click(object sender, RoutedEventArgs e)
{
int tries = 1000000;
int numberofdice = 20 ;
int numberofsides = 6; // W6 = 6
var rnd = new Random();
int[][] StoreResult = new int[numberofdice][];
for (int i = 0; i < numberofdice; i++)
{
StoreResult[i] = new int[numberofsides];
}
Task<int[]>[] tasks = new Task<int[]>[numberofdice];
for (int ctr = 0; ctr < numberofdice; ctr++)
{
tasks[ctr] = Task.Run(() =>
{
int newValue = 0;
int[] StoreTemp = new int[numberofsides]; // Array that represents how often each value appeared
for (int i = 1; i <= tries; i++) // how often to roll the dice
{
newValue = rnd.Next(1, numberofsides + 1); // Roll the dice; UpperLimit for random int EXCLUDED from range
StoreTemp[newValue-1] = StoreTemp[newValue-1] + 1; //increases value corresponding to eyes on dice by +1
}
return StoreTemp;
});
StoreResult[ctr] = tasks[ctr].Result; // Summing up the individual results for each dice in an array
}
Task.WaitAll(tasks);
// do something to visualize the results - not important for the question
}
}

The issue here is tasks[ctr].Result. The .Result portion itself waits for the function to complete before storing the resulting int array into StoreResult. Instead, make a new loop after Task.WaitAll to get your results.

You may consider doing a Parallel.Foreach loop instead of manually creating separate tasks for this.
As others have indicated, when you try to aggregate this you just end up waiting for each individual task to finish, so this isn't actually multi-threaded.
Very important note: The C# random number generator is not thread safe (see also this MSDN blog post for discussion on the topic). Don't share the same instance between multiple threads. From the documentation:
...Random objects are not thread safe. If your app calls Random
methods from multiple threads, you must use a synchronization object
to ensure that only one thread can access the random number generator
at a time. If you don't ensure that the Random object is accessed in a
thread-safe way, calls to methods that return random numbers return 0.
Also, just to be nit-picky, using a Task is not really the same thing as doing multithreading; while you are, in fact, doing multithreading here, it's also possible to do purely asynchronous, non-multithreaded code with async/await. This is used mostly for I/O-bound operations where it's largely pointless to create a separate thread just to wait for a result (but it's desirable to allow the calling thread to do other work while it's waiting for the result).
I don't think you should have to worry about thread safety while assigning to the main array (assuming that each thread is assigning only to a specific index in the array and that no one else is assigning to the same memory location); you only have to worry about locking when multiple threads are accessing/modifying shared mutable state at the same time. If I'm reading this correctly, this is mutable state (but it's not shared mutable state).

Related

Reliable way to prove that int++ is not atomic

I can already see it's not by the incorrect increments, but there's just one small piece of the puzzle I can't quite seem to catch.
We have the following code:
internal class StupidObject
{
static public SemaphoreSlim semaphore = new SemaphoreSlim(0, 100);
private int counter;
public bool MethodCall() => counter++ == 0;
public int GetCounter() => counter;
}
And the following test code to try and see if it's an atomic operation:
var sharedObj = new StupidObject();
var resultTasks = new Task[100];
for (int i = 0; i < 100; i++)
{
resultTasks[i] = Task.Run(async () =>
{
await StupidObject.semaphore.WaitAsync();
if (sharedObj.MethodCall())
{
Console.WriteLine("True");
};
});
}
Console.WriteLine("Done");
Console.ReadLine();
StupidObject.semaphore.Release(100);
Console.ReadLine();
Console.WriteLine(sharedObj.GetCounter());
Console.ReadLine();
I expect to see multiple True's written to the console, but I ever see a single one.
Why is that? By my understanding, a ++ operation reads the value, increments the read value, and then stores that value to the variable.
Those are 3 operations. If we had a race condition, where thread A did the following:
Reads value to be 0.
Increments read value by 1.
And another thread B did the same things, but beat thread A to the third operation as following:
Writes read value to variable.
When A finishes writing the incremented read value, it should print back 0, same with thread B after it has done its write operation.
Am I missing something at the design aspect of things, or is my test not good enough to make this exact situation come to fruition?
Example without the Task Parallel Library (still yields a single True to the console):
var sharedObj = new StupidObject();
var resultTasks = new Thread[10000];
for (int i = 0; i < 10000; i++)
{
resultTasks[i] = new Thread(() =>
{
StupidObject.semaphore.Wait();
if (sharedObj.MethodCall())
{
Console.WriteLine("True");
};
});
resultTasks[i].IsBackground = false;
resultTasks[i].Start();
}
Console.WriteLine("Done");
Console.ReadLine();
StupidObject.semaphore.Release(10000);

What Liam said about Console.WriteLine is possible, but also there's another thing.
Starting Tasks doesn't equal starting threads, and even starting threads doesn't guarantee that all threads will begin immediatelly. Starting 100 short tasks probably won't even fill .Net's thread pool significantly, because those tasks end quickly and thread pool's manager probably won't start more than 3-5 threads. That's not the "immediate" and "parallel" you'd like to see when you want to start parallel 100 increments to race with each other, right? Remember that Tasks are queued first, then assigned to threads.
Note that the StupidObject's counter starts with zero and that's the ONLY MOMENT EVER that the value is zero. If ANY thread wins the race and successfully writes an update to that integer, you'll get FALSE in all future tasks, because it's already 1.
And if there are many tasks on the thread pool's queue, something first has to notice that fact. At program's start, thread pool lacks threads. They are not started in dozens right at program start. They are started on demand. Most probably you fill up the queue with 100 tasks, threadpool's thread is created, picks first task, bumps counter to 1, then maybe thread pool starts new threads to consume tasks faster.
To get a bit better image what's happening, instead of printing out 'true', collect values observed by return counter++: let each task run, finish, store its value in Task's .Result, then run threads/tasks, then wait for all of then to stop, then collect .Results and write a histogram of those values. Even if you don't see 5 zeros, maybe you will see 3 ones, 7 twos, 2 threes and so on.

Parallel.For: Is it safe to lock the value?

Im trying to figure out the output of this code:
Dictionary<int, MyRequest> request = new Dictionary<int, MyRequest>();
for (int i = 0; i < 1000; i++ )
{
request.Add(i, new MyRequest() { Name = i.ToString() });
}
var ids = request.Keys.ToList();
Parallel.For(0, ids.Count, (t) =>
{
var id = ids[t];
var b = request[id];
lock (b)
{
if (b.Name == 4.ToString())
{
Thread.Sleep(10000);
}
Console.WriteLine(b.Name);
}
});
Console.WriteLine("done");
Console.Read();
output:
789
800
875
.
.
.
4
5
6
7
done
MyRequest is just a dummy class used for demonstration (it is not doing anything but holding values). Is my lock blocking the execution or are the last 4 being put on their own thread?
This is a .NET 4.0 demo.
UPDATE
Ok I did figure out they were on teh same thread, but i would still like to know if the lock does anything to block execution. I cant imagine it does.

If ids does not contain duplicates, that lock won't block anything. But if there are duplicates in ids, then yes, there might be contention at the lock, as different threads fight for access to the same request.

Your lock will only be blocking execution if the ids line up such that you retrieve the same request more than once. Since different names are being printed each time, that shouldn't be a concern.

Parallel.For uses a thread pool to process your loop. As soon as one of its threads is free, it assigns it to the next element. This is non-deterministic, because you don't know how many threads there are in the pool, and you don't control the CPU time given to each thread. This means that some threads may finish sooner or later than you would "naturally" expect.
Your lock isn't doing anything. A lock blocks delimits sections of code that attempt to use the same object. In your case, you're not ever using the same object twice in the loop. The fact that the last IDs processed seem consistent is probably purely coincidental.

Multi-part question about multi-threading, locks and multi-core processors (multi ^ 3)

I have a program with two methods. The first method takes two arrays as parameters, and performs an operation in which values from one array are conditionally written into the other, like so:
void Blend(int[] dest, int[] src, int offset)
{
for (int i = 0; i < src.Length; i++)
{
int rdr = dest[i + offset];
dest[i + offset] = src[i] > rdr? src[i] : rdr;
}
}
The second method creates two separate sets of int arrays and iterates through them such that each array of one set is Blended with each array from the other set, like so:
void CrossBlend()
{
int[][] set1 = new int[150][75000]; // we'll pretend this actually compiles
int[][] set2 = new int[25][10000]; // we'll pretend this actually compiles
for (int i1 = 0; i1 < set1.Length; i1++)
{
for (int i2 = 0; i2 < set2.Length; i2++)
{
Blend(set1[i1], set2[i2], 0); // or any offset, doesn't matter
}
}
}
First question: Since this apporoach is an obvious candidate for parallelization, is it intrinsically thread-safe? It seems like no, since I can conceive a scenario (unlikely, I think) where one thread's changes are lost because a different threads ~simultaneous operation.
If no, would this:
void Blend(int[] dest, int[] src, int offset)
{
lock (dest)
{
for (int i = 0; i < src.Length; i++)
{
int rdr = dest[i + offset];
dest[i + offset] = src[i] > rdr? src[i] : rdr;
}
}
}
be an effective fix?
Second question: If so, what would be the likely performance cost of using locks like this? I assume that with something like this, if a thread attempts to lock a destination array that is currently locked by another thread, the first thread would block until the lock was released instead of continuing to process something.
Also, how much time does it actually take to acquire a lock? Nanosecond scale, or worse than that? Would this be a major issue in something like this?
Third question: How would I best approach this problem in a multi-threaded way that would take advantage of multi-core processors (and this is based on the potentially wrong assumption that a multi-threaded solution would not speed up this operation on a single core processor)? I'm guessing that I would want to have one thread running per core, but I don't know if that's true.

The potential contention with CrossBlend is set1 - the destination of the blend. Rather than using a lock, which is going to be comparatively expensive compared to the amount of work you are doing, arrange for each thread to work on it's own destination. That is a given destination (array at some index in set1) is owned by a given task. This is possible since the outcome is independent of the order that CrossBlend processes the arrays in.
Each task should then run just the inner loop in CrossBlend, and the task is parameterized with the index of the dest array (set1) to use (or range of indices.)
You can also parallelize the Blend method, since each index is computed independently of the others, so no contention there. But on todays machines, with <40 cores you will get sufficient parallism just threading the CrossBlend method.
To run effectively on multi-core you can either
for N cores, divide the problem into N parts. Given that set1 is reasonably large compared to the number of cores, you could just divide set1 into N ranges, and pass each range of indices into N threads running the inner CrossBlend loop. That will give you fairly good parallelism, but it's not optimal. (Some threads will finish sooner and end up with no work to do.)
A more involved scheme is to make each iteration of the CrossBlend inner loop a separate task. Have N queues (for N cores), and distribute the tasks amongst the queues. Start N threads, with each thread reading it's tasks from a queue. If a threads queue becomes empty, it takes a task from some other thread's queue.
The second approach is best suited to irregularly sized tasks, or where the system is being used for other tasks, so some cores may be time switching between other processes, so you cannot expect that equal amounts of work complete in the roughly same time on different cores.
The first approach is much simpler to code, and will give you a good level of parallelism.

Random number clashes with same .Net code in different processes

Before I start, I want to point out that I'm pretty sure this actually happened. All my logs suggest that it did.
I'd like to know whether I'm wrong and this is impossible, whether it's just incredibly unlikely (which I suspect), or if it's not that unlikely and I'm doing something fundamentally wrong.
I have 4 instances of the same code running as Windows Services on the same server. This server has a multicore (4) processor.
Here's a summary of the code:
public class MyProcess
{
private System.Timers.Timer timer;
// execution starts here
public void EntryPoint()
{
timer = new System.Timers.Timer(15000); // 15 seconds
timer.Elapsed += new System.Timers.ElapsedEventHandler(Timer_Elapsed);
timer.AutoReset = false;
Timer_Elapsed(this, null);
}
private void Timer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
string uid = GetUID();
// this bit of code sends a message to an external process.
// It uses the uid as an identifier - these shouldn't clash!
CommunicationClass.SendMessage(uid);
timer.Start();
}
// returns an 18 digit number as a string
private string GetUID()
{
string rndString = "";
Random rnd = new Random((int)DateTime.Now.Ticks);
for (int i = 0; i < 18; i++)
{
rndString += rnd.Next(0, 10);
}
return rndString;
}
The external process that receives these messages got confused - I think because the same uid came from two separate processes. Based on that, it appears that the GetUID() method returned the same "random" 18 digit string for two separate processes.
I've seeded the Random class using DateTime.Now.Ticks which I thought would provide protection between threads - one tick is 100 nanoseconds, surely two threads couldn't get the same seed value.
What I didn't account for obviously is that we're not talking about threads, we're talking about processes on a multicore processor. That means that this code can literally run twice at the same time. I think that's what's caused the clash.
Two processes running the same code at approximate 15 second intervals managed to hit the same code inside 100 nanoseconds. Is this possible? Am I on the right track here?
I'd be grateful for your thoughts or suggestions.
To clarify, I can't really use a GUID - the external process I'm communicating with needs an 18 digit number. It's old, and I can't change it unfortunately.

Unless there is some reason you can't, you should look at using a GUID for this purpose. You will eliminate collisions that way.
Per comment: You could use a GUID and a 64-bit FNV hash and use XOR-folding to get your result to within the 59-bits that you have. Not as collision proof as a GUID, but better than what you have.

You don't want random numbers for this purpose, you want unique numbers. I'm with #JP. I think you should look at using GUIDs for your message ids.
EDIT: if you can't use a GUID, then think of a way to get a 64-bit number that is unique and use successive 3-bit chunks of it as an index into a 8-character alphabet (tossing the unused upper bits). One way to do this would be to have a database in which you create an entry for each new message with an auto-incremented 64-bit integer as the key. Use the key and translate it into your 18-character message id.
If you don't want to rely on a database you can get something that works under certain conditions. For example, if the messages only need to be unique during the life of the processes, then you could use the process id as 32 bits of the value and get the remaining 22 required bits from a random number generator. Since no two processes running at the same time can have the same id, they should be guaranteed to have unique message ids.
There are undoubtedly many other ways that you could do this if your situation doesn't fit into one of the above scenarios.

Try using this function for the seed, in place of DateTime.Now.Ticks:
public static int GetSeed()
{
byte[] raw = Guid.NewGuid().ToByteArray();
int i1 = BitConverter.ToInt32(raw, 0);
int i2 = BitConverter.ToInt32(raw, 4);
int i3 = BitConverter.ToInt32(raw, 8);
int i4 = BitConverter.ToInt32(raw, 12);
long val = i1 + i2 + i3 + i4;
while (val > int.MaxValue)
{
val -= int.MaxValue;
}
return (int)val;
}
This basically turns a Guid into an int. You could theoretically get duplicates, but it's cosmically unlikely.
Edit: or even just use:
Guid.NewGuid().GetHashCode();
Using DateTime.Now.Ticks, on the other hand, almost guarantees a collision at some point. It's very common in Windows programming to have a timer's resolution specified in units that are far beyond the timer's actual accuracy (I first ran into this with Visual Basic 3.0's timer control, which was set in milliseconds but only actually went off 18 times a second). I don't know this for sure, but I bet if you just ran a loop and printed out DateTime.Now.Ticks, you would see the values quantizing around 15ms intervals or so. So with 4 processes going it's actually really likely that two of them would end up using the exact same seed for the Random function.
Since the Guid-based GetSeed function has a super-tiny chance of producing duplicates, ideally you'd want to create some sort of bank of pre-calculated unique numbers. However, since you're talking about separate processes here, you'd have to come up with some way of caching the values where all the processes could read them, which is a bother.
If you want to worry about cosmically unlikely events, buy lottery tickets.

Another way to accomplish this is to not use the Random class as it is fraught with issues like this. You can accomplish the same functionality (a random 18 digit number) using the cryptographic quality random number generator available in System.Security.Cryptography.
I have modified your code to use the RNGCryptoServiceProvider class to generate the id.
// returns an 18 digit number as a string
private string GetUID()
{
string rndString = "";
var rnd = new RNGCryptoServiceProvider();
var data = new byte[18];
rnd.GetBytes(data);
foreach(byte item in data)
{
rndString += Convert.ToString((int)item % 10);
}
return rndString;
}

Yip it can happen, and therefore it did.
You should initialise Random only once, at start-up. If you have many threads starting at the same time, get a copy of DateTime.Now.Ticks and pass it to each thread with a known offset to prevent then initialising on the same time.

I also concur with the GUID idea.
As for your original problem, since Ticks is a long, this statement:
(int)DateTime.Now.Ticks
will cause an overflow. Not sure what kinda nastiness will happen then...

One thread can change the reference to the array while another thread is iterating through it

I have a static array variable which is being shared by two threads.
I'd like to understand what happens if I assign the array variable to another array object in Thread1 while Thread2 is iterating over the array.
I.e
In thread 1
MyStaticClass.MyArray = SomeOtherArray
While in Thread 2:
for (int i = 0; i < MyStaticClass.MyArray.length; i++)
{
//do something with the i'th element
}

Assuming MyStaticClass.MyArray is just a field or simple property.
Nothing good or predictable will happen that's for sure.
I'd say most likely are:
The loop may read one half of old array and rest from new array
The new array may be shorter than the last giving a exception when you access [i]
Or Thread 2 may actually completely ignore the change to the array! And what's worse, this behaviour may be different in Release build to Debug.
The last situation is due to compiler optimisations and/or the way the memory model works in .net (and other languages like Java BTW). There's a whole keyword to get around this issue (volatile) http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/

To prevent this, you want to use a lock to lock the critical section. Basically wrapping your itteration in a lock will prevent the other thread from overwriting the array while you are processing it
The lock keyword ensures that one thread does not enter a critical section of code while another thread is in the critical section. If another thread tries to enter a locked code, it will wait, block, until the object is released.
https://msdn.microsoft.com/en-us/library/c5kehkcz.aspx

The condition in your for loop is evaluated for every iteration. So if another thread changes the reference in MyStaticClass.MyArray, you use this new reference in the next iteration.
For example, this code:
int[] a = {1,2,3,4,5};
int[] b = {10, 20, 30, 40, 50, 60, 70};
for (int i = 0; i < a.Length; i++)
{
Console.WriteLine(a[i]);
a = b;
}
gives this output:
1
20
30
40
50
60
70
To avoid this, you could use foreach:
foreach(int c in a)
{
Console.WriteLine(c);
a = b;
}
gives:
1
2
3
4
5
because foreach is translated so that it calls a.GetEnumerator() once and then uses this enumerator (MoveNext() and Current) for all iterations, not accessing a again.

There will be no consequences other than thread 2 at some point starting to work on the new array instead of the old. When exactly this happens is more or less random, and as weston writes in his comment, it may never even happen at all.
If the new array has fewer elements than the old one this can (but doesn't have to) cause a runtime exception.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.