Parallel programming in C# how to separate data

Parallel programming in C# how to separate data - c#

At school we started multithreading last week and we already are on multiprocessing now, I am getting a little lost on it so I am going to explain the problem to you.
For an exercise we must make a casino game simulator which simulates 10000 games so that we can know how often the casino wins the game.
So I coded the simulator, and I've got 5 methods to run a game :
static void game(Croupier croupier)
{
croupier.createNewCardDeck();
croupier.shuffleCards();
croupier.giveCardsToPlayers();
croupier.countPlayerPoints();
croupier.displayResults();
}
If I call game in a classic for loop of 10000 iterations it runs ok, takes approx 2 seconds, and the bank wins 50% of times.
If I use Parallel.For, it crashes on shuffleCards, because (I think) multiple processes are trying to edit the same pack of cards at the same time.
My first idea was to put a Mutex on my shuffleCards, but it would slow down the simulation, when the point of using Parallel programming was to increase speed. So I thought to separate data on the different processes ( so that instead of 10000 iterations, I do 2500 on 4 processes, every loop having its own croupier, players, cards etc...)
What do you think would be the best way of resolving this problem ? Have you got any simple tutorial that explains how to deal with parallel work which uses the same data ? Which solution would you choose ?
Thanks
Edit : ShuffleCard method
List<Card> randomList = new List<Card>();
Random r = new Random();
int randomIndex = 0;
while (_cards.Count > 0)
{
randomIndex = r.Next(0, _cards.Count); //Choose a random object in the list
randomList.Add(_cards[randomIndex]); //add it to the new, random list
_cards.RemoveAt(randomIndex); //remove to avoid duplicates
}
return randomList;
So yes _cards being a private property of croupier (which calls this._cards = shuffleCards() , every process have the same card list

Your idea is the way to go: Give each "processing unit" (i.e. thread, task) its own game table (croupier, players, cards). Just like in a real casino you can have as many game tables as you want, playing all at the same time, indepently from each other because they do not share any data. Whenever a game is finished, the results is transferred to the bank (of which you have only one). So the only thing that must be synchronized (with a criticial section) is the aggregation of the results into the bank.
This example is the perfect trivial example for parallel programming, because the real world can rather intuitively be modeled into the corresponding classes and algorithms.

Either give each thread it's own set of collections or implement a concurrent collection which implements its own thread locking.

so that instead of 10000 iterations, I do 2500 on 4 processes
Can use Task and Data Parallelism together, For eg,
int noOfProcess = 4;
Task[] t = new Task[noOfProcess];
for (int i = 0; i < noOfProcess; i++)
{
t[i]= Task.Factory.StartNew(() =>
{
Parallel.For(0, 2500, (v) => game(..));
});
}
Task.WaitAll(t); //If Synchronous is needed.
Avoid Writing to Shared Memory Locations.Check this msdn for some pitfalls while using Parallel Programming.

Earlier answers are indeed right, the goal is to split up the work in order to run it as quickly as possible. In short: design your code to be concurrent, so it can run in parallel.
In each run, we have:
A croupier
A game being played
Only one game per Croupier at one time.
The results of the game
The two main decisions regarding concurrency we need to make is
How do we share or distribute Croupiers amongst games?
How do results get shared?
Here are the different options for the Croupier:
If we have only one Croupier, we will end up with no concurrency. Only one game can be played at one time.
We could have one Croupier per game. That way we could theoretically run each game simultaneously.
We could have one Croupier per processing unit. This would allow as many games to run as possible, but this would may not balance as well as one Croupier per game, as there might be other factors outside of raw processing. Imagine if writing the results was a lengthy IO operation, but not CPU intensive. We could be running more games, but instead the Croupier is waiting for results to finish.
For the results, we could:
Output results to some stream as we received them. If this is a console, output could easily become garbled.
Output results to some consumer that processes the results in turn. This is the better option as it allows the result state to be returned without effecting games.
So the overall decision making should always be around: how can I make my code as concurrency friendly as possible, to allow the hosting system to run things as best it can.
For the example below, I have opted for a Croupier per processing unit, not because it is better, but because it was not illustrated in the other answers.
Here is some sample code illustrating some of these ideas:
void Main()
{
const int NUMBER_OF_GAMES = 10000;
// this is how we have a Croupier per thread.
var threadLocalCroupier = new ThreadLocal<Croupier>(() => new Croupier());
var results = from gameNumber in Enumerable.Range(0, NUMBER_OF_GAMES).AsParallel()
let croupier = threadLocalCroupier.Value
select game(croupier, gameNumber);
foreach (var result in results) {
Console.WriteLine("Game done {0}", result.GameNumber);
// display or analyse results.
}
}
static ResultOfGame game(Croupier croupier, int gameNumber)
{
croupier.createNewCardDeck(gameNumber);
croupier.shuffleCards();
croupier.giveCardsToPlayers();
croupier.countPlayerPoints();
var results = croupier.getResults();
return results;
}
class ResultOfGame {
public int GameNumber { get; private set; }
public ResultOfGame(int gameNumber)
{
this.GameNumber = gameNumber;
}
}
// Define other methods and classes here
class Croupier {
private int currentGame;
public void createNewCardDeck(int gameNumber) {this.currentGame = gameNumber;}
public void shuffleCards() {}
public void giveCardsToPlayers() {}
public void countPlayerPoints() {}
public ResultOfGame getResults() {
return new ResultOfGame(this.currentGame);
}
}

Related

Make segment of code in c# thread un-interruptible

I'm using a c# thread in which I want to take a snapshot of 28 dictionaries (stored in an array for easy access) that are being populated by event handler methods, in the thread I'm looping over the array of dictionaries and storing the snapshots using ToList() method in another array of Lists, so that I can do some work in the thread afterwards even though the collections keep changing in real-time. The problem is that I want the snapshots of all dictionaries to be as close in time as possible, so I don't want my thread to be interrupted while I'm looping over the array. The code looks something like this:
public void ThreadProc_Scoring()
{
List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[] SnapBOUGHT = new List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[27];
List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[] SnapSOLD = new List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[27];
int index;
int NormIdx;
DateTime actual = new DateTime();
double score = 0;
double totAScore = 0;
double totBScore = 0;
double AskNormalization = 1;
double BidNormalization = 1;
while (true)
{
Thread.Sleep(15000);
//Get a snapshot of BOUGHT/SOLD collections
//Enter un-interruptible section
foreach (var symbol in MySymbols)
{
index = symbol.Value;
SnapBOUGHT[index] = BOUGHT[index].ToList();
SnapSOLD[index] = SOLD[index].ToList();
}
//Exit un-interruptible section
//Do some work on Snapshots
}
If you have any other suggestions on how to approach the situation, I'd be grateful for the help. Thank you!

You are looking for a critical section, i.e. a lock. used like
lock(myLockObject){
// do uninteruptable work
}
where myLockObject is a object that is shared by all threads that use the resources. Any thread that uses the dictionaries will need to take this lock using the same object. You might be able to use a readwritelock to reduce contention, but that would require more details about your specific use case. If you are using regular dictionaries you will need some kind of locking anyway since dictionaries are not threadsafe, this can be avoided by using concurrentDictionaries.
If you do not depend on the dictionary snapshot being taken at the same time you can simply use concurrentDictionaries and run your loop without locking, and deal with the consequences.
The idea of "uninteruptable" does not really exist in c#, it is the OS that controls all scheduling of threads. Processors have a feature to disable interrupts that have sort of that effect, but this is used for system level programming and not for regular client software.

Assuming that you are targeting a multitasking operating system (like Windows, Linux, Android, macOS, iOS, Solaris etc), the closest you can get to an un-interruptible section is to run this section on a thread configured with the maximum priority:
var thread = new Thread(GetTheSnapshots);
thread.Priority = ThreadPriority.Highest;
thread.Start();
You should not expect it to be consistently and deterministically un-interruptible though, since the operating system is not obligated to schedule the thread on a time-slice of infinite duration.

Threading and Thread Safety - How to managed data in and out of a thread [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I’m in the process of trying to teach myself Threading in c# and I’ve been reading a number of tutorials, questions and examples. I’ve successfully(*it seems to work) implemented threading into much bigger application but they are some areas which just feel very gray to me.
I’ve tried to put together a small console application, as point of discussion and to try and answer the questions posed. I’m not an experienced programmer – so if I’ve committed some mortal sins here I sincerely apologise. Feel free to point them out as well in bid to improve my programming skills. Hopefully the questions I raise here will help me and others trying to understand threading.
The first question is – if I called
randomNums.GenrateRandomNumbers() inside ThreadStart(), would that
be considered unsafe. I’m concluding it would be as
PrintRandomCNumbers() is being called from the other threads and it
would mean the object would be in very much undetermined state?
If I wanted to call randomNums.GenrateRandomNumbers, what would be
the thread safe way to call it? How and where would I implement the
locks, would I use write, multiple read lock?
When I run this application, each thread correctly outputs the
contents of randomNums, is there a scenario (multiple processors or
cores), where given this implementation the information wouldn’t be
present to output but the copy of the object reference still be in
scope. I.e. randomNums becomes null.
If there is not hardware scenario, how would I manipulate this
example to generate a scenario like this. I.e. Threadmanger has
the Randomnums object reference but it just end up pointing at
uninitiated object, but attempt to initiate the object. (I had a
similar problem to this in my bigger application.)
What is best design practice for getting data into and out of
thread?
When designing a thread to start, is good practice to manage the
start and top of thread inside the object or outside the object.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
// Using System Threading for theads;
using System.Threading;
namespace ThreadingExamples
{
class Program
{
static void Main(string[] args)
{
MyApplication app = new MyApplication();
app.Start();
}
public class MyApplication
{
private RandomNumbers randomNums;
public MyApplication()
{
}
public void Start()
{
randomNums = new RandomNumbers();
randomNums.GenrateRandomNumbers();
randomNums.PrintRandomcNumbers();
ThreadManager newThreadMan = new ThreadManager(randomNums);
Console.ReadLine();
}
}
public class ThreadManager
{
private RandomNumbers randomNums;
private Thread[] newThreads;
private int threadCount;
public ThreadManager(RandomNumbers newRandomNums)
{
threadCount = 3;
randomNums = newRandomNums;
newThreads = new Thread[threadCount];
for (int i = 0; i < threadCount; i++)
{
newThreads[i] = new Thread(ThreadStart);
newThreads[i].Start();
}
}
public void ThreadStart()
{
randomNums.PrintRandomcNumbers();
}
}
public class RandomNumbers
{
private Random rnd = new Random();
private int numberToStore;
private int[] randomNumbers;
public RandomNumbers()
{
numberToStore = 12;
randomNumbers = new int[numberToStore];
}
public void GenrateRandomNumbers()
{
for (int i = 0; i < numberToStore; i++)
{
randomNumbers[i] = rnd.Next(1,13);
}
}
public void PrintRandomcNumbers()
{
StringBuilder outputString = new StringBuilder();
for (int i = 0; i < numberToStore; i++)
{
outputString = new StringBuilder("The Random Numbers in position ");
outputString.Append(i.ToString());
outputString.Append("is the number: ");
outputString.Append(randomNumbers[i].ToString());
Console.WriteLine(outputString);
}
}
}
}
}

As written the calling of random numbers is actually safe enough, bad practice since random numbers will change based on the thread executing at the time, Something to keep in mind here, if numbers to store changes in each thread, say one is 6 and another thread has a number of 12, then both threads will have a list of random numbers that has 12 numbers in it, and the one with 6 will only change the first 6 elements in the list, not resize the list itself.
The simple explanation for how to safely change data from an internal thread is don't. I would actually find a different way to do this. If you find yourself needing to display different random numbers in each thread for instance generate a different list and pass it to the thread.
Since you aren't instantiating a new object as written this program will work fine, the data might not be exactly what you would expect, but there is not a situation where the data will not be there.
Actually this would be simple to force to fail, simple instantiate a new list every time the GenerateRandomNumbers function is called. This can cause runtime issues when one thread is at the point in it's cycle that it generates a new list, and another is looking for item 7 on that list, which isn't there because the new list is not initialized.
public void GenrateRandomNumbers()
{
randomNumbers=new List<int>();
for (int i = 0; i < numberToStore; i++)
{
randomNumbers[i] = rnd.Next(1,13);
}
}
One of the things i have found when using threading is to make all your objects immutable, this eliminates the need to be careful about reassigning values in a thread because to put it simply it cannot be done. It does cause you to have to think about how you can accomplish what you are trying to accomplish without changing a value, but that's actually a good way to program, and will make your threading experience much better.
A thread is essentially a fire and forget program, it goes and does it's own thing regardless of where or when it was called, and has the ability to interact with all running programs. Since this is true it doesn't really matter where you start the thread at, what matters is how that thread interacts with whatever you are running. I have always found the best practice when using threads is only implement a thread when you absolutely have to. To do a multi-threaded application you should be planning on making it multi-threaded in the design stages. Think very carefully about what each thread should do, and always be aware of how it will interact with the other running threads.

Worker Role process - Configuration value polling

I have a Worker Role which processes items off a queue. It is basically an infinite loop which pops items off of the queue and asynchronously processes them.
I have two configuration settings (PollingInterval and MessageGetLimit) which I want the worker role to pick up when changed (so with no restart required).
private TimeSpan PollingInterval
{
get
{
return TimeSpan.FromSeconds(Convert.ToInt32(RoleEnvironment.GetConfigurationSettingValue("PollingIntervalSeconds")));
}
}
private int MessageGetLimit
{
get
{
return Convert.ToInt32(RoleEnvironment.GetConfigurationSettingValue("MessageGetLimit"));
}
}
public override void Run()
{
while (true)
{
var messages = queue.GetMessages(MessageGetLimit);
if (messages.Count() > 0)
{
ProcessQueueMessages(messages);
}
else
{
Task.Delay(PollingInterval);
}
}
}
Problem:
During peak hours, the while loop could be running a couple of times per second. This means that it would be querying the config items up to 100,000 times per day.
Is this detrimental or inefficient?

John's answer is a good one using the Environment Changing/Changed events to modify your settings without restarts, but I think perhaps a better method is for you to use an exponential back-off policy to make your polling more efficient. By having the code behavior smarter on it's own you will reduce how often you are in there tweaking it. Remember that each time you update these environment settings it has to be rolled out to all of the instances, which can take a little time depending on how many instances you have running. Also, you are putting a step in here that a human has to be involved.
You are using Windows Azure Storage Queues which means each time your GetMessages(s) executes it's making a call to the service and retrieving 0 or more messages (up to your MessageGetLimit). Each time it asks for that you'll get charged a transaction. Now, understand that transactions are really cheap. Even 100,000 transactions a day is $0.01/day. However, don't underestimate the speed of a loop. :) You may get more throughput than that and if you have multiple worker role instances this adds up (though will still be a really small amount of money compared to actually running the instances themselves).
A more efficient path would be to put in an exponential backoff approach to reading your messages off the queue. Check out this post by Maarten on a simple example: http://www.developerfusion.com/article/120619/advanced-scenarios-with-windows-azure-queues/. Couple a back off approach with an auto-scaling of the worker roles based on queue depth and you'll have a solution that relies less on a human adjusting settings. Put in minimum and maximum values for instance counts, adjust the numbers of messages to pull based on how many times a message has been present the very next time you ask for one, etc. There are a lot of options here that will reduce your involvement and have an efficient system.
Also, you might look at Windows Azure Service Bus Queues in that they implement long polling, so it results in much fewer transactions while waiting for work to hit the queue.

Upfront disclaimer, I haven't used RoleEnvironments.
The MDSN documentation for GetConfigurationSettingValue states that the configuration is read from disk. http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleenvironment.getconfigurationsettingvalue.aspx. So it is sure to be slow when called often.
The MSDN documentation also shows that there is an event fired when a setting changes. http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleenvironment.changed.aspx. You can use this event to only reload the settings when they have actually changed.
Here is one (untested, not compiled) approach.
private TimeSpan mPollingInterval;
private int mMessageGetLimit;
public override void Run()
{
// Refresh the configuration members only when they change.
RoleEnvironment.Changed += RoleEnvironmentChanged;
// Initialize them for the first time
RefreshRoleEnvironmentSettings();
while (true)
{
var messages = queue.GetMessages(mMessageGetLimit);
if (messages.Count() > 0)
{
ProcessQueueMessages(messages);
}
else
{
Task.Delay(mPollingInterval);
}
}
}
private void RoleEnvironmentChanged(object sender, RoleEnvironmentChangedEventArgs e)
{
RefreshRoleEnvironmentSettings();
}
private void RefreshRoleEnvironmentSettings()
{
mPollingInterval = TimeSpan.FromSeconds(Convert.ToInt32(RoleEnvironment.GetConfigurationSettingValue("PollingIntervalSeconds")));
mMessageGetLimit = Convert.ToInt32(RoleEnvironment.GetConfigurationSettingValue("MessageGetLimit"));
}

Unexpected thread contention in Parallel.Foreach

I have tried to implement the following algorithm using Parallel.Foreach. I thought it would be trivial to make parallel, since it has no synchronization issues. It is basically a Monte-Carlo tree search, where I explore every child in a parallel. The Monte-Carlo stuff is not really important, all you have to know is that I have a method which works a some tree, and which I call with Parallel.Foreach on the root children. Here is the snippet where the parallel call is being made.
public void ExpandParallel(int time, Func<TGame, TGame> gameFactory)
{
int start = Environment.TickCount;
// Creating all of root's children
while (root.AvailablePlays.Count > 0)
Expand(root, gameInstance);
// Create the children games
var games = root.Children.Select(c =>
{
var g = gameFactory(gameInstance);
c.Play.Apply(g.Board);
return g;
}).ToArray();
// Create a task to expand each child
Parallel.ForEach(root.Children, (tree, state, i) =>
{
var game = games[i];
// Make sure we don't waste time
while (Environment.TickCount - start < time && !tree.Completed)
Expand(tree, game);
});
// Update (reset) the root data
root.Wins = root.Children.Sum(c => c.Wins);
root.Plays = root.Children.Sum(c => c.Plays);
root.TotalPayoff = root.Children.Sum(c => c.TotalPayoff);
}
The Func<TGame, TGame> delegate is a cloning factory, so that each child has its own clone of the game state. I can explain the internals of the Expand method if required, but I can assure that it only accesses the state of the current sub-tree and game instances and there are no static members in any of those types. I thought it may be that Environment.TickCount is making the contention, but I ran an experiment just calling EnvironmentTickCount inside a Parallel.Foreach loop, and got nearly 100 % processor usage.
I only get 45% to 50% use on a Core i5.

This is a common symptom of GC thrashing. Without knowing more about what your doing inside of the Expand method, my best guess is this would be your root-cause. It's also possible that some shared data access is also the culprit, either by calling to a remote system, or by locking access to shared resources.
Before you do anything, You need to determine the exact cause with a profiler or other tool. Don't guess as this will just waste your time, and don't wait for an answer here as without your complete program it can not be answered. As you already know from experimentation, there is nothing in the Parallel.ForEach that would cause this.

Array of structs, multithreading, can I write at the same time to different indexes?

I have an huge array which contains a struct "Tile". The program im writing is a 2D game, and I don't want different players (handled by different threads) to write their position to the same tile at the same time, and I wondered two things. Can two threads write to two different places in the array at the same time safely, and is there some effective way to lock only one index of this array?

Yes, you can write to different positions simultaneously from different threads.
To do the locking, you should create an array of locks, and use some simple hashing technique to choose a lock, based on the position being written. For instance:
class TileArray
{
private static readonly int numLocks = 16;
private object[] locks = (from i in Range(0, numLocks) select new object()).ToArray();
private Tile[] tiles = hugeTileArray();
...
public Tile this[int i]
{
get { return tiles[i]; }
set
{
lock (locks[i % numLocks])
tiles[i] = value;
}
}
}
This avoids the need to create zillions of locks, but still keeps lock-contention to a minimum. You can set numLocks up or down, based on profiling. Keep it a power of two, though, for an efficient modulo computation.
One final minutiae: beware of aliasing effects. For instance, multiple-of-16 positions might happen to be very popular with your threads for some odd reason, in which case, contention will go through the roof. If this is the case, you'll need a stronger hash. Perhaps (uint)i.GetHashCode() % numLocks will do, but I'm not sure what Int32.GetHashCode does; it might just return the number itself. Failing this, you can steal one from Bob Jenkins.

You can use the Interlocked.CompareExchange function to do the read and write safely without the explicit use of locks.
public class Example
{
private Tile[] m_Array;
public Tile this[int index]
{
get { return Interlocked.CompareExchange(ref m_Array[i], null, null); }
set { Interlocked.CompareExchange(ref m_Array[i], value, m_Array[i]); }
}
}
Of course you will have to convert your Tile struct to a class to be able to do this.

There is no built-in support for what you want to do. Multiple threads can access the same array at the same time, but you'd need to take care of data consistency yourself by means of synchronization or so. Therefore, while it is possible to implement per-index locking (which is similar to what a database does in a transaction), I'm not sure that this is the right approach for what you're trying to do. Why are you using an array to start with?

I believe you can use the lock statement to make the code thread safe which is accessing the array and is shared between the threads.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.