I'm using a c# thread in which I want to take a snapshot of 28 dictionaries (stored in an array for easy access) that are being populated by event handler methods, in the thread I'm looping over the array of dictionaries and storing the snapshots using ToList() method in another array of Lists, so that I can do some work in the thread afterwards even though the collections keep changing in real-time. The problem is that I want the snapshots of all dictionaries to be as close in time as possible, so I don't want my thread to be interrupted while I'm looping over the array. The code looks something like this:
public void ThreadProc_Scoring()
{
List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[] SnapBOUGHT = new List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[27];
List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[] SnapSOLD = new List<KeyValuePair<KeyValuePair<DateTime, double>, double>>[27];
int index;
int NormIdx;
DateTime actual = new DateTime();
double score = 0;
double totAScore = 0;
double totBScore = 0;
double AskNormalization = 1;
double BidNormalization = 1;
while (true)
{
Thread.Sleep(15000);
//Get a snapshot of BOUGHT/SOLD collections
//Enter un-interruptible section
foreach (var symbol in MySymbols)
{
index = symbol.Value;
SnapBOUGHT[index] = BOUGHT[index].ToList();
SnapSOLD[index] = SOLD[index].ToList();
}
//Exit un-interruptible section
//Do some work on Snapshots
}
If you have any other suggestions on how to approach the situation, I'd be grateful for the help. Thank you!
You are looking for a critical section, i.e. a lock. used like
lock(myLockObject){
// do uninteruptable work
}
where myLockObject is a object that is shared by all threads that use the resources. Any thread that uses the dictionaries will need to take this lock using the same object. You might be able to use a readwritelock to reduce contention, but that would require more details about your specific use case. If you are using regular dictionaries you will need some kind of locking anyway since dictionaries are not threadsafe, this can be avoided by using concurrentDictionaries.
If you do not depend on the dictionary snapshot being taken at the same time you can simply use concurrentDictionaries and run your loop without locking, and deal with the consequences.
The idea of "uninteruptable" does not really exist in c#, it is the OS that controls all scheduling of threads. Processors have a feature to disable interrupts that have sort of that effect, but this is used for system level programming and not for regular client software.
Assuming that you are targeting a multitasking operating system (like Windows, Linux, Android, macOS, iOS, Solaris etc), the closest you can get to an un-interruptible section is to run this section on a thread configured with the maximum priority:
var thread = new Thread(GetTheSnapshots);
thread.Priority = ThreadPriority.Highest;
thread.Start();
You should not expect it to be consistently and deterministically un-interruptible though, since the operating system is not obligated to schedule the thread on a time-slice of infinite duration.
Related
I am working on a project where individual regions of a map are either generated dynamically, or loaded from a file if it has already been generated and saved. Regions are only loaded/generated as needed, and saved and discarded when they aren't anymore.
There are several different tasks that will be using one or more regions of this map for various purposes. For instance, one of these tasks will be to draw all currently visible regions (about 9 at any given time). Another is to get information about, or even modify regions.
The problem is that these tasks may or may not be working with the same regions as other tasks.
Since these regions are rather large, and are costly to generate, it would be problematic (for these and other reasons) to use different copies for each task.
Rather, I think it would be a good idea to create and manage a pool of currently loaded regions. New tasks will first check the pool for their reqired region. They can then use it if it exists, or else create a new one and add it to the pool.
Provided that works, how would I manage this pool? How would I determine if a region is no longer needed by any tasks and can be safely discarded? Am I being silly and overcomplicating this?
I am using c# if that matters to anyone.
Edit:
Now that I'm more awake, would it be as simple as incrementing a counter in each region for each place it's used? then discarding it when the counter reaches 0?
Provided that works, how would I manage this pool? How would I determine if a region is no longer needed by any tasks and can be safely discarded?
A simple way of doing this can be to use weak references:
public class RegionStore
{
// I'm using int as the identifier for a region.
// Obviously this must be some type that can serve as
// an ID according to your application's logic.
private Dictionary<int, WeakReference<Region>> _store = new Dictionary<int, WeakReference<Region>>();
private const int TrimThreshold = 1000; // Profile to find good value here.
private int _addCount = 0;
public bool TryGetRegion(int id, out Region region)
{
WeakReference<Region> wr;
if(!_store.TryGetValue(id, out wr))
return false;
if(wr.TryGetTarget(out region))
return true;
// Clean up space in dictionary.
_store.Remove(id);
return false;
}
public void AddRegion(int id, Region region)
{
if(++_addCount >= TrimThreshold)
Trim();
_store[id] = new WeakReference<Region>(region);
}
public void Remove(int id)
{
_store.Remove(id);
}
private void Trim()
{
// Remove dead keys.
// Profile to test if this is really necessary.
// If you were fully implementing this, rather than delegating to Dictionary,
// you'd likely see if this helped prior to an internal resize.
_addCount = 0;
var keys = _store.Keys.ToList();
Region region;
foreach(int key in keys)
if(!_store[key].TryGetTarget(out wr))
_store.Remove(key);
}
}
Now you've a store of your Region objects, but that store doesn't prevent them being garbage collected if no other references to them exist.
Certain task will be modifying regions. In this case I will likely raise an "update" flag in the region object, and from there update all other tasks using it.
Do note that this will be a definite potential source of bugs in the application as a whole. Mutability complicates any sort of caching. If you can move to a immutable model, it will likely simplify things, but then uses of outdated objects brings its own complications.
ok, i don't know how you have your app designed, but i sugest you to have a look at this
You can also use static to share you variable with other tasks but then you may want to use block variables to prevent you to write or read from that variable while other processes are using it. (here)
I want to limit the size of the BlockingCollection. If I want to add another item and the collection is full, the oldest must be removed. Is there some Class specific to this task or my solution is ok?
BlockingCollection<string> collection = new BlockingCollection<string>(10);
string newString = "";
//Not an elegant solution?
if (collection.Count == collection.BoundedCapacity)
{
string dummy;
collection.TryTake(out dummy);
}
collection.Add(newString);
EDIT1: Similar question here: ThreadSafe FIFO List with Automatic Size Limit Management
What you are describing is a LRU cache. There is no implementation that I know of in the standard libraries but would not be hard to create. Look at this c++ implementation for some clues.
Edit
Try this one from code project
Your solution will function correctly, but it is not thread safe. BlockingCollection<T> does not provide a mechanism to handle this directly.
Your solution may still block (if another thread calls Add() after your TryTake) or potentially remove an extra item (if another thread removes while you're also removing).
I have tried to implement the following algorithm using Parallel.Foreach. I thought it would be trivial to make parallel, since it has no synchronization issues. It is basically a Monte-Carlo tree search, where I explore every child in a parallel. The Monte-Carlo stuff is not really important, all you have to know is that I have a method which works a some tree, and which I call with Parallel.Foreach on the root children. Here is the snippet where the parallel call is being made.
public void ExpandParallel(int time, Func<TGame, TGame> gameFactory)
{
int start = Environment.TickCount;
// Creating all of root's children
while (root.AvailablePlays.Count > 0)
Expand(root, gameInstance);
// Create the children games
var games = root.Children.Select(c =>
{
var g = gameFactory(gameInstance);
c.Play.Apply(g.Board);
return g;
}).ToArray();
// Create a task to expand each child
Parallel.ForEach(root.Children, (tree, state, i) =>
{
var game = games[i];
// Make sure we don't waste time
while (Environment.TickCount - start < time && !tree.Completed)
Expand(tree, game);
});
// Update (reset) the root data
root.Wins = root.Children.Sum(c => c.Wins);
root.Plays = root.Children.Sum(c => c.Plays);
root.TotalPayoff = root.Children.Sum(c => c.TotalPayoff);
}
The Func<TGame, TGame> delegate is a cloning factory, so that each child has its own clone of the game state. I can explain the internals of the Expand method if required, but I can assure that it only accesses the state of the current sub-tree and game instances and there are no static members in any of those types. I thought it may be that Environment.TickCount is making the contention, but I ran an experiment just calling EnvironmentTickCount inside a Parallel.Foreach loop, and got nearly 100 % processor usage.
I only get 45% to 50% use on a Core i5.
This is a common symptom of GC thrashing. Without knowing more about what your doing inside of the Expand method, my best guess is this would be your root-cause. It's also possible that some shared data access is also the culprit, either by calling to a remote system, or by locking access to shared resources.
Before you do anything, You need to determine the exact cause with a profiler or other tool. Don't guess as this will just waste your time, and don't wait for an answer here as without your complete program it can not be answered. As you already know from experimentation, there is nothing in the Parallel.ForEach that would cause this.
At school we started multithreading last week and we already are on multiprocessing now, I am getting a little lost on it so I am going to explain the problem to you.
For an exercise we must make a casino game simulator which simulates 10000 games so that we can know how often the casino wins the game.
So I coded the simulator, and I've got 5 methods to run a game :
static void game(Croupier croupier)
{
croupier.createNewCardDeck();
croupier.shuffleCards();
croupier.giveCardsToPlayers();
croupier.countPlayerPoints();
croupier.displayResults();
}
If I call game in a classic for loop of 10000 iterations it runs ok, takes approx 2 seconds, and the bank wins 50% of times.
If I use Parallel.For, it crashes on shuffleCards, because (I think) multiple processes are trying to edit the same pack of cards at the same time.
My first idea was to put a Mutex on my shuffleCards, but it would slow down the simulation, when the point of using Parallel programming was to increase speed. So I thought to separate data on the different processes ( so that instead of 10000 iterations, I do 2500 on 4 processes, every loop having its own croupier, players, cards etc...)
What do you think would be the best way of resolving this problem ? Have you got any simple tutorial that explains how to deal with parallel work which uses the same data ? Which solution would you choose ?
Thanks
Edit : ShuffleCard method
List<Card> randomList = new List<Card>();
Random r = new Random();
int randomIndex = 0;
while (_cards.Count > 0)
{
randomIndex = r.Next(0, _cards.Count); //Choose a random object in the list
randomList.Add(_cards[randomIndex]); //add it to the new, random list
_cards.RemoveAt(randomIndex); //remove to avoid duplicates
}
return randomList;
So yes _cards being a private property of croupier (which calls this._cards = shuffleCards() , every process have the same card list
Your idea is the way to go: Give each "processing unit" (i.e. thread, task) its own game table (croupier, players, cards). Just like in a real casino you can have as many game tables as you want, playing all at the same time, indepently from each other because they do not share any data. Whenever a game is finished, the results is transferred to the bank (of which you have only one). So the only thing that must be synchronized (with a criticial section) is the aggregation of the results into the bank.
This example is the perfect trivial example for parallel programming, because the real world can rather intuitively be modeled into the corresponding classes and algorithms.
Either give each thread it's own set of collections or implement a concurrent collection which implements its own thread locking.
so that instead of 10000 iterations, I do 2500 on 4 processes
Can use Task and Data Parallelism together, For eg,
int noOfProcess = 4;
Task[] t = new Task[noOfProcess];
for (int i = 0; i < noOfProcess; i++)
{
t[i]= Task.Factory.StartNew(() =>
{
Parallel.For(0, 2500, (v) => game(..));
});
}
Task.WaitAll(t); //If Synchronous is needed.
Avoid Writing to Shared Memory Locations.Check this msdn for some pitfalls while using Parallel Programming.
Earlier answers are indeed right, the goal is to split up the work in order to run it as quickly as possible. In short: design your code to be concurrent, so it can run in parallel.
In each run, we have:
A croupier
A game being played
Only one game per Croupier at one time.
The results of the game
The two main decisions regarding concurrency we need to make is
How do we share or distribute Croupiers amongst games?
How do results get shared?
Here are the different options for the Croupier:
If we have only one Croupier, we will end up with no concurrency. Only one game can be played at one time.
We could have one Croupier per game. That way we could theoretically run each game simultaneously.
We could have one Croupier per processing unit. This would allow as many games to run as possible, but this would may not balance as well as one Croupier per game, as there might be other factors outside of raw processing. Imagine if writing the results was a lengthy IO operation, but not CPU intensive. We could be running more games, but instead the Croupier is waiting for results to finish.
For the results, we could:
Output results to some stream as we received them. If this is a console, output could easily become garbled.
Output results to some consumer that processes the results in turn. This is the better option as it allows the result state to be returned without effecting games.
So the overall decision making should always be around: how can I make my code as concurrency friendly as possible, to allow the hosting system to run things as best it can.
For the example below, I have opted for a Croupier per processing unit, not because it is better, but because it was not illustrated in the other answers.
Here is some sample code illustrating some of these ideas:
void Main()
{
const int NUMBER_OF_GAMES = 10000;
// this is how we have a Croupier per thread.
var threadLocalCroupier = new ThreadLocal<Croupier>(() => new Croupier());
var results = from gameNumber in Enumerable.Range(0, NUMBER_OF_GAMES).AsParallel()
let croupier = threadLocalCroupier.Value
select game(croupier, gameNumber);
foreach (var result in results) {
Console.WriteLine("Game done {0}", result.GameNumber);
// display or analyse results.
}
}
static ResultOfGame game(Croupier croupier, int gameNumber)
{
croupier.createNewCardDeck(gameNumber);
croupier.shuffleCards();
croupier.giveCardsToPlayers();
croupier.countPlayerPoints();
var results = croupier.getResults();
return results;
}
class ResultOfGame {
public int GameNumber { get; private set; }
public ResultOfGame(int gameNumber)
{
this.GameNumber = gameNumber;
}
}
// Define other methods and classes here
class Croupier {
private int currentGame;
public void createNewCardDeck(int gameNumber) {this.currentGame = gameNumber;}
public void shuffleCards() {}
public void giveCardsToPlayers() {}
public void countPlayerPoints() {}
public ResultOfGame getResults() {
return new ResultOfGame(this.currentGame);
}
}
foreach (int tranQuote in transactionIds)
{
CompassIntegration compass = new CompassIntegration();
Chatham.Business.Objects.Transaction tran = compass.GetTransaction(tranQuote);
// then we want to send each trade through the PandaIntegration
// class with either buildSchedule, fillRates, calcPayments or
// a subset of the three
PandaIntegrationOperationsWrapper wrapper = new PandaIntegrationOperationsWrapper { buildSchedule = false, calcPayments = true, fillRates = true };
new PandaIntegration().RecalculateSchedule(tran, wrapper);
// then we call to save the transaction through the BO
compass.SaveTransaction(tran);
}
Two lines here are taking a very long time. There's about 18k records in transactionIds that I do this for.
The GetTransaction and SaveTransaction are the two lines that take the most time, but I'd honestly just like to thread out what happens inside the loop to improve performance.
What's the best way to thread this out without running into any issues with the CPU or anything like that? I'm not really sure how many threads are safe or how to thread manage or stuff like that.
Thanks guys.
The TPL will provide the necessary throttling and managing.
//foreach (int tranQuote in transactionIds) { ... }
Parallel.ForEach(transactionIds, tranQuote => { ... } );
It does require Fx4 or later, and all the code inside the loop has to be thread-safe.
It's not clear if your GetTransaction and SaveTransaction are safe to be called concurrently.
What do GetTransaction and SaveTransaction actually do that makes them slow? If it's work on the CPU then threading could certainly help. If the slowness comes from making database queries or doing file I/O, threading isn't going to do anything to make your disk or database faster. It might actually slow things down because now you have multiple simultaneous requests going to a resource that is already constrained.
You can use Parallel foreach if order doesn't matter to improve the overall performance