Unexpected thread contention in Parallel.Foreach

Unexpected thread contention in Parallel.Foreach - c#

I have tried to implement the following algorithm using Parallel.Foreach. I thought it would be trivial to make parallel, since it has no synchronization issues. It is basically a Monte-Carlo tree search, where I explore every child in a parallel. The Monte-Carlo stuff is not really important, all you have to know is that I have a method which works a some tree, and which I call with Parallel.Foreach on the root children. Here is the snippet where the parallel call is being made.
public void ExpandParallel(int time, Func<TGame, TGame> gameFactory)
{
int start = Environment.TickCount;
// Creating all of root's children
while (root.AvailablePlays.Count > 0)
Expand(root, gameInstance);
// Create the children games
var games = root.Children.Select(c =>
{
var g = gameFactory(gameInstance);
c.Play.Apply(g.Board);
return g;
}).ToArray();
// Create a task to expand each child
Parallel.ForEach(root.Children, (tree, state, i) =>
{
var game = games[i];
// Make sure we don't waste time
while (Environment.TickCount - start < time && !tree.Completed)
Expand(tree, game);
});
// Update (reset) the root data
root.Wins = root.Children.Sum(c => c.Wins);
root.Plays = root.Children.Sum(c => c.Plays);
root.TotalPayoff = root.Children.Sum(c => c.TotalPayoff);
}
The Func<TGame, TGame> delegate is a cloning factory, so that each child has its own clone of the game state. I can explain the internals of the Expand method if required, but I can assure that it only accesses the state of the current sub-tree and game instances and there are no static members in any of those types. I thought it may be that Environment.TickCount is making the contention, but I ran an experiment just calling EnvironmentTickCount inside a Parallel.Foreach loop, and got nearly 100 % processor usage.
I only get 45% to 50% use on a Core i5.

This is a common symptom of GC thrashing. Without knowing more about what your doing inside of the Expand method, my best guess is this would be your root-cause. It's also possible that some shared data access is also the culprit, either by calling to a remote system, or by locking access to shared resources.
Before you do anything, You need to determine the exact cause with a profiler or other tool. Don't guess as this will just waste your time, and don't wait for an answer here as without your complete program it can not be answered. As you already know from experimentation, there is nothing in the Parallel.ForEach that would cause this.

Related

How to use Roslyn C# scripting in batch processing with several scripts?

I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:
Single-threaded Import engine
Multi-threaded client that invokes Import engine in threads.
In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project.
Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.
Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:
//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();
if (Script != "") //Here we have script fetched from DB as text
{
try
{
//We are creating script object …
ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
//… and we are compiling it upfront to save time since this might be invoked multiple times.
ScriptObject.Compile();
IsScriptCompiled = true;
}
catch
{
IsScriptCompiled = false;
}
}
Later we will invoke this script with:
async Task<string> RunScript()
{
return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}
So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined.
Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).
All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting.
If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.
Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.
Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.
Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.

As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.
Other solutions:
How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
How about using Generics? with enough T's to fit your needs:
public class ImportEngine<T1,T2,T3,T3,T5>
Using Tuples (which is pretty much like using generics)
But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:
Instead of simply (the script string):
DoSomeWork();
You can do this (IHaveWork is an interface defined in you app, with only one method Work):
public class ScriptWork : IHaveWork
{
Work()
{
DoSomeWork();
}
}
return new ScriptWork();
This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).
The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.
EDIT
Some quick benchmark:
This code:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\");";
List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
}
Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts).
It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.
But we can do better:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "return () => { System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\"); };";
List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
}
In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).
When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.

I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.
I will post it here just for the convenience:
var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();
for (int i = 0; i < 10; i++)
{
Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}
It takes some memory initially, but keep runner in some global list and invoke it later quickly.

Are Private properties of a class called within a Parallel.Foreach body Thread Safe?

I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?

There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.

Is it safe for property to update its elements every time its called?

I have a property, IList <IMyPlayer> Players {}, which syncs with the game server every time it is summoned. I need to know if it will update every increment when made limiting count in a for loop. The reason why is I'm worried a player may leave the game during this loop.
edit this is a single thread application.
public static IList <IMyPlayer> Players
{
get
{
playersField.Clear(); //GetPlayers() just adds without overwriting so list must be cleared every time.
if (Debugging == false)
{
MyAPIGateway.Multiplayer.Players.GetPlayers (playersField); //everytime the project needs to see all players, this will update. Little heavier on performance but its polymorphic.
}
return playersField.AsReadOnly();
}
}
for (int i = 0; i < AttendanceManager.Players.Count; i++)
{
if (AttendanceManager.Players[i].SteamUserId == MyAPIGateway.Multiplayer.MyId)
{
//do stuff
}
}

I can see several potential problems with your approach:
You add items while you are looping over it, but only loop until the original count is reached. So any added items are not accessed by the for loop
Your getter is doing more than a "normal" getter, which could mean performance problems if clients are not aware of that
using foreach would only call the getter once, which would behave differently that your for loop.
If you want to do this, I would instead make it a GetPlayers() function which makes it clearer that you are creating something as part of the method, and not just getting the current value of a property. If a client wants to reload the list each time they are stil lfree to do so, but it would be more obvious looking at the code.
For example:
for (int i = 0; i < AttendanceManager.GetPlayers().Count; i++)
{
if (AttendanceManager.GetPlayers()[i].SteamUserId == MyAPIGateway.Multiplayer.MyId)
look much more dodgy than a standard property getter.

I would most certainly not do this.
Every time you use your property it is going to call an API. That is going to be terrible for performance. Also it could be quite easy for your property to be called multiple times, even if you don't think it will be. One example I can think of is serialization, or if you are using this as an argument to say an MVC or Web API controller method.
This is what is commonly referred to as a side-effect, which is something you want to avoid in a getter at all costs.

deallocating memory for objects I haven't set to null

EDIT: Problem wasn't related to the question. It was indeed something wrong with my code, and actually, it was so simple that I don't want to put it on the internet. Thanks anyway.
I read in roughly 550k Active directory records and store them in a List, the class being a simple wrapper for an AD user. I then split the list of ADRecords into four lists, each containing a quarter of the total. After I do this, I read in about 400k records from a database, known as EDR records, into a DataTable. I take the four quarters of my list and spawn four threads, passing each one of the four quarters. I have to match the AD records to the EDR records using email right now, but we plan to add more things to match on later.
I have a foreach on the list of AD records, and inside of that, I have to run a for loop on the EDR records to check each one, because if an AD record matches more than one EDR record, then that isn't a direct match, and should not be treated as a direct match.
My problem, by the time I get to this foreach on the list, my ADRecords list only has about 130 records in it, but right after I pull them all in, I Console.WriteLine the count, and it's 544k.
I am starting to think that even though I haven't set the list to null to be collected later, C# or Windows or something is actually taking my list away to make room for the EDR records because I haven't used the list in a while. The database that I have to use to read EDR records is a linked server, so it takes about 10 minutes to read them all in, so my list is actually idle for 10 minutes, but it's never set to null.
Any ideas?
//splitting list and passing in values to threads.
List<ADRecord> adRecords = GetAllADRecords();
for (int i = 0; i < adRecords.Count/4; i++)
{
firstQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/4; i < adRecords.Count/2; i++)
{
secondQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/2; i < (adRecords.Count/4)*3; i++)
{
thirdQuarter.Add(adRecords[i]);
}
for (int i = (adRecords.Count/4)*3; i < adRecords.Count; i++)
{
fourthQuarter.Add(adRecords[i]);
}
DataTable edrRecordsTable = GetAllEDRRecords();
DataRow[] edrRecords = edrRecordsTable.Select("Email_Address is not null and Email_Address <> ''", "Email_Address");
Dictionary<string, int> letterPlaces = FindLetterPlaces(edrRecords);
Thread one = new Thread(delegate() { ProcessMatches(firstQuarter, edrRecords, letterPlaces); });
Thread two = new Thread(delegate() { ProcessMatches(secondQuarter, edrRecords, letterPlaces); });
Thread three = new Thread(delegate() { ProcessMatches(thirdQuarter, edrRecords, letterPlaces); });
Thread four = new Thread(delegate() { ProcessMatches(fourthQuarter, edrRecords, letterPlaces); });
one.Start();
two.Start();
three.Start();
four.Start();
In ProcessMatches, there is a foreach on the List of ADRecords passed in. The first line in the foreach is AdRecordsProcessed++; which is a global static int, and the program finishes with it at 130 instead of the 544k.

The variable is never set to null and is still in scope? If so, it shouldn't be collected and idle time isn't your problem.
First issue I see is:
AdRecordsProcessed++;
Are you locking that global variable before updating it? If not, and depending on how fast the records are processed, it's going to be lower than you expect.
Try running it from a single thread (i.e. pass in adRecords instead of firstQuarter and don't start the other threads.) Does it work as expected with 1 thread?

Firstly, you don't set a list to null. What you might do is set every reference to a list to null (or to another list), or all such references might simply fall out of scope. This may seem like a nitpick point, but if you are having to examine what is happening to your data it's time to be nitpicky on such things.
Secondly, getting the GC to deallocate something that has a live reference is pretty hard to do. You can fake it with a WeakReference<> or think you've found it when you hit a bug in a finaliser (because the reference isn't actually live, and even then its a matter of the finaliser trying to deal with a finalised rather than deallocated object). Bugs can happen everywhere, but that you've found a way to make the GC deallocate something that is live is highly unlikely.
The GC will be likely do two things with your list:
It is quite likely to compact the memory used by it, which will move its component items around.
It is quite likely to promote it to a higher generation.
Neither of these are going to have any changes you will detect unless you actually look for them (obviously you'll notice a change in generation if you keep calling GetGeneration(), but aside from that you aren't really going to).
The memory used could also be paged out, but it will be paged back in when you go to use the objects. Again, no effect you will notice.
Finally, if the GC did deallocate something, you wouldn't have a reduced number of items, you'd have a crash, because if objects just got deallocated the system will still try to use the supposedly live references to them.
So, while the GC or the OS may do something to make room for your other object, it isn't something observable in code, and it does not stop the object from being available and in the same programmatic state.
Something else is the problem.

Is there a reason you have to get all the data all at once? If you break the data up into chunks it should be more manageable. All I know is having to get into GC stuff is a little smelly. Best to look at refactoring your code.

The garbage collector will not collect:
A global variable
Objects managed by static objects
A local variable
A variable referencable by any method on the call stack
So if you can reference it from your code, there is no possibility that the garbage collector collected it. No way, no how.
In order for the collector to collect it, all references to it must have gone away. And if you can see it, that's most definitely not the case.

Threaded code execution time rises slowly. How to determine the culprit?

I have some code in a thread. The code's main function is to call other methods, which write stuff to a SQL database like this:
private void doWriteToDb()
{
while (true)
{
try
{
if (q.Count == 0) qFlag.WaitOne();
PFDbItem dbItem = null;
lock (qLock)
{
dbItem = q.Dequeue();
}
if (dbItem != null)
{
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
//write it off
PFResult result = dbItem.Result;
double frequency = dbItem.Frequency;
int i = dbItem.InChannel;
int j = dbItem.OutChannel;
long detail, param, reading, res;
detail = PFCoreMethods.AddNewTestDetail(DbTables, _dicTestHeaders[result.Name.ToString().ToLower()]);
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Frequency");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, frequency.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "In channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, i.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Out channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, j.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Spec");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, result.Spec);
dbItem.Dispose();
dqcnt++;
sw.Stop();
}
}
catch (Exception ex)
{
}
}
}
The AddNewTestParameter method is using a 3rd party class which has the SQL code. Currently I have no access to its internals.
DbTables is a collection object, whose properties are table objectscreated by the 3rd party program. There is only one DbTable object which the program uses.
The problem is that as time passes (couple of hours) the AddNewTestParameter method call takes longer and longer, starting from about 10ms to about 1sec.
The q, is a queue with objects that contain the necessary information to write into the database. The items are added to this queue by the main thread. THe thread simply takes them out, writes them, and disposes of them. The q.Count is no more than 1, although in time as the database writes become slower, the q.Count rises since dequeueing cannot catch up. At its worst, the q.Count was over 30,000. I write over 150,000 entries to the database in total.
On the SQL end, I ran some traces on the server, and the trace shows that internally SQL always takes about 10ms, even during the time the C# code itself takes 1sec.
So, currently, I have 2 suspicions:
My code is the problem. The thread is low-priority, perhaps this might affect performance. Also, after watching the memory usage for 20 minutes, I see that it rises at about 100K/min, CPU usage seems constant around %2-5. How can I figure out where the memory leak happens? Can I pinpoint it to a specific part of the code?
The 3rd party code is the problem. How could I go about proving this? What methods are there to watch and confirm that the problem lies in 3rd party code?

Anyway, if I had to make a suggestion I would look at DBTables ... if that's a collection maybe you're forgetting to reset it so everytime you call it it has one more element... so after a while the 3rd party routine that's O(n^2), or something like that, starts to degrade because it's expecting a worst case scenario of 20 tables and you're providing 1000.
Edit: Ok, I would discard the problem being in the queue as dequeuing should be a really fast operation (you can measure it anyway). It still points to the DBTables collection growing bigger and bigger, have you check its size after the first x iterations?
Edit2: Ok, another approach, let's say the AddNewTestParameter does exactly what is says it does... ADD a new parameter that then gets added to an internal collection. Now, there are two options, if that's the case, either you're supposed to clear that collection by calling the "ClearParameters" function after each iteration and then it will be your fault, or you have not such functionality and then it's 3rd code fault. That would explain also your memory loses (altough that can also be related to the growing queue)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.