Parallel.For becomes single threaded near the end

Parallel.For becomes single threaded near the end - c#

There is a problem I'm currently experiencing with Parallel.For.
I'm trying to run a batch of ~1000 of pretty long running tasks( each running for like 120 seconds), and my problem seems to be, that on the last 10-20 iterations, the processing becomes single-threaded.
Pseudocode:
Parallel.For(0, 1000,
(i, loopState)=>
{
//Do lots of work
Thread.Sleep(120_000);
});
Do you guys know why this is happening, and how can this issue be fixed?

I think I figured out the problem, or at least found a solution that makes it go away.
Sadly the example code in my question was not 100% representative of my code, in my code I had used the overload with localInit and localFinally. I thought it didn't matter but It seems it does.
My guess is that in doing so, the internal implementation partitioned up my loop iterations and statically assigned each partition to a separate thread. Since each element took wildly different amount of time to process, it resulted one partition taking a lot longer than the others, hence the single-threading.
The fix was to use the overload that is referenced in my question, which doesn't seem to suffer from this problem.

Related

C# - Is the amount of objects in memory affecting performance of local processing?

I am very confused by what I am seeing in my program.
Let's say we have a list of two large objects (loaded from 2 external files).
Then I iterate over each object and for each one I call a method, which performs a bunch of processing.
Just to illustrate:
foreach (var object in objects)
{
object.DoSomething();
}
In first case, objects contains 2 items. It completes very fast, I track the progress of each object individually and the processing for each one is very fast.
Then I run the program again, this time adding some more input files, so instead of 2, I'd have let's say 6 objects.
So the code runs again, and the 2 objects from before are still there, along with some more, but for some odd reason, now each processing (each call to object.DoSomething()) takes much longer than before.
Let's say scenario 1 with 2 objects, objectA.Dosomething() takes 1
minute to complete.
Let's say scenario 2, with 6 objects, same objectA.Dosomething()
as in scenario 1 now takes 5 minutes to complete.
The more objects I have in my list, the longer each processing for each individual object takes.
How is that possible? How can the performance of an individual processing for a specific, independent object, be affected so much by objects in the memory? How can, in scenario 1 and 2 above, the exact same processing on the exact same data take a significantly different amount of time to complete?
Also, please note that processing is slower from the start, it does not start fast on first object and then slows down progressively, it's just consistently slowed down proportionally to the amount of objects to process. I have some multi-threading in there, and I can see the rate at which threads complete drops dramatically when I start adding more objects. The multi-threading happens inside of "DoSomething()" and it will not leave untill all threads have completed. However, I don't think this issue is related to multi-threading. Actually, I added multi-threading because of the slowness.
Also please note that initially I was merging all input files into one huge object and one single call to DoSomething(), and I broke it down thinking it would help performance.
Is this a "normal" behavior and if so, what are the ways around this? I can think of other ways to process the data, but I still don't get this behavior and there has to be something I can do to get the intended result here.
Edit 1:
Each object in the "objects" list above also contains a list (queue) of smaller objects, around 5000 of those each. I am starting to believe my issue might be that, and that I should use structs or something similar instead of having so many nested objects. Would that explain the type of behavior I am describing above?

As stated in the comments, my question was too abstract for any precise answer to be given. I mostly wanted some pointers and to know if somehow I might have hit some internal limit.
It turned out I was overlooking a separate mechanism I have for logging results internally and producing reports. I built that part of the system really quickly and it was ridiculously inefficient and growing way too fast. Limiting the size of the internal structures, limiting the amount of retrievals from big collections and breaking down the processing in smaller chunks did the trick.
Just to illustrate, something that was taking over 6 hours is now taking 1 minute. Shame on me. Cleaner solution would be to use a database, but at least it seems I will be getting away with this one for now.

Inefficient Parallel.For?

I'm using a parallel for loop in my code to run a long running process on a large number of entities (12,000).
The process parses a string, goes through a number of input files (I've read that given the number of IO based things the benefits of threading could be questionable, but it seems to have sped things up elsewhere) and outputs a matched result.
Initially, the process goes quite quickly - however it ends up slowing to a crawl. It's possible that it's just hit a number of particularly tricky input data, but this seems unlikely looking closer at things.
Within the loop, I added some debug code that prints "Started Processing: " and "Finished Processing: " when it begins/ends an iteration and then wrote a program that pairs a start and a finish, initially in order to find which ID was causing a crash.
However, looking at the number of unmatched ID's, it looks like the program is processing in excess of 400 different entities at once. This seems like, with the large number of IO, it could be the source of the issue.
So my question(s) is(are) this(these):
Am I interpreting the unmatched ID's properly, or is there some clever stuff going behind the scenes I'm missing, or even something obvious?
If you'd agree what I've spotted is correct, how can I limit the number it spins off and does at once?
I realise this is perhaps a somewhat unorthodox question and may be tricky to answer given there is no code, but any help is appreciated and if there's any more info you'd like, let me know in the comments.

Without seeing some code, I can guess at the answers to your questions:
Unmatched IDs indicate to me that the thread that is processing that data is being de-prioritized. This could be due to IO or the thread pool trying to optimize, however it seems like if you are strongly IO bound then that is most likely your issue.
I would take a look at Parallel.For, specifically using ParallelOptions.MaxDegreesOfParallelism to limit the maximum number of tasks to a reasonable number. I would suggest trial and error to determine the optimum number of degrees, starting around the number of processor cores you have.
Good luck!

Let me start by confirming that is indeed a very bad idea to read 2 files at the same time from a hard drive (at least until the majority of HDs out there are SSDs), let alone whichever number your whole thing is using.
The use of parallelism serves to optimize processing using an actually paralellizable resource, which is the CPU power. If you paralellized process reads from a hard drive then you're losing most of the benefit.
And even then, even the CPU power is not prone to infinite paralellization. A normal desktop CPU has the capacity to run up to 10 threads at the same time (depends of the model obviously, but that's the order of magnitude).
So two things
first, I am going to make the assumption that your entities use all your files, but your files are not too big to be loaded into memory. If it's the case, you should read your files into objects (i.e. into memory), then paralellize the processing of your entities using those objects. If not, you're basically relying on your hard drive's cache to not reread your files every time you need them, and your hard drive's cache is far smaller than your memory (1000-fold).
second, you shouldn't be running Parallel.For on 12.000 items. Parallel.For will actually (try to) create 12.000 threads, and that is actually worse than 10 threads, because of the big overhead that paralellizing will create, and the fact your CPU will not benefit from it at all since it cannot run more than 10 threads at a time.
You should probably use a more efficient method, which is the IEnumerable<T>.AsParallel() extension (comes with .net 4.0). This one will, at runtime, determine what is the optimal thread number to run, then divide your enumerable into as many batches. Basically, it does the job for you - but it creates a big overhead too, so it's only useful if the processing of one element is actually costly for the CPU.
From my experience, using anything parallel should always be evaluated against not using it in real-life, i.e. by actually profiling your application. Don't assume it's going to work better.

Garbage collection worst case performance on Mono

Hi i want to find out the longest time needed to call a method or to create an object.
I thought of something like calling GC.Collect() before creating the object or during the method or calling some destructors.
Has anyone some hints or ideas for finding out some (or the) worst case scenarios?
best regards

See this thread. Put the answer code in a loop and record lowest & highest time that come in. But that may not be actually that intersting. Here is better performance meassaring:
Run the code in you actual application to get a feel of a real life scenario
or
Run a test on say 100000 calls to whatever method you want to
test then you can take an average of the call times which should give you a better indication if your method is slow or not

Variation in execution time

I've been profiling a method using the stopwatch class, which is sub-millisecond accurate. The method runs thousands of times, on multiple threads.
I've discovered that most calls (90%+) take 0.1ms, which is acceptable. Occasionally, however, I find that the method takes several orders of magnitude longer, so that the average time for the call is actually more like 3-4ms.
What could be causing this?
The method itself is run from a delegate, and is essentially an event handler.
There are not many possible execution paths, and I've not yet discovered a path that would be conspicuously complicated.
I'm suspecting garbage collection, but I don't know how to detect whether it has occurred.
Finally, I am also considering whether the logging method itself is causing the problem. (The logger is basically a call to a static class + event listener that writes to the console.)

Just because Stopwatch has a high accuracy doesn't mean that other things can't get in the way - like the OS interrupting that thread to do something else. Garbage collection is another possibility. Writing to the console could easily cause delays like that.
Are you actually interested in individual call times, or is it overall performance which is important? It's generally more useful to run a method thousands of times and look at the total time - that's much more indicative of overall performance than individual calls which can be affected by any number of things on the computer.

As I commented, you really should at least describe what your method does, if you're not willing to post some code (which would be best).
That said, one way you can tell if garbage collection has occurred (from Windows):
Run perfmon (Start->Run->perfmon)
Right-click on the graph; select "Add Counters..."
Under "Performance object", select ".NET CLR Memory"
From there you can select # Gen 0, 1, and 2 collections and click "Add"
Now on the graph you will see a graph of all .NET CLR garbage collections
Just keep this graph open while you run your application
EDIT: If you want to know if a collection occurred during a specific execution, why not do this?
int initialGen0Collections = GC.CollectionCount(0);
int initialGen1Collections = GC.CollectionCount(1);
int initialGen2Collections = GC.CollectionCount(2);
// run your method
if (GC.CollectionCount(0) > initialGen0Collections)
// gen 0 collection occurred
if (GC.CollectionCount(1) > initialGen1Collections)
// gen 1 collection occurred
if (GC.CollectionCount(2) > initialGen2Collections)
// gen 2 collection occurred
SECOND EDIT: A couple of points on how to reduce garbage collections within your method:
You mentioned in a comment that your method adds the object passed in to "a big collection." Depending on the type you use for said big collection, it may be possible to reduce garbage collections. For instance, if you use a List<T>, then there are two possibilities:
a. If you know in advance how many objects you'll be processing, you should set the list's capacity upon construction:
List<T> bigCollection = new List<T>(numObjects);
b. If you don't know how many objects you'll be processing, consider using something like a LinkedList<T> instead of a List<T>. The reason for this is that a List<T> automatically resizes itself whenever a new item is added beyond its current capacity; this results in a leftover array that (eventually) will need to be garbage collected. A LinkedList<T> does not use an array internally (it uses LinkedListNode<T> objects), so it will not result in this garbage collection.
If you are creating objects within your method (i.e., somewhere in your method you have one or more lines like Thing myThing = new Thing();), consider using a resource pool to eliminate the need for constantly constructing objects and thereby allocating more heap memory. If you need to know more about resource pooling, check out the Wikipedia article on Object Pools and the MSDN documentation on the ConcurrentBag<T> class, which includes a sample implementation of an ObjectPool<T>.

That can depend on many things and you really have to figure out which one you are delaing with.
I'm not terribly familiar with what triggers garbage collection and what thread it runs on, but that sounds like a possibility.
My first thought around this is with paging. If this is the first time the method runs and the application needs to page in some code to run the method, it would be waiting on that. Or, it could be the data that you're using within the method that triggered a cache miss and now you have to wait for that.
Maybe you're doing an allocation and the allocator did some extra reshuffling in order to get you the allocation you requested.
Not sure how thread time is calculated with Stopwatch, but a context switch might be what you're seeing.
Or...it could be something completely different...
Basically, it could be one of several things and you really have to look at the code itself to see what is causing your occasional slow-down.

It could well be GC. If you use a profiler application such as Redgate's ANTS profiler you can profile % time in GC along side your application's performance to see what's going on.
In addition, you can use the CLRProfiler...
https://github.com/MicrosoftArchive/clrprofiler
Finally, Windows Performance Monitor will show the % time in GC for a given running applicaiton too.
These tools will help you get a holistic view of what's going on in your app as well as the OS in general.
I'm sure you know this stuff already but microbenchmarking such as this is sometimes useful for determining how fast one line of code might be compared to another than you might write, but you generally want to profile your application under typical load too.
Knowing that a given line of code is 10 times faster than another is useful, but if that line of code is easier to read and not part of a tight loop then the 10x performance hit might not be a problem.

What you need is a performance profile to tell you exactly what causes a slow down. Here is a quick list And of course here is the ANTS profiler.
Without knowing what your operation is doing, it sounds like it could be the garbage collection. However that might not be the only reason. If you are reading or writing to the disc it is possible your application might have to wait while something else is using the disk.
Timing issues may occur if you have a multi-threaded application and another thread could be taking some processor time that is only running 10 % of the time. This is why a profiler would help.

If you're only running the code "thousands" of times on a pretty quick function, the occasional longer time could easily be due to transient events on the system (maybe Windows decided it was time to cache something).
That being said, I would suggest the following:
Run the function many many more times, and take an average.
In the code that uses the function, determine if the function in question actually is a bottleneck. Use a profiler for this.

It can be dependent on your OS, environment, page reads, CPU ticks per second and so on.
The most realistic way is to run an execution path several thousand times and take the average.
However, if that logging class is only called occasionally and it logs to disk, that is quite likely to be a slow-down factor if it has to seek on the drive first.
A read of http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 may give you an insight into more techniques for determining slowdowns in your applications, while a list of profiling tools that you may find useful is at:
http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler
specifically http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler if you're doing c# stuff.
Hope that helps!

Windows Service Increasing CPU Consumption

At my job, I have a clutch of six Windows services that I am responsible for, written in C# 2003. Each of these services contain a timer that fires every minute or so, where the majority of their work happens.
My problem is that, as these services run, they start to consume more and more CPU time through each iteration of the loop, even if there is no meaningful work for them to do (ie, they're just idling, looking through the database for something to do). When they start up, each service uses an average of (about) 2-3% of 4 CPUs, which is fine. After 24 hours, each service will be consuming an entire processor for the duration of its loop's run.
Can anyone help? I'm at a loss as to what could be causing this. Our current solution is to restart the services once a day (they shut themselves down, then a script sees that they're offline and restarts them at about 3AM). But this is not a long term solution; my concern is that as the services get busier, restarting them once a day may not be sufficient... but as there's a significant startup penalty (they all use NHibernate for data access), as they get busier, exactly what we don't want to be doing is restarting them more frequently.
#akmad: True, it is very difficult.
Yes, a service run in isolation will show the same symptom over time.
No, it doesn't. We've looked at that. This can happen at 10AM or 6PM or in the middle of the night. There's no consistency.
We do; and they are. The services are doing exactly what they should be, and nothing else.
Unfortunately, that requires foreknowledge of exactly when the services are going to be maxing out CPUs, which happens on an unpredictable schedule, and never very quickly... which makes things doubly difficult, because my boss will run and restart them when they start having problems without thinking of debug issues.
No, they're using a fairly consistent amount of RAM (approx. 60-80MB each, out of 4GB on the machine).
Good suggestions, but rest assured, we have tried all of the usual troubleshooting. What I'm hoping is that this is a .NET issue that someone might know about, that we can work on solving. My boss' solution (which I emphatically don't want to implement) is to put a field in the database which holds multiple times for the services to restart during the day, so that he can make the problem go away and not think about it. I'm desperately seeking the cause of the real problem so that I can fix it, because that solution will become a disaster in about six months.
#Yaakov Ellis: They each have a different function. One reads records out of an Oracle database somewhere offsite; another one processes those records and transfers files belonging to those records over to our system; a third checks those files to make sure they're what we expect them to be; another is a maintenance service that constantly checks things like disk space (that we have enough) and polls other servers to make sure they're alive; one is running only to make sure all of these other ones are running and doing their jobs, monitors and reports errors, and restarts anything that's failed to keep the whole system going 24 hours a day.
So, if you're asking what I think you're asking, no, there isn't one common thing that all these services do (other than database access via NHibernate) that I can point to as a potential problem. Unfortunately, if that turns out to be the actual issue (which wouldn't surprise me greatly), the whole thing might be screwed -- and I'll end up rewriting all of them in simple SQL. I'm hoping it's a garbage collector problem or something easier to deal with than NHibernate.
#Joshdan: No secret. As I said, we've tried all the usual troubleshooting. Profiling was unhelpful: the profiler we use was unable to point to any code that was actually executing when the CPU usage was high. These services were torn apart about a month ago looking for this problem. Every section of code was analyzed to attempt to figure out if our code was the issue; I'm not here asking because I haven't done my homework. Were this a simple case of the services doing more work than anticipated, that's something that would have been caught.
The problem here is that, most of the time, the services are not doing anything at all, yet still manage to consume 25% or more of four CPU cores: they're finding no work to do, and exiting their loop and waiting for the next iteration. This should, quite literally, take almost no CPU time at all.
Here's a example of behaviour we're seeing, on a service with no work to do for two days (in an unchanging environment). This was captured last week:
Day 1, 8AM: Avg. CPU usage approx 3%
Day 1, 6PM: Avg. CPU usage approx 8%
Day 2, 7AM: Avg. CPU usage approx 20%
Day 2, 11AM: Avg. CPU usage approx 30%
Having looked at all of the possible mundane reasons for this, I've asked this question here because I figured (rightly, as it turns out) that I'd get more innovative answers (like Ubiguchi's), or pointers to things I hadn't thought of (like Ian's suggestion).
So does the CPU spike happen
immediately preceding the timer
callback, within the timer callback,
or immediately following the timer
callback?
You misunderstand. This is not a spike. If it were, there would be no problem; I can deal with spikes. But it's not... the CPU usage is going up generally. Even when the service is doing nothing, waiting for the next timer hit. When the service starts up, things are nice and calm, and the graph looks like what you'd expect... generally, 0% usage, with spikes to 10% as NHibernate hits the database or the service does some trivial amount of work. But this increases to an across-the-board 25% (more if I let it go too far) usage at all times while the process is running.
That made Ian's suggestion the logical silver bullet (NHibernate does a lot of stuff when you're not looking). Alas, I've implemented his solution, but it hasn't had an effect (I have no proof of this, but I actually think it's made things worse... average usage is seeming to go up much faster now). Note that stripping out the NHibernate "sections" (as you recommend) is not feasible, since that would strip out about 90% of the code in the service, which would let me rule out the timer as a problem (which I absolutely intend to try), but can't help me rule out NHibernate as the issue, because if NHibernate is causing this, then the dodgy fix that's implemented (see below) is just going to have to become The Way The System Works; we are so dependent on NHibernate for this project that the PM simply won't accept that it's causing an unresolvable structural problem.
I just noted a sense of desperation in
the question -- that your problems
would continue barring a small miracle
Don't mean for it to come off that way. At the moment, the services are being restarted daily (with an option to input any number of hours of the day for them to shutdown and restart), which patches the problem but cannot be a long-term solution once they go onto the production machine and start to become busy. The problems will not continue, whether I fix them or the PM maintains this constraint on them. Obviously, I would prefer to implement a real fix, but since the initial testing revealed no reason for this, and the services have already been extensively reviewed, the PM would rather just have them restart multiple times than spend any more time trying to fix them. That's entirely out of my control and makes the miracle you were talking about more important than it would otherwise be.
That is extremely intriguing (insofar
as you trust your profiler).
I don't. But then, these are Windows services written in .NET 1.1 running on a Windows 2000 machine, deployed by a dodgy Nant script, using an old version of NHibernate for database access. There's little on that machine I would actually say I trust.

You mentioned that you're using NHibernate - are you closing your NHibernate sessions at appropriate points (such as the end of each iteration?)
If not, then the size of the object map loaded into memory will be gradually increasing over time, and each session flush will take increasingly more CPU time.

Here's where I'd start:
Get Process Explorer and show %Time in JIT, %Time in GC, CPU Cycles Delta, CPU Time, CPU %, and Threads.
You'll also want kernel and user time, and a couple of representative stack traces but I think you have to hit Properties to get snapshots.
Compare before and after shots.
A couple of thoughts on possibilities:
excessive GC (% Time in GC going up. Also, Perfmon GC and CPU counters would correspond)
excessive threads and associated context switches (# of threads going up)
polling (stack traces are consistently caught in a single function)
excessive kernel time (kernel times are high - Task Manager shows large kernel time numbers when CPU is high)
exceptions (PE .NET tab Exceptions thrown is high and getting higher. There's also a Perfmon counter)
virus/rootkit (OK, this is a last ditch scenario - but it is possible to construct a rootkit that hides from TaskManager. I'd suspect that you could then allocate your inevitable CPU usage to another process if you were cunning enough. Besides, if you've ruled out all of the above, I'm out of ideas right now)

It's obviously pretty difficult to remotely debug you're unknown application... but here are some things I'd look at:
What happens when you only run one of the services at a time? Do you still see the slow-down? This may indicate that there is some contention between the services.
Does the problem always occur around the same time, regardless of how long the service has been running? This may indicate that something else (a backup, virus scan, etc) is causing the machine (or db) as a whole to slow down.
Do you have logging or some other mechanism to be sure that the service is only doing work as often as you think it should?
If you can see the performance degradation over a short time period, try running the service for a while and then attach a profiler to see exactly what is pegging the CPU.
You don't mention anything about memory usage. Do you have any of this information for the services? It's possible that your using up most of the RAM and causing the disk the trash, or some similar problem.
Best of luck!

I suggest to hack the problem into pieces.
First, find a way to reproduce the problem 100% of the times and quickly. Lower the timer so that the services fire up more frequently (for example, 10 times quicker than normal). If the problem arises 10 times quicker, then it's related to the number of iterations and not to real time or to real work done by the services). And you will be able to do the next steps quicker than once a day.
Second, comment out all the real work code, and let only the services, the timers and the synchronization mechanism. If the problem still shows up, than it will be in that part of the code.
If it doesn't, then start adding back the code you commented out, one piece at a time. Eventually, you should find out what part of the code is causing the problem.

'Fraid this answer is only going to suggest some directions for you to look in, but having seen similar problems in .NET Windows Services I have a couple of thoughts you might find helpful.
My first suggestion is your services might have some bugs in either the way they handle memory, or perhaps in the way they handle unmanaged memory. The last time I tracked down a similar issue it turned out a 3rd party OSS libray we were using stored handles to unmanaged objects in static memory. The longer the service ran the more handles the service picked up which caused the process' CPU performance to nose-dive very quickly. The way to try and resolve this sort of issue to ensure your services store nothing in memory inbetween the timer invocations, although if your 3rd party libraries use static memory you might have to do something clever like create an app domain for the timer invocation and ditch the app doamin (and its static memory) once processing is complete.
The other issue I've seen in similar circumstances was with the timer synchronization code being suspect, which in effect allowed more than one thread to be running the processing code at once. When we debugged the code we found the 1st thread was blocking the 2nd, and by the time the 2nd kicked off there was a 3rd being blocked. Over time the blocking was lasting longer and longer and the CPU usage was therefore heading to the top. The solution we used to fix the issue was to implement proper synchronization code so the timer only kicked off another thread if it wouldn't be blocked.
Hope this helps, but apologies up front if both my thoughts are red herrings.

Sounds like a threading issue with the timer. You might have one unit of work blocking another running on different worker threads, causing them to stack up every time the timer fires. Or you might have instances living and working longer than you expect.
I'd suggest refactoring out the timer. Replace it with a single thread that queues up work on the ThreadPool. You can Sleep() the thread to control how often it looks for new work. Make sure this is the only place where your code is multithreaded. All other objects should be instantiated as work is readied for processing and destroyed after that work is completed. STATE IS THE ENEMY in multithreaded code.
Another area where the design is lacking appears to be that you have multiple services that are polling resources to do something. I'd suggest unifying them under a single service. They might do seperate things, but they're working in unison; you're just using the filesystem, database, etc as a substitution for method calls. Also, 2003? I feel bad for you.

Good suggestions, but rest assured, we have tried all of the usual troubleshooting. What I'm hoping is that this is a .NET issue that someone might know about, that we can work on solving.
My feeling is that no matter how bizarre the underlying cause, the usual troubleshooting steps are your best bet for locating the issue.
Since this is a performance issue, good measurements are invaluable. The overall process CPU usage is far too broad a measurement. Where is your service spending its time? You could use a profiler to measure this, or just log various section start and stops. If you aren't able to do even that, then use Andrea Bertani's suggestion -- isolate sections by removing others.
Once you've located the general area, then you can make even finer-grained measurements, until you sort out the source of the CPU usage. If it's not obvious how to fix it at that point, you at least have ammunition for a much more specific question.
If you have in fact already done all this usual troubleshooting, please do let us in on the secret.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.