OVERVIEW
I am facing performance slowdown while iterating MANY times through a calculator class.
Iterations take about 3mn each at the beginning and take longer and longer as the iteration count grows (30mn+/per process). I have to Stop the program/Restart the execution where I left it to come back to normal conditions (3mn/per process).
WHAT I DO
I have a scientific application that tests a set of parameters over a process.
For example I have N scenarios (i.e a parameter combination), tested over an experimentation set, that consists in a calculator class that takes the parameters in input, processes them against T possible XP conditions, and stores the output in ORM objects, that are fired to DB after each iteration. In other words, Each of the N Parameters combination is passed T times trough the calculator.
Parameter combination : Params Set 1, Params Set 2, ...., Params Set N
Experimental Set : XP Set 1 , XP Set 2 , ...., XP Set T
So I have NxT combinations, N and T being around 256 each, which give 65000+ iterations.
HOW I DO IT
I have a GUI to fix the parameter sets, and launch Background Workers (one per Parameter combination). Each Backrgound worker loads the first of the T XP sets, executes the current Parameter Set, move to next XP Set, and so on . A report is generated after each single iteration by the calculator (i.e after each Nx/Tx) and an event is fired to populate .NET Linq/SQL ORM objects (AgileFX) and store them into an SQL Server Database.
THE PROBLEM
The process runs fine the first 30mn and then slowly begins to drift, each iteration taking longer and longer (Sound like a memory overflow or so...)
HINT
Oddly enough, an experimenter noticed very pertinently that the processing time grows in a linear fashion : +3mn more of the precedent processing time. Which comes down to an arithmetic progression (Tn+1 = Tn + 3mn)
I have a 12-Core INTEL and 24GB RAM
A quick suggestion, could you solve your problem through Memoization, avoiding re-calculating what should have been known results?
Also, remember that your garbage collector will not be able to do a garbage collection if you have it will find a reference to the object in some way!
I think I have found one part of the problem, but it did not fix the issue completely :
Objects where sent to the ORM via Delegates registered by a Listener, so each Calculation Thread was still "existing" in the memory even after it has ended.
As a colleague stated it : "Even if you move off, If I still have your address in my registers, ror me you still live in the neighborhood."
BTW, performance wizard in VS2010 works a treat. Extremely insightful and useful for monitoring overall memory performance with precision and accuracy.
EDIT : PROBLEM SOLVED
The class responsible for firing background workers was keeping track of some data in a tracker object that kept growing on and on and never flushed, getting bigger and bigger. I've noticed this by closely tracking memory usage per object in VS 2010 Performance Wizard.
I advice having a clear view of objects lifecycle, and memory usage, although it can get tough when the application is big and complex.
Related
I am very confused by what I am seeing in my program.
Let's say we have a list of two large objects (loaded from 2 external files).
Then I iterate over each object and for each one I call a method, which performs a bunch of processing.
Just to illustrate:
foreach (var object in objects)
{
object.DoSomething();
}
In first case, objects contains 2 items. It completes very fast, I track the progress of each object individually and the processing for each one is very fast.
Then I run the program again, this time adding some more input files, so instead of 2, I'd have let's say 6 objects.
So the code runs again, and the 2 objects from before are still there, along with some more, but for some odd reason, now each processing (each call to object.DoSomething()) takes much longer than before.
Let's say scenario 1 with 2 objects, objectA.Dosomething() takes 1
minute to complete.
Let's say scenario 2, with 6 objects, same objectA.Dosomething()
as in scenario 1 now takes 5 minutes to complete.
The more objects I have in my list, the longer each processing for each individual object takes.
How is that possible? How can the performance of an individual processing for a specific, independent object, be affected so much by objects in the memory? How can, in scenario 1 and 2 above, the exact same processing on the exact same data take a significantly different amount of time to complete?
Also, please note that processing is slower from the start, it does not start fast on first object and then slows down progressively, it's just consistently slowed down proportionally to the amount of objects to process. I have some multi-threading in there, and I can see the rate at which threads complete drops dramatically when I start adding more objects. The multi-threading happens inside of "DoSomething()" and it will not leave untill all threads have completed. However, I don't think this issue is related to multi-threading. Actually, I added multi-threading because of the slowness.
Also please note that initially I was merging all input files into one huge object and one single call to DoSomething(), and I broke it down thinking it would help performance.
Is this a "normal" behavior and if so, what are the ways around this? I can think of other ways to process the data, but I still don't get this behavior and there has to be something I can do to get the intended result here.
Edit 1:
Each object in the "objects" list above also contains a list (queue) of smaller objects, around 5000 of those each. I am starting to believe my issue might be that, and that I should use structs or something similar instead of having so many nested objects. Would that explain the type of behavior I am describing above?
As stated in the comments, my question was too abstract for any precise answer to be given. I mostly wanted some pointers and to know if somehow I might have hit some internal limit.
It turned out I was overlooking a separate mechanism I have for logging results internally and producing reports. I built that part of the system really quickly and it was ridiculously inefficient and growing way too fast. Limiting the size of the internal structures, limiting the amount of retrievals from big collections and breaking down the processing in smaller chunks did the trick.
Just to illustrate, something that was taking over 6 hours is now taking 1 minute. Shame on me. Cleaner solution would be to use a database, but at least it seems I will be getting away with this one for now.
I am working on a large windows desktop application that stores large amount of data in form of a project file. We have our custom ORM and serialization to efficiently load the object data from CSV format. This task is performed by multiple threads running in parallel processing multiple files. Our large project can contain million and likely more objects with many relationships between them.
Recently I got tasked to improve the project open performance which deteriorated for very large projects. Upon profiling it turned out that most of the time spent can be attributed to garbage collection (GC).
My theory is that due to large number of very fast allocations the GC is starved, postponed for a very long time and then when it finally kicks in it takes a very long time to the job. That idea was further confirmed by two contradicting facts:
Optimizing deserialization code to work faster only made things worse
Inserting Thread.Sleep calls at strategic places made load go faster
Example of slow load with 7 generation 2 collections and huge % of time in GC is below.
Example of fast load with sleep periods in the code to allow GC some time is below. In this case wee have 19 generation 2 collections and also more than double the number of generation 0 and generation 1 collections.
So, my question is how to prevent this GC starvation? Adding Thread.Sleep looks silly and it is very difficult to guess the right amount of milliseconds in the right place. My other idea would be to use GC.Collect, but that also poses the difficulty of how many and where to put them. Any other ideas?
Based on the comments, I'd guess that you are doing a ton of String.Substring() operations as part of CSV parsing. Each of these creates a new string instance, which I'd bet you then throw away after further parsing it into an integer or date or whatever you need. You almost certainly need to start thinking about using a different persistence mechanism (CSV has a lot of shortcomings that you are undoubtedly aware of), but in the meantime you are going to want to look into versions of parsers that do not allocate substrings. If you dig into the code for Int32.TryParse, you'll find that it does some character iteration to avoid allocating more strings. I'd bet that you could spend an hour writing a version that takes a start and end parameter, then you can pass them the whole line with offsets and avoid doing a substring call to get the individual field values. Doing that will save you millions of allocations.
So, it appears that this is a .NET bug rather then GC starvation. The workarounds and answers described in this question Garbage Collection and Parallel.ForEach Issue After VS2015 Upgrade apply perfectly. I got best results by switching to GC server mode.
Note however, that I am experiencing this issue in .NET 4.5.2. Will add hotfix link if there is one.
I have an event driven app that I was tasked with maintaining.
About 100 events run every 30 seconds, on separate timers. Over time the events alias into a constant stream of about 1-3 events per second.
Memory usage does not appear dependent on the number of events firing in any given second.
Each event polls data from a Webservice, checks the data using a LINQ2SQL DataContext against the previously polled data (I do not dispose or null out the DataContext when done), and if the data is different, updates the database and pushes the new data as an XML message to receiver service via TCP.
This app appears to have a memory leak which
only manifests after 30m+ of running (either debug or release)
won't manifest when profiling [I'm using .NET Memory Profiler 4.5]
Characteristics:
On startup the program uses ~30MB. As time progresses this Memory usage in Task Manager will begin pogoing, first only slightly, between 50 and 150MB, and eventually gets worse, oscillating between 200MB and 1GB+. When this happens, it happens a few times within a second or two, then settles down at ~150MB for the next 10-20 or so seconds.
I've been trying to catch this behavior in action using memory profiling. So far I've been unsuccessful, I can't get the app to pogo or oscillate in memory usage anywhere near like I can when the profiler isn't watching.
However, I've been noticing a square-wave sort of pattern on the memory usage as the Garbage Collector stages 1 and 2 run that looks very similar to what I see in Task Manager, except the memory usage oscillations in the square-wave are 10MB wide, instead of 800MB+ (200MB to 1GB+). Now, according to Google Images, Garbage collection in a properly functioning app looks more like a sawtooth wave than square.
I frankly don't see any way that my app could be pogoing between 200MB and 1GB+ of memory usage within a second and NOT be spiking the CPU to 100%.
I have read about some problems that can manifest between garbage collection + event handling, but I have several paths I could go investigate and am trying to narrow down which one to spend time on. I'm still pretty slow at .NET and haven't developed the "intuition" I have for embedded devices running C that generally helps me filter what I should investigate first.
What if FEELS like is perhaps some event handlers are losing and re-gaining references to [massive amounts of data] (I don't know how this could even happen?) seeing as memory usage appears to spike back up to 1GB soon after the garbage collector runs and drops memory usage back to 200MB.
Previous versions of this app did not have these problems. Two changes I have made since then include
utilizing LINQ2SQL instead of our own data manager (which had an ADORecordSetHelper object we utilized to execute hardcoded SQL statements)
changing the piece of software we use to send the TCP XML messages to a receiver.
Due to the simplicity of the what we're doing in #2, it COULD be the source of the problem but this memory usage behavior makes me think otherwise.
I guess my main questions at this point are
Should I be calling dispose on my LINQ2SQL DataContexts before I return from the method I create them in?
Should I null them out instead?
if an exception were occuring somewhere in a method after creation of a DataContext, could it cause the DataContext to be kept in memory indefinitely?
if I store a result from a LINQ query to a value-type (ie int not var), is it lazy-loaded then, or lazy'd when the variable is used?
how possible is it for event-driven frameworks to hypothetically lose and regain references?
edit: the events have instance-based subscriptions like discussed here and are never unsubscribed for the life of the app.
edit2: finally managed to catch it in the profiler, appears to be a 200MB system.string that's being created somehow. Thanks everyone for ruling out GC behavior.
Most of the times, memory leaks are caused by weird references between objects (events and delegates are also included here).
What I think you could try is the following:
Run the application and reproduce the issue. When the private working set of memory hits a very high value, right click the process on task manager and select "Create Dump File". This will be a lot less intrusive than profiling the application live.
Download WinDBG and run it.
Open the memory dump by going to the File menu and selecting Open dump file (I cannot remember exactly what the name of the menu options is... should be easy to spot though).
Run the following commands:
.symfix
.loadby sos clr
!dumpheap -type [YourAssemblyNameSpacePrefix] -stat
The last command will give you all the instances in memory which are not CLR types, only your types. Look at the types which have a very high number of instances and try to see if anything doesn't look right.
If you see a very high number of objects of the same type run the following command which will show you all instances' addresses:
!dumheap -type [TheFullObjectTypeName]
You will need to select one single instance address. Now run the following command to see the references to that instance:
!gcroot [InstanceAddress]
Repeat step 6 a few times for different instances so that you can confirm the leak is coming from the same place or to help you identify what is causing those instances to not be collected (still being referenced by other objects).
If you don't see anything weird with your own types, change the !dumpheap command in step 4 to: !dumpheap -stat. This way you are not filtering by type and you will also see CLR types and third party libraries types.
This is a little bit complex but hopefully I was able to give you a method to help you figure out how to find memory leaks.
I'm using a parallel for loop in my code to run a long running process on a large number of entities (12,000).
The process parses a string, goes through a number of input files (I've read that given the number of IO based things the benefits of threading could be questionable, but it seems to have sped things up elsewhere) and outputs a matched result.
Initially, the process goes quite quickly - however it ends up slowing to a crawl. It's possible that it's just hit a number of particularly tricky input data, but this seems unlikely looking closer at things.
Within the loop, I added some debug code that prints "Started Processing: " and "Finished Processing: " when it begins/ends an iteration and then wrote a program that pairs a start and a finish, initially in order to find which ID was causing a crash.
However, looking at the number of unmatched ID's, it looks like the program is processing in excess of 400 different entities at once. This seems like, with the large number of IO, it could be the source of the issue.
So my question(s) is(are) this(these):
Am I interpreting the unmatched ID's properly, or is there some clever stuff going behind the scenes I'm missing, or even something obvious?
If you'd agree what I've spotted is correct, how can I limit the number it spins off and does at once?
I realise this is perhaps a somewhat unorthodox question and may be tricky to answer given there is no code, but any help is appreciated and if there's any more info you'd like, let me know in the comments.
Without seeing some code, I can guess at the answers to your questions:
Unmatched IDs indicate to me that the thread that is processing that data is being de-prioritized. This could be due to IO or the thread pool trying to optimize, however it seems like if you are strongly IO bound then that is most likely your issue.
I would take a look at Parallel.For, specifically using ParallelOptions.MaxDegreesOfParallelism to limit the maximum number of tasks to a reasonable number. I would suggest trial and error to determine the optimum number of degrees, starting around the number of processor cores you have.
Good luck!
Let me start by confirming that is indeed a very bad idea to read 2 files at the same time from a hard drive (at least until the majority of HDs out there are SSDs), let alone whichever number your whole thing is using.
The use of parallelism serves to optimize processing using an actually paralellizable resource, which is the CPU power. If you paralellized process reads from a hard drive then you're losing most of the benefit.
And even then, even the CPU power is not prone to infinite paralellization. A normal desktop CPU has the capacity to run up to 10 threads at the same time (depends of the model obviously, but that's the order of magnitude).
So two things
first, I am going to make the assumption that your entities use all your files, but your files are not too big to be loaded into memory. If it's the case, you should read your files into objects (i.e. into memory), then paralellize the processing of your entities using those objects. If not, you're basically relying on your hard drive's cache to not reread your files every time you need them, and your hard drive's cache is far smaller than your memory (1000-fold).
second, you shouldn't be running Parallel.For on 12.000 items. Parallel.For will actually (try to) create 12.000 threads, and that is actually worse than 10 threads, because of the big overhead that paralellizing will create, and the fact your CPU will not benefit from it at all since it cannot run more than 10 threads at a time.
You should probably use a more efficient method, which is the IEnumerable<T>.AsParallel() extension (comes with .net 4.0). This one will, at runtime, determine what is the optimal thread number to run, then divide your enumerable into as many batches. Basically, it does the job for you - but it creates a big overhead too, so it's only useful if the processing of one element is actually costly for the CPU.
From my experience, using anything parallel should always be evaluated against not using it in real-life, i.e. by actually profiling your application. Don't assume it's going to work better.
I've been profiling a method using the stopwatch class, which is sub-millisecond accurate. The method runs thousands of times, on multiple threads.
I've discovered that most calls (90%+) take 0.1ms, which is acceptable. Occasionally, however, I find that the method takes several orders of magnitude longer, so that the average time for the call is actually more like 3-4ms.
What could be causing this?
The method itself is run from a delegate, and is essentially an event handler.
There are not many possible execution paths, and I've not yet discovered a path that would be conspicuously complicated.
I'm suspecting garbage collection, but I don't know how to detect whether it has occurred.
Finally, I am also considering whether the logging method itself is causing the problem. (The logger is basically a call to a static class + event listener that writes to the console.)
Just because Stopwatch has a high accuracy doesn't mean that other things can't get in the way - like the OS interrupting that thread to do something else. Garbage collection is another possibility. Writing to the console could easily cause delays like that.
Are you actually interested in individual call times, or is it overall performance which is important? It's generally more useful to run a method thousands of times and look at the total time - that's much more indicative of overall performance than individual calls which can be affected by any number of things on the computer.
As I commented, you really should at least describe what your method does, if you're not willing to post some code (which would be best).
That said, one way you can tell if garbage collection has occurred (from Windows):
Run perfmon (Start->Run->perfmon)
Right-click on the graph; select "Add Counters..."
Under "Performance object", select ".NET CLR Memory"
From there you can select # Gen 0, 1, and 2 collections and click "Add"
Now on the graph you will see a graph of all .NET CLR garbage collections
Just keep this graph open while you run your application
EDIT: If you want to know if a collection occurred during a specific execution, why not do this?
int initialGen0Collections = GC.CollectionCount(0);
int initialGen1Collections = GC.CollectionCount(1);
int initialGen2Collections = GC.CollectionCount(2);
// run your method
if (GC.CollectionCount(0) > initialGen0Collections)
// gen 0 collection occurred
if (GC.CollectionCount(1) > initialGen1Collections)
// gen 1 collection occurred
if (GC.CollectionCount(2) > initialGen2Collections)
// gen 2 collection occurred
SECOND EDIT: A couple of points on how to reduce garbage collections within your method:
You mentioned in a comment that your method adds the object passed in to "a big collection." Depending on the type you use for said big collection, it may be possible to reduce garbage collections. For instance, if you use a List<T>, then there are two possibilities:
a. If you know in advance how many objects you'll be processing, you should set the list's capacity upon construction:
List<T> bigCollection = new List<T>(numObjects);
b. If you don't know how many objects you'll be processing, consider using something like a LinkedList<T> instead of a List<T>. The reason for this is that a List<T> automatically resizes itself whenever a new item is added beyond its current capacity; this results in a leftover array that (eventually) will need to be garbage collected. A LinkedList<T> does not use an array internally (it uses LinkedListNode<T> objects), so it will not result in this garbage collection.
If you are creating objects within your method (i.e., somewhere in your method you have one or more lines like Thing myThing = new Thing();), consider using a resource pool to eliminate the need for constantly constructing objects and thereby allocating more heap memory. If you need to know more about resource pooling, check out the Wikipedia article on Object Pools and the MSDN documentation on the ConcurrentBag<T> class, which includes a sample implementation of an ObjectPool<T>.
That can depend on many things and you really have to figure out which one you are delaing with.
I'm not terribly familiar with what triggers garbage collection and what thread it runs on, but that sounds like a possibility.
My first thought around this is with paging. If this is the first time the method runs and the application needs to page in some code to run the method, it would be waiting on that. Or, it could be the data that you're using within the method that triggered a cache miss and now you have to wait for that.
Maybe you're doing an allocation and the allocator did some extra reshuffling in order to get you the allocation you requested.
Not sure how thread time is calculated with Stopwatch, but a context switch might be what you're seeing.
Or...it could be something completely different...
Basically, it could be one of several things and you really have to look at the code itself to see what is causing your occasional slow-down.
It could well be GC. If you use a profiler application such as Redgate's ANTS profiler you can profile % time in GC along side your application's performance to see what's going on.
In addition, you can use the CLRProfiler...
https://github.com/MicrosoftArchive/clrprofiler
Finally, Windows Performance Monitor will show the % time in GC for a given running applicaiton too.
These tools will help you get a holistic view of what's going on in your app as well as the OS in general.
I'm sure you know this stuff already but microbenchmarking such as this is sometimes useful for determining how fast one line of code might be compared to another than you might write, but you generally want to profile your application under typical load too.
Knowing that a given line of code is 10 times faster than another is useful, but if that line of code is easier to read and not part of a tight loop then the 10x performance hit might not be a problem.
What you need is a performance profile to tell you exactly what causes a slow down. Here is a quick list And of course here is the ANTS profiler.
Without knowing what your operation is doing, it sounds like it could be the garbage collection. However that might not be the only reason. If you are reading or writing to the disc it is possible your application might have to wait while something else is using the disk.
Timing issues may occur if you have a multi-threaded application and another thread could be taking some processor time that is only running 10 % of the time. This is why a profiler would help.
If you're only running the code "thousands" of times on a pretty quick function, the occasional longer time could easily be due to transient events on the system (maybe Windows decided it was time to cache something).
That being said, I would suggest the following:
Run the function many many more times, and take an average.
In the code that uses the function, determine if the function in question actually is a bottleneck. Use a profiler for this.
It can be dependent on your OS, environment, page reads, CPU ticks per second and so on.
The most realistic way is to run an execution path several thousand times and take the average.
However, if that logging class is only called occasionally and it logs to disk, that is quite likely to be a slow-down factor if it has to seek on the drive first.
A read of http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 may give you an insight into more techniques for determining slowdowns in your applications, while a list of profiling tools that you may find useful is at:
http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler
specifically http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler if you're doing c# stuff.
Hope that helps!