I've been profiling a method using the stopwatch class, which is sub-millisecond accurate. The method runs thousands of times, on multiple threads.
I've discovered that most calls (90%+) take 0.1ms, which is acceptable. Occasionally, however, I find that the method takes several orders of magnitude longer, so that the average time for the call is actually more like 3-4ms.
What could be causing this?
The method itself is run from a delegate, and is essentially an event handler.
There are not many possible execution paths, and I've not yet discovered a path that would be conspicuously complicated.
I'm suspecting garbage collection, but I don't know how to detect whether it has occurred.
Finally, I am also considering whether the logging method itself is causing the problem. (The logger is basically a call to a static class + event listener that writes to the console.)
Just because Stopwatch has a high accuracy doesn't mean that other things can't get in the way - like the OS interrupting that thread to do something else. Garbage collection is another possibility. Writing to the console could easily cause delays like that.
Are you actually interested in individual call times, or is it overall performance which is important? It's generally more useful to run a method thousands of times and look at the total time - that's much more indicative of overall performance than individual calls which can be affected by any number of things on the computer.
As I commented, you really should at least describe what your method does, if you're not willing to post some code (which would be best).
That said, one way you can tell if garbage collection has occurred (from Windows):
Run perfmon (Start->Run->perfmon)
Right-click on the graph; select "Add Counters..."
Under "Performance object", select ".NET CLR Memory"
From there you can select # Gen 0, 1, and 2 collections and click "Add"
Now on the graph you will see a graph of all .NET CLR garbage collections
Just keep this graph open while you run your application
EDIT: If you want to know if a collection occurred during a specific execution, why not do this?
int initialGen0Collections = GC.CollectionCount(0);
int initialGen1Collections = GC.CollectionCount(1);
int initialGen2Collections = GC.CollectionCount(2);
// run your method
if (GC.CollectionCount(0) > initialGen0Collections)
// gen 0 collection occurred
if (GC.CollectionCount(1) > initialGen1Collections)
// gen 1 collection occurred
if (GC.CollectionCount(2) > initialGen2Collections)
// gen 2 collection occurred
SECOND EDIT: A couple of points on how to reduce garbage collections within your method:
You mentioned in a comment that your method adds the object passed in to "a big collection." Depending on the type you use for said big collection, it may be possible to reduce garbage collections. For instance, if you use a List<T>, then there are two possibilities:
a. If you know in advance how many objects you'll be processing, you should set the list's capacity upon construction:
List<T> bigCollection = new List<T>(numObjects);
b. If you don't know how many objects you'll be processing, consider using something like a LinkedList<T> instead of a List<T>. The reason for this is that a List<T> automatically resizes itself whenever a new item is added beyond its current capacity; this results in a leftover array that (eventually) will need to be garbage collected. A LinkedList<T> does not use an array internally (it uses LinkedListNode<T> objects), so it will not result in this garbage collection.
If you are creating objects within your method (i.e., somewhere in your method you have one or more lines like Thing myThing = new Thing();), consider using a resource pool to eliminate the need for constantly constructing objects and thereby allocating more heap memory. If you need to know more about resource pooling, check out the Wikipedia article on Object Pools and the MSDN documentation on the ConcurrentBag<T> class, which includes a sample implementation of an ObjectPool<T>.
That can depend on many things and you really have to figure out which one you are delaing with.
I'm not terribly familiar with what triggers garbage collection and what thread it runs on, but that sounds like a possibility.
My first thought around this is with paging. If this is the first time the method runs and the application needs to page in some code to run the method, it would be waiting on that. Or, it could be the data that you're using within the method that triggered a cache miss and now you have to wait for that.
Maybe you're doing an allocation and the allocator did some extra reshuffling in order to get you the allocation you requested.
Not sure how thread time is calculated with Stopwatch, but a context switch might be what you're seeing.
Or...it could be something completely different...
Basically, it could be one of several things and you really have to look at the code itself to see what is causing your occasional slow-down.
It could well be GC. If you use a profiler application such as Redgate's ANTS profiler you can profile % time in GC along side your application's performance to see what's going on.
In addition, you can use the CLRProfiler...
https://github.com/MicrosoftArchive/clrprofiler
Finally, Windows Performance Monitor will show the % time in GC for a given running applicaiton too.
These tools will help you get a holistic view of what's going on in your app as well as the OS in general.
I'm sure you know this stuff already but microbenchmarking such as this is sometimes useful for determining how fast one line of code might be compared to another than you might write, but you generally want to profile your application under typical load too.
Knowing that a given line of code is 10 times faster than another is useful, but if that line of code is easier to read and not part of a tight loop then the 10x performance hit might not be a problem.
What you need is a performance profile to tell you exactly what causes a slow down. Here is a quick list And of course here is the ANTS profiler.
Without knowing what your operation is doing, it sounds like it could be the garbage collection. However that might not be the only reason. If you are reading or writing to the disc it is possible your application might have to wait while something else is using the disk.
Timing issues may occur if you have a multi-threaded application and another thread could be taking some processor time that is only running 10 % of the time. This is why a profiler would help.
If you're only running the code "thousands" of times on a pretty quick function, the occasional longer time could easily be due to transient events on the system (maybe Windows decided it was time to cache something).
That being said, I would suggest the following:
Run the function many many more times, and take an average.
In the code that uses the function, determine if the function in question actually is a bottleneck. Use a profiler for this.
It can be dependent on your OS, environment, page reads, CPU ticks per second and so on.
The most realistic way is to run an execution path several thousand times and take the average.
However, if that logging class is only called occasionally and it logs to disk, that is quite likely to be a slow-down factor if it has to seek on the drive first.
A read of http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 may give you an insight into more techniques for determining slowdowns in your applications, while a list of profiling tools that you may find useful is at:
http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler
specifically http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler if you're doing c# stuff.
Hope that helps!
Related
I am very confused by what I am seeing in my program.
Let's say we have a list of two large objects (loaded from 2 external files).
Then I iterate over each object and for each one I call a method, which performs a bunch of processing.
Just to illustrate:
foreach (var object in objects)
{
object.DoSomething();
}
In first case, objects contains 2 items. It completes very fast, I track the progress of each object individually and the processing for each one is very fast.
Then I run the program again, this time adding some more input files, so instead of 2, I'd have let's say 6 objects.
So the code runs again, and the 2 objects from before are still there, along with some more, but for some odd reason, now each processing (each call to object.DoSomething()) takes much longer than before.
Let's say scenario 1 with 2 objects, objectA.Dosomething() takes 1
minute to complete.
Let's say scenario 2, with 6 objects, same objectA.Dosomething()
as in scenario 1 now takes 5 minutes to complete.
The more objects I have in my list, the longer each processing for each individual object takes.
How is that possible? How can the performance of an individual processing for a specific, independent object, be affected so much by objects in the memory? How can, in scenario 1 and 2 above, the exact same processing on the exact same data take a significantly different amount of time to complete?
Also, please note that processing is slower from the start, it does not start fast on first object and then slows down progressively, it's just consistently slowed down proportionally to the amount of objects to process. I have some multi-threading in there, and I can see the rate at which threads complete drops dramatically when I start adding more objects. The multi-threading happens inside of "DoSomething()" and it will not leave untill all threads have completed. However, I don't think this issue is related to multi-threading. Actually, I added multi-threading because of the slowness.
Also please note that initially I was merging all input files into one huge object and one single call to DoSomething(), and I broke it down thinking it would help performance.
Is this a "normal" behavior and if so, what are the ways around this? I can think of other ways to process the data, but I still don't get this behavior and there has to be something I can do to get the intended result here.
Edit 1:
Each object in the "objects" list above also contains a list (queue) of smaller objects, around 5000 of those each. I am starting to believe my issue might be that, and that I should use structs or something similar instead of having so many nested objects. Would that explain the type of behavior I am describing above?
As stated in the comments, my question was too abstract for any precise answer to be given. I mostly wanted some pointers and to know if somehow I might have hit some internal limit.
It turned out I was overlooking a separate mechanism I have for logging results internally and producing reports. I built that part of the system really quickly and it was ridiculously inefficient and growing way too fast. Limiting the size of the internal structures, limiting the amount of retrievals from big collections and breaking down the processing in smaller chunks did the trick.
Just to illustrate, something that was taking over 6 hours is now taking 1 minute. Shame on me. Cleaner solution would be to use a database, but at least it seems I will be getting away with this one for now.
I am working on a large windows desktop application that stores large amount of data in form of a project file. We have our custom ORM and serialization to efficiently load the object data from CSV format. This task is performed by multiple threads running in parallel processing multiple files. Our large project can contain million and likely more objects with many relationships between them.
Recently I got tasked to improve the project open performance which deteriorated for very large projects. Upon profiling it turned out that most of the time spent can be attributed to garbage collection (GC).
My theory is that due to large number of very fast allocations the GC is starved, postponed for a very long time and then when it finally kicks in it takes a very long time to the job. That idea was further confirmed by two contradicting facts:
Optimizing deserialization code to work faster only made things worse
Inserting Thread.Sleep calls at strategic places made load go faster
Example of slow load with 7 generation 2 collections and huge % of time in GC is below.
Example of fast load with sleep periods in the code to allow GC some time is below. In this case wee have 19 generation 2 collections and also more than double the number of generation 0 and generation 1 collections.
So, my question is how to prevent this GC starvation? Adding Thread.Sleep looks silly and it is very difficult to guess the right amount of milliseconds in the right place. My other idea would be to use GC.Collect, but that also poses the difficulty of how many and where to put them. Any other ideas?
Based on the comments, I'd guess that you are doing a ton of String.Substring() operations as part of CSV parsing. Each of these creates a new string instance, which I'd bet you then throw away after further parsing it into an integer or date or whatever you need. You almost certainly need to start thinking about using a different persistence mechanism (CSV has a lot of shortcomings that you are undoubtedly aware of), but in the meantime you are going to want to look into versions of parsers that do not allocate substrings. If you dig into the code for Int32.TryParse, you'll find that it does some character iteration to avoid allocating more strings. I'd bet that you could spend an hour writing a version that takes a start and end parameter, then you can pass them the whole line with offsets and avoid doing a substring call to get the individual field values. Doing that will save you millions of allocations.
So, it appears that this is a .NET bug rather then GC starvation. The workarounds and answers described in this question Garbage Collection and Parallel.ForEach Issue After VS2015 Upgrade apply perfectly. I got best results by switching to GC server mode.
Note however, that I am experiencing this issue in .NET 4.5.2. Will add hotfix link if there is one.
I have a program that processes high volumes of data, and can cache much of it for reuse with subsequent records in memory. The more I cache, the faster it works. But if I cache too much, boom, start over, and that takes a lot longer!
I haven't been too successful trying to do anything after the exception occurs - I can't get enough memory to do anything.
Also I've tried allocating a huge object, then de-allocating it right away, with inconsistent results. Maybe I'm doing something wrong?
Anyway, what I'm stuck with is just setting a hardcoded limit on the # of cached objects that, from experience, seems to be low enough. Any better Ideas? thanks.
edit after answer
The following code seems to be doing exactly what I want:
Loop
Dim memFailPoint As MemoryFailPoint = Nothing
Try
memFailPoint = New MemoryFailPoint( mysize) ''// size of MB of several objects I'm about to add to cache
memFailPoint.Dispose()
Catch ex As InsufficientMemoryException
''// dump the oldest items here
End Try
''// do work
next loop.
I need to test if it is slowing things down in this arrangement or not, but I can see the yellow line in Task Manager looking like a very healthy sawtooth pattern with a consistent top - yay!!
You can use MemoryFailPoint to check for available memory before allocating.
You may need to think about your release strategy for the cached objects. There is no possible way you can hold all of them forever so you need to come up with an expiration timeframe and have older cached objects removed from memory. It should be possible to find out how much memory is left and use that as part of your strategy but one thing is certain, old objects must go.
If you implement your cache with WeakRerefences (http://msdn.microsoft.com/en-us/library/system.weakreference.aspx) that will leave the cached objects still eligible for garbage collection in situations where you might otherwise throw an OutOfMemory exception.
This is an alternative to a fixed sized cache, but potentially has the problem to be overly aggressive in clearing out the cache when a GC does occur.
You might consider taking a hybrid approach, where there are a (tunable) fixed number of non-weakreferences in the cahce but you let it grow additionally with weakreferences. Or this may be overkill.
There are a number of metrics you can use to keep track of how much memory your process is using:
GC.GetTotalMemory
Environment.WorkingSet (This one isn't useful, my bad)
The native GlobalMemoryStatusEx function
There are also various properties on the Process class
The trouble is that there isn't really a reliable way of telling from these values alone whether or not a given memory allocation will fail as although there may be sufficient space in the address space for a given memory allocation memory fragmentation means that the space may not be continuous and so the allocation may still fail.
You can however use these values as an indication of how much memory the process is using and therefore whether or not you should think about removing objects from your cache.
Update: Its also important to make sure that you understand the distinction between virtual memory and physical memory - unless your page file is disabled (very unlikely) the cause of the OutOfMemoryException will be caused by a lack / fragmentation of the virtual address space.
If you're only using managed resources you can use the GC.GetTotalMemory method and compare the results with the maximum allowed memory for a process on your architecture.
A more advanced solution (I think this is how SQL Server manages to actually adapt to the available memory) is to use the CLR Hosting APIs:
the interface allows the CLR to inform the host of the consequences of
failing a particular allocation
which will mean actually removing some objects from the cache and trying again.
Anyway I think this is probably an overkill for almost all applications unless you really need an amazing performance.
The simple answer... By knowing what your memory limit is.
The closer you are to reach that limit the more you ARE ABOUT to get an OutOfMemoryException.
The more elaborated answer.... Unless you yourself writes a mechanism to do that kind of thing, programming languages/systems do not work that way; as far as I know they cannot inform you ahead or in advance you are exceeding limits BUT, they gladly inform you when the problem has occurred, and that usually happens through exceptions which you are supposed to write code to handle.
Memory is a resource that you can use; it has limits and it also has some conventions and rules for you to follow to make good use of that resource.
I believe what you are doing of setting a good limit, hard coded or configurable, seems to be your best bet.
I wish (I dont know if its possible) to build profiling support into my code instead of using some external profiler. I have heard that there is some profiler api that is used by most of the profiler writers. Can that api be used to profile from within the code that is being executed? Are there any other considerations?
If you don't want to use a regular profiler, you could have your application output performance counters.
You may find this blog entry useful to get started: Link
The EQATEC Profiler builds an instrumented version of your app that will run and collect profiling statistics entirely by itself - you don't need to attach the profiler. By default your app will simply dump the statistics into plaintext xml-files.
This means that you can build a profiled version of your app, deploy it at your customer's site, and have them run it and send back the statistics-reports to you. No need for them to install anything special or run a profiler or anything.
Also, if you can reach your deployed app's machine via a network-connection and it allows incoming connections then you can even take snapshots of the running profiled app yourself, sitting at home with the profiler. All you need is a socket-connection - you decide the port-number yourself and the control-protocol itself is plain http, so it's pretty likely to make it past even content-filtering gateways.
The .NET framework profiler API is a COM object that intercepts calls before .NET handles them. My understanding is that it cannot be hosted in managed (C#) code.
Depending on what you want to do, you can insert Stopwatch timers to measure length of calls, or add Performance Counters to your application so that you can monitor the performance of the application from the Performance Monitor.
There's a GameDev article that discusses how to build profiling infrastructure in a C++ program. You may be able to adapt this approach to work with C# provided objects created on the stack are freed on exit instead of left for the garbage collector
http://www.gamedev.net/reference/programming/features/enginuity3/
Even if you can't take the whole technique, there may be some useful ideas.
What I've done when I can't use my favorite technique is this. It's clumsy and gives low-resolution information, but it works. First, have a global stack of strings. This is in C, but you can adapt it to C#:
int nStack = 0;
char* stack[10000];
Then, on entry and exit to each routine you have source code for, push/pop the name of the routine:
void EveryFunction(){
int iStack = nStack++; stack[iStack] = "EveryFunction";
... code inside function
nStack = iStack; stack[iStack] = NULL;
}
So now stack[0..nStack] keeps a running call stack (minus the line numbers of where functions are called from), so it's not as good as a real call stack, but better than nothing.
Now you need a way to take snapshots of it at random or pseudo-random times. Have another global variable and a routine to look at it:
time_t timeToSnap;
void CheckForSnap(){
time_t now = time(NULL);
if (now >= timeToSnap){
if (now - timeToSnap > 10000) timeToSnap = now; // don't take snaps since 1970
timeToSnap += 1; // setup time for next snapshot
// print stack to snapshot file
}
}
Now, sprinkle calls to CheckForSnap throughout your code, especially in the low-level routines. When the run is finished, you have a file of stack samples. You can look at those for unexpected behavior. For example, any function showing up on a significant fraction of samples has inclusive time roughly equal to that fraction.
Like I said, this is better than nothing. It does have shortcomings:
It does not capture line-numbers where calls come from, so if you find a function with suspiciously large time, you need to rummage within it for the time-consuming code.
It adds significant overhead of it's own, namely all the calls to time(NULL), so when you have removed all your big problems, it will be harder to find the small ones.
If your program spends significant time waiting for I/O or for user input, you will see a bunch of samples piled up after that I/O. If it's file I/O, that's useful information, but if it's user input, you will have to discard those samples, because all they say is that you take time.
It is important to understand a few things:
Contrary to popular accepted wisdom, accuracy of time measurement (and thus a large number of samples) is not important. What is important is that samples occur during the time when you are waiting for the program to do its work.
Also contrary to accepted wisdom, you are not looking for a call graph, you don't need to care about recursion, you don't need to care about how many milliseconds any routine takes or how many times it is called, and you don't need to care about the distinction between inclusive and exclusive time, or the distinction between CPU and wall-clock time. What you do need to care about is, for any routine, what percent of time it is on the stack, because that is how much time it is responsible for, in the sense that if you could somehow make that routine take no time, that is how much your total time would decrease.
I am working on a web app using C# and asp.net I have been receiving an out of memory exception. What the app does is read a bunch of records(products) from a data source, could be hundreds/thousands, processes those records through settings in a wizard and then updates a different data source with the processes product information. Although there are multiple DB classes, right now all the logic is in one big class. The only reason for this, is all the information has to do with one thing, a product. Would it help the memory if I divided my app into different classes?? I don't think it would because if I divided the business logic into two classes, both of the classes would remain alive the entire time sending messages to each other, and so I don't know how this would help. I guess my other solution would be to find out what's sucking up all the memory. Is there a good tool you could recommend??
Thanks
Are you using datareaders to stream through your data? (to avoid loading too much into memory)
My gut is telling me this is a trivial issue to fix, don't pump datatables with 1 million records, work through tables one row at a time, or in small batches ... Release and dispose objects when you are done with them. (Example: don't have static List<Customer> allCustomers = AllCustomers())
Have a development rule that ensures no one reads tables into memory if there are more than X amount of rows involved.
If you need a tool to debug this look at .net memory profiler or windbg with the sos extension both will allow you to sniff through your your managed heaps.
Another note is, if you care about maintainability and would like to reduce your defect count, get rid of the SuperDuperDoEverything class and model information correctly in a way that is better aligned with your domain. The SuperDuperDoEverything class is a bomb waiting to explode.
Also note that you may not actually be running out of memory. What happens is that .NET goes to look for contiguous blocks of memory, and if it doesn't find any, it throws an OOM - even if you have plenty of total memory to cover the request.
Someone referenced both Perfmon and WinDBG. You could also setup adplus to capture a memory dump on crash - I believe the syntax is adplus -crash -iis. Once you have the memory dump, you can do something like:
.symfix C:\symbols
.reload
.loadby sos mscorwks
!dumpheap -stat
And that will give you an idea for what your high-memory objects are.
And of course, check out Tess Fernandez's excellent blog, for example this article on Memory Leaks with XML Serializers and how to troubleshoot them.
If you are able to repro this in your dev environment, and you have VS Team Edition for Developers, there are memory profilers built right in. Just launch a new performance session, and run your app. It will spit out a nice report of what's hanging around.
Finally, make sure your objects don't define a destructor. This isn't C++, and there's nothing deterministic about it, other than it guarantees your object will survive a round of Garbage Collection since it has to be placed in the finalizer queue, and then cleaned up the next round.
a very basic thing you might want to try is, restart visual studio (assuming you are using it) and see if the same thing happens, and yes releasing objects without waiting for garbage collector is always a good practice.
to sum it up,
release objects
close connections
and you can always try this,
http://msdn.microsoft.com/en-us/magazine/cc337887.aspx
I found the problem. While doing my loop I had a collection that wasn't being cleared and so data just keep being added to it.
Start with Perfmon; There is a number of counters for GC related info. More than likely you are leaking memory(otherwise the GC would be deleting objects), meaning you are still referencing data structures that are no longer needed.
You should split into multiple classes anyways, just for the sake of a sane design.
Are you closing your DB connections? If you are reading into files, are you closing/releasing them once you are done reading/writing? Same goes for other objects.
You could cycle your class objects routinely just to release memory.