.Net Memory Management Issue : Objects Stuck in Generation 2 - c#

I have profiled my App w/ VS2010 profiler, with object lifetime collection enabled.
I was heavilly surprised to see that most instances of a particular struct named "Record" are collected by the GC as Gen 2 instances. I am very upset, as instances of "Record" struct should live less than 500ms each (theoretically).
These structs are simple time series data of 6xInt32 or so, that are read on the flow, Queued/Dequeued in a Queue having a size of 1000, passed to a processor that fires some logic depending on those few millions "Records" sequentially. I do not need to keep more than 50 records at a time.
So my question is : Why could these object live long enough to mainly end up as Second Generation references, and what could I do to ensure they REALLY get dumped off after each computation.
EDIT :
I am asking this because I have noticed a drastic performance dropdown for bigger sample sizes (i.e Records Numbers) : if N take T minutes, 2N takes 2,5T minutes or so, and so on. So there is obviously a leak somewhere.
EDIT 2 :
My Bad : Creating a struct instance cannot cause a garbage collection
I've changed it to classes and did not notice any significant improvement so far.
I'll run the profiler again with classes this time not structs) and see what it gives
EDIT 3 :
Many answers suspect Boxing/Unboxing to take place somewhere.
I DO use typed generic collections and typed Queues. And "Records" are never attached to any class as members. They are individually handled by events.
The ex-Struct (Now Class) implemented an interface and was casted by it when called (this is rather common usage) and I dropped off that interface. No improvement.
EDIT 4 :
I have run again the profiler, replacing struct by class. I have the same results : most instances of CLASS "Record" still end up being collected as Gen2 instances
EDIT 5 :
Producers of the Record classes are many parallel BackGroundWorkers (Byte Readers), and there is one Consumer Thread that dispatches the Records to other methods after performing a few checks. Besides I use Events and Delegates to communicate between the different parts. I do not unregister those events because they are useful all along the process (I may be wrong on that point)

If you stored them just as local variables (which may or may not be possible depending on your scenario) they will never end up on the heap at all.
If that's possible, I recommend trying that. You might get a more in depth response, if you post a code sample.
As a sanity check, are you forgetting to unregister static events, or using some other class that may be doing this (some classes fix this via Dispose)?
Also have you looked into the possibility of using the Flyweight pattern?
Edit- Since you now say you are doing something with events, this is highly likely to be a cause of your issues. Are you forgetting to unregister the events?

If you are "heavy" on memory use, and you are using C# 4.0, you could try the "server" GC. Merge your app.config (or web.config) with:
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
(with merge I mean that if you already have some of these sections, use them, otherwhise create them. "configuration" is the first level element of the app.config). This is better for some apps (apps that don't need heavy interaction with the user)

They end up as Second Generation because the GC just decided not to garbage collect them. I don't think it is really a problem to worry about. If you really wanted to you could just reuse the 1000 objects instead of continuously creating new ones. This would make it so that these objects wouldn't need to get garbage collected. You could force Garbage collection to ensure the objects "REALLY" get dumped but that would degrade performance as GC is an expensive operation or so I am told.

Related

How do I read a Visual Studio 2019 memory snapshot of my C# service usefully?

I've read the other questions that come up when I ask about memory snapshots, but I might be too thick to really grasp it. I have a windows service that I can produce a memory leak in by doing a pretty straightforward data operation repeatedly. I've taken memory snapshots along the way, and I see that the number of roots is going up (from 2,100 after a successful start to 7,100 after 100 or so data operations). The snapshots were taken at the blue arrow marks:
Before the multiple data operations, the memory snapshot looks like this:
Afterwards, it looks like this:
We're using WCF for data transport and it would appear that Serialization is playing a part in this memory growth, but I don't know where to go from here. If I look at instances of RuntimeType+RuntimeTypeCache, the vast majority of instances look like this:
If anyone can help me figure out the next step to take, I would appreciate it immensely. We have a static instance that has a concurrent dictionary of ServiceHosts that I'm suspicious of, but I don't know how to confirm it.
EDIT:
This also seems significant and is in reference to ServiceHosts. Could we be enabling some unwise proxy generation and instance retention via this static relationship?
Sort your items by Size, and inside that list, watch out for your own class types. Which one is piling up. Have at least a total of a few Megabytes of objects, to be sure to see a real 'Pile' not just some parts of the infrastructure.
The 12.000 existing runtime types, might indicate dynamically created types, maybe the serialization DLL is created for each new call.
You can also try to do GC.Collect() after your critical function call, to enforce Garbage collection.

Producer/Consumer with C# structs?

I have a singleton object that process requests. Each request takes around one millisecond to be completed, usually less. This object is not thread-safe and it expects requests in a particular format, encapsulated in the Request class, and returns the result as Response. This processor has another producer/consumer that sends/receives through a socket.
I implemented the producer/consumer approach to work fast:
Client prepares a RequestCommand command object, that contains a TaskCompletionSource<Response> and the intended Request.
Client add the command to the "request queue" (Queue<>) and awaits command.Completion.Task.
A different thread (and actual background Thread) pulls the command from the "request queue", process the command.Request, generates Response and signals the command as done using command.Completion.SetResult(response).
Client continues working.
But when doing a small memory benchmark I see LOTS of these objects being created and topping the list of most common object in memory. Note that there is no memory leak, the GC can clean everything up nicely each time triggers, but obviously so many objects being created fast, makes Gen 0 very big. I wonder if a better memory usage may yield better performance.
I was considering convert some of these objects to structs to avoid allocations, specially now that there are some new features to work with them C# 7.1. But I do not see a way of doing it.
Value types can be instantiated in the stack, but if they pass from thread to thread, they must be copied to the stackA->heap and heap->stackB I guess. Also when enqueuing in the queue, it goes from stack to heap.
The singleton object is truly asynchronous. There is some in-memory processing, but 90% of the time it needs to call outside and going through the internal producer/consumer.
ValueTask<> does not seem to fit here, because things are asynchronous.
TaskCompletionSource<> has a state, but it is object, so it would be boxed.
The command also jumps from thread to thread.
Reciclying objects only works for the command itself, its content cannot be recycled (TaskCompletionSource<> and a string).
Is there any way I could leverage structs to reduce the memory usage or/and improve the performance? Any other option?
Value types can be instantiated in the stack, but if they pass from thread to thread, they must be copied to the stackA->heap and heap->stackB I guess.
No, that's not at all true. But you have a deeper problem in your thinking here:
Immediately stop thinking of structs as living on the stack. When you make an int array with a million ints, you think those four million bytes of ints live on your one-million-byte stack? Of course not.
The truth is that stack vs heap has nothing whatsoever to do with value types. Instead of "stack and heap", start saying "short term allocation pool" and "long term allocation pool". Variables that have short lifetimes are allocated from the short term allocation pool, regardless of whether that variable contains an int or a reference to an object. Once you start thinking about variable lifetime correctly then your reasoning becomes entirely straightforward. Short-lived things live in the short term pool, obviously.
So: when you pass a struct from one thread to another, does it ever live "on the heap"? The question is nonsensical because values are not things that live on the heap. Variables are things that are storage; variables store value.
So: Is it the case that turning classes into structs will improve performance because "those structs can live on the stack"? No, of course not. The relevant difference between reference types and value types is not where they live but how they are copied. Value types are copied by value, reference types are copied by reference, and reference copies are the fastest copies.
I see LOTS of these objects being created and topping the list of most common object in memory. Note that there is no memory leak, the GC can clean everything up nicely each time triggers, but obviously so many objects being created fast, makes Gen 0 very big. I wonder if a better memory usage may yield better performance.
OK, now we come to the sensible part of your question. This is an excellent observation and it is one which is testable with science. The first thing you should do is to use a profiler to determine what is the actual burden of gen 0 collections on the performance of your application.
It may be that this burden is not the slowest thing in your program and in fact it is irrelevant. In that case, you will now know to concentrate your efforts on the real problem, rather than chasing down memory allocation problems that aren't real problems.
Suppose you discover that gen 0 collections really are killing your performance; what can you do? Is the answer to make more things structs? That can work, but you have to be very careful:
If the structs themselves contain references, you've just pushed the problem off one level, you haven't solved it.
If the structs are larger than reference size -- and of course they almost always are -- then now you are copying them by copying the entire struct rather than copying a reference, and you've traded a GC time problem for a copy time problem. That might be a win, or a loss; use science to find out which it is.
When we were faced with this problem in Roslyn, we thought about it very carefully and did a lot of experiments. The strategy we went with was in general not to move things onto the stack. Rather, we identified how many small, short-lived objects there were active in memory at any one time, of each type -- using a profiler -- and then implemented a pooling strategy on those objects. You need a small object, you take it out of the pool. When you're done, you put it back in the pool. What happens is, you end up with O(number of objects active at any one time) in the pool, which quickly gets moved into the gen 2 heap; you then greatly lower your collection pressure on the gen 0 heap while increasing the cost of comparatively rare gen 2 collections.
I'm not saying that's the best choice for you. I'm saying that we had this same problem in Roslyn, and we solved it with science. You can do the same.

Optimizing collection of long lived objects

Background:
I have a service whose purpose in life is to provide objects to requestors - it basically gets complicated data from a database and transforms it once (a bit like a view over data) to produce a simplified record. This then services requests from other services by providing up to 100k records (depending on the nature of the request) on demand.
The idea is that the complicated transformation is done once and is cached by the service - it works out quicker than letting the database work it out each time a view is accessed and for my purposes works just fine. (I believe this is called SSOS by some)
The way data is being cached is in a list of objects which are property bags for standard .Net types. These objects have no references to anything else.
Periodically a record will change, and the cache must be updated which means that the original record must be located, thrown away and replaced.
Now the record in the cache will have been in there for a long time and will have been marked for a Gen 2 collection; pretty much all the collections will happen in the Gen2 phase as these objects are hanging around for ages (on purpose).
So my understanding of Gen2 collections is that they are slow, and if the collections are mainly working on Gen2 then the optimizer is going to do this more often.
I would like to be able to de-reference an object in the list in a way that doesn't end up triggering a full Gen2 collection... I was thinking that maybe there is a way of marking it as Gen0 and then de-referencing it before replacing it - but I don't think that is possible.
I am constrained to using .Net 4 for this and the application is a service which serves data to up to 100 clients who request full lists or changes to the list over a period of time.
Question: Can anyone suggest a way to de-reference long lived objects in a GC friendly way or perhaps another way to approach this problem?
There is no simple answer to this. If you have lots of long-lived objects, then full collections really can hurt, as I discussed here. Since a picture tells a thousand words:
Those vertical spikes are where garbage collection happens and slaughters the response times.
The way we reduced the impact of this was: don't have a gazillion long-lived objects. What we did was to change the classes to structs, which meant that the only object was the array that contained them. We were fortunate here is that the data was simple and didn't involve strings, which would of course themselves be objects. We also did some crazy fixed-size buffer work to reduce things that were previously collections, and changed what were references to indices (into the array). If you do have to use string data, perhaps try to ensure you don't have 20,000 different string instancs with the same value - some kind of manual interner (a Dictionary<string,string> would suffice) can be really useful there.
Note that this needn't impact your public API, since you can always create the old class data from the struct storage - the difference is that this class will only exist briefly as a DTO - so will be collected cheaply in the next gen-0 sweep.
YMMV, but this worked enough well for us.
The problem is: you need to be really careful when working with structs; I strongly advise making them immutable.

C# best practice for performance: object creation vs reuse

I am currently working on performance tuning of an existing C# website. There is a class say.. MyUtil.cs. This class has been extensively used across all web pages. On some pages around 10/12 instances are created (MyUtil). I ran the "Redgate" performance profiler. The object creation is a costly operation according to RedGate.
Please note that each instance sets specific properties and performs specific operation. So I can not reuse the object as it is. I have to reset all the member variables.
I want to optimize this code. I have thought about following options. Kindly help me evaluate which is the better approach here :
(1) Create a "Reset" method in "MyUtil.cs" class which will reset all the member variables (there are 167 of those :(..) so that I can reuse one single object in an page class.
(2) Continue with the multiple object creation (I do have Dispose() method in "MyUtil")
(3) I thought of "object pooling" but again I will have to reset the members. I think its better to pool objects at page level and release them instead of keeping them live at the project level.
Any reply on this will be appreciated. Thanks in advance..!
Every app has multiple opportunities for speedup of different sizes, like kinds of food on your plate.
Creating and initializing objects can typically be one of these, and can typically be a large one.
Whenever I see that object creation/initialization is taking a large fraction of time, I recycle used objects.
It's a basic technique, and it can make a big difference.
But I only do it if I know that it will save a healthy fraction of time.
Don't just do it on general principles.
I would recommend that you always create new objects instead of resetting the objects. This is so for the following reasons
GC is smart enough to classify objects and assign them a generation. It depends on the usage of objects in your code. Profiling is done based on the execution pattern of code as well as architecture of code
You will be able to get optimum result from hardware if you use the GC and let it manage the process as it also decides the garbage collection thresh hold based on hardware configuration available and available system resources.
Apart from that, your code will be much easier and manageable. - (Though it is not a direct benefit, but should also give at least some weight to it.)
Creating object pool at page level is also not a good idea because to re-use the object, you will have to do two things, fetch it from pool and reset its properties, i.e. you will also have to manage the pool which is additional burden
Creating a single instance and re-using it with resetting properties might also not be a good idea cause when you need more then one instance of object, it will not work.
so the conclusion is, you should keep creating objects in the page and let the Garbage collector do its job.

Variation in execution time

I've been profiling a method using the stopwatch class, which is sub-millisecond accurate. The method runs thousands of times, on multiple threads.
I've discovered that most calls (90%+) take 0.1ms, which is acceptable. Occasionally, however, I find that the method takes several orders of magnitude longer, so that the average time for the call is actually more like 3-4ms.
What could be causing this?
The method itself is run from a delegate, and is essentially an event handler.
There are not many possible execution paths, and I've not yet discovered a path that would be conspicuously complicated.
I'm suspecting garbage collection, but I don't know how to detect whether it has occurred.
Finally, I am also considering whether the logging method itself is causing the problem. (The logger is basically a call to a static class + event listener that writes to the console.)
Just because Stopwatch has a high accuracy doesn't mean that other things can't get in the way - like the OS interrupting that thread to do something else. Garbage collection is another possibility. Writing to the console could easily cause delays like that.
Are you actually interested in individual call times, or is it overall performance which is important? It's generally more useful to run a method thousands of times and look at the total time - that's much more indicative of overall performance than individual calls which can be affected by any number of things on the computer.
As I commented, you really should at least describe what your method does, if you're not willing to post some code (which would be best).
That said, one way you can tell if garbage collection has occurred (from Windows):
Run perfmon (Start->Run->perfmon)
Right-click on the graph; select "Add Counters..."
Under "Performance object", select ".NET CLR Memory"
From there you can select # Gen 0, 1, and 2 collections and click "Add"
Now on the graph you will see a graph of all .NET CLR garbage collections
Just keep this graph open while you run your application
EDIT: If you want to know if a collection occurred during a specific execution, why not do this?
int initialGen0Collections = GC.CollectionCount(0);
int initialGen1Collections = GC.CollectionCount(1);
int initialGen2Collections = GC.CollectionCount(2);
// run your method
if (GC.CollectionCount(0) > initialGen0Collections)
// gen 0 collection occurred
if (GC.CollectionCount(1) > initialGen1Collections)
// gen 1 collection occurred
if (GC.CollectionCount(2) > initialGen2Collections)
// gen 2 collection occurred
SECOND EDIT: A couple of points on how to reduce garbage collections within your method:
You mentioned in a comment that your method adds the object passed in to "a big collection." Depending on the type you use for said big collection, it may be possible to reduce garbage collections. For instance, if you use a List<T>, then there are two possibilities:
a. If you know in advance how many objects you'll be processing, you should set the list's capacity upon construction:
List<T> bigCollection = new List<T>(numObjects);
b. If you don't know how many objects you'll be processing, consider using something like a LinkedList<T> instead of a List<T>. The reason for this is that a List<T> automatically resizes itself whenever a new item is added beyond its current capacity; this results in a leftover array that (eventually) will need to be garbage collected. A LinkedList<T> does not use an array internally (it uses LinkedListNode<T> objects), so it will not result in this garbage collection.
If you are creating objects within your method (i.e., somewhere in your method you have one or more lines like Thing myThing = new Thing();), consider using a resource pool to eliminate the need for constantly constructing objects and thereby allocating more heap memory. If you need to know more about resource pooling, check out the Wikipedia article on Object Pools and the MSDN documentation on the ConcurrentBag<T> class, which includes a sample implementation of an ObjectPool<T>.
That can depend on many things and you really have to figure out which one you are delaing with.
I'm not terribly familiar with what triggers garbage collection and what thread it runs on, but that sounds like a possibility.
My first thought around this is with paging. If this is the first time the method runs and the application needs to page in some code to run the method, it would be waiting on that. Or, it could be the data that you're using within the method that triggered a cache miss and now you have to wait for that.
Maybe you're doing an allocation and the allocator did some extra reshuffling in order to get you the allocation you requested.
Not sure how thread time is calculated with Stopwatch, but a context switch might be what you're seeing.
Or...it could be something completely different...
Basically, it could be one of several things and you really have to look at the code itself to see what is causing your occasional slow-down.
It could well be GC. If you use a profiler application such as Redgate's ANTS profiler you can profile % time in GC along side your application's performance to see what's going on.
In addition, you can use the CLRProfiler...
https://github.com/MicrosoftArchive/clrprofiler
Finally, Windows Performance Monitor will show the % time in GC for a given running applicaiton too.
These tools will help you get a holistic view of what's going on in your app as well as the OS in general.
I'm sure you know this stuff already but microbenchmarking such as this is sometimes useful for determining how fast one line of code might be compared to another than you might write, but you generally want to profile your application under typical load too.
Knowing that a given line of code is 10 times faster than another is useful, but if that line of code is easier to read and not part of a tight loop then the 10x performance hit might not be a problem.
What you need is a performance profile to tell you exactly what causes a slow down. Here is a quick list And of course here is the ANTS profiler.
Without knowing what your operation is doing, it sounds like it could be the garbage collection. However that might not be the only reason. If you are reading or writing to the disc it is possible your application might have to wait while something else is using the disk.
Timing issues may occur if you have a multi-threaded application and another thread could be taking some processor time that is only running 10 % of the time. This is why a profiler would help.
If you're only running the code "thousands" of times on a pretty quick function, the occasional longer time could easily be due to transient events on the system (maybe Windows decided it was time to cache something).
That being said, I would suggest the following:
Run the function many many more times, and take an average.
In the code that uses the function, determine if the function in question actually is a bottleneck. Use a profiler for this.
It can be dependent on your OS, environment, page reads, CPU ticks per second and so on.
The most realistic way is to run an execution path several thousand times and take the average.
However, if that logging class is only called occasionally and it logs to disk, that is quite likely to be a slow-down factor if it has to seek on the drive first.
A read of http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 may give you an insight into more techniques for determining slowdowns in your applications, while a list of profiling tools that you may find useful is at:
http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler
specifically http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler if you're doing c# stuff.
Hope that helps!

Categories