Debugging a memory leak in a long-running mono/C# application

Debugging a memory leak in a long-running mono/C# application - c#

I have a C# program that runs on an Ubuntu VM as a server using mono 5.10. Every now and then, the program starts using lots of memory until it eventually crashes. It can often take days before such a memory leak occurs, and I haven't been able to reproduce the issue locally.
For the past few days, I've used the mono log profiler to be able to collect heapshots on demand. This has given me a 15Gb+ file. I managed to get a heapshot from when the memory leak occurs, which displays as follows in the Xamarin profiler:
Running mprof-report gives a similarly unhelpful report.
Neither really help debug the problem. Obviously there's something going on with the threadpool, but it seems like there is no way of figuring out what.
Being able to see where the objects are allocated might help, but that requires enabling alloc in the profiler, which is not possible because of the file size it will produce.
What is causing the massive amount of IThreadPoolWorkItems? The app is using 30-40% CPU, so it doesn't seem like tasks are being scheduled more quickly than the CPU can handle.
Is there a way to list the objects referenced by the ThreadPool jobs? That would at least allow me to identify what piece of code is being run so much. mprof-report only shows inverse references as far as I can see, which isn't very helpful as the IThreadPoolWorkItems are obviously owned by the ThreadPool itself.
Update: The tasks aren't (disk) IO bound. While leaking, the reads are in the 10s of kbs per second, which shouldn't be saturating the disk.
On second thought: if there were a lot of tasks scheduled, shouldn't I be seeing lots of individual IThreadPoolWorkItem objects too? Since IThreadPoolWorkItem isn't a struct, there should be a separate entry for the actual object and IThreadPoolWorkItem[] would just be an array of pointers. So this would suggest that those IThreadPoolWorkItem[]s are just arrays of (mostly) null pointers.
Update: ThreadPool.GetAvailableThreads shows 2 worker threads and 0 completion port threads are active, which seems to indicate that the ThreadPool is doing just fine. Custom logging also indicates that once again, I'm not scheduling too many tasks and the tasks that are scheduled are finished quickly.

Related

Webjob getting 'System.OutOfMemoryException'

I have a webjob running on Azure on their S3 standard platform meaning it has 7 gb of ram available to run my app.
On the machine 3 jobs are running of which one is the one doing all the processing and the other two handles small tasks. My problem lies in the fact that I on certain memoryintensive large tasks gets a memory exception meaning that results in the given job crashing.
The job I try to run is a very memory intensive job, and requires around 1,5 gb of ram, but based on the graph below I do not understand how this should be a problem since I never am above 2.2gb of used ram for the app service. I do have to add, that I run 3 instances, so it might be that one instance is using way more memory, but I can not find anywhere to view that information.
Memory consumption on server
When I look in the process explorer in the Kudo I see I use around 1.3gb of ram currently, which is still way below the needed memory for the job.
Kudo screenshot
The job has run without any problems no more than 2 days ago on the same server setup, so I am completely lost as to where to look.
Update: Code works fine in visual studio with same data running the same exact task.
Do anyone have ideas as to how to approach this problem

Per my understanding, you could capture the memory dump in your azure app service and analyze the dump to narrow this issue. You could refer to this tutorial about how to get a full memory dump in Azure App Services. Also, you could leverage the Crash Diagnoser extension to monitor the CPU and Memory, for more details you could refer to this blog.

Well well well first of all how do you deal with Garbage collector ?! i mean do you dispose disposable objects after they will do their tasks. You said app is being run for 2 days , seems app has clashed with more intensive memory loading stuff. As you said your app is "very memory intensive" , i guess you should fetch it out (source code) and make sure you are managing objects correctly, because garbage collector couldn't care all of your "source code mess". Good luck.

.NET 4.5 Memory Leak

I have a problem with an application that I wrote in .NET/C#. It consists of a server which manages a few other machines, and runs tests on them. It is a windows forms application. In order to run tests with proper error handling, for each machine I have two threads: one for running tests and one that pings it continuously. Each machine has a running queue, in which tasks are stored, tasks that will be run on that particular machine.
The issue is that after some time, when more than a few tasks are present in the queue, the memory it consumes(process explorer, task manager) gradually increases from about 50-100MB to 1.6-1.8 GB. At about this limit almost every transaction(file copy on share, remote WMI access) with the remote machines fails with either "Not enough storage" or "Out of memory". I tried some tools in order to localize the string and the closest I got was .Net Memory Profiler. That wasn't of great help, because the largest amount of memory was residing in "Private Data - Unidentified". This I'm guessing it's unmanaged data, because I can evaluate every other data(from my program) down to each string and int, and every instance of it.
Can anyone suggest a tool I can use in order to properly localize the leak and maybe fix it. It would help me a lot if I would know the DLL(from my app)/Thread that uses that memory, or at least if I can view somehow what is in that memory.
Note: A lot of posts are out there about the two exceptions: Not enough storage, and Out of memory. Most of them suggest increasing the IRPStackSize on the 'server' machine(in my case, clients). I have IRPStackSize of 50(0x32) on all of the machines, including the server.
EDIT
Regarding the comments: yes, I do maintain a log, but nothing strange happens. Using a memory profiler I discovered that my application, the .NET side uses about 20MB of memory when the unmanaged part is well over 1GB. With the help of WinDbg I found out what resides in that extra memory(in most of it). In order to access the machines and run different tests on them I use WMI, for which I have a wrapper. Everything I use is being disposed(using statements, and for some actually calling the Dispose method. Strangely though, the memory is filled with clones of this class. Does anyone know why a class would clone itself in memory.
Note: the rate at which the memory usage increases in about 5MB/s, so it's not really over a long period of time. I also wonder why it is not being freed by the garbage collector. I am using C# classes to work with WMI, not COM, nor unmanaged code. Also, among the objects on the heap I see a lot of data belonging to wmiutils, CWbemError. Oddly enough, google doesn't even know the word(no results for CWbemError)

C# .NET memory leak: sawtooth memory usage when GC stage#1 and stage#2 run

I have an event driven app that I was tasked with maintaining.
About 100 events run every 30 seconds, on separate timers. Over time the events alias into a constant stream of about 1-3 events per second.
Memory usage does not appear dependent on the number of events firing in any given second.
Each event polls data from a Webservice, checks the data using a LINQ2SQL DataContext against the previously polled data (I do not dispose or null out the DataContext when done), and if the data is different, updates the database and pushes the new data as an XML message to receiver service via TCP.
This app appears to have a memory leak which
only manifests after 30m+ of running (either debug or release)
won't manifest when profiling [I'm using .NET Memory Profiler 4.5]
Characteristics:
On startup the program uses ~30MB. As time progresses this Memory usage in Task Manager will begin pogoing, first only slightly, between 50 and 150MB, and eventually gets worse, oscillating between 200MB and 1GB+. When this happens, it happens a few times within a second or two, then settles down at ~150MB for the next 10-20 or so seconds.
I've been trying to catch this behavior in action using memory profiling. So far I've been unsuccessful, I can't get the app to pogo or oscillate in memory usage anywhere near like I can when the profiler isn't watching.
However, I've been noticing a square-wave sort of pattern on the memory usage as the Garbage Collector stages 1 and 2 run that looks very similar to what I see in Task Manager, except the memory usage oscillations in the square-wave are 10MB wide, instead of 800MB+ (200MB to 1GB+). Now, according to Google Images, Garbage collection in a properly functioning app looks more like a sawtooth wave than square.
I frankly don't see any way that my app could be pogoing between 200MB and 1GB+ of memory usage within a second and NOT be spiking the CPU to 100%.
I have read about some problems that can manifest between garbage collection + event handling, but I have several paths I could go investigate and am trying to narrow down which one to spend time on. I'm still pretty slow at .NET and haven't developed the "intuition" I have for embedded devices running C that generally helps me filter what I should investigate first.
What if FEELS like is perhaps some event handlers are losing and re-gaining references to [massive amounts of data] (I don't know how this could even happen?) seeing as memory usage appears to spike back up to 1GB soon after the garbage collector runs and drops memory usage back to 200MB.
Previous versions of this app did not have these problems. Two changes I have made since then include
utilizing LINQ2SQL instead of our own data manager (which had an ADORecordSetHelper object we utilized to execute hardcoded SQL statements)
changing the piece of software we use to send the TCP XML messages to a receiver.
Due to the simplicity of the what we're doing in #2, it COULD be the source of the problem but this memory usage behavior makes me think otherwise.
I guess my main questions at this point are
Should I be calling dispose on my LINQ2SQL DataContexts before I return from the method I create them in?
Should I null them out instead?
if an exception were occuring somewhere in a method after creation of a DataContext, could it cause the DataContext to be kept in memory indefinitely?
if I store a result from a LINQ query to a value-type (ie int not var), is it lazy-loaded then, or lazy'd when the variable is used?
how possible is it for event-driven frameworks to hypothetically lose and regain references?
edit: the events have instance-based subscriptions like discussed here and are never unsubscribed for the life of the app.
edit2: finally managed to catch it in the profiler, appears to be a 200MB system.string that's being created somehow. Thanks everyone for ruling out GC behavior.

Most of the times, memory leaks are caused by weird references between objects (events and delegates are also included here).
What I think you could try is the following:
Run the application and reproduce the issue. When the private working set of memory hits a very high value, right click the process on task manager and select "Create Dump File". This will be a lot less intrusive than profiling the application live.
Download WinDBG and run it.
Open the memory dump by going to the File menu and selecting Open dump file (I cannot remember exactly what the name of the menu options is... should be easy to spot though).
Run the following commands:
.symfix
.loadby sos clr
!dumpheap -type [YourAssemblyNameSpacePrefix] -stat
The last command will give you all the instances in memory which are not CLR types, only your types. Look at the types which have a very high number of instances and try to see if anything doesn't look right.
If you see a very high number of objects of the same type run the following command which will show you all instances' addresses:
!dumheap -type [TheFullObjectTypeName]
You will need to select one single instance address. Now run the following command to see the references to that instance:
!gcroot [InstanceAddress]
Repeat step 6 a few times for different instances so that you can confirm the leak is coming from the same place or to help you identify what is causing those instances to not be collected (still being referenced by other objects).
If you don't see anything weird with your own types, change the !dumpheap command in step 4 to: !dumpheap -stat. This way you are not filtering by type and you will also see CLR types and third party libraries types.
This is a little bit complex but hopefully I was able to give you a method to help you figure out how to find memory leaks.

C# Thread not releasing memory

I have a windows service written in C#.Net. When the service is started, I spawn a new thread as shown below
new Thread(new ThreadStart(Function1)).Start();
This thread loops infinitely and performs the duties expected of my service. Once a day, I need to simultaneously perform a different operation for which my thread spawns a second thread as show below
new Thread(new ThreadStart(Function2)).Start();
This second thread performs a very simple function. It reads all the lines of a text file using FileReadAllLines , quickly processes this information and exits.
My problem is that the memory used by the second thread which reads the file is not getting collected. I let my service run for 3 hours hoping that the GC would be called but nothing happened and task manager still shows that my service is using 150mb of memory. The function to read and process the text file is a very simple one and I am sure there are no hidden references to the string array containing the text. Could someone shed some light on why this is happening? Could it be possible that a thread spawned by another spawned thread cannot cleanup after itself?
Thanks

Trust the garbage collector and stop worrying. 150 megs is nothing. You aren't even measuring the size of the file in that; most of that will be code.
If you are concerned about where memory is going, start by understanding how memory works in a modern operating system. You need to understand the difference between virtual and physical memory, the difference between committed and allocated memory, and all of that, before you start throwing around numbers like "150 megs of allocated memory". Remember, you have 2000 megs of virtual address space in a 32 bit process; I would not think that a 150 meg process is large by any means.
Like Jon says, what you want to be concerned about is a slow steady rise in private bytes. If that's not happening, then you don't have a memory leak. Let the garbage collector do its job and don't worry about it.
If you are still worried about it good heavens do not use task manager. Get a memory profiler and learn how to use it. Task manager is for inspecting processes by looking down on them from 30000 feet. You need to be using a microscope, not a telescope, to analyze how the process is freeing the bytes of a single file.

If you're using Windows task manager to try to work out the memory used, it's likely to be deceiving you. Memory used by the CLR isn't generally returned to the operating system as far as I'm aware... so you'll potentially still see a high working set, even though most of that memory is then still available to be reused within the process.
If you let the service run for a week, do you see the memory usage climb steadily through the week, or does it just increase in the first day, and then plateau? If so, do you definitely view this as a problem? If so, you may need to put your second task in a separate process.

ASP.NET/WCF: Why are logical/physical threads dropping on full GC cycle?

recently I record several hours of .NET Memory counters of a WCF service. The service is hosted in IIS on a Win2k8, 8 core, x64 with 20GB ram.
I can see the GC being pretty healthy, performing a full collection only approx. every 2hrs!
I noticed that in the very same time period, the number of physical and logical threads increases. When the full collection occurs, the number of physical and logical threads drops back and continues to raise again to the same level.
Why does the GC full collection cycle and the drop in threads correlate?
Why is the number of threads continuously increasing?
This is pure ASP.NET/WCF threading model. No custom threads being spawned etc.
Thanks,
Alex

It looks like the answer was indeed pretty simple: Threads get created and die as the thread pool is grown and shrinked over time. Normally, those threads have a relatively long life-time which means they end up in generation 2. This is why they naturally only cleaned up on a full GC cycle.
The interesting bit is that a very well memory managed application can end up with loads of dead threads which I was able to observe in WinDbg. This is because a Full GC might only happen after several hours or even days! Some MS blog said that those dead threads are not a really nice thing to have as the corresponding unmanaged C++ thread object seems to be quite resource intensive (cannot find the damn blog anymore unfortunatly....). I am wondering if this is one of the rare cases where an induced Full GC would make sense in a certain time interval.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.