Targetting memory leak in a .NET production service

Targetting memory leak in a .NET production service - c#

I have a C#.NET service running in production. The service functions as a TCP server to which clients register and make requests against. In looking at the Task Manager, it appears to be leaking about 10MB/day. I don't seem to notice these in dev (perhaps because of far less traffic and client activity). In searching around I've read that the Task Manager can be seriously wrong, but I'm not sure how accurate this is or in what circumstances the TM would display incorrect information.
To solve this problem I need to more closely monitor memory consumption. The problem is that the leak only seems to appear in production, where the deployed service was built for Release. Also since it's a service that can't be run directly be VS with an attached profiler/debugging, I'm not sure how to best pinpoint the problem with something more precise than TM.
Any group wisdom would be much appreciated, thanks.
EDIT:
I've added perfmon counters for the privates bytes of the service (7MB to start out) as well as CLR mem in all heaps (30MB to start out)
Task manager says the total memory to be ~37MB so this seems to make sense
The first part of this is to let the service go for a day and check out my counters again.
If my private bytes get huge but CLR mem is roughly static this would indicate an unmanaged leak. If both get huge then it's a managed leak.
Thanks guys.

Your first task is figuring out if the process is leaking memory. You can do this with perfmon measuring the Private Bytes
http://www.goldstarsoftware.com/papers/CapturingVirtualBytesToALogFile.pdf
If the graph is consistently rising (for say half an hour ) you have a memory leak. You can then use other counters to figure out if this is a .NET leak (.NET memory) though this is unlikely. I find that in most of these cases, there is a COM component that is being invoked but not released.
If you truly have a memory leak (and this isn't just variable memory usage)- the process will shutdown with an out of memory exception after running for a while.

You need one of the below MemoryProfilers in order to monitor it;
http://www.jetbrains.com/profiler/
http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/
There are other choices but these are very capable and you can profile remote application's memory with them (at least JetBrains's solution handles that)

Follow this guide: http://blogs.msdn.com/b/tess/archive/2008/03/25/net-debugging-demos-lab-7-memory-leak.aspx
It goes over exactly what you're describing, a memory leak in production. As was mentioned you have to first determine whether it's unmanaged code or managed code that's leaking using perfmon and Private Bytes.
In general make sure for networking objects you're wrapping them in using statements so that they're properly disposed.
A workflow I often use for managed memory leaks is to start the server on a test machine, hit it with a known amount of connections (say 123,456 connections). Then take a memory snapshot by going to task manager and right clicking on the process name and selecting 'create dump'. Open this dump with WinDBG and SOS and run the command !dumpheap -stat. Look for objects that have a multiple of 123,456 instances. Should these objects still be in memory? If not run a !gcroot on an instance of those objects to find why it's still in memory.

Get a dump of the memory when its in a leak state using the Task Manager right click on the process and select create dump file. You can also use ProcDump which gives you more options.
Use SOS Extensions in either WinDebug or Visual Studio to inspect the memory.

Related

When I use Socket.IO, why I got an error An unhandled exception of type 'System.OutOfMemoryException'

I coded a program to get the screen shot and send to the server. Every time, I got a screenshot and turned into base64 then sent it using Socket.IO. (using SocketIOClient.dll)
Dictionary<string, string> image = new Dictionary<string, string>();
image.add("image", "");
private void windowMonitorTimer_Tick(object sender, EventArgs e)
{
image["image"] = windowMonitorManager.MonitorScreen();
client.getSocket().Emit("Shot", image);
}
windowMonitorManager.MonitorScreen() is for return a base64 string. If I do not use client.getSocket().Emit("Shot", image), the program could run correct, but if I add this line, the program stop like 2 seconds(send nearly 80 times) and give me the error :
An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
If I do not send the string as long as this, just a short string "hello", it sends 1600 times then occurs the same problem.
Somebody knows how to debug this problem?
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
I try to test socket.Emit(), and find it has its limit.
For example, I send a string of 10000000, after 88 times, it occurs the out of memory problem.
If I send a string of 5000000, after 170 times, it occurs the same problem.

Out of memory is mostly an exception thrown when the process consumes much larger memory than what is allowed by default system (like 2 GB on a 32 bit system), on a 64 bit system, it is higher, but is still bound by the certain practical limit, it's not the theory value of 2^64, it differs from OS to OS and is also dependent on underlying RAM, but is large enough for a single process, now this situation can happens due to multiple reasons:
Memory leak (most prominent), mostly associated with the unmanaged code calls, if there's a handle or memory allocation that is not de-allocated or freed, over a period of time it leads to huge memory allocation for a process and thus the exception, when system cannot map any more.
Managed code can too leak and I have done that, when objects gets continuously created and they are not de-referenced, i.e they are still reachable in GC context, so you can lead to this scenario, I have done this in my code :)
This is not a null reference or a corruption, so taking direct stack trace will be of little use in this scenario, simply because you may get a different stack every time, it will be like the stack of the process threads, when exception happens and it would be mostly misleading, so do not try that way. Executing method of a thread doesn't mean it caused memory leak and it will be different for different threads.
How to debug:
Number of simple steps can be taken to narrow down, but before anything ensure that you have the debug version with valid pdb files for all the loaded binaries of the process.
To know whether is it a leak, monitor process "working set", "virtual bytes" counters either via task manager or preferably via perfmon, since it is much more accurate, also it provides visual graph.
Now a leak is a leak, so steps like increasing address space in a 32 bit system to 3 GB in place of default 2 GB can only help for sometime, but perfmon will tell you if there's a stabilization point, like in few cases, process needs 2.2 GB of memory, so 2 GB by default is not enough but 3 GB in boot.config and UserVA setting for fine tuning 3 GB will help avoiding exception.
If you are using 64 bit system then that's not a worry point, but ensure that your binary is compiled for X64 or any CPU, a 32 bit binary will run as a WOW process and will have limitation on 64 bit system too.
Also try a small utility handles from sysinternals, running it mutliple times in a batch for a process will provide details of a leaking handle, in terms of number of handles like file, mutex etc allocated
Once you have confirmed a genuine leak, not a settings or configuration or system issue, then comes the memory profilers. In free ones, you get lot of information through free tools like windbg, umdh and leakdiag, they infact point you to exact stack trace which is leaking. umdh and leakdiag are both very good tools, they let you know the leaking function. Leakdiag is more exhaustive than UMDH, but for runtime heap UMDH is good enough
Professional memory profilers like:
Dot Memory - http://www.jetbrains.com/dotmemory/
Ants - Red Gate - http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/
are also very good, i personally find Dot memory much more helpful and it helps in quickly pointing to the leaking function or type, with little effort, provided you have correct symbol files. Both have free download version available
Mostly resolving out of memory exception is a gradual and iterative process, because this one could be hidden deep inside, leaking a chunk memory with every execution and bringing the complete process to knees. Let me know if you need help in using a specific tool, then we can see what more can be done to debug the issue further. Happy Debugging

Sounds to me like there's a bug in the SocketIOClient DLL. Without the DLL I cannot reproduce the problem, but tracking it down sounds easy.
Since C# is a garbage collected language, the only way to get an Out Of Memory (OOM) is if you have too much memory allocated that cannot be tracked down to a 'root object'. There are several kinds of root objects:
Static variables (or threadstatic)
Methods variables (locals / arguments) in your stack trace
All objects that you reference from these two (direct / indirect) will contribute to your memory pressure. If you allocate memory that's not available, the GC will first attempt to free memory before throwing an OOM; if not enough memory is free after the GC completed, an OOM will be thrown.
One obvious reason why this might happen is because you're running a 32-bit process, which is the default in Visual Studio nowadays. This can be fixed in the project properties. However, most processes don't need more than 2 GB of memory, so it's more likely that you 'leaked' memory somewhere. So let's break it down:
Locals or arguments that leak
Ways to solve these kind of OOM:
Open visual studio, ctrl d,e (or debug -> exceptions)
Click OutOfMemoryException -> check the 'throw' box
Run the program.
When the out of memory (OOM) hits, you browse through the nodes in the stack trace (or 'parallel stacks') and check the size of the variables. In most cases it's a single buffer or collection that causes the problem. F.ex. in your case a buffer might fill up with socket data, which is never emptied.
Static variables that leak
Other cases of OOM are usually buffers that granually fill up and have a reference to the main tree. The easiest way to find these is to use a memory profiler, like Red Gate / ANTS memory profiler. Run your program in the profiler, take some snapshots and check for 'large instances'.
In general, I usually try to avoid static variables alltogether, which solves a whole world of problems.
Oh and in this case...
Perhaps it's worth noting that there are a lot of good socket libraries out there... even though I don't know the specifics of SocketIOClient, you might want to consider using a widely supported, proven socket library like WCF/SOAP or Protobuf. There's a lot of material online on how to use these in just about any scenario, so if the problem is in SocketIOClient you might want to consider that...

I would guess you are running your timer too frequently and eating memory to an unsustainable point. Did you try lowering its frequency?
If lowering the frequency doesn't help, your code or the SocketIOClient.dll library may be leaking memory. I suggest that first you review the usage of that library to verify that you are not leaving resources open.

.NET 4.5 Memory Leak

I have a problem with an application that I wrote in .NET/C#. It consists of a server which manages a few other machines, and runs tests on them. It is a windows forms application. In order to run tests with proper error handling, for each machine I have two threads: one for running tests and one that pings it continuously. Each machine has a running queue, in which tasks are stored, tasks that will be run on that particular machine.
The issue is that after some time, when more than a few tasks are present in the queue, the memory it consumes(process explorer, task manager) gradually increases from about 50-100MB to 1.6-1.8 GB. At about this limit almost every transaction(file copy on share, remote WMI access) with the remote machines fails with either "Not enough storage" or "Out of memory". I tried some tools in order to localize the string and the closest I got was .Net Memory Profiler. That wasn't of great help, because the largest amount of memory was residing in "Private Data - Unidentified". This I'm guessing it's unmanaged data, because I can evaluate every other data(from my program) down to each string and int, and every instance of it.
Can anyone suggest a tool I can use in order to properly localize the leak and maybe fix it. It would help me a lot if I would know the DLL(from my app)/Thread that uses that memory, or at least if I can view somehow what is in that memory.
Note: A lot of posts are out there about the two exceptions: Not enough storage, and Out of memory. Most of them suggest increasing the IRPStackSize on the 'server' machine(in my case, clients). I have IRPStackSize of 50(0x32) on all of the machines, including the server.
EDIT
Regarding the comments: yes, I do maintain a log, but nothing strange happens. Using a memory profiler I discovered that my application, the .NET side uses about 20MB of memory when the unmanaged part is well over 1GB. With the help of WinDbg I found out what resides in that extra memory(in most of it). In order to access the machines and run different tests on them I use WMI, for which I have a wrapper. Everything I use is being disposed(using statements, and for some actually calling the Dispose method. Strangely though, the memory is filled with clones of this class. Does anyone know why a class would clone itself in memory.
Note: the rate at which the memory usage increases in about 5MB/s, so it's not really over a long period of time. I also wonder why it is not being freed by the garbage collector. I am using C# classes to work with WMI, not COM, nor unmanaged code. Also, among the objects on the heap I see a lot of data belonging to wmiutils, CWbemError. Oddly enough, google doesn't even know the word(no results for CWbemError)

C# .NET memory leak: sawtooth memory usage when GC stage#1 and stage#2 run

I have an event driven app that I was tasked with maintaining.
About 100 events run every 30 seconds, on separate timers. Over time the events alias into a constant stream of about 1-3 events per second.
Memory usage does not appear dependent on the number of events firing in any given second.
Each event polls data from a Webservice, checks the data using a LINQ2SQL DataContext against the previously polled data (I do not dispose or null out the DataContext when done), and if the data is different, updates the database and pushes the new data as an XML message to receiver service via TCP.
This app appears to have a memory leak which
only manifests after 30m+ of running (either debug or release)
won't manifest when profiling [I'm using .NET Memory Profiler 4.5]
Characteristics:
On startup the program uses ~30MB. As time progresses this Memory usage in Task Manager will begin pogoing, first only slightly, between 50 and 150MB, and eventually gets worse, oscillating between 200MB and 1GB+. When this happens, it happens a few times within a second or two, then settles down at ~150MB for the next 10-20 or so seconds.
I've been trying to catch this behavior in action using memory profiling. So far I've been unsuccessful, I can't get the app to pogo or oscillate in memory usage anywhere near like I can when the profiler isn't watching.
However, I've been noticing a square-wave sort of pattern on the memory usage as the Garbage Collector stages 1 and 2 run that looks very similar to what I see in Task Manager, except the memory usage oscillations in the square-wave are 10MB wide, instead of 800MB+ (200MB to 1GB+). Now, according to Google Images, Garbage collection in a properly functioning app looks more like a sawtooth wave than square.
I frankly don't see any way that my app could be pogoing between 200MB and 1GB+ of memory usage within a second and NOT be spiking the CPU to 100%.
I have read about some problems that can manifest between garbage collection + event handling, but I have several paths I could go investigate and am trying to narrow down which one to spend time on. I'm still pretty slow at .NET and haven't developed the "intuition" I have for embedded devices running C that generally helps me filter what I should investigate first.
What if FEELS like is perhaps some event handlers are losing and re-gaining references to [massive amounts of data] (I don't know how this could even happen?) seeing as memory usage appears to spike back up to 1GB soon after the garbage collector runs and drops memory usage back to 200MB.
Previous versions of this app did not have these problems. Two changes I have made since then include
utilizing LINQ2SQL instead of our own data manager (which had an ADORecordSetHelper object we utilized to execute hardcoded SQL statements)
changing the piece of software we use to send the TCP XML messages to a receiver.
Due to the simplicity of the what we're doing in #2, it COULD be the source of the problem but this memory usage behavior makes me think otherwise.
I guess my main questions at this point are
Should I be calling dispose on my LINQ2SQL DataContexts before I return from the method I create them in?
Should I null them out instead?
if an exception were occuring somewhere in a method after creation of a DataContext, could it cause the DataContext to be kept in memory indefinitely?
if I store a result from a LINQ query to a value-type (ie int not var), is it lazy-loaded then, or lazy'd when the variable is used?
how possible is it for event-driven frameworks to hypothetically lose and regain references?
edit: the events have instance-based subscriptions like discussed here and are never unsubscribed for the life of the app.
edit2: finally managed to catch it in the profiler, appears to be a 200MB system.string that's being created somehow. Thanks everyone for ruling out GC behavior.

Most of the times, memory leaks are caused by weird references between objects (events and delegates are also included here).
What I think you could try is the following:
Run the application and reproduce the issue. When the private working set of memory hits a very high value, right click the process on task manager and select "Create Dump File". This will be a lot less intrusive than profiling the application live.
Download WinDBG and run it.
Open the memory dump by going to the File menu and selecting Open dump file (I cannot remember exactly what the name of the menu options is... should be easy to spot though).
Run the following commands:
.symfix
.loadby sos clr
!dumpheap -type [YourAssemblyNameSpacePrefix] -stat
The last command will give you all the instances in memory which are not CLR types, only your types. Look at the types which have a very high number of instances and try to see if anything doesn't look right.
If you see a very high number of objects of the same type run the following command which will show you all instances' addresses:
!dumheap -type [TheFullObjectTypeName]
You will need to select one single instance address. Now run the following command to see the references to that instance:
!gcroot [InstanceAddress]
Repeat step 6 a few times for different instances so that you can confirm the leak is coming from the same place or to help you identify what is causing those instances to not be collected (still being referenced by other objects).
If you don't see anything weird with your own types, change the !dumpheap command in step 4 to: !dumpheap -stat. This way you are not filtering by type and you will also see CLR types and third party libraries types.
This is a little bit complex but hopefully I was able to give you a method to help you figure out how to find memory leaks.

.net memory measuring and profiling

I understand there are many questions related to this, so I'll be very specific.
I create Console application with two instructions. Create a List with some large capacity and fill it with sample data, and then clear that List or make it equal to null.
What I want to know is if there is a way for me to know/measure/profile while debugging or not, if the actual memory used by the application after the list was cleared and null-ed is about the same as before the list was created and populated. I know for sure that the application has disposed of the information and the GC has finished collecting, but can I know for sure how much memory my application would consume after this?
I understand that during the process of filling the list, a lot of memory is allocated and after it's been cleared that memory may become available to other process if it needs it, but is it possible to measure the real memory consumed by the application at the end?
Thanks
Edit: OK, here is my real scenario and objective. I work on a WPF application that works with large amounts of data read through USB device. At some point, the application allocates about 700+ MB of memory to store all the List data, which it parses, analyzes and then writes to the filesystem. When I write the data to the filesystem, I clear all the Lists and dispose all collections that previously held the large data, so I can do another data processing. I want to know that I won't run into performance issues or eventually use up all memory. I'm fine with my program using a lot of memory, but I'm not fine with it using it all after few USB processings.
How can I go around controlling this? Are memory or process profilers used in case like this? Simply using Task Manager, I see my application taking up 800 MB of memory, but after I clear the collections, the memory stays the same. I understand this won't go down unless windows needs it, so I was wondering if I can know for sure that the memory is cleared and free to be used (by my application or windows)

It is very hard to measure "real memory" usage on Windows if you mean physical memory. Most likley you want something else like:
Amount of memory allocated for the process (see Zooba's answer)
Amount of Managed memory allocated - CLR Profiler, or any other profiler listed in this one - Best .NET memory and performance profiler?
What Task Manager reports for your application
Note that it is not necessary that after garbage collection is finished amount of memory allocated for your process (1) changes - GC may keep allocated memory for future managed allocations (this behavior is not specific to CLR for memory allcation - most memory allocators keep free blocks for later usage unless forced to release it by some means). The http://blogs.msdn.com/b/maoni/ blog is excelent source for details on GC/memory.

Process Explorer will give you all the information you need. Specifically, you will probably be most interested in the "private bytes history" graph for your process.
Alternatively, it is possible to use Window's Performance Monitor to track your specific application. This should give identical information to Process Explorer, though it will let you write the actual numbers out to a separate file.
(A picture because I can...)

I personaly use SciTech Memory Profiler
It has a real time option that you can use to see your memory usage. It has help me find a number of problems with leaking memory.

Try ANTS Profiler. Its not free but you can try the trial version.
http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/

System.OutOfMemory being thrown. How to find the culprit?

I am using Visual C# Express 2008 and I have an application that starts up on a form, but uses a thread with a delegated display function to take care of essentially all the processing. That way my form doesn't lock up while tasks are being processed.
Semi-recently, after going through a repeated process a number of times (the program processes incoming data, so when data comes in, the process repeats) my app will crash with a System.OutOfMemory error.
The stack trace in the error message is useless because it only directs me to the the line where I call the delegated form control function.
I've heard people say they use ProcMon from SysInternals to see why errors like this happen. But I, for the life of me, can't figure it out. The amount of memory I am using doesn't change as the program runs, if it goes up, it comes back down. Plus, even if it was going up, how do I figure out which part of my program is the problem?
How can I go about investigating this problem?
EDIT:
So, after delving further into this issue, I looked through anything that I was ever re-declaring. There were a few instances where I had hugematrix = new uint[gigantic], so I got rid of about 3 of those.
Instead of getting rid of the error, it is now far more obscured and confusing.
My application takes the incoming data, and renders it using OpenGL. Now, instead of throwing "System.OutOfMemory" it simply does not render anything with OpenGL.
The only difference in my code is that I do not make new matrices for holding the data I plot. That way, I hope, my array stays in the same place in memory and doesn't do anything suicidal to my LOH.
Unfortunately, this twists the beast far beyond my meager means. With zero errors popping up, and all my data structures apparently still properly filled, how can I find my problem? Does OpenGL use memory in an obscure way so as to not throw exceptions when it fails? Is memory still a problem? How do I find out? All the memory profilers in the world seem to tell me very little.
EDIT:
With the boatloads of support from this community (with extra kudos to Amissico) the error has finally been rooted out. Apparently I was adding items to an OpenGL list, and never taking them off the list.
The app that finally clued me in was .Net Memory Profiler. At the time of crash it showed 1.5GB of data in the <unknown> category. Through process of elimination (everything else in the list that was named), the last thing to be checked off the list was the OpenGL rendering pipleline. The rest is history.

Based on the description in your comments, I would suspect that you are either not disposing of your images correctly or that you have severe Large Object Heap fragmentation and, when trying to allocate for a new image, don't have enough contiguous space available. See this question for more info - Large Object Heap Fragmentation

You need to use a memory profiler, such as the ants memory profiler to find out what causes this error.
Are you re-registering an event handler on every loop and not un-registering it?

CLR Profiler for the .NET Framework 2.0 at https://github.com/MicrosoftArchive/clrprofiler
The most common cause of memory fragmentation is excessive string creation.

Following considerations:
Make sure that threads you spawn are destroyed (aborted or function return). Too much threads can fail application, although in Task Manager used memory is not too high
Memory leaks. Yes, yes, you can cause them in .net pretty well without setting reference to nulls. This can be solved by using memory profilers like dotTrace or ANTS Memory Profiler

I had an OutOfMemoryException-problem as well:
Microsoft Visual C# 2008 Reducing number of loaded dlls
The reason was fragmentation of 2GB GB virtual address space and poster nobugz suggested Sysinternal's Vmmap utility which has been very helpful for diagnostics. You can use it to check if your free memory areas become more fragmented over time. (First sort by size then by type -> refresh repeat sorting and you can see if contiguous free memory blocks become smaller)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.