I have a webjob running on Azure on their S3 standard platform meaning it has 7 gb of ram available to run my app.
On the machine 3 jobs are running of which one is the one doing all the processing and the other two handles small tasks. My problem lies in the fact that I on certain memoryintensive large tasks gets a memory exception meaning that results in the given job crashing.
The job I try to run is a very memory intensive job, and requires around 1,5 gb of ram, but based on the graph below I do not understand how this should be a problem since I never am above 2.2gb of used ram for the app service. I do have to add, that I run 3 instances, so it might be that one instance is using way more memory, but I can not find anywhere to view that information.
Memory consumption on server
When I look in the process explorer in the Kudo I see I use around 1.3gb of ram currently, which is still way below the needed memory for the job.
Kudo screenshot
The job has run without any problems no more than 2 days ago on the same server setup, so I am completely lost as to where to look.
Update: Code works fine in visual studio with same data running the same exact task.
Do anyone have ideas as to how to approach this problem
Per my understanding, you could capture the memory dump in your azure app service and analyze the dump to narrow this issue. You could refer to this tutorial about how to get a full memory dump in Azure App Services. Also, you could leverage the Crash Diagnoser extension to monitor the CPU and Memory, for more details you could refer to this blog.
Well well well first of all how do you deal with Garbage collector ?! i mean do you dispose disposable objects after they will do their tasks. You said app is being run for 2 days , seems app has clashed with more intensive memory loading stuff. As you said your app is "very memory intensive" , i guess you should fetch it out (source code) and make sure you are managing objects correctly, because garbage collector couldn't care all of your "source code mess". Good luck.
Related
I have a C# program that runs on an Ubuntu VM as a server using mono 5.10. Every now and then, the program starts using lots of memory until it eventually crashes. It can often take days before such a memory leak occurs, and I haven't been able to reproduce the issue locally.
For the past few days, I've used the mono log profiler to be able to collect heapshots on demand. This has given me a 15Gb+ file. I managed to get a heapshot from when the memory leak occurs, which displays as follows in the Xamarin profiler:
Running mprof-report gives a similarly unhelpful report.
Neither really help debug the problem. Obviously there's something going on with the threadpool, but it seems like there is no way of figuring out what.
Being able to see where the objects are allocated might help, but that requires enabling alloc in the profiler, which is not possible because of the file size it will produce.
What is causing the massive amount of IThreadPoolWorkItems? The app is using 30-40% CPU, so it doesn't seem like tasks are being scheduled more quickly than the CPU can handle.
Is there a way to list the objects referenced by the ThreadPool jobs? That would at least allow me to identify what piece of code is being run so much. mprof-report only shows inverse references as far as I can see, which isn't very helpful as the IThreadPoolWorkItems are obviously owned by the ThreadPool itself.
Update: The tasks aren't (disk) IO bound. While leaking, the reads are in the 10s of kbs per second, which shouldn't be saturating the disk.
On second thought: if there were a lot of tasks scheduled, shouldn't I be seeing lots of individual IThreadPoolWorkItem objects too? Since IThreadPoolWorkItem isn't a struct, there should be a separate entry for the actual object and IThreadPoolWorkItem[] would just be an array of pointers. So this would suggest that those IThreadPoolWorkItem[]s are just arrays of (mostly) null pointers.
Update: ThreadPool.GetAvailableThreads shows 2 worker threads and 0 completion port threads are active, which seems to indicate that the ThreadPool is doing just fine. Custom logging also indicates that once again, I'm not scheduling too many tasks and the tasks that are scheduled are finished quickly.
I am working to diagnose a series of OutOfMemoryException problems within an application of ours. This is an internal 32-bit (x86) OWIN-hosted WebAPI that runs within a console application and talks to a series of hardware components in parallel. For that reason it's creating around 20 instances of a library, and the sharp increase in "virtual size" memory matches when those instances are created.
From the output of Process Explorer, and dotMemory, it does not appear that we're allocating that much actual memory within this application:
From reading many, many SO answers I think I understand that our problem is either from fragmentation within the G0, G1, G2 & LOH heaps, or we're possibly bumping into the 2GB addressable memory limit for a 32-bit process running on Windows 7. This application works in batches where it collects a bunch of data from hardware devices, creates collections in memory to aggregate that data into a single object, and then saves it to be retrieved by a client app. This activity is the cause of the spikes in the dotMemory visual, but these data structures are not enormous, which I think the dotMemory chart shows.
Looking at the heaps has shown they rarely grow beyond 10-15MB in size, and I don't see much evidence that the LOH is growing too large or being severely fragmented. I'm really struggling with how to proceed to better understand what's happening here.
So my question is two-fold:
Is it conceivable that we could be hitting that 2GB limit for virtual memory, and that's a cause for these memory exceptions?
If that is a possible cause then am I right in thinking a 64-bit build would get around that?
We are exploring moving to a 64-bit build, but that would require updating some low-level libraries we use to also be 64-bit. It's certainly an option we will explore eventually (if not sooner), but we're trying to understand this situation better before investing the time required.
Update after setting the LARGEADDRESSFLAG
Based a recommendation I set that flag on the binary and interestingly saw the virtual size jump immediately to nearly 3GB. I don't know if I should be alarmed by that?!
I will monitor the application with this configuration for the next several hours.
In my case the advice provided by #ThomasWeller was indeed correct and enabling the "large address aware" flag has allowed this application to run for several days without throwing memory exceptions.
As the load on our Azure website has increased (along with the complexity of the work that it's doing), we've noticed that we're running into CPU utilization issues. CPU utilization gradually creeps upward over the course of several hours, even as traffic levels remain fairly steady. Over time, if the Azure stats are correct, we'll somehow manage to get > 60 seconds of CPU per instance (not quite clear how that works), and response times will start increasing dramatically.
If I restart the web server, CPU drops immediately, and then begins a slow creep back up. For instance, in the image below, you can see CPU creeping up, followed by the restart (with the red circle) and then a recovery of the CPU.
I'm strongly inclined to suspect that this is a problem somewhere in my own code, but I'm scratching my head as to how to figure it out. So far, any attempts to reproduce this on my dev or testing environments have proven ineffectual. Nearly all the suggestions for profiling IIS/C# performance seem to presume either direct access to the machine in question or at least a "Cloud Service" instance rather than an Azure Website.
I know this is a bit of a long shot, but... any suggestions, either for what it might be, or how to troubleshoot it?
(We're using C# 5.0, .NET 4.5.1, ASP.NET MVC 5.2.0, WebAPI 2.2, EF 6.1.1, Azure System Bus, Azure SQL Database, Azure redis cache, and async for every significant code path.)
Edit 8/5/14 - I've tried some of the suggestions below. But when the website is truly busy, i.e., ~100% CPU utilization, any attempt to download a mini-dump or GC dump results in a 500 error, with the message, "Not enough storage." The various times that I have been able to download a mini-dump or GC dump, they haven't shown anything particularly interesting, at least, so far as I could figure out. (For instance, the most interesting thing in the GC dump was a half dozen or so >100KB string instances - those seem to be associated with the bundling subsystem in some fashion, so I suspect that they're just cached ScriptBundle or StyleBundle instances.)
Try remote debugging to your site from visual studio.
Try https://{sitename}.scm.azurewebsites.net/ProcessExplorer/ there you can take memory dumps an GC dumps of your w3wp process.
Then you can compare 2 GC dumps to find memory leaks and open the memory dump with windbg/VS for further "offline" debugging.
I have a problem with an application that I wrote in .NET/C#. It consists of a server which manages a few other machines, and runs tests on them. It is a windows forms application. In order to run tests with proper error handling, for each machine I have two threads: one for running tests and one that pings it continuously. Each machine has a running queue, in which tasks are stored, tasks that will be run on that particular machine.
The issue is that after some time, when more than a few tasks are present in the queue, the memory it consumes(process explorer, task manager) gradually increases from about 50-100MB to 1.6-1.8 GB. At about this limit almost every transaction(file copy on share, remote WMI access) with the remote machines fails with either "Not enough storage" or "Out of memory". I tried some tools in order to localize the string and the closest I got was .Net Memory Profiler. That wasn't of great help, because the largest amount of memory was residing in "Private Data - Unidentified". This I'm guessing it's unmanaged data, because I can evaluate every other data(from my program) down to each string and int, and every instance of it.
Can anyone suggest a tool I can use in order to properly localize the leak and maybe fix it. It would help me a lot if I would know the DLL(from my app)/Thread that uses that memory, or at least if I can view somehow what is in that memory.
Note: A lot of posts are out there about the two exceptions: Not enough storage, and Out of memory. Most of them suggest increasing the IRPStackSize on the 'server' machine(in my case, clients). I have IRPStackSize of 50(0x32) on all of the machines, including the server.
EDIT
Regarding the comments: yes, I do maintain a log, but nothing strange happens. Using a memory profiler I discovered that my application, the .NET side uses about 20MB of memory when the unmanaged part is well over 1GB. With the help of WinDbg I found out what resides in that extra memory(in most of it). In order to access the machines and run different tests on them I use WMI, for which I have a wrapper. Everything I use is being disposed(using statements, and for some actually calling the Dispose method. Strangely though, the memory is filled with clones of this class. Does anyone know why a class would clone itself in memory.
Note: the rate at which the memory usage increases in about 5MB/s, so it's not really over a long period of time. I also wonder why it is not being freed by the garbage collector. I am using C# classes to work with WMI, not COM, nor unmanaged code. Also, among the objects on the heap I see a lot of data belonging to wmiutils, CWbemError. Oddly enough, google doesn't even know the word(no results for CWbemError)
I have a C#.NET service running in production. The service functions as a TCP server to which clients register and make requests against. In looking at the Task Manager, it appears to be leaking about 10MB/day. I don't seem to notice these in dev (perhaps because of far less traffic and client activity). In searching around I've read that the Task Manager can be seriously wrong, but I'm not sure how accurate this is or in what circumstances the TM would display incorrect information.
To solve this problem I need to more closely monitor memory consumption. The problem is that the leak only seems to appear in production, where the deployed service was built for Release. Also since it's a service that can't be run directly be VS with an attached profiler/debugging, I'm not sure how to best pinpoint the problem with something more precise than TM.
Any group wisdom would be much appreciated, thanks.
EDIT:
I've added perfmon counters for the privates bytes of the service (7MB to start out) as well as CLR mem in all heaps (30MB to start out)
Task manager says the total memory to be ~37MB so this seems to make sense
The first part of this is to let the service go for a day and check out my counters again.
If my private bytes get huge but CLR mem is roughly static this would indicate an unmanaged leak. If both get huge then it's a managed leak.
Thanks guys.
Your first task is figuring out if the process is leaking memory. You can do this with perfmon measuring the Private Bytes
http://www.goldstarsoftware.com/papers/CapturingVirtualBytesToALogFile.pdf
If the graph is consistently rising (for say half an hour ) you have a memory leak. You can then use other counters to figure out if this is a .NET leak (.NET memory) though this is unlikely. I find that in most of these cases, there is a COM component that is being invoked but not released.
If you truly have a memory leak (and this isn't just variable memory usage)- the process will shutdown with an out of memory exception after running for a while.
You need one of the below MemoryProfilers in order to monitor it;
http://www.jetbrains.com/profiler/
http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/
There are other choices but these are very capable and you can profile remote application's memory with them (at least JetBrains's solution handles that)
Follow this guide: http://blogs.msdn.com/b/tess/archive/2008/03/25/net-debugging-demos-lab-7-memory-leak.aspx
It goes over exactly what you're describing, a memory leak in production. As was mentioned you have to first determine whether it's unmanaged code or managed code that's leaking using perfmon and Private Bytes.
In general make sure for networking objects you're wrapping them in using statements so that they're properly disposed.
A workflow I often use for managed memory leaks is to start the server on a test machine, hit it with a known amount of connections (say 123,456 connections). Then take a memory snapshot by going to task manager and right clicking on the process name and selecting 'create dump'. Open this dump with WinDBG and SOS and run the command !dumpheap -stat. Look for objects that have a multiple of 123,456 instances. Should these objects still be in memory? If not run a !gcroot on an instance of those objects to find why it's still in memory.
Get a dump of the memory when its in a leak state using the Task Manager right click on the process and select create dump file. You can also use ProcDump which gives you more options.
Use SOS Extensions in either WinDebug or Visual Studio to inspect the memory.