My small stress test, which allocates random length arrays (100..200MB each) in a loop, shows different behaviour on a 64 bit Win7 machine and on a 32 bit XP (in a VM). Both systems first normally allocate as much arrays as will fit into the LOH. Then the LOH gets bigger and bigger until the virtual address space available is filled up. Expected behaviour so far. But than - on further requests - both behave differently:
While on Win7 an OutOfMemoryException (OOM) is thrown, on XP it seems, the heap gets increased and even swapped to disk - at least no OOM is thrown. (Dont know, if this may have to do with XP running in a virtual box.)
Question:
How does the runtime (or the OS?) decide, whether for managed memory allocation requests, if it is too large to get allocated, a OOM is generated or the large object heap is getting increased - eventually even swapped to disk?
If it is swapped, when does an OOM occour than?
IMO this question is important to all production environments, potentially dealing with larger datasets. Somehow it feels more "safe" to know, the system would rather slow down dramatically in such situations (by swapping) than simply throwing an OOM. At least, it should somehow be deterministically, right?
#Edit: the app is a 32 bit application, therefore running in 32 bit mode on Win 7.
The normal rules apply, a managed process is not treated differently by the Windows memory manager. The ultimate source for chunks of memory is the Windows memory manager. If it cannot find a hole in the virtual memory address space to fit the requested memory allocation then it fails the VirtualAlloc() call and the CLR generates OOM.
Same for swapping behavior, if pages in RAM are needed to map pages of other processes or even pages of the same process then they'll get swapped out. This is not otherwise associated with OOM.
You cannot assume it will work exactly the same on XP as it does on Win7 x64. Getting OOM on x64 when you build your program targeting AnyCPU is quite unusual, a 64-bit operating system has a very large virtual memory address space. The upper limit is set by the maximum size of the paging file. A 32-bit program will run in the WOW emulation layer, it can have a 4 GB address space if you set the LARGEADDRESSAWARE option bit with Editbin.exe.
You can use SysInteral's VMMap utility to see how the address space of your process is carved up.
Related
I am working to diagnose a series of OutOfMemoryException problems within an application of ours. This is an internal 32-bit (x86) OWIN-hosted WebAPI that runs within a console application and talks to a series of hardware components in parallel. For that reason it's creating around 20 instances of a library, and the sharp increase in "virtual size" memory matches when those instances are created.
From the output of Process Explorer, and dotMemory, it does not appear that we're allocating that much actual memory within this application:
From reading many, many SO answers I think I understand that our problem is either from fragmentation within the G0, G1, G2 & LOH heaps, or we're possibly bumping into the 2GB addressable memory limit for a 32-bit process running on Windows 7. This application works in batches where it collects a bunch of data from hardware devices, creates collections in memory to aggregate that data into a single object, and then saves it to be retrieved by a client app. This activity is the cause of the spikes in the dotMemory visual, but these data structures are not enormous, which I think the dotMemory chart shows.
Looking at the heaps has shown they rarely grow beyond 10-15MB in size, and I don't see much evidence that the LOH is growing too large or being severely fragmented. I'm really struggling with how to proceed to better understand what's happening here.
So my question is two-fold:
Is it conceivable that we could be hitting that 2GB limit for virtual memory, and that's a cause for these memory exceptions?
If that is a possible cause then am I right in thinking a 64-bit build would get around that?
We are exploring moving to a 64-bit build, but that would require updating some low-level libraries we use to also be 64-bit. It's certainly an option we will explore eventually (if not sooner), but we're trying to understand this situation better before investing the time required.
Update after setting the LARGEADDRESSFLAG
Based a recommendation I set that flag on the binary and interestingly saw the virtual size jump immediately to nearly 3GB. I don't know if I should be alarmed by that?!
I will monitor the application with this configuration for the next several hours.
In my case the advice provided by #ThomasWeller was indeed correct and enabling the "large address aware" flag has allowed this application to run for several days without throwing memory exceptions.
Today's PCs have a large amount of physical RAM but still, the stack size of C# is only 1 MB for 32-bit processes and 4 MB for 64-bit processes (Stack capacity in C#).
Why the stack size in CLR is still so limited?
And why is it exactly 1 MB (4 MB) (and not 2 MB or 512 KB)? Why was it decided to use these amounts?
I am interested in considerations and reasons behind that decision.
You are looking at the guy that made that choice. David Cutler and his team selected one megabyte as the default stack size. Nothing to do with .NET or C#, this was nailed down when they created Windows NT. One megabyte is what it picks when the EXE header of a program or the CreateThread() winapi call doesn't specify the stack size explicitly. Which is the normal way, almost any programmer leaves it up the OS to pick the size.
That choice probably pre-dates the Windows NT design, history is way too murky about this. Would be nice if Cutler would write a book about it, but he's never been a writer. He's been extraordinarily influential on the way computers work. His first OS design was RSX-11M, a 16-bit operating system for DEC computers (Digital Equipment Corporation). It heavily influenced Gary Kildall's CP/M, the first decent OS for 8-bit microprocessors. Which heavily influenced MS-DOS.
His next design was VMS, an operating system for 32-bit processors with virtual memory support. Very successful. His next one was cancelled by DEC around the time the company started disintegrating, not being able to compete with cheap PC hardware. Cue Microsoft, they made him a offer he could not refuse. Many of his co-workers joined too. They worked on VMS v2, better known as Windows NT. DEC got upset about it, money changed hands to settle it. Whether VMS already picked one megabyte is something I don't know, I only know RSX-11 well enough. It isn't unlikely.
Enough history. One megabyte is a lot, a real thread rarely consumes more than a couple of handfuls of kilobytes. So a megabyte is actually rather wasteful. It is however the kind of waste you can afford on a demand-paged virtual memory operating system, that megabyte is just virtual memory. Just numbers to the processor, one each for every 4096 bytes. You never actually use the physical memory, the RAM in the machine, until you actually address it.
It is extra excessive in a .NET program because the one megabyte size was originally picked to accommodate native programs. Which tend to create large stack frames, storing strings and buffers (arrays) on the stack as well. Infamous for being a malware attack vector, a buffer overflow can manipulate the program with data. Not the way .NET programs work, strings and arrays are allocated on the GC heap and indexing is checked. The only way to allocate space on the stack with C# is with the unsafe stackalloc keyword.
The only non-trivial usage of the stack in .NET is by the jitter. It uses the stack of your thread to just-in-time compile MSIL to machine code. I've never seen or checked how much space it requires, it rather depends on the nature of the code and whether or not the optimizer is enabled, but a couple of tens of kilobytes is a rough guess. Which is otherwise how this website got its name, a stack overflow in a .NET program is quite fatal. There isn't enough space left (less than 3 kilobytes) to still reliably JIT any code that tries to catch the exception. Kaboom to desktop is the only option.
Last but not least, a .NET program does something pretty unproductive with the stack. The CLR will commit the stack of a thread. That's an expensive word that means that it doesn't just reserve the size of the stack, it also makes sure that space is reserved in the operating system's paging file so the stack can always be swapped out when necessary. Failing to commit is a fatal error and terminates a program unconditionally. That only happens on machine with very little RAM that runs entirely too many processes, such a machine will have turned to molasses before programs start dying. A possible problem 15+ years ago, not today. Programmers that tune their program to act like an F1 race-car use the <disableCommitThreadStack> element in their .config file.
Fwiw, Cutler didn't stop designing operating systems. That photo was made while he worked on Azure.
Update, I noticed that .NET no longer commits the stack. Not exactly sure when or why this happened, it's been too long since I checked. I'm guessing this design change happened somewhere around .NET 4.5. Pretty sensible change.
The default reserved stack size is specified by the linker and it can be overridden by developers via changing the PE value at the link time or for an individual thread by specifying the dwStackSize parameter for the CreateThread WinAPI function.
If you create a thread with the initial stack size larger than or equal to the default stack size then it rounded up to the nearest multiple of 1 MB.
Why the value equals to 1 MB for 32-bit processes and 4 MB for 64-bit? I think you should ask developers, who designed Windows, or wait until someone of them answers your question.
Probably Mark Russinovich knows that and you can contact him. Maybe you can find this information in his Windows Internals books earlier than sixth edition which describes less info about stacks rather than his article. Or maybe Raymond Chen knows reasons since he writes interesting things about Windows internals and its history. He can answer your question too, but you should post a suggestion to the Suggestion Box.
But at this time I'll try to explain some probable reasons why Microsoft have choose these values using MSDN, Mark's and Raymond's blogs.
The defaults have these values probably because in early times PCs were slow and allocating memory on the stack was much faster than allocating memory in the heap. And since stack allocations were much cheaper they were used, but it required a larger stack size.
So the value were the optimal reserved stack size for most of applications. It's optimal because allows to make a lot of nested calls and allocate memory on the stack to pass structures to calling functions. At the same time it allows to create a lot threads.
Nowadays these values are mostly used for backward compatibility, because structures which are passed as parameters to WinAPI functions are still allocated on the stack. But if you're not using stack allocations then a thread's stack usage will be significantly less than the default 1 MB and it is wasteful as Hans Passant mentioned. And to prevent this the OS commits only the first page of the stack (4 KB), if other isn't specified in the PE header of the application. Other pages are allocated on demand.
Some applications override reserved address space and initially committed to optimize memory usage. As an example, the maximum stack size of an IIS native process's thread is 256 KB (KB932909). And this decreasing of the default values is recommended by Microsoft:
It is best to choose as small a stack size as possible and commit the stack that is needed for the thread or fiber to run reliably. Every page that is reserved for the stack cannot be used for any other purpose.
Sources:
Thread Stack Size (Microsoft Docs)
Pushing the Limits of Windows: Processes and Threads (Mark Russinovich)
By default, the maximum stack size of a thread that is created in a native IIS process is 256 KB (KB932909)
I have a backend application (windows service) built on top of .NET Framework 4.5 (C#). The application runs on Windows Server 2008 R2 server, with 64GB of memory.
Due to dependencies I had, I used to compile and run this application as a 32-bit process (compile it as x86) and use /LARGEADDRESSAWARE flag to let the application use more than 2GB memory in the user space. Using this configuration, the average memory consumption (according to the "memory (private working set)" column in the task manager) was about 300-400MB.
The reason I needed the LARGEADDRESSAWARE flag, and the reason i changed it to 64-bit, is that although 300-400MB is the average, once in a while this app doing stuff that involves loading a lot of data into the memory (and it's much easier to develop and manage this kind of stuff when you're not very limited memory-wise).
Recently (after removing those x86 native dependencies), I changed the application compilation to "Any CPU", so now, on the production server, it runs as a 64-bit process. Starting when I did this change, the average memory consumption (according to the task manager) got to new levels: 3-4 GB, when there is no other change that may explain this change in behavior.
Here are some additional facts about the current state:
According to the "#Bytes in all heaps" counter, the total amount of memory is about 600MB.
When debugging the process with WinDbg+SOS, !dumpheap -stat showed that there are about 250-300MB free, but all the other object was much less than the total amount of memory the process used.
According to the GC performance counters, there are Gen0 collections on regular basis. In fact, the "% Time in GC" counter indicates that 10-20% in average of the time spent on GC (which makes sense given the nature of the application - a lot of allocations of information and data structures that are in use for short time).
I'm using Server GC in this app.
There is no memory problem on the server. It uses about 50-60% of the available memory (64GB).
My questions:
Why is a great difference between the memory allocated to the process (according to the task manager) and the actual size of the CLR heap (there is no un-managed code in the process that can explain this)?
Why is the 64-bit process takes more memory compared to the same process running as 32-bit process? even when considering that pointers takes twice the size, there's a big difference.
Can i do something to lower the memory consumption, or to have better understanding of the issue?
Thanks!
There are a few things to consider:
1) You mentioned you're using Server GC mode. In server GC mode, CLR creates one heap for every CPU core on the machine, which is more efficient more multi-threaded processing in server processes, e.g. Asp.Net processes. Each heap has two segment: one for small objects, one for large objects. Each segment starts with 4 gb reserved memory. Basically server GC mode tries to use more memory on the system to trade for overall system performance.
2) Pointer is bigger on 64-bit, of course.
3) Foreground Gen2 GC becomes super expensive in server GC mode due to heap is much larger. So CLR tries super hard to reduce the number of foreground Gen2 GC, sometimes using background Gen2 GC.
4) Depending on usage, fragmentation can become a real issue. I've seen heaps with 98% fragmentation (98% heap is free blocks).
To really solve your problem, you need to get an ETW trace + a memory dump, and then use tools like PerfView for detailed analysis.
A 64-bit process will naturally use 64-bit pointers, effectively doubling the memory usage of every reference. Certain platform-dependent variables such as IntPtr will also take up double the space.
The first and best thing you can do is to run a memory profiler to see where exactly the extra memory footprint is coming from. Anything else is speculative!
I have a c# application which I used to run on a XP machine.
I switched recently to a Windows 7.0 machine.
I have the following error message when being in debugger: "System.StackOverflowException". Still have the XP machine, don't have the problem with this one.
It's overflowing in the middle of a recursive algorithm.
Anyone familiar with this problem? Is that the OS which has to do with this or is that the machine itself?
Many thanks for your help,
Michael
It would be helpful to know just how deep the recursion goes in XP before reaching the base case, and where it errors in Win7.
Theoretically, a Windows 7 process should have more available stack space than a WinXP process; at the very least, they should be the same. However, there are other factors at play here. Check out this blog post: http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx
In short, the limiting factor is usually "resident available memory"; this is physical RAM (not page file space) that is available for data that must be kept there and can't be swapped to the page file. A lot of things must be kept "resident" on the average computer and cannot be swapped out to the page file; most important is that anything that must be run in "kernel mode" (requiring direct access to the core system) must be kept in RAM to avoid page faults, even when there are no active threads for that process at the time.
Windows 7 has more of these "kernel-mode" processes. For instance, Windows Aero (which wasn't part of WinXP) uses your graphics card to accelerate rendering of the desktop, and so it must run in kernel mode. The Windows 7 kernel itself is larger, because it includes additional security and additional built-in hardware support. Windows 7 also has additional background processes etc that run in kernel mode that weren't in WinXP.
So, all other things being equal (including RAM), a Windows 7 machine will actually have less resident memory available to commit to your recursive algorithm, meaning that the algorithm will not be able to recurse deeply enough to reach the base case before a call triggers a StackOverflowException due to Windows not having enough resident memory to meet the "commit" required for the new call.
In addition, Windows 7 arranges things in memory differently. Older Windows versions (XP and older) reserved a memory space for each new process in roughly sequential fashion; the N+1th process (or thread) is given a memory address one block after the last one reserved for the Nth process/thread. Beginning with Windows Vista, memory was allocated in a more "random" fashion; Windows will choose a location in memory that may or may not be adjacent to any other reserved block (it's only guaranteed not to be a part of any other reserved block). This is a security feature designed to confuse malware and prevent it from successfully snooping around in other processes' memory. However, the less space-efficient allocation scheme means that the OS will more quickly run out of 1MB blocks of contiguous RAM to allocate to each new thread. At that point, it begins allocating the gaps. So, depending on your Windows 7 machine's specific memory usage footprint, the thread for your recursive function may request the usual 1MB of stack space, and be given a pointer by the OS which actually only has 128K of contiguous space. Your program won't be able to tell the difference, until it can't actually commit all the space it thought it had reserved. This can produce Heisenbugs where it'll work one time but fail the next because of non-deterministic differences in the exact memory space Windows reserves for the thread each time.
The answer to all of this is "more RAM". The amount needed by the core kernel-mode processes is relatively static, so every GB of additional RAM you can add is a GB that is available solely for user program processes and threads.
How recursive is recursive?
Anything deeper than about ten or so could be risky.
If you're exhausting the stack and you're sure it's not a bug, you could manage your own stack...
For instance:
void Process(SomeType foo)
{
DoWork(foo); //work on foo
foreach(var child in foo.Children)
{
Process(child);
}
}
could become
void Process(SomeType foo)
{
Stack<SomeType> bar=new Stack<SomeType>();
bar.Push(foo);
while(bar.Any())
{
var item=bar.Pop();
DoWork(item);//work on item
foreach(var child in item.Children)
{
bar.Push(child);
}
}
}
thus eliminating any CLR call-stack problems.
Of course, this won't fix an unbounded recursion.
I don't believe this has anything to do with the physical RAM on your PC. I suspect the reason you didn't happen to see it on XP is simply that Windows 7 probably has a (slightly?) different version of .Net.
Clearly, you need to somehow limit the depth of your recursion (or substitute a non-recursive loop).
But you can potentially configure your .Net stack(s). Please look at these links:
http://www.atalasoft.com/cs/blogs/rickm/archive/2008/04/22/increasing-the-size-of-your-stack-net-memory-management-part-3.aspx
http://msdn.microsoft.com/en-us/library/5cykbwz4.aspx
How does the .NET IL .maxstack directive work?
I have created a volume class (called VoxelVolume) with a self-organizing memory management, since the GC in C# didn't provide a good mechanism for managing contents of the volume for mapping, unmapping and remapping. Although I could have used the mechanisms of virtual memory, the problem is that the files are often too large to fit into the page file and I don't want to force the users to increase the pagefile size.
Currently this system is working quite well and there is no problem in lacking resources and OutOfMemoryExceptions since the InsufficientMemoryException using the MemoryFailPoint works quite well. This was all testes on a 32bit WinXP system with 2GB of main memory.
Running the same mechanism on 64bit system with 32GB of main memory also works well, but when the application runs the MemoryFailPoint suddenly throws an exception although 24GB of main memory are still free. Another point is when the MemoryFailPoint has fired once, it fires everytime and there is no chance to get rid of it.
What I have read so far, that there is a small object and a large object heap (SOH and LOH). But only for the SOH the GC takes real care of and I can free the SOH from unused objects by applying GC.Collect() and GC.WaitForPendingFinalizers. The MemoryFailPoint is obviously the only way to get a little bit of control for the LOH, but since there is enough memory left on the system I see no reason why the MemoryFilePoint should fire.
Is there any experience around here using the MemoryFailPoint?
Thank you for your help
Martin
I suppose MFP fires due to memory fragmentation.
In 64bit's system you still cannot allocate chunk bigger than 2GB, as far as I know.