Why is stack size in C# exactly 1 MB? - c#

Today's PCs have a large amount of physical RAM but still, the stack size of C# is only 1 MB for 32-bit processes and 4 MB for 64-bit processes (Stack capacity in C#).
Why the stack size in CLR is still so limited?
And why is it exactly 1 MB (4 MB) (and not 2 MB or 512 KB)? Why was it decided to use these amounts?
I am interested in considerations and reasons behind that decision.

You are looking at the guy that made that choice. David Cutler and his team selected one megabyte as the default stack size. Nothing to do with .NET or C#, this was nailed down when they created Windows NT. One megabyte is what it picks when the EXE header of a program or the CreateThread() winapi call doesn't specify the stack size explicitly. Which is the normal way, almost any programmer leaves it up the OS to pick the size.
That choice probably pre-dates the Windows NT design, history is way too murky about this. Would be nice if Cutler would write a book about it, but he's never been a writer. He's been extraordinarily influential on the way computers work. His first OS design was RSX-11M, a 16-bit operating system for DEC computers (Digital Equipment Corporation). It heavily influenced Gary Kildall's CP/M, the first decent OS for 8-bit microprocessors. Which heavily influenced MS-DOS.
His next design was VMS, an operating system for 32-bit processors with virtual memory support. Very successful. His next one was cancelled by DEC around the time the company started disintegrating, not being able to compete with cheap PC hardware. Cue Microsoft, they made him a offer he could not refuse. Many of his co-workers joined too. They worked on VMS v2, better known as Windows NT. DEC got upset about it, money changed hands to settle it. Whether VMS already picked one megabyte is something I don't know, I only know RSX-11 well enough. It isn't unlikely.
Enough history. One megabyte is a lot, a real thread rarely consumes more than a couple of handfuls of kilobytes. So a megabyte is actually rather wasteful. It is however the kind of waste you can afford on a demand-paged virtual memory operating system, that megabyte is just virtual memory. Just numbers to the processor, one each for every 4096 bytes. You never actually use the physical memory, the RAM in the machine, until you actually address it.
It is extra excessive in a .NET program because the one megabyte size was originally picked to accommodate native programs. Which tend to create large stack frames, storing strings and buffers (arrays) on the stack as well. Infamous for being a malware attack vector, a buffer overflow can manipulate the program with data. Not the way .NET programs work, strings and arrays are allocated on the GC heap and indexing is checked. The only way to allocate space on the stack with C# is with the unsafe stackalloc keyword.
The only non-trivial usage of the stack in .NET is by the jitter. It uses the stack of your thread to just-in-time compile MSIL to machine code. I've never seen or checked how much space it requires, it rather depends on the nature of the code and whether or not the optimizer is enabled, but a couple of tens of kilobytes is a rough guess. Which is otherwise how this website got its name, a stack overflow in a .NET program is quite fatal. There isn't enough space left (less than 3 kilobytes) to still reliably JIT any code that tries to catch the exception. Kaboom to desktop is the only option.
Last but not least, a .NET program does something pretty unproductive with the stack. The CLR will commit the stack of a thread. That's an expensive word that means that it doesn't just reserve the size of the stack, it also makes sure that space is reserved in the operating system's paging file so the stack can always be swapped out when necessary. Failing to commit is a fatal error and terminates a program unconditionally. That only happens on machine with very little RAM that runs entirely too many processes, such a machine will have turned to molasses before programs start dying. A possible problem 15+ years ago, not today. Programmers that tune their program to act like an F1 race-car use the <disableCommitThreadStack> element in their .config file.
Fwiw, Cutler didn't stop designing operating systems. That photo was made while he worked on Azure.
Update, I noticed that .NET no longer commits the stack. Not exactly sure when or why this happened, it's been too long since I checked. I'm guessing this design change happened somewhere around .NET 4.5. Pretty sensible change.

The default reserved stack size is specified by the linker and it can be overridden by developers via changing the PE value at the link time or for an individual thread by specifying the dwStackSize parameter for the CreateThread WinAPI function.
If you create a thread with the initial stack size larger than or equal to the default stack size then it rounded up to the nearest multiple of 1 MB.
Why the value equals to 1 MB for 32-bit processes and 4 MB for 64-bit? I think you should ask developers, who designed Windows, or wait until someone of them answers your question.
Probably Mark Russinovich knows that and you can contact him. Maybe you can find this information in his Windows Internals books earlier than sixth edition which describes less info about stacks rather than his article. Or maybe Raymond Chen knows reasons since he writes interesting things about Windows internals and its history. He can answer your question too, but you should post a suggestion to the Suggestion Box.
But at this time I'll try to explain some probable reasons why Microsoft have choose these values using MSDN, Mark's and Raymond's blogs.
The defaults have these values probably because in early times PCs were slow and allocating memory on the stack was much faster than allocating memory in the heap. And since stack allocations were much cheaper they were used, but it required a larger stack size.
So the value were the optimal reserved stack size for most of applications. It's optimal because allows to make a lot of nested calls and allocate memory on the stack to pass structures to calling functions. At the same time it allows to create a lot threads.
Nowadays these values are mostly used for backward compatibility, because structures which are passed as parameters to WinAPI functions are still allocated on the stack. But if you're not using stack allocations then a thread's stack usage will be significantly less than the default 1 MB and it is wasteful as Hans Passant mentioned. And to prevent this the OS commits only the first page of the stack (4 KB), if other isn't specified in the PE header of the application. Other pages are allocated on demand.
Some applications override reserved address space and initially committed to optimize memory usage. As an example, the maximum stack size of an IIS native process's thread is 256 KB (KB932909). And this decreasing of the default values is recommended by Microsoft:
It is best to choose as small a stack size as possible and commit the stack that is needed for the thread or fiber to run reliably. Every page that is reserved for the stack cannot be used for any other purpose.
Sources:
Thread Stack Size (Microsoft Docs)
Pushing the Limits of Windows: Processes and Threads (Mark Russinovich)
By default, the maximum stack size of a thread that is created in a native IIS process is 256 KB (KB932909)

Related

Is it conceivable that the Virtual Size reported by Process Explorer could cause OutOfMemory exceptions?

I am working to diagnose a series of OutOfMemoryException problems within an application of ours. This is an internal 32-bit (x86) OWIN-hosted WebAPI that runs within a console application and talks to a series of hardware components in parallel. For that reason it's creating around 20 instances of a library, and the sharp increase in "virtual size" memory matches when those instances are created.
From the output of Process Explorer, and dotMemory, it does not appear that we're allocating that much actual memory within this application:
From reading many, many SO answers I think I understand that our problem is either from fragmentation within the G0, G1, G2 & LOH heaps, or we're possibly bumping into the 2GB addressable memory limit for a 32-bit process running on Windows 7. This application works in batches where it collects a bunch of data from hardware devices, creates collections in memory to aggregate that data into a single object, and then saves it to be retrieved by a client app. This activity is the cause of the spikes in the dotMemory visual, but these data structures are not enormous, which I think the dotMemory chart shows.
Looking at the heaps has shown they rarely grow beyond 10-15MB in size, and I don't see much evidence that the LOH is growing too large or being severely fragmented. I'm really struggling with how to proceed to better understand what's happening here.
So my question is two-fold:
Is it conceivable that we could be hitting that 2GB limit for virtual memory, and that's a cause for these memory exceptions?
If that is a possible cause then am I right in thinking a 64-bit build would get around that?
We are exploring moving to a 64-bit build, but that would require updating some low-level libraries we use to also be 64-bit. It's certainly an option we will explore eventually (if not sooner), but we're trying to understand this situation better before investing the time required.
Update after setting the LARGEADDRESSFLAG
Based a recommendation I set that flag on the binary and interestingly saw the virtual size jump immediately to nearly 3GB. I don't know if I should be alarmed by that?!
I will monitor the application with this configuration for the next several hours.
In my case the advice provided by #ThomasWeller was indeed correct and enabling the "large address aware" flag has allowed this application to run for several days without throwing memory exceptions.

When I use Socket.IO, why I got an error An unhandled exception of type 'System.OutOfMemoryException'

I coded a program to get the screen shot and send to the server. Every time, I got a screenshot and turned into base64 then sent it using Socket.IO. (using SocketIOClient.dll)
Dictionary<string, string> image = new Dictionary<string, string>();
image.add("image", "");
private void windowMonitorTimer_Tick(object sender, EventArgs e)
{
image["image"] = windowMonitorManager.MonitorScreen();
client.getSocket().Emit("Shot", image);
}
windowMonitorManager.MonitorScreen() is for return a base64 string. If I do not use client.getSocket().Emit("Shot", image), the program could run correct, but if I add this line, the program stop like 2 seconds(send nearly 80 times) and give me the error :
An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
If I do not send the string as long as this, just a short string "hello", it sends 1600 times then occurs the same problem.
Somebody knows how to debug this problem?
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
I try to test socket.Emit(), and find it has its limit.
For example, I send a string of 10000000, after 88 times, it occurs the out of memory problem.
If I send a string of 5000000, after 170 times, it occurs the same problem.
Out of memory is mostly an exception thrown when the process consumes much larger memory than what is allowed by default system (like 2 GB on a 32 bit system), on a 64 bit system, it is higher, but is still bound by the certain practical limit, it's not the theory value of 2^64, it differs from OS to OS and is also dependent on underlying RAM, but is large enough for a single process, now this situation can happens due to multiple reasons:
Memory leak (most prominent), mostly associated with the unmanaged code calls, if there's a handle or memory allocation that is not de-allocated or freed, over a period of time it leads to huge memory allocation for a process and thus the exception, when system cannot map any more.
Managed code can too leak and I have done that, when objects gets continuously created and they are not de-referenced, i.e they are still reachable in GC context, so you can lead to this scenario, I have done this in my code :)
This is not a null reference or a corruption, so taking direct stack trace will be of little use in this scenario, simply because you may get a different stack every time, it will be like the stack of the process threads, when exception happens and it would be mostly misleading, so do not try that way. Executing method of a thread doesn't mean it caused memory leak and it will be different for different threads.
How to debug:
Number of simple steps can be taken to narrow down, but before anything ensure that you have the debug version with valid pdb files for all the loaded binaries of the process.
To know whether is it a leak, monitor process "working set", "virtual bytes" counters either via task manager or preferably via perfmon, since it is much more accurate, also it provides visual graph.
Now a leak is a leak, so steps like increasing address space in a 32 bit system to 3 GB in place of default 2 GB can only help for sometime, but perfmon will tell you if there's a stabilization point, like in few cases, process needs 2.2 GB of memory, so 2 GB by default is not enough but 3 GB in boot.config and UserVA setting for fine tuning 3 GB will help avoiding exception.
If you are using 64 bit system then that's not a worry point, but ensure that your binary is compiled for X64 or any CPU, a 32 bit binary will run as a WOW process and will have limitation on 64 bit system too.
Also try a small utility handles from sysinternals, running it mutliple times in a batch for a process will provide details of a leaking handle, in terms of number of handles like file, mutex etc allocated
Once you have confirmed a genuine leak, not a settings or configuration or system issue, then comes the memory profilers. In free ones, you get lot of information through free tools like windbg, umdh and leakdiag, they infact point you to exact stack trace which is leaking. umdh and leakdiag are both very good tools, they let you know the leaking function. Leakdiag is more exhaustive than UMDH, but for runtime heap UMDH is good enough
Professional memory profilers like:
Dot Memory - http://www.jetbrains.com/dotmemory/
Ants - Red Gate - http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/
are also very good, i personally find Dot memory much more helpful and it helps in quickly pointing to the leaking function or type, with little effort, provided you have correct symbol files. Both have free download version available
Mostly resolving out of memory exception is a gradual and iterative process, because this one could be hidden deep inside, leaking a chunk memory with every execution and bringing the complete process to knees. Let me know if you need help in using a specific tool, then we can see what more can be done to debug the issue further. Happy Debugging
Sounds to me like there's a bug in the SocketIOClient DLL. Without the DLL I cannot reproduce the problem, but tracking it down sounds easy.
Since C# is a garbage collected language, the only way to get an Out Of Memory (OOM) is if you have too much memory allocated that cannot be tracked down to a 'root object'. There are several kinds of root objects:
Static variables (or threadstatic)
Methods variables (locals / arguments) in your stack trace
All objects that you reference from these two (direct / indirect) will contribute to your memory pressure. If you allocate memory that's not available, the GC will first attempt to free memory before throwing an OOM; if not enough memory is free after the GC completed, an OOM will be thrown.
One obvious reason why this might happen is because you're running a 32-bit process, which is the default in Visual Studio nowadays. This can be fixed in the project properties. However, most processes don't need more than 2 GB of memory, so it's more likely that you 'leaked' memory somewhere. So let's break it down:
Locals or arguments that leak
Ways to solve these kind of OOM:
Open visual studio, ctrl d,e (or debug -> exceptions)
Click OutOfMemoryException -> check the 'throw' box
Run the program.
When the out of memory (OOM) hits, you browse through the nodes in the stack trace (or 'parallel stacks') and check the size of the variables. In most cases it's a single buffer or collection that causes the problem. F.ex. in your case a buffer might fill up with socket data, which is never emptied.
Static variables that leak
Other cases of OOM are usually buffers that granually fill up and have a reference to the main tree. The easiest way to find these is to use a memory profiler, like Red Gate / ANTS memory profiler. Run your program in the profiler, take some snapshots and check for 'large instances'.
In general, I usually try to avoid static variables alltogether, which solves a whole world of problems.
Oh and in this case...
Perhaps it's worth noting that there are a lot of good socket libraries out there... even though I don't know the specifics of SocketIOClient, you might want to consider using a widely supported, proven socket library like WCF/SOAP or Protobuf. There's a lot of material online on how to use these in just about any scenario, so if the problem is in SocketIOClient you might want to consider that...
I would guess you are running your timer too frequently and eating memory to an unsustainable point. Did you try lowering its frequency?
If lowering the frequency doesn't help, your code or the SocketIOClient.dll library may be leaking memory. I suggest that first you review the usage of that library to verify that you are not leaving resources open.

System.StackOverflowException from XP machine to 7.0 machine

I have a c# application which I used to run on a XP machine.
I switched recently to a Windows 7.0 machine.
I have the following error message when being in debugger: "System.StackOverflowException". Still have the XP machine, don't have the problem with this one.
It's overflowing in the middle of a recursive algorithm.
Anyone familiar with this problem? Is that the OS which has to do with this or is that the machine itself?
Many thanks for your help,
Michael
It would be helpful to know just how deep the recursion goes in XP before reaching the base case, and where it errors in Win7.
Theoretically, a Windows 7 process should have more available stack space than a WinXP process; at the very least, they should be the same. However, there are other factors at play here. Check out this blog post: http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx
In short, the limiting factor is usually "resident available memory"; this is physical RAM (not page file space) that is available for data that must be kept there and can't be swapped to the page file. A lot of things must be kept "resident" on the average computer and cannot be swapped out to the page file; most important is that anything that must be run in "kernel mode" (requiring direct access to the core system) must be kept in RAM to avoid page faults, even when there are no active threads for that process at the time.
Windows 7 has more of these "kernel-mode" processes. For instance, Windows Aero (which wasn't part of WinXP) uses your graphics card to accelerate rendering of the desktop, and so it must run in kernel mode. The Windows 7 kernel itself is larger, because it includes additional security and additional built-in hardware support. Windows 7 also has additional background processes etc that run in kernel mode that weren't in WinXP.
So, all other things being equal (including RAM), a Windows 7 machine will actually have less resident memory available to commit to your recursive algorithm, meaning that the algorithm will not be able to recurse deeply enough to reach the base case before a call triggers a StackOverflowException due to Windows not having enough resident memory to meet the "commit" required for the new call.
In addition, Windows 7 arranges things in memory differently. Older Windows versions (XP and older) reserved a memory space for each new process in roughly sequential fashion; the N+1th process (or thread) is given a memory address one block after the last one reserved for the Nth process/thread. Beginning with Windows Vista, memory was allocated in a more "random" fashion; Windows will choose a location in memory that may or may not be adjacent to any other reserved block (it's only guaranteed not to be a part of any other reserved block). This is a security feature designed to confuse malware and prevent it from successfully snooping around in other processes' memory. However, the less space-efficient allocation scheme means that the OS will more quickly run out of 1MB blocks of contiguous RAM to allocate to each new thread. At that point, it begins allocating the gaps. So, depending on your Windows 7 machine's specific memory usage footprint, the thread for your recursive function may request the usual 1MB of stack space, and be given a pointer by the OS which actually only has 128K of contiguous space. Your program won't be able to tell the difference, until it can't actually commit all the space it thought it had reserved. This can produce Heisenbugs where it'll work one time but fail the next because of non-deterministic differences in the exact memory space Windows reserves for the thread each time.
The answer to all of this is "more RAM". The amount needed by the core kernel-mode processes is relatively static, so every GB of additional RAM you can add is a GB that is available solely for user program processes and threads.
How recursive is recursive?
Anything deeper than about ten or so could be risky.
If you're exhausting the stack and you're sure it's not a bug, you could manage your own stack...
For instance:
void Process(SomeType foo)
{
DoWork(foo); //work on foo
foreach(var child in foo.Children)
{
Process(child);
}
}
could become
void Process(SomeType foo)
{
Stack<SomeType> bar=new Stack<SomeType>();
bar.Push(foo);
while(bar.Any())
{
var item=bar.Pop();
DoWork(item);//work on item
foreach(var child in item.Children)
{
bar.Push(child);
}
}
}
thus eliminating any CLR call-stack problems.
Of course, this won't fix an unbounded recursion.
I don't believe this has anything to do with the physical RAM on your PC. I suspect the reason you didn't happen to see it on XP is simply that Windows 7 probably has a (slightly?) different version of .Net.
Clearly, you need to somehow limit the depth of your recursion (or substitute a non-recursive loop).
But you can potentially configure your .Net stack(s). Please look at these links:
http://www.atalasoft.com/cs/blogs/rickm/archive/2008/04/22/increasing-the-size-of-your-stack-net-memory-management-part-3.aspx
http://msdn.microsoft.com/en-us/library/5cykbwz4.aspx
How does the .NET IL .maxstack directive work?

When and how is the .NET managed heap getting swapped?

My small stress test, which allocates random length arrays (100..200MB each) in a loop, shows different behaviour on a 64 bit Win7 machine and on a 32 bit XP (in a VM). Both systems first normally allocate as much arrays as will fit into the LOH. Then the LOH gets bigger and bigger until the virtual address space available is filled up. Expected behaviour so far. But than - on further requests - both behave differently:
While on Win7 an OutOfMemoryException (OOM) is thrown, on XP it seems, the heap gets increased and even swapped to disk - at least no OOM is thrown. (Dont know, if this may have to do with XP running in a virtual box.)
Question:
How does the runtime (or the OS?) decide, whether for managed memory allocation requests, if it is too large to get allocated, a OOM is generated or the large object heap is getting increased - eventually even swapped to disk?
If it is swapped, when does an OOM occour than?
IMO this question is important to all production environments, potentially dealing with larger datasets. Somehow it feels more "safe" to know, the system would rather slow down dramatically in such situations (by swapping) than simply throwing an OOM. At least, it should somehow be deterministically, right?
#Edit: the app is a 32 bit application, therefore running in 32 bit mode on Win 7.
The normal rules apply, a managed process is not treated differently by the Windows memory manager. The ultimate source for chunks of memory is the Windows memory manager. If it cannot find a hole in the virtual memory address space to fit the requested memory allocation then it fails the VirtualAlloc() call and the CLR generates OOM.
Same for swapping behavior, if pages in RAM are needed to map pages of other processes or even pages of the same process then they'll get swapped out. This is not otherwise associated with OOM.
You cannot assume it will work exactly the same on XP as it does on Win7 x64. Getting OOM on x64 when you build your program targeting AnyCPU is quite unusual, a 64-bit operating system has a very large virtual memory address space. The upper limit is set by the maximum size of the paging file. A 32-bit program will run in the WOW emulation layer, it can have a 4 GB address space if you set the LARGEADDRESSAWARE option bit with Editbin.exe.
You can use SysInteral's VMMap utility to see how the address space of your process is carved up.

Random RAM usage amounts

I was hoping someone could explain why my application when loaded uses varying amounts of RAM. I'm speaking about a compiled version that uses the exe directly. It's a pretty basic applications and there are no conditional branches in the startup of the application. Yet every time I start it up the RAM amount varies from 6MB-16MB.
I know it's on the small end of usage anyways but I'm curious of why this happens.
Edit: to give a bit more clarification on what the app actually does.
It is a WinForm project.
It connects to a database using sqlclient to retrieve a list of servers.
Based on that list a series of buttons are created to start and stop a service on those servers.
Using the System.Timers class to audit the status of the services on those servers every 20 seconds.
The applications at this point sits there and waits for user input via one of the button clicks to start/stop the service.
The trick here is that the amount of RAM reported by the task schedule is not the amount of RAM used by your application. Rather, it is the amount of RAM reserved for use by your application.
Remember that with managed frameworks like .Net, you don't request or release memory directly. Rather, a garbage collector manages the memory for you. The amount of memory reserved for your application at a given time can vary and depends on a lot of different factors, including memory pressure created at the time by other programs.
Think of it this way: if you need 10 MBs of RAM for your app, is it faster to request and return it to the operating system 1 MB at a time over 10 requests/releases or reserve the block at once with one request/release? Now extend that to a scenario where you don't know exactly how much RAM you'll need, only that it's somewhere in the neighborhood of 10 MB. Additionally, your computer has 1 GB sitting there unused. Of course the best thing to do is take a good-sized chunk of that available RAM. Even 20 or 30 MB wouldn't be unreasonable relative to the ram that's sitting there unused, because unused RAM is wasted performance.
If your system later starts to feel some memory pressure then .Net can easily return some RAM to the system. This is one of the ways managed languages can sometimes give better performance than languages like C++ with traditional memory management: a garbage collector that can more easily take the entire system health into account when allocating memory.
What are you using to determine how much memory is being "used". Even with regular applications Windows will aggressively allocate unused memory in advance, with .NET applications it's even more complicated as to how much memory is actually being used, and how much Windows is just tacking on so that it will be available instantly when needed. If another application actually asks for memory this reserved memory will be repurposed.
One way to check is to minimize the application (at least on XP). If you are looking at the memory use in something like task manager you'll notice it drops off right away, eliminating the seemly "random" amount allocated.
It may be related to the jitter, after the first load the jitter already created a compiled version and it doesn't need to run. Other than that you would have to give us some more details about the app and which kind of memory you are referring to.

Categories