Balancing Memory Mapped Files for data larger than the RAM

Balancing Memory Mapped Files for data larger than the RAM - c#

We are having very large data files, for example assume a 3D volume with 2048x2048 matrix size with a slice depth of 20000.
Originally, I had my own central memory management where each data is backed with a "page-file" on the disk. During processing I track the process memory and if the memory is low I check which slices haven't been touched for a while and page them in my own hand-made paging-file. This results of course in some form a zig-zag when you look at the process memory, but this methods works even if the total size of my files is much larger than the available RAM. Of course putting more RAM inside improves the situation but the system is able to adapt to these situation. So if you pay more you get more speed ;-)
But, the method is of course hand-made, I need a separate thread that watches over the allocated files and data, I need special methods that handle the paging etc.
So I decided to have a look at memory mapped files...
When I now create a memory mapped file (Read&Write) and run through the data slice by slice and get access to the data with a ReadOnlySpan<> in a for loop the whole memory is consumed and you see the process memory lineary growing until the whole memory is consumed. Afterwards the system goes into swapping without coming out ever again. Its even so that the system freezes and only a restart helps to recover.
So obviously the MM-Files are not balancing themselves and when it comes to the point where we a reaching the RAM maximum, the system is exhausted and is obviously helpless.
The question is can I control when the system does a swapping and advise which regions can be swapped? Or do you have suggestions for better approaches?
I have created a example that does following
It creates a Volume as input
It creates one or many processor that simply add 10 to each slice in each processor.
Since everything is created only on access, a Parallel.For iterates over the slices and requests the result slice. Depending on the amount N of processors in the chain each slice of the result should have a value of (N+1)*10.
See here: Example Code
Update [26.09.2021]: My latest investigations have shown that there are two independent process running. One process (my process) is writing or reading data into the memory mapped file(s). Anothter system process is trying to manage the memory. While the memory is growing the system is obviously very lazy flushing data really to the backing memory mapped file or the paging file.
This leads to a bad condition. When the maximum RAM is reached, the memory manager starts to empty the working set and flush the data to disk. While the system flushes, I'm still producing data which fill the memory again. So in the end I'm coming to a point where I'm always using the maximum RAM and never come out.
I tried a few things:
VirtualUnlock enables me mark regions as pageable for the memory manager.
You can call a Flush on the ViewAccessor.
Since the workingset is still showing high memory usage, I call EmptyWorkingSet to empty the working set, which immediately returns.
What I need in the end is something to see if the memory manager is currently unmapping so that I can slow down the process.
I have to search more...

Related

WinRT how expensive is reading from local file cache

I am building a winrt metro app and always seem to run into this (non?) issue.
I find myself maintaining a lot of files of cached data that I can serialize back and forth. Data retrieved from services, user selected items and so on.
The question that I always seem to have when I write the calls is: is it the accessing of the actual file (and releasing etc) that takes time/is expensive or the amount of data that needs to be serialized from it?
How much should I worry about, for example, trying to combine a couple of files that may have the same object types stored into one and then identifying the ones I need once I have the objects 'out'.

Did you ever get insufficient Memory or memory out of bounds exception.
Winrt lets you use the ram and the cached file upto around 70-80% of its memory . Any thing beyond that will crash the app. Once you navigate away fro your page your resorces are garbage collected so thats not an issue. But if you are using using for the memory stream then also it will be fine but saving large data and continiously fetching files from the data base effects system memory. And since surface tablets have limited memory set so should take a bit care about large number of files :) i faced this while rendering bitmaps as loading around 100 bitmaps simultaneously into the memory threw insufficient memory exception.

.net memory measuring and profiling

I understand there are many questions related to this, so I'll be very specific.
I create Console application with two instructions. Create a List with some large capacity and fill it with sample data, and then clear that List or make it equal to null.
What I want to know is if there is a way for me to know/measure/profile while debugging or not, if the actual memory used by the application after the list was cleared and null-ed is about the same as before the list was created and populated. I know for sure that the application has disposed of the information and the GC has finished collecting, but can I know for sure how much memory my application would consume after this?
I understand that during the process of filling the list, a lot of memory is allocated and after it's been cleared that memory may become available to other process if it needs it, but is it possible to measure the real memory consumed by the application at the end?
Thanks
Edit: OK, here is my real scenario and objective. I work on a WPF application that works with large amounts of data read through USB device. At some point, the application allocates about 700+ MB of memory to store all the List data, which it parses, analyzes and then writes to the filesystem. When I write the data to the filesystem, I clear all the Lists and dispose all collections that previously held the large data, so I can do another data processing. I want to know that I won't run into performance issues or eventually use up all memory. I'm fine with my program using a lot of memory, but I'm not fine with it using it all after few USB processings.
How can I go around controlling this? Are memory or process profilers used in case like this? Simply using Task Manager, I see my application taking up 800 MB of memory, but after I clear the collections, the memory stays the same. I understand this won't go down unless windows needs it, so I was wondering if I can know for sure that the memory is cleared and free to be used (by my application or windows)

It is very hard to measure "real memory" usage on Windows if you mean physical memory. Most likley you want something else like:
Amount of memory allocated for the process (see Zooba's answer)
Amount of Managed memory allocated - CLR Profiler, or any other profiler listed in this one - Best .NET memory and performance profiler?
What Task Manager reports for your application
Note that it is not necessary that after garbage collection is finished amount of memory allocated for your process (1) changes - GC may keep allocated memory for future managed allocations (this behavior is not specific to CLR for memory allcation - most memory allocators keep free blocks for later usage unless forced to release it by some means). The http://blogs.msdn.com/b/maoni/ blog is excelent source for details on GC/memory.

Process Explorer will give you all the information you need. Specifically, you will probably be most interested in the "private bytes history" graph for your process.
Alternatively, it is possible to use Window's Performance Monitor to track your specific application. This should give identical information to Process Explorer, though it will let you write the actual numbers out to a separate file.
(A picture because I can...)

I personaly use SciTech Memory Profiler
It has a real time option that you can use to see your memory usage. It has help me find a number of problems with leaking memory.

Try ANTS Profiler. Its not free but you can try the trial version.
http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/

Virtual and Physical Memory / OutOfMemoryException

I am working on a 64-bit .Net Windows Service application that essentially loads up a bunch of data for processing. While performing data volume testing, we were able to overwhelm the process and it threw an OutOfMemoryException (I do not have any performance statistics on the process when it failed.) I have a hard time believing that the process requested a chunk of memory that would have exceeded the allowable address space for the process since its running on a 64-bit machine. I do know that the process is running on a machine that is consistently in the neighborhood of 80%-90% physical memory usage. My question is: Can the CLR throw an OutOfMemoryException if the machine is critically low on available physical memory even though the process wouldn't exceed it's allowable amount of virtual memory?
Thanks for your help!

There are still some reachable limits in place in a 64-bit environment. Check this page for some of the most common ones. In short, yes, you can still run out of memory, if your program loads a whopping 128GB of data into virtual memory. You could also still be limited by the 2GB max per-process limit if you do not have the IMAGE_FILE_LARGE_ADDRESS_AWARE environment variable set.

Another possibility is that the program tried to allocate a single block of memory larger than 2 gigabytes, which is a .NET limitation. This can happen when adding things to a collection (most often a Dictionary or HashSet, but also a List or any other collection that grows automatically.)
Dictionary and HashSet do this often if you're trying to put more than about 47 million items into the collection. Although the collection can hold about 89.5 million, the algorithm that grows the collection does so by doubling. If you start from an empty Dictionary and begin adding items, the collection doubles a number of times until it reaches about 47 million. Then it tries to double again and throws OutOfMemoryException.
The way to avoid the exception is to pre-allocate the collection so that its capacity is large enough to hold however many items you expect to put into it.

There's what you can theoretically address.
There's what physical me mory you have, and whe nyou exceed that you start using swap, which is often limited to the size of some chosen disk partitions.
As a rule of thumb one tends to have a small number (like one or two) of multiples of physical memory as swap.
So yes it's quite likely that you are out of available, as opposed to addressable memory.

Random RAM usage amounts

I was hoping someone could explain why my application when loaded uses varying amounts of RAM. I'm speaking about a compiled version that uses the exe directly. It's a pretty basic applications and there are no conditional branches in the startup of the application. Yet every time I start it up the RAM amount varies from 6MB-16MB.
I know it's on the small end of usage anyways but I'm curious of why this happens.
Edit: to give a bit more clarification on what the app actually does.
It is a WinForm project.
It connects to a database using sqlclient to retrieve a list of servers.
Based on that list a series of buttons are created to start and stop a service on those servers.
Using the System.Timers class to audit the status of the services on those servers every 20 seconds.
The applications at this point sits there and waits for user input via one of the button clicks to start/stop the service.

The trick here is that the amount of RAM reported by the task schedule is not the amount of RAM used by your application. Rather, it is the amount of RAM reserved for use by your application.
Remember that with managed frameworks like .Net, you don't request or release memory directly. Rather, a garbage collector manages the memory for you. The amount of memory reserved for your application at a given time can vary and depends on a lot of different factors, including memory pressure created at the time by other programs.
Think of it this way: if you need 10 MBs of RAM for your app, is it faster to request and return it to the operating system 1 MB at a time over 10 requests/releases or reserve the block at once with one request/release? Now extend that to a scenario where you don't know exactly how much RAM you'll need, only that it's somewhere in the neighborhood of 10 MB. Additionally, your computer has 1 GB sitting there unused. Of course the best thing to do is take a good-sized chunk of that available RAM. Even 20 or 30 MB wouldn't be unreasonable relative to the ram that's sitting there unused, because unused RAM is wasted performance.
If your system later starts to feel some memory pressure then .Net can easily return some RAM to the system. This is one of the ways managed languages can sometimes give better performance than languages like C++ with traditional memory management: a garbage collector that can more easily take the entire system health into account when allocating memory.

What are you using to determine how much memory is being "used". Even with regular applications Windows will aggressively allocate unused memory in advance, with .NET applications it's even more complicated as to how much memory is actually being used, and how much Windows is just tacking on so that it will be available instantly when needed. If another application actually asks for memory this reserved memory will be repurposed.
One way to check is to minimize the application (at least on XP). If you are looking at the memory use in something like task manager you'll notice it drops off right away, eliminating the seemly "random" amount allocated.

It may be related to the jitter, after the first load the jitter already created a compiled version and it doesn't need to run. Other than that you would have to give us some more details about the app and which kind of memory you are referring to.

Preventing Memory issues when handling large amounts of text

I have written a program which analyzes a project's source code and reports various issues and metrics based on the code.
To analyze the source code, I load the code files that exist in the project's directory structure and analyze the code from memory. The code goes through extensive processing before it is passed to other methods to be analyzed further.
The code is passed around to several classes when it is processed.
The other day I was running it on one of the larger project my group has, and my program crapped out on me because there was too much source code loaded into memory. This is a corner case at this point, but I want to be able to handle this issue in the future.
What would be the best way to avoid memory issues?
I'm thinking about loading the code, do the initial processing of the file, then serialize the results to disk, so that when I need to access them again, I do not have to go through the process of manipulating the raw code again. Does this make sense? Or is the serialization/deserialization more expensive then processing the code again?
I want to keep a reasonable level of performance while addressing this problem. Most of the time, the source code will fit into memory without issue, so is there a way to only "page" my information when I am low on memory? Is there a way to tell when my application is running low on memory?
Update:
The problem is not that a single file fills memory, its all of the files in memory at once fill memory. My current idea is to rotate off the disk drive when I process them

1.6GB is still manageable and by itself should not cause memory problems. Inefficient string operations might do it.
As you parse the source code your probably split it apart into certain substrings - tokens or whatver you call them. If your tokens combined account for entire source code, that doubles memory consumption right there. Depending on the complexity of the processing you do the mutiplier can be even bigger.
My first move here would be to have a closer look on how you use your strings and find a way to optimize it - i.e. discarding the origianl after the first pass, compress the whitespaces, or use indexes (pointers) to the original strings rather than actual substrings - there is a number of techniques which can be useful here.
If none of this would help than I would resort to swapping them to and fro the disk

If the problem is that a single copy of your code causing you to fill the memory available then there are atleast two options.
serialize to disk
compress files in memory. If you have a lot of CPU it can be faster to zip and unzip information in memory, instead of caching to disk.
You should also check if you are disposing of objects properly. Do you have memory problems due to old copies of objects being in memory?

Use WinDbg with SOS to see what is holding on the string references (or what ever is causing the extreme memory usage).

Serializing/deserializing sounds like a good strategy. I've done a fair amount of this and it is very fast. In fact I have an app that instantiates objects from a DB and then serializes them to the hard drives of my web nodes. It has been a while since I benchmarked it, but it was serializing several hundred a second and maybe over 1k back when I was load testing.
Of course it will depend on the size of your code files. My files were fairly small.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.