I am going to be doing a project soon for my degree that requires brute force text crunching and analysis. This will obviously mean a lot of reading and writing to RAM.
What is the most efficient method of memory management in C#? Last semester I was introduced to the memory marshal class and found this to be a very efficient method of reading and writing large amounts of data to RAM, however maybe that was just my experience. I'm hoping that someone can give me some advice or suggestions on alternatives or best practices for memory management in C#.
Thanks
The most efficient memory management system varies wildly with what you try to do in practice.
As a rule of thumb, try to stay clear of unmanaged code in C#: managed memory is more than enough for the immense majority of problems, and unless you know exactly what to do you're very unlikely to be more efficient than managed memory.
So my advice would be the following. Try a fully managed implementation, with a few good practices to prevent using too much memory:
always dispose your disposable objects
try mutualizing heavy assets, byte buffers for instance: instead of creating a new buffer every time you need one, use a buffer pool
If you gain empirical evidence that you need to do manual marshalling, then learn about it and use it. But not before.
Remember that a lot of people have worked on C# memory management, and that most C# developers don't need more (to the point that a lot of them don't even know how memory management works behind the scene, because they just don't need to). Managed memory in C# is pretty good, give it a shot first.
Related
Now i started work on typical application that is massively using buffers.. i was surprised that i can't find good clear guide on this topic.
I have couple questions.
1) When do i prefer to use buffer in unmanaged heap memory over managed memory?
I know that object allocation is faster on .net then on the unmanaged heap and object destruction is much more expensive on .net because of GC overhead, so i think that it will be little faster to use unmanaged. When should i use fixed{} and when Marshal.AllocHGlobal()?
2) As i understand it is more effective to use week reference for both managed and unmanaged buffers in .net if it buffer possible can be reused after some time (based on user actions), isn't it?
Trying to manually manage your memory allocations using native "buffers" is going to be difficult at best with .NET. You can't allocate managed types into the unmanaged buffers, so they'll only be usable for structured data, in which case, there's little advantage over a simple managed array (which will stay in memory contiguously, etc).
In general, it's typically a better approach to try to manage how you're allocating and letting go of objects, and try to manually reuse them as appropriate (if, and only if, memory pressure is an issue for you).
As for some of your specific points:
I know that object allocation is faster on .net then on the unmanaged heap and object destruction is much more expensive on .net because of GC overhead, so i think that it will be little faster to use unmanaged.
I think your assumptions here are a bit flawed. Object allocation, at the point of allocation, is typically faster in .NET, as the CLR will potentially have preallocated memory it can already use. Object "destruction" is also faster on .NET, though there is a deferred cost due to GC that can be a bit higher (though it's not always). There are a lot of factors here, mainly focused around object lifecycles - if you allow your objects to be promoted to Gen1 or especially Gen2, then things can potentially get difficult to track and measure, as GC compaction costs can be higher.
When should i use fixed{} and when Marshal.AllocHGlobal()?
In general, you would (very) rarely use either in C#. You're typically better off leaving your memory unpinned, and allowing the GC to do its work properly, which in turn tends to lead to better GC heuristics overall.
2) As i understand it is more effective to use week reference for both managed and unmanaged buffers in .net if it buffer possible can be reused after some time (based and user actions), isn't it?
Not necessarily. Reusing objects and keeping them alive longer than necessary has some serious drawbacks, as well. This will probably guarantee that the memory will get promoted into Gen2, which will potentially make life worse, not better.
Typically, my advice would be to trust the system, but measure as you go. If, and only if, you find a real problem, there are almost always ways to address those specific issues (without resorting to unmanaged or manually managing memory buffers). Working with raw memory should be an absolute last resort when dealing with a managed code base.
I have an app which consumes a lot of real time data, and because it's doing so much it's quite slow under the VS 2010 and this causes it to fail in various ways.
So I was wondering if there's any way other than this profiler that I can find out how much memory in bytes say is allocated to each type in memory and dump this out periodically?
It's quite a large application so adding my own counters isn't really feasible...
You need to use a memory profiler.
There are many around, some free and some commercial.
MemProfiler
ANTS memory profiler
dotTrace
clr profiler
Also see What Are Some Good .NET Profilers?
There is no easy general purpose way of saying GetBytesUsedForInstance(object), but it depends what you need the data for (unless all your types are value types, in which case it should be relatively simple).
We have an in memory cache for part of our application. We care most about relative amounts of memory used - ie the total cache size is twice what it was yesterday. For this, we serialize our object graphs to a stream and take the stream length (and then discard the stream). This is not an accurate measurement of "how much memory a type uses up" per se, but is useful for these relative comparisons.
Other than that - I think you are stuck using a profiler. I can highly recommend SciTech Memory Profiler - I use it a lot. It integrates well into Visual Studio, is fast (the latest version is anyway), and gives tremendously useful detail.
I would suggest like for getting a general information massively use Process Explorer.
One time you gfigure out you need to understand the stuff deeper (what kind of objects are on heap, for example) , the best tool I used for profiling is JetBrains Memory and Performance profiler. This one is payed only.
If you need only performance profiler, there is really good free option Equatec Performance profiler
Good luck.
I have a program that processes high volumes of data, and can cache much of it for reuse with subsequent records in memory. The more I cache, the faster it works. But if I cache too much, boom, start over, and that takes a lot longer!
I haven't been too successful trying to do anything after the exception occurs - I can't get enough memory to do anything.
Also I've tried allocating a huge object, then de-allocating it right away, with inconsistent results. Maybe I'm doing something wrong?
Anyway, what I'm stuck with is just setting a hardcoded limit on the # of cached objects that, from experience, seems to be low enough. Any better Ideas? thanks.
edit after answer
The following code seems to be doing exactly what I want:
Loop
Dim memFailPoint As MemoryFailPoint = Nothing
Try
memFailPoint = New MemoryFailPoint( mysize) ''// size of MB of several objects I'm about to add to cache
memFailPoint.Dispose()
Catch ex As InsufficientMemoryException
''// dump the oldest items here
End Try
''// do work
next loop.
I need to test if it is slowing things down in this arrangement or not, but I can see the yellow line in Task Manager looking like a very healthy sawtooth pattern with a consistent top - yay!!
You can use MemoryFailPoint to check for available memory before allocating.
You may need to think about your release strategy for the cached objects. There is no possible way you can hold all of them forever so you need to come up with an expiration timeframe and have older cached objects removed from memory. It should be possible to find out how much memory is left and use that as part of your strategy but one thing is certain, old objects must go.
If you implement your cache with WeakRerefences (http://msdn.microsoft.com/en-us/library/system.weakreference.aspx) that will leave the cached objects still eligible for garbage collection in situations where you might otherwise throw an OutOfMemory exception.
This is an alternative to a fixed sized cache, but potentially has the problem to be overly aggressive in clearing out the cache when a GC does occur.
You might consider taking a hybrid approach, where there are a (tunable) fixed number of non-weakreferences in the cahce but you let it grow additionally with weakreferences. Or this may be overkill.
There are a number of metrics you can use to keep track of how much memory your process is using:
GC.GetTotalMemory
Environment.WorkingSet (This one isn't useful, my bad)
The native GlobalMemoryStatusEx function
There are also various properties on the Process class
The trouble is that there isn't really a reliable way of telling from these values alone whether or not a given memory allocation will fail as although there may be sufficient space in the address space for a given memory allocation memory fragmentation means that the space may not be continuous and so the allocation may still fail.
You can however use these values as an indication of how much memory the process is using and therefore whether or not you should think about removing objects from your cache.
Update: Its also important to make sure that you understand the distinction between virtual memory and physical memory - unless your page file is disabled (very unlikely) the cause of the OutOfMemoryException will be caused by a lack / fragmentation of the virtual address space.
If you're only using managed resources you can use the GC.GetTotalMemory method and compare the results with the maximum allowed memory for a process on your architecture.
A more advanced solution (I think this is how SQL Server manages to actually adapt to the available memory) is to use the CLR Hosting APIs:
the interface allows the CLR to inform the host of the consequences of
failing a particular allocation
which will mean actually removing some objects from the cache and trying again.
Anyway I think this is probably an overkill for almost all applications unless you really need an amazing performance.
The simple answer... By knowing what your memory limit is.
The closer you are to reach that limit the more you ARE ABOUT to get an OutOfMemoryException.
The more elaborated answer.... Unless you yourself writes a mechanism to do that kind of thing, programming languages/systems do not work that way; as far as I know they cannot inform you ahead or in advance you are exceeding limits BUT, they gladly inform you when the problem has occurred, and that usually happens through exceptions which you are supposed to write code to handle.
Memory is a resource that you can use; it has limits and it also has some conventions and rules for you to follow to make good use of that resource.
I believe what you are doing of setting a good limit, hard coded or configurable, seems to be your best bet.
I understand there are many questions related to this, so I'll be very specific.
I create Console application with two instructions. Create a List with some large capacity and fill it with sample data, and then clear that List or make it equal to null.
What I want to know is if there is a way for me to know/measure/profile while debugging or not, if the actual memory used by the application after the list was cleared and null-ed is about the same as before the list was created and populated. I know for sure that the application has disposed of the information and the GC has finished collecting, but can I know for sure how much memory my application would consume after this?
I understand that during the process of filling the list, a lot of memory is allocated and after it's been cleared that memory may become available to other process if it needs it, but is it possible to measure the real memory consumed by the application at the end?
Thanks
Edit: OK, here is my real scenario and objective. I work on a WPF application that works with large amounts of data read through USB device. At some point, the application allocates about 700+ MB of memory to store all the List data, which it parses, analyzes and then writes to the filesystem. When I write the data to the filesystem, I clear all the Lists and dispose all collections that previously held the large data, so I can do another data processing. I want to know that I won't run into performance issues or eventually use up all memory. I'm fine with my program using a lot of memory, but I'm not fine with it using it all after few USB processings.
How can I go around controlling this? Are memory or process profilers used in case like this? Simply using Task Manager, I see my application taking up 800 MB of memory, but after I clear the collections, the memory stays the same. I understand this won't go down unless windows needs it, so I was wondering if I can know for sure that the memory is cleared and free to be used (by my application or windows)
It is very hard to measure "real memory" usage on Windows if you mean physical memory. Most likley you want something else like:
Amount of memory allocated for the process (see Zooba's answer)
Amount of Managed memory allocated - CLR Profiler, or any other profiler listed in this one - Best .NET memory and performance profiler?
What Task Manager reports for your application
Note that it is not necessary that after garbage collection is finished amount of memory allocated for your process (1) changes - GC may keep allocated memory for future managed allocations (this behavior is not specific to CLR for memory allcation - most memory allocators keep free blocks for later usage unless forced to release it by some means). The http://blogs.msdn.com/b/maoni/ blog is excelent source for details on GC/memory.
Process Explorer will give you all the information you need. Specifically, you will probably be most interested in the "private bytes history" graph for your process.
Alternatively, it is possible to use Window's Performance Monitor to track your specific application. This should give identical information to Process Explorer, though it will let you write the actual numbers out to a separate file.
(A picture because I can...)
I personaly use SciTech Memory Profiler
It has a real time option that you can use to see your memory usage. It has help me find a number of problems with leaking memory.
Try ANTS Profiler. Its not free but you can try the trial version.
http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/
I have been writing applications lately in c# that use a ton of memory or stack overflow due to processing extremely large amounts of data in fun ways. Is there a language better suited for this type of thing? Would I benefit from learning a different language (other than c++) to do this?
C# isn't the problem. You may need to reconsider the "fun ways" you're handling memory and data. Provide specific scenarios and questions here to get specific answers and alternatives to potentially-problematic methods and strategies you may be using in your application(s).
If running on a 32bit system .Net will start giving you out of memory exceptions when you consume ~800mb. This is because it need to allocate continuous blocks of memory. If you have an array or list which needs to be expanded, it will copy the old content to a new one, thus having two instances allocated at the same time.
If you can run 64bit, then you will hit your exceptions on anything from ~2GB and above, all depending on how your application works, and what else is running.
For data larger than your physical memory, I would recommend either memory mapped files, or doing some disk/memory swapping.
If you are working with large data sets and doing functional manipulation, you might consider looking into a functional language like F# or Haskell.
The will not suffer as readily from recursive issues.
However these languages wont substitute for a good design and attention to how you are doing your operations. Its possible that C# is completely well suited to your problem you might just need to refactor how you are handling the problem space.
IDL (Interactive Data Language) is specially suited for large, matrix-like sets of data. You must, however, pay attention to using matrix or vector operations and not sequential loops.
If licensing is a problem you can try the free clone GDL, although it may not be as fast as IDL.
How large is your data?