Why does reusing arrays increase performance so significantly in c#? - c#

In my code, I perform a large number of tasks, each requiring a large array of memory to temporarily store data. I have about 500 tasks. At the beginning of each task, I allocate memory for
an array :
double[] tempDoubleArray = new double[M];
M is a large number depending on the precise task, typically around 2000000. Now, I do some complex calculations to fill the array, and in the end I use the array to determine the result of this task. After that, the tempDoubleArray goes out of scope.
Profiling reveals that the calls to construct the arrays are time consuming. So, I decide to try and reuse the array, by making it static and reusing it. It requires some additional juggling to
figure out the minimum size of the array, requiring an extra pass through all tasks, but it works. Now, the program is much faster (from 80 sec to 22 sec for execution of all tasks).
double[] tempDoubleArray = staticDoubleArray;
However, I'm a bit in the dark of why precisely this works so well. Id say that in the original code, when the tempDoubleArray goes out of scope, it can be collected, so allocating a new array should not be that hard right?
I ask this because understanding why it works might help me figuring out other ways to achieve the same effect, and because I would like to know in what cases allocation gives performance issues.

Just because something can be collected doesn't mean that it will. In fact, were the garbage collector as aggressive as that in its collection, your performance would be significantly worse.
Bear in mind that creating an array is not just creating one variable, it's creating N variables (N being the number of elements in the array). Reusing arrays is a good bang-for-your-buck way of increasing performance, though you have to do so carefully.
To clarify, what I mean by "creating variables" specifically is allocating the space for them and performing whatever steps the runtime has to in order to make them usable (i.e. initializing the values to zero/null). Because arrays are reference types, they are stored on the heap, which makes life a little more complicated when it comes to memory allocation. Depending on the size of the array (whether or not it's over 85KB in total storage space), it will either be stored in the normal heap or the Large Object Heap. An array stored on the ordinary heap, as with all other heap objects, can trigger garbage collection and compaction of the heap (which involves shuffling around currently in-use memory in order to maximize contiguous available space). An array stored on the Large Object Heap would not trigger compaction (as the LOH is never compacted), but it could trigger premature collection by taking up another large contiguous block of memory.

One answer could be the large object heap - objects greater than 85KB are allocated on a different LOH, that is less frequently collected and not compacted.
See the section on performance implications
there is an allocation cost (primarily clearing out the allocated memory)
the collection cost (LOH and Gen2 are collected together - causing compaction of large objects in Gen2)

It's not always easy to allocate large blocks of memory in the presence of fragmentation. I can't say for sure, but my guess is that it's having to do some rearranging to get enough contiguous memory for such a big block of memory. As for why allocating subsequent arrays isn't faster, my guess is either that the big block gets fragmented between GC time and the next allocation OR the original block was never GCd to start with.

Related

.NET - Reducing GC contention on array creation

I have some code that deal with a lot of copying of arrays. Basically my class is a collection that uses arrays as backing fields, and since I don't want to run the risk of anyone modifying an existing collection, most operations involves creating copies of the collection before modifying it, hence also copying the backing arrays.
I have noticed that the copying can be slow sometimes, within acceptable limits but I am worried that it might be a problem when the application is scaled up and starts using more data.
Some performance analysis testing suggests that while barely consuming CPU resources at all, my array copy code spends a lot of time blocked. There are few contentions, but a lot of time blocked. Since the testing application is single threaded, I assume there is some GC contention magic going on. I'm not confident enough in how the GC works in these scenarios, so I'm asking here.
My question - is there a way to create new arrays that reduces the strain on the GC? Or is there some other way I can speed this up (simplified for testing and readability purposes):
public MyCollection(MyCollection copyFrom)
{
_items = new KeyValuePair<T, double>[copyFrom._items.Length]; //this line is reported to have a lot of contention time
Array.Copy(copyFrom._items, _items, copyFrom._items.Length);
_numItems = copyFrom._numItems;
}
Not so sure what's going on here, but contention is a threading problem, not an array copying problem. And yes, a concurrency analyzer is liable to point at a new statement since memory allocation requires acquiring a lock that protects the heap.
That lock is held for a very short time when allocations come from the gen #0 heap. So having threads fighting over the lock and losing a great deal of time being locked out is a very unlikely mishap. It is not so fast when the allocation comes from the Large Object Heap. Happens when the allocation is 85,000 bytes or more. But then a thread would of course be pretty busy with copying the array elements as well.
Do watch out for what the tool tells you, a very large number of total contentions does not automatically mean you have a problem. It only gets ugly when threads end up getting blocked for a substantial amount of time. If that is a real problem then you next need to look at how much time is spent on garbage collection. There is a basic perf counter for that, you can see it in Perfmon.exe. Category ".NET CLR Memory", counter "% Time in GC", instance = yourapp. Which is liable to be high, considering the amount of copying you do. A knob you can tweak if that is the real problem is to enable server GC.
There's a concept of persistent immutable data structure. This is one of the possible solutions that basically let's you create immutable objects, while still modifying them, in a memory efficient way.
For example,
Roslyn has a SyntaxTree object, that is immutable. You can modify the immutable object, and get back modified immutable object. Note that the "modified immutable object" has possibly no memory allocations, because it can build on the "first immutable object".
The same concept is also used in Visual Studio text editor itself. The TextBuffer is immutable object, but each time you press a keyboard button, new immutable TextBuffer is created, however, they do not allocate memory(as that would be slow).
Also, if it's true that you're facing the issues with LOH, it can help sometimes when you allocate the big memory block yourself, and use that as "reusable" memory pool, thus avoiding GC completely. It's worth considering.
No. You can wait for the new runtime in 2015 though that will use SIMD instructions for the Array.Copy operation. This will be quite a lot faster. The current implementation is very sub-optimal.
At the end, the trick is in avoiding memory operations - which sometime just is not possible.

Does allocating objects of the same size improve GC or "new" performance?

Suppose we have to create many small objects of byte array type. The size varies but it always below 1024 bytes , say 780,256,953....
Will it improve operator new or GC efficiency over time if we always allocate only bytes[1024], and use only space needed?
UPD: This is short living objects, created for parsing binary protocol messages.
UPD: The number of the objects is the same in both cases, it just the size of allocation which changes (random vs. always 1024).
In C++ it would matter because of fragmentation and C++ new performance. But in C#....
Will it improve operator new or GC efficiency over time if we always allocate only bytes[1024], and use only space needed?
Maybe. You're going to have to profile it and see.
The way we allocate syntax tree nodes inside the Roslyn compiler is quite interesting, and I'm eventually going to do a blog post about it. Until then, the relevant bit to your question is this interesting bit of trivia. Our allocation pattern typically involves allocating an "underlying" immutable node (which we call the "green" node) and a "facade" mutable node that wraps it (which we call the "red" node). As you might imagine, it is frequently the case that we end up allocating these in pairs: green, red, green, red, green, red.
The green nodes are persistent and therefore long-lived; the facades are short-lived, because they are discarded on every edit. Therefore it is frequently the case that the garbage collector has green / hole / green / hole / green / hole, and then the green nodes move up a generation.
Our assumption had always been that making data structures smaller will always improve GC performance. Smaller structures equals less memory allocated, equals less collection pressure, equals fewer collections, equals more performance, right? But we discovered through profiling that making the red nodes smaller in this scenario actually decreases GC performance. Something about the particular size of the holes affects the GC in some odd way; not being an expert on the internals of the garbage collector, it is opaque to me why that should be.
So is it possible that changing the size of your allocations can affect the GC in some unforseen way? Yes, it is possible. But, first off, it is unlikely, and second it is impossible to know whether you are in that situation until you actually try it in real-world scenarios and carefully measure GC performance.
And of course, you might not be gated on GC performance. Roslyn does so many small allocations that it is crucial that we tune our GC-impacting behaviour, but we do an insane number of small allocations. The vast majority of .NET programs do not stress the GC the way we do. If you are in the minority of programs that stress the GC in interesting ways then there is no way around it; you're going to have to profile and gather empirical data, just like we do on the Roslyn team.
If you are not in that minority, then don't worry about GC performance; you probably have a bigger problem somewhere else that you should be dealing with first.
new is fast, it is the GC that causes problems. So, it depends on how long your arrays live for.
If they only live a short time, I don't think there will be any improvement from allocating 1024 byte arrays. In fact this will put more pressure on the GC because of the wasted space and will probably degrade performance.
If they live for the life of your application, I would consider allocating one large array and using chunks of it for each small array. You would need to profile this to see if it helps.
Not really, Allocating or clearing a bytes array requires only one instruction, regardless of its size. (I speak about your case. there are exceptions)
You shouldn't worry about the performance aspect of garbage collection, unless you are sure that it's a bottleneck for your application (ie you create a lot of references with complex relationship, and throw it shortly afterward... And the garbage collection is noticeable.)
To read an excellent story about a well known (and quite useful) site having performance issues with the .NET GC (in an impressive use case) see this blog. http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector ;)
But the most important thing about GC is: Never, ever do optimisations before being sure that you have a problem. Because if you do, you will probably have one. Applications are complex, and the GC interacts with every parts of it, at runtime. Apart from simple cases, predicting its behavior and bottlenecks beforehand seems (in my opinion) difficult.
I also don't think only allocating 1024 byte arrays will improve the GC.
Since the GC is not determistic, I also think it will not be your problem.
You can influence the GC by using using {} statements arround your arrays to free memory (maybe) sooner.
I do not think GC will be the problem and/ or bottleneck.
Allocating objects of different sizes and freeing them in a different sequence can lead to fragmented heap memory. So this is where allocating objects of the same size might come in handy.
If you really do allocate/ free a lot and really do see this as your bottleneck, try to re-use the objects with a local object cache. This can lead to performance increase, especially if they are data-only objects that do not implement a lot of logic. If the objects do implement a lot of logic and require mor complex initialisation (RAII-pattern) I would forego the performance increase for robustness of the program...
hth
Mario

Collections and memory

I have an application that read 3-4 GB s of data, build entities out of each line and then stores them in Lists.
The problem I had is, memory grows insane becomes like 13 to 15 GB. Why the heck storing these entities takes so much memory.
So I build a Tree and did something similar to Huffman Encoding, and overall memory size became around 200 - 300 MB.
I understand, that I compacted the data. But I wasn't expecting that storing objects in the list would increase the memory so much. Why did that happen?
how about other data structures like dictionary, stack, queue, array etc?
Where can I find more information about the internals and memory allocations of data structures?
Or am I doing something wrong?
In .NET large objects go on the large object heap which is not compacted. Large is everything above 85,000 bytes. When you grow your lists they will probably become larger than that and have to be reallocated once you cross the current capacity. Rellocation means that they are very likely put at the end of the heap. So you end up with a very fragmented LOH and lots of memory usage.
Update: If you initialize your lists with the required capacity (which you can determine from the DB I guess) then your memory consumption should go down a bit.
Regardless of the data structure you're going to use, your memory consumption is never going to drop below the memory required to store all your data.
Have you calculated how much memory it is required to store one instance class object?
Your huffman encoding is a space-saving optimization, which means that you are eliminating a lot of duplicated data within your class objects yourself. This has nothing to do with the data structure you use to hold your data. This depends on how your data itself is structured so that you can take advantage of different space-saving strategies (of which huffman encoding is one out of many possibilities, suitable for eliminating common prefixes and the data structure used to store it is a tree).
Now, back to your question. Without optimizing your data (i.e. objects), there are things you can watch out to improve memory usage efficiency.
Are all our objects of similar size?
Did you simply run a loop, allocate memory on-the-fly, then insert them into a list, like this:
foreach (var obj in collection) { myList.Add(new myObject(obj)); }
In that case, your list object is constantly being expanded. And if there is not enough free memory at the end to expand the list, .NET will allocate a new, larger piece of memory and copies the original array to the new memory. Essentially you end up with two pieces of memory -- the original one, and the new expanded one (now holding the list). Do this many many many times (as you obviously need to for GB's of data), and you are looking at a LOT of fragmented memory spaces.
You'll be better off just allocating enough memory for the entire list at one go.
As an afternote, I can't help but wondering: how in the world are you going to search this HUGE list to find something you need? Shouldn't you be using something like a binary tree or a hash-table to aid in your searching? Maybe you are just reading in all the data, perform some processing on all of them, then writing them back out...
If you are using classes, read the response of this: Understanding CLR object size between 32 bit vs 64 bit
On 64 bits (you are using 64 bits, right?) object overhead is 16 bytes PLUS the reference to the object (someone is referencing him, right?) so another 8 bytes. So an empty object will "eat" at least 24 bytes.
If you are using Lists, remember that Lists grow by doubling, so you could be wasting much space. Other .NET collections grow in the same way.
I'll add that the "pure" overhead of million of Lists could bring the memory to his knees. Other than the 16 + 8 bytes of space "eaten" by the List object, it is composed (in the .NET implementation) of 2 ints (8 bytes), a SyncLock reference (8 bytes, it's null normally) and a reference to the internal array (so 8 + 16 bytes + the array)

C#: managing large memory buffers

I am maintaining a video application written in C#.
I need as much control as possible over memory allocation/deallocation
for large memory buffers (hundreds of megabytes).
As it is written, when pixel data needs to be freed, the pixel buffer
is set to null. Is there a better way of freeing up memory?
Is there a large cost to garbage collecting large objects?
Thanks!
Don't throw big buffers like that away, you are lucky to have it. Video gives lots of opportunity for re-use. Don't lose a buffer until you are sure you won't need it anymore. At which point it doesn't matter when it get collected.
The cost of garbage collecting large objects is very high from what I remember. From what I read they automatically become generation 2 on allocation(they are allocated in the large object heap). And since they are large they force frequent generation 2 collections.
So I'd rather implement manual pooling for the bitmap arrays, or even use unmanaged memory. Have some pool class and return the array back to it in the Dispose of your pixels/bitmap class.
With memory blocks that large ("hundreds of megabytes") it should be relativaly easy to know precisely who and where uses them (you can fit just 10-20 of such blocks in memory anyway). As ypu plan to use such amounts of mmeory you need to carefully budget memory usage - i.e. simple copy of whole buffer will take non-trivial time.
When you are done with particular block you can force GC yourself. It sounds like reasonable usage of GC.Collect API - you done with using huge portion of all memory avaialble.
You also may consider switchihng to allocation of smaller (64k) blocks and link them together if it works for your application. This will align better with garbage collection and may provide more flexibility for your application.

Understanding Memory Performance Counters

[Update - Sep 30, 2010]
Since I studied a lot on this & related topics, I'll write whatever tips I gathered out of my experiences and suggestions provided in answers over here-
1) Use memory profiler (try CLR Profiler, to start with) and find the routines which consume max mem and fine tune them, like reuse big arrays, try to keep references to objects to minimal.
2) If possible, allocate small objects (less than 85k for .NET 2.0) and use memory pools if you can to avoid high CPU usage by garbage collector.
3) If you increase references to objects, you're responsible to de-reference them the same number of times. You'll have peace of mind and code probably will work better.
4) If nothing works and you are still clueless, use elimination method (comment/skip code) to find out what is consuming most memory.
Using memory performance counters inside your code might also help you.
Hope these help!
[Original question]
Hi!
I'm working in C#, and my issue is out of memory exception.
I read an excellent article on LOH here ->
http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Awesome read!
And,
http://dotnetdebug.net/2005/06/30/perfmon-your-debugging-buddy/
My issue:
I am facing out of memory issue in an enterprise level desktop application. I tried to read and understand stuff about memory profiling and performance counter (tried WinDBG also! - little bit) but am still clueless about basic stuff.
I tried CLR profiler to analyze the memory usage. It was helpful in:
Showing me who allocated huge chunks of memory
What data type used maximum memory
But, both, CLR Profiler and Performance Counters (since they share same data), failed to explain:
The numbers that is collected after each run of the app - how to understand if there is any improvement?!?!
How do I compare the performance data after each run - is lower/higher number of a particular counter good or bad?
What I need:
I am looking for the tips on:
How to free (yes, right) managed data type objects (like arrays, big strings) - but not by making GC.Collect calls, if possible. I have to handle arrays of bytes of length like 500KB (unavoidable size :-( ) every now and then.
If fragmentation occurs, how to compact memory - as it seems that .NET GC is not really effectively doing that and causing OOM.
Also, what exactly is 85KB limit for LOH? Is this the size of the object of the overall size of the array? This is not very clear to me.
What memory counters can tell if code changes are actually reducing the chances of OOM?
Tips I already know
Set managed objects to null - mark them garbage - so that garbage collector can collect them. This is strange - after setting a string[] object to null, the # bytes in all Heaps shot up!
Avoid creating objects/arrays > 85KB - this is not in my control. So, there could be lots of LOH.
3.
Memory Leaks Indicators:
# bytes in all Heaps increasing
Gen 2 Heap Size increasing
# GC handles increasing
# of Pinned Objects increasing
# total committed Bytes increasing
# total reserved Bytes increasing
Large Object Heap increasing
My situation:
I have got 4 GB, 32-bit machine with Wink 2K3 server SP2 on it.
I understand that an application can use <= 2 GB of physical RAM
Increasing the Virtual Memory (pagefile) size has no effect in this scenario.
As its OOM issue, I am only focusing on memory related counters only.
Please advice! I really need some help as I'm stuck because of lack of good documentation!
Nayan, here are the answers to your questions, and a couple of additional advices.
You cannot free them, you can only make them easier to be collected by GC. Seems you already know the way:the key is reducing the number of references to the object.
Fragmentation is one more thing which you cannot control. But there are several factors which can influence this:
LOH external fragmentation is less dangerous than Gen2 external fragmentation, 'cause LOH is not compacted. The free slots of LOH can be reused instead.
If the 500Kb byte arrays are referring to are used as some IO buffers (e.g. passed to some socket-based API or unmanaged code), there are high chances that they will get pinned. A pinned object cannot be compacted by GC, and they are one of the most frequent reasons of heap fragmentation.
85K is a limit for an object size. But remember, System.Array instance is an object too, so all your 500K byte[] are in LOH.
All counters that are in your post can give a hint about changes in memory consumption, but in your case I would select BIAH (Bytes in all heaps) and LOH size as primary indicators. BIAH show the total size of all managed heaps (Gen1 + Gen2 + LOH, to be precise, no Gen0 - but who cares about Gen0, right? :) ), and LOH is the heap where all large byte[] are placed.
Advices:
Something that already has been proposed: pre-allocate and pool your buffers.
A different approach which can be effective if you can use any collection instead of contigous array of bytes (this is not the case if the buffers are used in IO): implement a custom collection which internally will be composed of many smaller-sized arrays. This is something similar to std::deque from C++ STL library. Since each individual array will be smaller than 85K, the whole collection won't get in LOH. The advantage you can get with this approach is the following: LOH is only collected when a full GC happens. If the byte[] in your application are not long-lived, and (if they were smaller in size) would get in Gen0 or Gen1 before being collected, this would make memory management for GC much easier, since Gen2 collection is much more heavyweight.
An advice on the testing & monitoring approach: in my experience, the GC behavior, memory footprint and other memory-related stuff need to be monitored for quite a long time to get some valid and stable data. So each time you change something in the code, have a long enough test with monitoring the memory performance counters to see the impact of the change.
I would also recommend to take a look at % Time in GC counter, as it can be a good indicator of the effectiveness of memory management. The larger this value is, the more time your application spends on GC routines instead of processing the requests from users or doing other 'useful' operations. I cannot give advices for what absolute values of this counter indicate an issue, but I can share my experience for your reference: for the application I am working on, we usually treat % Time in GC higher than 20% as an issue.
Also, it would be useful if you shared some values of memory-related perf counters of your application: Private bytes and Working set of the process, BIAH, Total committed bytes, LOH size, Gen0, Gen1, Gen2 size, # of Gen0, Gen1, Gen2 collections, % Time in GC. This would help better understand your issue.
You could try pooling and managing the large objects yourself. For example, if you often need <500k arrays and the number of arrays alive at once is well understood, you could avoid deallocating them ever--that way if you only need, say, 10 of them at a time, you could suffer a fixed 5mb memory overhead instead of troublesome long-term fragmentation.
As for your three questions:
Is just not possible. Only the garbage collector decides when to finalize managed objects and release their memory. That's part of what makes them managed objects.
This is possible if you manage your own heap in unsafe code and bypass the large object heap entirely. You will end up doing a lot of work and suffering a lot of inconvenience if you go down this road. I doubt that it's worth it for you.
It's the size of the object, not the number of elements in the array.
Remember, fragmentation only happens when objects are freed, not when they're allocated. If fragmentation is indeed your problem, reusing the large objects will help. Focus on creating less garbage (especially large garbage) over the lifetime of the app instead of trying to deal with the nuts and bolts of the gc implementation directly.
Another indicator is watching Private Bytes vs. Bytes in all Heaps. If Private Bytes increases faster than Bytes in all Heaps, you have an unmanaged memory leak. If 'Bytes in all Heaps` increases faster than 'Private Bytes' it is a managed leak.
To correct something that #Alexey Nedilko said:
"LOH external fragmentation is less dangerous than Gen2 external
fragmentation, 'cause LOH is not compacted. The free slots of LOH can
be reused instead."
is absolutely incorrect. Gen2 is compacted which means there is never free space after a collection. The LOH is NOT compacted (as he correctly mentions) and yes, free slots are reused. BUT if the free space is not contiguous to fit the requested allocation, then the segment size is increased - and can continue to grow and grow. So, you can end up with gaps in the LOH that are never filled. This is a common cause of OOMs and I've seen this in many memory dumps I've analyzed.
Though there are now methods in the GC API (as of .NET 4.51) that can be called to programatically compact the LOH, I strongly recommend to avoid this - if app performance is a concern. It is extremely expensive to perform this operation at runtime and and hurt your app performance significantly. The reason that the default implementation of the GC was to be performant which is why they omitted this step in the first place. IMO, if you find that you have to call this because of LOH fragmentation, you are doing something wrong in your app - and it can be improved with pooling techniques, splitting arrays, and other memory allocation tricks instead. If this app is an offline app or some batch process where performance isn't a big deal, maybe it's not so bad but I'd use it sparingly at best.
A good visual example of how this can happen is here - The Dangers of the Large Object Heap and here Large Object Heap Uncovered - by Maoni (GC Team Lead on the CLR)

Categories