Collections and memory - c#

I have an application that read 3-4 GB s of data, build entities out of each line and then stores them in Lists.
The problem I had is, memory grows insane becomes like 13 to 15 GB. Why the heck storing these entities takes so much memory.
So I build a Tree and did something similar to Huffman Encoding, and overall memory size became around 200 - 300 MB.
I understand, that I compacted the data. But I wasn't expecting that storing objects in the list would increase the memory so much. Why did that happen?
how about other data structures like dictionary, stack, queue, array etc?
Where can I find more information about the internals and memory allocations of data structures?
Or am I doing something wrong?

In .NET large objects go on the large object heap which is not compacted. Large is everything above 85,000 bytes. When you grow your lists they will probably become larger than that and have to be reallocated once you cross the current capacity. Rellocation means that they are very likely put at the end of the heap. So you end up with a very fragmented LOH and lots of memory usage.
Update: If you initialize your lists with the required capacity (which you can determine from the DB I guess) then your memory consumption should go down a bit.

Regardless of the data structure you're going to use, your memory consumption is never going to drop below the memory required to store all your data.
Have you calculated how much memory it is required to store one instance class object?
Your huffman encoding is a space-saving optimization, which means that you are eliminating a lot of duplicated data within your class objects yourself. This has nothing to do with the data structure you use to hold your data. This depends on how your data itself is structured so that you can take advantage of different space-saving strategies (of which huffman encoding is one out of many possibilities, suitable for eliminating common prefixes and the data structure used to store it is a tree).
Now, back to your question. Without optimizing your data (i.e. objects), there are things you can watch out to improve memory usage efficiency.
Are all our objects of similar size?
Did you simply run a loop, allocate memory on-the-fly, then insert them into a list, like this:
foreach (var obj in collection) { myList.Add(new myObject(obj)); }
In that case, your list object is constantly being expanded. And if there is not enough free memory at the end to expand the list, .NET will allocate a new, larger piece of memory and copies the original array to the new memory. Essentially you end up with two pieces of memory -- the original one, and the new expanded one (now holding the list). Do this many many many times (as you obviously need to for GB's of data), and you are looking at a LOT of fragmented memory spaces.
You'll be better off just allocating enough memory for the entire list at one go.
As an afternote, I can't help but wondering: how in the world are you going to search this HUGE list to find something you need? Shouldn't you be using something like a binary tree or a hash-table to aid in your searching? Maybe you are just reading in all the data, perform some processing on all of them, then writing them back out...

If you are using classes, read the response of this: Understanding CLR object size between 32 bit vs 64 bit
On 64 bits (you are using 64 bits, right?) object overhead is 16 bytes PLUS the reference to the object (someone is referencing him, right?) so another 8 bytes. So an empty object will "eat" at least 24 bytes.
If you are using Lists, remember that Lists grow by doubling, so you could be wasting much space. Other .NET collections grow in the same way.
I'll add that the "pure" overhead of million of Lists could bring the memory to his knees. Other than the 16 + 8 bytes of space "eaten" by the List object, it is composed (in the .NET implementation) of 2 ints (8 bytes), a SyncLock reference (8 bytes, it's null normally) and a reference to the internal array (so 8 + 16 bytes + the array)

Related

Buffer growing requires triple memory - is there more efficient way (in generic)?

Suppose I have a byte[] (buffer) and I need to keep it in memory. It also needs growing strategy x2. So each time I need it to grow I have to create another byte[] twice large and copy elements there.
During this operation the app memory usage is original_size * 3 because I have in memory both the original buffer and the new buffer.
When working with buffers of size near 300 mb this operation takes 900 mb of memory and can easily lead to OutOfMemoryException in 32-bit app.
The only way to make this better I know is to choose another growing strategy: after some constant size make the buffer grow linearly (e.g. +5mb each time). But it still requires 600 mb for my 300 mb of data!
So can I do anything about it?
I think about structure which maintains a list of buffers inside so it doesn't need to copy anything when growing - just adds new buffer to the list. It should provide methods to perform operations with those buffers as with one big buffer. Is there something in .NET like this? Or usually what's it called like?
Added:
I don't know the capacity preliminary.
The buffers are used in serialization process.
Sorry for misleading, the question is about buffer growing problem in generic - not about solving a particular app issue.
In this scenario I think you should use ArrayList or List instead of array.
List is better than ArrayList. For more information visit:
Which is better? array, ArrayList or List (in terms of performance and speed)

Hash Array Mapped Tries Memory Allocation

I'm reading this http://lampwww.epfl.ch/papers/idealhashtrees.pdf and at the end of page 5 in to page 6 he talks about having an array of 32 linked lists of free tables.
In the algorithm, when you add a node to a table, you resize by 1 and then append it.
I don't understand the benefit of having this linked list of free tables. What's the difference between pre-allocating them and allocating them on the fly? If you pre-allocate then you're wasting memory so the main benefit of a HAMT is defeated, and also you'll rapidly run out of pre-allocated tables to use so it was a waste from the start.
I just don't really understand that section of the paper at all. Either way you're allocating these arrays so how is this more complex version any better than simply re-allocating on demand? I'm coding this in C# by the way if it makes any difference regarding memory optimizations.

Looking to optimize Redis memory usage for caching many JSON API results

I'm brand new to Redis, and am just experimenting with caching some data and seeing how memory usage/performance compares to other options like Memcached. I'm using ServiceStack.Redis client library via IRedisClient
I have been testing Redis, and 25k key/value objects is pushing around 250MB of memory, with a 100MB dump.rdb file. I need to cache a lot more than this, and am looking to reduce the memory consumption if possible. My best guess is that each cache item's text (JSON blob) is around 4k in size, but if my basic math is correct, each item is consuming around 10k in Redis from a memory footprint point of view at least. The vast difference between the dump size and the in memory size is a bit alarming to me.
I'm also running on 64bit VM right now, which I understand wastes a lot of extra space compared to 32bit, so i'll look into that as well. Looks like redis needs 2x the memory for each pointer (per key/value cached?). Could this be where the 2.5x disk:memory ratio is coming from?
I understand I can write code on my side to deal with the compression/decompression of data on the way in/out of Redis, but just curious if there is some way to configure the client library to do something similar with say StreamExtensions.
Usage pattern is ready heavy, with infrequent writes, and/or batch cache refresh writes.
Anyway, looking for any suggestions on how to get more cache items for a given amount of memory.
There are multiple points you need to consider. In the following, I suppose your data are stored in strings each of them containing a JSON object.
The first point is 4 KB JSON objects are stored. The overhead of Redis due to dynamic data structure and pointers is absolutely negligible compared to the size of the useful data. This overhead would be high if you had plenty of very small objects (it is about 80 bytes per key), but with 4 KB objects it should not be a problem.
So using a 32 bit version (reducing the size of pointers) will not help.
The second point is the difference between memory footprint and dump file size is easily explained by the fact strings in the dump file are compressed using the LZF algorithm (and JSON does compress quite well). The memory footprint is generally much larger than the dump file size for non compressed data.
Now the difference you see between the real size of your data and the memory footprint is probably due to the allocator internal fragmentation. Generally, people only consider external fragmentation (i.e. the one which is commonly referred as memory fragmentation by most people), but in some situations, internal fragmentation can also represent a major overhead. See the definitions here.
In your situation, the 4 KB objects are actually one of this worst cases. Redis uses the jemalloc allocator, featuring well-defined allocation classes. You can see that 4 KB is an allocation class and the next one is 8 KB. It means if a number of your objects weight a bit more than 4 KB (including the Redis string overhead of 8 bytes), 8 KB will be allocated instead of 4 KB, and half of the memory will be wasted.
You can easily check this point by only storing objects a bit smaller than 4 KB, and calculate the ratio between the memory footprint and the expected size of the useful data. Repeat the same operation with objects a bit larger than 4 KB and compare the results.
Possible solutions to reduce the overhead:
client side compression. Use any lightweight compression algorithm (LZF, LZO, quicklz, snappy). It will work well if you can maintain the size of most of your objects below 4 KB.
change the memory allocator. Redis makefile also supports tcmalloc (Google allocator) as a memory allocator. It could reduce the memory overhead for these 4 KB objects since the allocation classes are different.
Please note with other in-memory stores, you will also get the same kind of overhead. For instance with memcached, it is the job of the slab allocator to optimize memory consumption, and minimize internal and external fragmentation.
I had myself a hard time understanding how to use Redis efficiently. Especially when you come from Memcache(get/set) VS Redis (strings, hashes, lists, sets & sorted sets).
You should read this article about Redis memory usage : http://nosql.mypopescu.com/post/1010844204/redis-memory-usage. Old article (2010), but still interesting.
I see two solutions here :
Compile and use 32 bit instances. Dump files are compatible between 32bbit and 64bit and you can switch later if you need to.
Using Hashes looks better by me : http://redis.io/topics/memory-optimization. Read the section "Using hashes to abstract a very memory efficient plain key-value store on top of Redis". ServiceStack.Redis provides a RedisClientHash. It should be easy to use !
Hope it can help you !

String vs byte array, Performance

(This post is regarding High Frequency type programming)
I recently saw on a forum (I think they were discussing Java) that if you have to parse a lot of string data its better to use a byte array than a string with a split(). The exact post was:
One performance trick to working with any language, C++, Java, C# is
to avoid object creation. It's not the cost of allocation or GC, its
the cost to access large memory arrays that dont fit in the CPU cache.
Modern CPU's are much faster than their memory. They stall for many,
many cycles for each cache miss. Most of the CPU transister budget is
allocated to reduce this with large caches and lots of ticks.
GPU's solve the problem differently by having lots of threads ready to
execute to hide memory access latency and have little or no cache and
spend the transistors on more cores.
So, for example, rather than using String's and split to parse a
message, use byte arrays that can be updated in place. You really want
to avoid random memory access over large data structures, at least in
the inner loops.
Is he just saying "dont use strings because they're an object and creating objects is costly" ? Or is he saying something else?
Does using a byte array ensure the data remains in the cache for as long as possible?
When you use a string is it too large to be held in the CPU cache?
Generally, is using the primitive data types the best methods for writing faster code?
He's saying that if you break a chunk text up into separate string objects, those string objects have worse locality than the large array of text. Each string, and the array of characters it contains, is going to be somewhere else in memory; they can be spread all over the place. It is likely that the memory cache will have to thrash in and out to access the various strings as you process the data. In contrast, the one large array has the best possible locality, as all the data is on one area of memory, and cache-thrashing will be kept to a minimum.
There are limits to this, of course: if the text is very, very large, and you only need to parse out part of it, then those few small strings might fit better in the cache than the large chunk of text.
There are lots of other reasons to use byte[] or char* instead of Strings for HFT. Strings consists of 16-bit char in Java and are immutable. byte[] or ByteBuffer are easily recycled, have good cache locatity, can be off the heap (direct) saving a copy, avoiding character encoders. This all assumes you are using ASCII data.
char* or ByteBuffers can also be mapped to network adapters to save another copy. (With some fiddling for ByteBuffers)
In HFT you are rarely dealing with large amounts of data at once. Ideally you want to be processing data as soon as it comes down the Socket. i.e. one packet at a time. (about 1.5 KB)

Why does reusing arrays increase performance so significantly in c#?

In my code, I perform a large number of tasks, each requiring a large array of memory to temporarily store data. I have about 500 tasks. At the beginning of each task, I allocate memory for
an array :
double[] tempDoubleArray = new double[M];
M is a large number depending on the precise task, typically around 2000000. Now, I do some complex calculations to fill the array, and in the end I use the array to determine the result of this task. After that, the tempDoubleArray goes out of scope.
Profiling reveals that the calls to construct the arrays are time consuming. So, I decide to try and reuse the array, by making it static and reusing it. It requires some additional juggling to
figure out the minimum size of the array, requiring an extra pass through all tasks, but it works. Now, the program is much faster (from 80 sec to 22 sec for execution of all tasks).
double[] tempDoubleArray = staticDoubleArray;
However, I'm a bit in the dark of why precisely this works so well. Id say that in the original code, when the tempDoubleArray goes out of scope, it can be collected, so allocating a new array should not be that hard right?
I ask this because understanding why it works might help me figuring out other ways to achieve the same effect, and because I would like to know in what cases allocation gives performance issues.
Just because something can be collected doesn't mean that it will. In fact, were the garbage collector as aggressive as that in its collection, your performance would be significantly worse.
Bear in mind that creating an array is not just creating one variable, it's creating N variables (N being the number of elements in the array). Reusing arrays is a good bang-for-your-buck way of increasing performance, though you have to do so carefully.
To clarify, what I mean by "creating variables" specifically is allocating the space for them and performing whatever steps the runtime has to in order to make them usable (i.e. initializing the values to zero/null). Because arrays are reference types, they are stored on the heap, which makes life a little more complicated when it comes to memory allocation. Depending on the size of the array (whether or not it's over 85KB in total storage space), it will either be stored in the normal heap or the Large Object Heap. An array stored on the ordinary heap, as with all other heap objects, can trigger garbage collection and compaction of the heap (which involves shuffling around currently in-use memory in order to maximize contiguous available space). An array stored on the Large Object Heap would not trigger compaction (as the LOH is never compacted), but it could trigger premature collection by taking up another large contiguous block of memory.
One answer could be the large object heap - objects greater than 85KB are allocated on a different LOH, that is less frequently collected and not compacted.
See the section on performance implications
there is an allocation cost (primarily clearing out the allocated memory)
the collection cost (LOH and Gen2 are collected together - causing compaction of large objects in Gen2)
It's not always easy to allocate large blocks of memory in the presence of fragmentation. I can't say for sure, but my guess is that it's having to do some rearranging to get enough contiguous memory for such a big block of memory. As for why allocating subsequent arrays isn't faster, my guess is either that the big block gets fragmented between GC time and the next allocation OR the original block was never GCd to start with.

Categories