I'm reading this http://lampwww.epfl.ch/papers/idealhashtrees.pdf and at the end of page 5 in to page 6 he talks about having an array of 32 linked lists of free tables.
In the algorithm, when you add a node to a table, you resize by 1 and then append it.
I don't understand the benefit of having this linked list of free tables. What's the difference between pre-allocating them and allocating them on the fly? If you pre-allocate then you're wasting memory so the main benefit of a HAMT is defeated, and also you'll rapidly run out of pre-allocated tables to use so it was a waste from the start.
I just don't really understand that section of the paper at all. Either way you're allocating these arrays so how is this more complex version any better than simply re-allocating on demand? I'm coding this in C# by the way if it makes any difference regarding memory optimizations.
Related
I'm brand new to Redis, and am just experimenting with caching some data and seeing how memory usage/performance compares to other options like Memcached. I'm using ServiceStack.Redis client library via IRedisClient
I have been testing Redis, and 25k key/value objects is pushing around 250MB of memory, with a 100MB dump.rdb file. I need to cache a lot more than this, and am looking to reduce the memory consumption if possible. My best guess is that each cache item's text (JSON blob) is around 4k in size, but if my basic math is correct, each item is consuming around 10k in Redis from a memory footprint point of view at least. The vast difference between the dump size and the in memory size is a bit alarming to me.
I'm also running on 64bit VM right now, which I understand wastes a lot of extra space compared to 32bit, so i'll look into that as well. Looks like redis needs 2x the memory for each pointer (per key/value cached?). Could this be where the 2.5x disk:memory ratio is coming from?
I understand I can write code on my side to deal with the compression/decompression of data on the way in/out of Redis, but just curious if there is some way to configure the client library to do something similar with say StreamExtensions.
Usage pattern is ready heavy, with infrequent writes, and/or batch cache refresh writes.
Anyway, looking for any suggestions on how to get more cache items for a given amount of memory.
There are multiple points you need to consider. In the following, I suppose your data are stored in strings each of them containing a JSON object.
The first point is 4 KB JSON objects are stored. The overhead of Redis due to dynamic data structure and pointers is absolutely negligible compared to the size of the useful data. This overhead would be high if you had plenty of very small objects (it is about 80 bytes per key), but with 4 KB objects it should not be a problem.
So using a 32 bit version (reducing the size of pointers) will not help.
The second point is the difference between memory footprint and dump file size is easily explained by the fact strings in the dump file are compressed using the LZF algorithm (and JSON does compress quite well). The memory footprint is generally much larger than the dump file size for non compressed data.
Now the difference you see between the real size of your data and the memory footprint is probably due to the allocator internal fragmentation. Generally, people only consider external fragmentation (i.e. the one which is commonly referred as memory fragmentation by most people), but in some situations, internal fragmentation can also represent a major overhead. See the definitions here.
In your situation, the 4 KB objects are actually one of this worst cases. Redis uses the jemalloc allocator, featuring well-defined allocation classes. You can see that 4 KB is an allocation class and the next one is 8 KB. It means if a number of your objects weight a bit more than 4 KB (including the Redis string overhead of 8 bytes), 8 KB will be allocated instead of 4 KB, and half of the memory will be wasted.
You can easily check this point by only storing objects a bit smaller than 4 KB, and calculate the ratio between the memory footprint and the expected size of the useful data. Repeat the same operation with objects a bit larger than 4 KB and compare the results.
Possible solutions to reduce the overhead:
client side compression. Use any lightweight compression algorithm (LZF, LZO, quicklz, snappy). It will work well if you can maintain the size of most of your objects below 4 KB.
change the memory allocator. Redis makefile also supports tcmalloc (Google allocator) as a memory allocator. It could reduce the memory overhead for these 4 KB objects since the allocation classes are different.
Please note with other in-memory stores, you will also get the same kind of overhead. For instance with memcached, it is the job of the slab allocator to optimize memory consumption, and minimize internal and external fragmentation.
I had myself a hard time understanding how to use Redis efficiently. Especially when you come from Memcache(get/set) VS Redis (strings, hashes, lists, sets & sorted sets).
You should read this article about Redis memory usage : http://nosql.mypopescu.com/post/1010844204/redis-memory-usage. Old article (2010), but still interesting.
I see two solutions here :
Compile and use 32 bit instances. Dump files are compatible between 32bbit and 64bit and you can switch later if you need to.
Using Hashes looks better by me : http://redis.io/topics/memory-optimization. Read the section "Using hashes to abstract a very memory efficient plain key-value store on top of Redis". ServiceStack.Redis provides a RedisClientHash. It should be easy to use !
Hope it can help you !
My understanding is that the CPU does its math operations in conjunction with the CPU caches (L1 etc) and that if a value needed for an operation is not already in the cache a page will need to be got from RAM before the calculation can be performed. It seems reasonable to think, therefore, that managed heap RAM is a better place to be having your Vector data than than any old hole the OS managed to find somewhere in the great expanse of unmanaged stack RAM. I say this because I assume managed memory is held together tighter than unmanaged memory, and therefore there is more likelihood that vectors (x, y, z) for math operations will be stored in same pages loaded into the cache; whereas vectors as structs on the stack might be pages apart. Could anyone explain the pros and cons of class based rather than struct based vector classes in this light?
CPU cache is managed completely by CPU. Memory that is recently accessed is cached by relatively large chunks (i.e. 128 bytes around accessed position).
OSes manage paging to/from physical memory. If you application hitting that process often enough (i.e. size of your data is way bigger that physical RAM) than you have other issues to worry about outside CPU cache line hits and misses.
There is essentially no difference between stack and heap from that point of view. The only meaningful difference is how close the next piece of data to be used to one of recently used once.
In most cases math classes (vector/matrix/points) are stored in sequential blocks of memory for both managed and native implementations. So caching behavior is likely be comparable unless one explicitly does some strange allocations to make individual elements to be far apart in memory.
Summary: make sure to profile your code and keep data compact if performance is of huge concern.
Try and measure different iteration orders across arrays. I.e. if iteration crosses caching lines every time it could be slower - walk by row or by column first in 2d array could show measurable difference for large enough data sets when caches have to be repopulated on most array access...
The stack is faster here is a site that cover it in more detail.
I am attempting to ascertain the maximum sizes (in RAM) of a List and a Dictionary. I am also curious as to the maximum number of elements / entries each can hold, and their memory footprint per entry.
My reasons are simple: I, like most programmers, am somewhat lazy (this is a virtue). When I write a program, I like to write it once, and try to future-proof it as much as possible. I am currently writing a program that uses Lists, but noticed that the iterator wants an integer. Since the capabilities of my program are only limited by available memory / coding style, I'd like to write it so I can use a List with Int64s or possibly BigInts (as the iterators). I've seen IEnumerable as a possibility here, but would like to find out if I can just stuff a Int64 into a Dictionary object as the key, instead of rewriting everything. If I can, I'd like to know what the cost of that might be compared to rewriting it.
My hope is that should my program prove useful, I need only hit recompile in 5 years time to take advantage of the increase in memory.
Is it specified in the documentation for the class? No, then it's unspecified.
In terms of current implementations, there's no maximum size in RAM in the classes themselves, if you create a value type that's 2MB in size, push a few thousand into a list, and receive an out of memory exception, that's nothing to do with List<T>.
Internally, List<T>s workings would prevent it from ever having more than 2billion items. It's harder to come to a quick answer with Dictionary<TKey, TValue>, since the way things are positioned within it is more complicated, but really, if I was looking at dealing with a billion items (if a 32-bit value, for example, then 4GB), I'd be looking to store them in a database and retrieve them using data-access code.
At the very least, once you're dealing with a single data structure that's 4GB in size, rolling your own custom collection class no longer counts as reinventing the wheel.
I am using a concurrentdictionary to rank 3x3 patterns in half a million games of go. Obviously there are a lot of possible patterns. With C# 4.0 the concurrentdictionary goes out of memory at around 120 million objects. It is using 8GB at that time (on a 32GB machine) but wants to grow way too much I think (tablegrowths happen in large chunks with concurrentdictionary). Using a database would slow me down at least a hundredfold I think. And the process is taking 10 hours already.
My solution was to use a multiphase solution, actually doing multiple passes, one for each subset of patterns. Like one pass for odd patterns and one for even patterns. When using more objects no longer fails I can reduce the amount of passes.
C# 4.5 adds support for larger arraysin 64bit by using unsigned 32bit pointers for arrays
(the mentioned limit goes from 2 billion to 4 billion). See also
http://msdn.microsoft.com/en-us/library/hh285054(v=vs.110).aspx. Not sure which objects will benefit from this, List<> might.
I think you have bigger issues to solve before even wondering if a Dictionary with an int64 key will be useful in 5 or 10 years.
Having a List or Dictionary of 2e+10 elements in memory (int32) doesn't seem to be a good idea, never mind 9e+18 elements (int64). Anyhow the framework will never allow you to create a monster that size (not even close) and probably never will. (Keep in mind that a simple int[int.MaxValue] array already far exceeds the framework's limit for memory allocation of any given object).
And the question remains: Why would you ever want your application to hold in memory a list of so many items? You are better of using a specialized data storage backend (database) if you have to manage that amount of information.
I have an application that read 3-4 GB s of data, build entities out of each line and then stores them in Lists.
The problem I had is, memory grows insane becomes like 13 to 15 GB. Why the heck storing these entities takes so much memory.
So I build a Tree and did something similar to Huffman Encoding, and overall memory size became around 200 - 300 MB.
I understand, that I compacted the data. But I wasn't expecting that storing objects in the list would increase the memory so much. Why did that happen?
how about other data structures like dictionary, stack, queue, array etc?
Where can I find more information about the internals and memory allocations of data structures?
Or am I doing something wrong?
In .NET large objects go on the large object heap which is not compacted. Large is everything above 85,000 bytes. When you grow your lists they will probably become larger than that and have to be reallocated once you cross the current capacity. Rellocation means that they are very likely put at the end of the heap. So you end up with a very fragmented LOH and lots of memory usage.
Update: If you initialize your lists with the required capacity (which you can determine from the DB I guess) then your memory consumption should go down a bit.
Regardless of the data structure you're going to use, your memory consumption is never going to drop below the memory required to store all your data.
Have you calculated how much memory it is required to store one instance class object?
Your huffman encoding is a space-saving optimization, which means that you are eliminating a lot of duplicated data within your class objects yourself. This has nothing to do with the data structure you use to hold your data. This depends on how your data itself is structured so that you can take advantage of different space-saving strategies (of which huffman encoding is one out of many possibilities, suitable for eliminating common prefixes and the data structure used to store it is a tree).
Now, back to your question. Without optimizing your data (i.e. objects), there are things you can watch out to improve memory usage efficiency.
Are all our objects of similar size?
Did you simply run a loop, allocate memory on-the-fly, then insert them into a list, like this:
foreach (var obj in collection) { myList.Add(new myObject(obj)); }
In that case, your list object is constantly being expanded. And if there is not enough free memory at the end to expand the list, .NET will allocate a new, larger piece of memory and copies the original array to the new memory. Essentially you end up with two pieces of memory -- the original one, and the new expanded one (now holding the list). Do this many many many times (as you obviously need to for GB's of data), and you are looking at a LOT of fragmented memory spaces.
You'll be better off just allocating enough memory for the entire list at one go.
As an afternote, I can't help but wondering: how in the world are you going to search this HUGE list to find something you need? Shouldn't you be using something like a binary tree or a hash-table to aid in your searching? Maybe you are just reading in all the data, perform some processing on all of them, then writing them back out...
If you are using classes, read the response of this: Understanding CLR object size between 32 bit vs 64 bit
On 64 bits (you are using 64 bits, right?) object overhead is 16 bytes PLUS the reference to the object (someone is referencing him, right?) so another 8 bytes. So an empty object will "eat" at least 24 bytes.
If you are using Lists, remember that Lists grow by doubling, so you could be wasting much space. Other .NET collections grow in the same way.
I'll add that the "pure" overhead of million of Lists could bring the memory to his knees. Other than the 16 + 8 bytes of space "eaten" by the List object, it is composed (in the .NET implementation) of 2 ints (8 bytes), a SyncLock reference (8 bytes, it's null normally) and a reference to the internal array (so 8 + 16 bytes + the array)
I have an out of memory exception using C# when reading in a massive file
I need to change the code but for the time being can I increase the heap size (like I would in Java) as a shaort term fix?
.Net does that automatically.
Looks like you have reached the limit of the memory one .Net process can use for its objects (on 32 bit machine this is 2 standard or 3GB by using the /3GB boot switch. Credits to Leppie & Eric Lippert for the info).
Rethink your algorithm, or perhaps a change to a 64 bit machine might help.
No, this is not possible. This problem might occur because you're running on a 32-bit OS and memory is too fragmented. Try not to load the whole file into memory (for instance, by processing line by line) or, when you really need to load it completely, by loading it in multiple, smaller parts.
No you can't see my answer here: Is there any way to pre-allocate the heap in the .NET runtime, like -Xmx/-Xms in Java?
For reading large files it is usually preferable to stream them from disk, reading them in chunks and dealing with them a piece at a time instead of loading the whole thing up front.
As others have already pointed out, this is not possible. The .NET runtime handles heap allocations on behalf of the application.
In my experience .NET applications commonly suffer from OOM when there should be plenty of memory available (or at least, so it appears). The reason for this is usually the use of huge collections such as arrays, List (which uses an array to store its data) or similar.
The problem is these types will sometimes create peaks in memory use. If these peak requests cannot be honored an OOM exception is throw. E.g. when List needs to increase its capacity it does so by allocating a new array of double the current size and then it copies all the references/values from one array to the other. Similarly operations such as ToArray makes a new copy of the array. I've also seen similar problems on big LINQ operations.
Each array is stored as contiguous memory, so to avoid OOM the runtime must be able to obtain one big chunk of memory. As the address space of the process may be fragmented due to both DLL loading and general use for the heap, this is not always possible in which case an OOM exception is thrown.
What sort of file are you dealing with ?
You might be better off using a StreamReader and yield returning the ReadLine result, if it's textual.
Sure, you'll be keeping a file-pointer around, but the worst case scenario is massively reduced.
There are similar methods for Binary files, if you're uploading a file to SQL for example, you can read a byte[] and use the Sql Pointer mechanics to write the buffer to the end of a blob.