String vs byte array, Performance - c#

(This post is regarding High Frequency type programming)
I recently saw on a forum (I think they were discussing Java) that if you have to parse a lot of string data its better to use a byte array than a string with a split(). The exact post was:
One performance trick to working with any language, C++, Java, C# is
to avoid object creation. It's not the cost of allocation or GC, its
the cost to access large memory arrays that dont fit in the CPU cache.
Modern CPU's are much faster than their memory. They stall for many,
many cycles for each cache miss. Most of the CPU transister budget is
allocated to reduce this with large caches and lots of ticks.
GPU's solve the problem differently by having lots of threads ready to
execute to hide memory access latency and have little or no cache and
spend the transistors on more cores.
So, for example, rather than using String's and split to parse a
message, use byte arrays that can be updated in place. You really want
to avoid random memory access over large data structures, at least in
the inner loops.
Is he just saying "dont use strings because they're an object and creating objects is costly" ? Or is he saying something else?
Does using a byte array ensure the data remains in the cache for as long as possible?
When you use a string is it too large to be held in the CPU cache?
Generally, is using the primitive data types the best methods for writing faster code?

He's saying that if you break a chunk text up into separate string objects, those string objects have worse locality than the large array of text. Each string, and the array of characters it contains, is going to be somewhere else in memory; they can be spread all over the place. It is likely that the memory cache will have to thrash in and out to access the various strings as you process the data. In contrast, the one large array has the best possible locality, as all the data is on one area of memory, and cache-thrashing will be kept to a minimum.
There are limits to this, of course: if the text is very, very large, and you only need to parse out part of it, then those few small strings might fit better in the cache than the large chunk of text.

There are lots of other reasons to use byte[] or char* instead of Strings for HFT. Strings consists of 16-bit char in Java and are immutable. byte[] or ByteBuffer are easily recycled, have good cache locatity, can be off the heap (direct) saving a copy, avoiding character encoders. This all assumes you are using ASCII data.
char* or ByteBuffers can also be mapped to network adapters to save another copy. (With some fiddling for ByteBuffers)
In HFT you are rarely dealing with large amounts of data at once. Ideally you want to be processing data as soon as it comes down the Socket. i.e. one packet at a time. (about 1.5 KB)

Related

Looking to optimize Redis memory usage for caching many JSON API results

I'm brand new to Redis, and am just experimenting with caching some data and seeing how memory usage/performance compares to other options like Memcached. I'm using ServiceStack.Redis client library via IRedisClient
I have been testing Redis, and 25k key/value objects is pushing around 250MB of memory, with a 100MB dump.rdb file. I need to cache a lot more than this, and am looking to reduce the memory consumption if possible. My best guess is that each cache item's text (JSON blob) is around 4k in size, but if my basic math is correct, each item is consuming around 10k in Redis from a memory footprint point of view at least. The vast difference between the dump size and the in memory size is a bit alarming to me.
I'm also running on 64bit VM right now, which I understand wastes a lot of extra space compared to 32bit, so i'll look into that as well. Looks like redis needs 2x the memory for each pointer (per key/value cached?). Could this be where the 2.5x disk:memory ratio is coming from?
I understand I can write code on my side to deal with the compression/decompression of data on the way in/out of Redis, but just curious if there is some way to configure the client library to do something similar with say StreamExtensions.
Usage pattern is ready heavy, with infrequent writes, and/or batch cache refresh writes.
Anyway, looking for any suggestions on how to get more cache items for a given amount of memory.
There are multiple points you need to consider. In the following, I suppose your data are stored in strings each of them containing a JSON object.
The first point is 4 KB JSON objects are stored. The overhead of Redis due to dynamic data structure and pointers is absolutely negligible compared to the size of the useful data. This overhead would be high if you had plenty of very small objects (it is about 80 bytes per key), but with 4 KB objects it should not be a problem.
So using a 32 bit version (reducing the size of pointers) will not help.
The second point is the difference between memory footprint and dump file size is easily explained by the fact strings in the dump file are compressed using the LZF algorithm (and JSON does compress quite well). The memory footprint is generally much larger than the dump file size for non compressed data.
Now the difference you see between the real size of your data and the memory footprint is probably due to the allocator internal fragmentation. Generally, people only consider external fragmentation (i.e. the one which is commonly referred as memory fragmentation by most people), but in some situations, internal fragmentation can also represent a major overhead. See the definitions here.
In your situation, the 4 KB objects are actually one of this worst cases. Redis uses the jemalloc allocator, featuring well-defined allocation classes. You can see that 4 KB is an allocation class and the next one is 8 KB. It means if a number of your objects weight a bit more than 4 KB (including the Redis string overhead of 8 bytes), 8 KB will be allocated instead of 4 KB, and half of the memory will be wasted.
You can easily check this point by only storing objects a bit smaller than 4 KB, and calculate the ratio between the memory footprint and the expected size of the useful data. Repeat the same operation with objects a bit larger than 4 KB and compare the results.
Possible solutions to reduce the overhead:
client side compression. Use any lightweight compression algorithm (LZF, LZO, quicklz, snappy). It will work well if you can maintain the size of most of your objects below 4 KB.
change the memory allocator. Redis makefile also supports tcmalloc (Google allocator) as a memory allocator. It could reduce the memory overhead for these 4 KB objects since the allocation classes are different.
Please note with other in-memory stores, you will also get the same kind of overhead. For instance with memcached, it is the job of the slab allocator to optimize memory consumption, and minimize internal and external fragmentation.
I had myself a hard time understanding how to use Redis efficiently. Especially when you come from Memcache(get/set) VS Redis (strings, hashes, lists, sets & sorted sets).
You should read this article about Redis memory usage : http://nosql.mypopescu.com/post/1010844204/redis-memory-usage. Old article (2010), but still interesting.
I see two solutions here :
Compile and use 32 bit instances. Dump files are compatible between 32bbit and 64bit and you can switch later if you need to.
Using Hashes looks better by me : http://redis.io/topics/memory-optimization. Read the section "Using hashes to abstract a very memory efficient plain key-value store on top of Redis". ServiceStack.Redis provides a RedisClientHash. It should be easy to use !
Hope it can help you !

What's the most efficient way to manage large amounts of data (height data) and replace this huge array?

I need to be able to look up this data quickly and need access to all of this data. Unfortunately, I also need to conserve memory (several of this will cause OutofMemoryExceptions)
short[,,] data = new short[8000,8000,2];
I have attempted the following:
tried jagged array - same memory problems
tried breaking into smaller arrays - still get memory issues
only resolution is to map this data efficiently using a memory mapped file or is there some other way to do this?
How about a database? After all, they are made for this.
I'd suggest you take a look at some NoSQL database. Depending on your needs, there are also in-memory databases [which obviously could suffer from the same out-of-memory problem] and databases that can be copy deployed or linked to your application.
I wouldn't want to mess with the storage details manually, and memory-mapping files is what some databases (at least MongoDB) are doing internally. So essentially, you'd be rolling your own DB, and writing a database is not trivial -- even if you narrow down the use case.
Redis or Membase sound like suitable alternatives for your problem. As far as I can see, both are able to manage the RAM utilization for you, that is, read data from the disk as needed and cache data in RAM for fast access. Of course, your access patterns will play a role here.
Keep in mind that a lot of effort went into building these DBs. According to Wikipedia, Zynga is using Membase and Redis is sponsored by VMWare.
Are you sure you need access to all of it all of the time? ...or could you load a portion of it, do your processing then move onto the next?
Could you get away with using mip-mapping or LoD representations if it's just height data? Both of those could allow you to hold lower resolutions until you need to load up specific chunks of the higher resolution data.
How much free memory do you have on your machine? What operating system are you using? Is it 64 bit?
If you're doing memory / processing intensive operations, have you considered implementing those parts in C++ where you have greater control over such things?
It's difficult to help you much further without knowing some more specifics of your system and what your actually doing with your data... ?
I wouldn't recommend a traditional relational database if you're doing numeric calculations with this data. I suspect what you're running into here isn't the size of the data itself, but rather a known problem with .NET called Large Object Heap Fragmentation. If you're running into a problem after allocating these buffers frequently (even though they should be being garbage collected), this is likely your culprit. Your best solution is to keep as many buffers as you need pre-allocated and re-use them, to prevent the reallocation and subsequent fragmentation.
How are you interacting with this large multi dimensional array? Are you using Recursion? If so, make sure your recursive methods are passing parameters by reference, rather than by value.
On a side note, do you need 100% of this data accessible at the same time? The best way to deal with large volumes of data is usually via a stream, or some kind of reader object. Try to deal with the data in segments. I've got a few processes that deal with Gigs worth of data, and it can process it in a minor amount of memory due to how I'm streaming it in via a SqlDataReader.
TL;DR: look at how you pass data between your function calls O(ref) and maybe use streaming patterns to deal with the data in smaller chunks.
hope that helps!
.NET stores shorts as 32-bit values even though they only contain 16 bits. So you could save a factor two by using an array of ints and decoding the int to two shorts yourself using bit operations.
Then you pretty much have the most efficient way of storing such an array. What you can do then is:
Use a 64-bit machine. Then you can allocate a lot of memory and the OS will take care of paging the data to disk for you if you run out of RAM (make sure you have a large enough swap file). Then you can use 8 TERAbytes of data (if you have a large enough disk).
Read parts of this data from disk as you need them manually using file IO, or using memory mapping.

Collections and memory

I have an application that read 3-4 GB s of data, build entities out of each line and then stores them in Lists.
The problem I had is, memory grows insane becomes like 13 to 15 GB. Why the heck storing these entities takes so much memory.
So I build a Tree and did something similar to Huffman Encoding, and overall memory size became around 200 - 300 MB.
I understand, that I compacted the data. But I wasn't expecting that storing objects in the list would increase the memory so much. Why did that happen?
how about other data structures like dictionary, stack, queue, array etc?
Where can I find more information about the internals and memory allocations of data structures?
Or am I doing something wrong?
In .NET large objects go on the large object heap which is not compacted. Large is everything above 85,000 bytes. When you grow your lists they will probably become larger than that and have to be reallocated once you cross the current capacity. Rellocation means that they are very likely put at the end of the heap. So you end up with a very fragmented LOH and lots of memory usage.
Update: If you initialize your lists with the required capacity (which you can determine from the DB I guess) then your memory consumption should go down a bit.
Regardless of the data structure you're going to use, your memory consumption is never going to drop below the memory required to store all your data.
Have you calculated how much memory it is required to store one instance class object?
Your huffman encoding is a space-saving optimization, which means that you are eliminating a lot of duplicated data within your class objects yourself. This has nothing to do with the data structure you use to hold your data. This depends on how your data itself is structured so that you can take advantage of different space-saving strategies (of which huffman encoding is one out of many possibilities, suitable for eliminating common prefixes and the data structure used to store it is a tree).
Now, back to your question. Without optimizing your data (i.e. objects), there are things you can watch out to improve memory usage efficiency.
Are all our objects of similar size?
Did you simply run a loop, allocate memory on-the-fly, then insert them into a list, like this:
foreach (var obj in collection) { myList.Add(new myObject(obj)); }
In that case, your list object is constantly being expanded. And if there is not enough free memory at the end to expand the list, .NET will allocate a new, larger piece of memory and copies the original array to the new memory. Essentially you end up with two pieces of memory -- the original one, and the new expanded one (now holding the list). Do this many many many times (as you obviously need to for GB's of data), and you are looking at a LOT of fragmented memory spaces.
You'll be better off just allocating enough memory for the entire list at one go.
As an afternote, I can't help but wondering: how in the world are you going to search this HUGE list to find something you need? Shouldn't you be using something like a binary tree or a hash-table to aid in your searching? Maybe you are just reading in all the data, perform some processing on all of them, then writing them back out...
If you are using classes, read the response of this: Understanding CLR object size between 32 bit vs 64 bit
On 64 bits (you are using 64 bits, right?) object overhead is 16 bytes PLUS the reference to the object (someone is referencing him, right?) so another 8 bytes. So an empty object will "eat" at least 24 bytes.
If you are using Lists, remember that Lists grow by doubling, so you could be wasting much space. Other .NET collections grow in the same way.
I'll add that the "pure" overhead of million of Lists could bring the memory to his knees. Other than the 16 + 8 bytes of space "eaten" by the List object, it is composed (in the .NET implementation) of 2 ints (8 bytes), a SyncLock reference (8 bytes, it's null normally) and a reference to the internal array (so 8 + 16 bytes + the array)

C# Increase Heap Size - Is It Possible

I have an out of memory exception using C# when reading in a massive file
I need to change the code but for the time being can I increase the heap size (like I would in Java) as a shaort term fix?
.Net does that automatically.
Looks like you have reached the limit of the memory one .Net process can use for its objects (on 32 bit machine this is 2 standard or 3GB by using the /3GB boot switch. Credits to Leppie & Eric Lippert for the info).
Rethink your algorithm, or perhaps a change to a 64 bit machine might help.
No, this is not possible. This problem might occur because you're running on a 32-bit OS and memory is too fragmented. Try not to load the whole file into memory (for instance, by processing line by line) or, when you really need to load it completely, by loading it in multiple, smaller parts.
No you can't see my answer here: Is there any way to pre-allocate the heap in the .NET runtime, like -Xmx/-Xms in Java?
For reading large files it is usually preferable to stream them from disk, reading them in chunks and dealing with them a piece at a time instead of loading the whole thing up front.
As others have already pointed out, this is not possible. The .NET runtime handles heap allocations on behalf of the application.
In my experience .NET applications commonly suffer from OOM when there should be plenty of memory available (or at least, so it appears). The reason for this is usually the use of huge collections such as arrays, List (which uses an array to store its data) or similar.
The problem is these types will sometimes create peaks in memory use. If these peak requests cannot be honored an OOM exception is throw. E.g. when List needs to increase its capacity it does so by allocating a new array of double the current size and then it copies all the references/values from one array to the other. Similarly operations such as ToArray makes a new copy of the array. I've also seen similar problems on big LINQ operations.
Each array is stored as contiguous memory, so to avoid OOM the runtime must be able to obtain one big chunk of memory. As the address space of the process may be fragmented due to both DLL loading and general use for the heap, this is not always possible in which case an OOM exception is thrown.
What sort of file are you dealing with ?
You might be better off using a StreamReader and yield returning the ReadLine result, if it's textual.
Sure, you'll be keeping a file-pointer around, but the worst case scenario is massively reduced.
There are similar methods for Binary files, if you're uploading a file to SQL for example, you can read a byte[] and use the Sql Pointer mechanics to write the buffer to the end of a blob.

Is there any scenario where the Rope data structure is more efficient than a string builder

Related to this question, based
on a comment of user Eric
Lippert.
Is there any scenario where the Rope data structure is more efficient than a string builder? It is some people's opinion that rope data structures are almost never better in terms of speed than the native string or string builder operations in typical cases, so I am curious to see realistic scenarios where indeed ropes are better.
The documentation for the SGI C++ implementation goes into some detail on the big O behaviours verses the constant factors which is instructive.
Their documentation assumes very long strings being involved, the examples posited for reference talk about 10 MB strings. Very few programs will be written which deal with such things and, for many classes of problems with such requirements reworking them to be stream based rather than requiring the full string to be available where possible will lead to significantly superior results. As such ropes are for non streaming manipulation of multi megabyte character sequences when you are able to appropriately treat the rope as sections (themselves ropes) rather than just a sequence of characters.
Significant Pros:
Concatenation/Insertion become nearly constant time operations
Certain operations may reuse the previous rope sections to allow sharing in memory.
Note that .Net strings, unlike java strings do not share the character buffer on substrings - a choice with pros and cons in terms of memory footprint. Ropes tend to avoid this sort of issue.
Ropes allow deferred loading of substrings until required
Note that this is hard to get right, very easy to render pointless due to excessive eagerness of access and requires consuming code to treat it as a rope, not as a sequence of characters.
Significant Cons:
Random read access becomes O(log n)
The constant factors on sequential read access seem to be between 5 and 10
efficient use of the API requires treating it as a rope, not just dropping in a rope as a backing implementation on the 'normal' string api.
This leads to a few 'obvious' uses (the first mentioned explicitly by SGI).
Edit buffers on large files allowing easy undo/redo
Note that, at some point you may need to write the changes to disk, involving streaming through the entire string, so this is only useful if most edits will primarily reside in memory rather than requiring frequent persistence (say through an autosave function)
Manipulation of DNA segments where significant manipulation occurs, but very little output actually happens
Multi threaded Algorithms which mutate local subsections of string. In theory such cases can be parcelled off to separate threads and cores without needing to take local copies of the subsections and then recombine them, saving considerable memory as well as avoiding a costly serial combining operation at the end.
There are cases where domain specific behaviour in the string can be coupled with relatively simple augmentations to the Rope implementation to allow:
Read only strings with significant numbers of common substrings are amenable to simple interning for significant memory savings.
Strings with sparse structures, or significant local repetition are amenable to run length encoding while still allowing reasonable levels of random access.
Where the sub string boundaries are themselves 'nodes' where information may be stored, though such structures are quite possible better done as a Radix Trie if they are rarely modified but often read.
As you can see from the examples listed, all fall well into the 'niche' category. Further, several may well have superior alternatives if you are willing/able to rewrite the algorithm as a stream processing operation instead.
the short answer to this question is yes, and that requires little explanation. Of course there's situations where the Rope data structure is more efficient than a string builder. they work differently, so they are more suited for different purposes.
(From a C# perspective)
The rope data structure as a binary tree is better in certain situations. When you're looking at extremely large string values (think 100+ MB of xml coming in from SQL), the rope data structure could keep the entire process off the large object heap, where the string object hits it when it passes 85000 bytes.
If you're looking at strings of 5-1000 characters, it probably doesn't improve the performance enough to be worth it. this is another case of a data structure that is designed for the 5% of people that have an extreme situation.
The 10th ICFP Programming Contest relied, basically, on people using the rope data structure for efficient solving. That was the big trick to get a VM that ran in reasonable time.
Rope is excellent if there are lots of prefixing (apparently the word "prepending" is made up by IT folks and isn't a proper word!) and potentially better for insertions; StringBuilders use continuous memory, so only work efficiently for appending.
Therefore, StringBuilder is great for building strings by appending fragments - a very normal use-case. As developers need to do this a lot, StringBuilders are a very mainstream technology.
Ropes are great for edit buffers, e.g. the data-structure behind, say, an enterprise-strength TextArea. So (a relaxation of Ropes, e.g. a linked list of lines rather than a binary tree) is very common in the UI controls world, but that's not often exposed to the developers and users of those controls.
You need really really big amounts of data and churn to make the rope pay-off - processors are very good at stream operations, and if you have the RAM then simply realloc for prefixing does work acceptably for normal use-cases. That competition mentioned at the top was the only time I've seen it needed.
Most advanced text editors represent the text body as a "kind of rope" (though in implementation, leaves aren't usually individual characters, but text runs), mainly to improve the the frequent inserts and deletes on large texts.
Generally, StringBuilder is optimized for appending and tries to minimize the total number of reallocations without overallocating to much. The typical guarantee is (log2 N allocations, and less than 2.5x the memory). Normally the string is built once and may then be used for quite a while without being modified.
Rope is optimized for frequent inserts and removals, and tries to minimize amount of data copied (by a larger number of allocations). In a linear buffer implementation, each insert and delete becomes O(N), and you usually have to represent single character inserts.
Javascript VMs often use ropes for strings.
Maxime Chevalier-Boisvert, developer of the Higgs Javascript VM, says:
In JavaScript, you can use arrays of strings and eventually
Array.prototype.join to make string concatenation reasonably fast,
O(n), but the "natural" way JS programmers tend to build strings is to
just append using the += operator to incrementally build them. JS
strings are immutable, so if this isn't optimized internally,
incremental appending is O(n2 ). I think it's probable that ropes were
implemented in JS engines specifically because of the SunSpider
benchmarks which do string appending. JS engine implementers used
ropes to gain an edge over others by making something that was
previously slow faster. If it wasn't for those benchmarks, I think
that cries from the community about string appending performing poorly
may have been met with "use Array.prototype.join, dummy!".
Also.

Categories