Extensive use of LOH causes significant performance issue - c#

We have a Web Service using WebApi 2, .NET 4.5 on Server 2012. We were seeing occasional latency increases by 10-30ms with no good reason. We were able to track down the problematic piece of code to LOH and GC.
There is some text which we convert to its UTF8 byte representation (actually, the serialization library we use does that). As long as the text is shorter than 85000 bytes, latency is stable and short: ~0.2 ms on average and at 99%. As soon as the 85000 boundary is crossed, average latency increases to ~1ms while the 99% jumps to 16-20ms. Profiler shows that most of the time is spent in GC. To be certain, if I put GC.Collect between iterations, the measured latency goes back to 0.2ms.
I have two questions:
Where does the latency come from? As far as I understand the LOH
isn't compacted. SOH is being compacted, but doesn't show the latency.
Is there a practical way to work around this? Note
that I can’t control the size of the data and make it smaller.
--
public void PerfTestMeasureGetBytes()
{
var text = File.ReadAllText(#"C:\Temp\ContactsModelsInferences.txt");
var smallText = text.Substring(0, 85000 + 100);
int count = 1000;
List<double> latencies = new List<double>(count);
for (int i = 0; i < count; i++)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = Encoding.UTF8.GetBytes(smallText);
sw.Stop();
latencies.Add(sw.Elapsed.TotalMilliseconds);
//GC.Collect(2, GCCollectionMode.Default, true);
}
latencies.Sort();
Console.WriteLine("Average: {0}", latencies.Average());
Console.WriteLine("99%: {0}", latencies[(int)(latencies.Count * 0.99)]);
}

The performance problems usually come from two areas: allocation and fragmentation.
Allocation
The runtime guarantees clean memory so spends cycles cleaning it. When you allocate a large object, that's a lot of memory and starts to add milliseconds to a single allocation (when lets be honest, simple allocation in .NET is actually very fast, so we usually never care about this).
Fragmentation occurs when LOH objects are allocated then reclaimed. Until recently, the GC could not reorganise the memory to remove these old object "gaps", and thus could only fit the next object in that gap if it was the same size or smaller. Recently, the GC has been given the ability to compact the LOH, which removes this issue, but costs time during compaction.
My guess in your case is you are suffering from both issues and triggering GC runs, but it depends on how often your code is attempting to allocate items in the LOH. If you are doing lots of allocations, try the object pooling route. If you cannot control a pool effectively (lumpy object lifetimes or disparate usage patterns), try chunking the data you are working against to avoid it completely.
Your Options
I've encountered two approaches to the LOH:
Avoid it.
Use it, but realise you are using it and manage it explicitly.
Avoid it
This involves chunking your large object (usually an array of some sort) into, well, chunks that each fall under the LOH barrier. We do this when serialising large object streams. Works well, but an implementation would be specific to your environment so I'm hesitant to provide a coded example.
Use it
A simple way to tackle both allocation and fragmentation is long-lived objects. Explicitly make an empty array (or arrays) of a large size to accommodate your large object, and don't get rid of it (or them). Leave it around and re-use it like an object pool. You pay for this allocation, but can do this either on first use or during application idle time, but you pay less for re-allocation (because you aren't re-allocating) and lessen fragmentation issues because you aren't constantly asking to allocate stuff and you aren't reclaiming items (which causes the gaps in the first place).
That said, a halfway house may be in order. Reserve a section of memory up-front for an object pool. Done early, these allocations should be contiguous in memory so you won't get any gaps, and leave the tail end of the available memory for uncontrolled items. Do beware though that this obviously has an impact on the working set of your application - an object pool takes space regardless of it being used or not.
Resources
The LOH is covered a lot out in the web, but pay attention to the date of the resource. In the latest .NET versions the LOH has received some love, and has improved. That said, if you are on an older version I think the resources on the net are fairly accurate as the LOH never really received any serious updates in a long time between inception and .NET 4.5 (ish).
For example, there is this article from 2008 http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
And a summary of improvements in .NET 4.5: http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx

In addition to the following, make sure that you're using the server garbage collector. That doesn't affect how the LOH is used, but my experience is that it does significantly reduce the amount of time spent in GC.
The best work around I found for avoiding large object heap problems is to create a persistent buffer and re-use it. So rather than allocating a new byte array with every call to Encoding.GetBytes, pass the byte array to the method.
In this case, use the GetBytes overload that takes a byte array. Allocate an array that's large enough to hold the bytes for your longest expected string, and keep it around. For example:
// allocate buffer at class scope
private byte[] _theBuffer = new byte[1024*1024];
public void PerfTestMeasureGetBytes()
{
// ...
for (...)
{
var sw = Stopwatch.StartNew();
var numberOfBytes = Encoding.UTF8.GetBytes(smallText, 0, smallText.Length, _theBuffer, 0);
sw.Stop();
// ...
}
The only problem here is that you have to make sure your buffer is large enough to hold the largest string. What I've done in the past is to allocate the buffer to the largest size I expect, but then check to make sure it's large enough whenever I go to use it. If it's not large enough, then re-allocate it. How you do that depends on how rigorous you want to be. When working with primarily Western European text, I'd just double the string length. For example:
string textToConvert = ...
if (_theBuffer.Length < 2*textToConvert.Length)
{
// reallocate the buffer
_theBuffer = new byte[2*textToConvert.Length];
}
Another way to do it is to just try the GetString, and reallocate on failure. Then retry. For example:
while (!good)
{
try
{
numberOfBytes = Encoding.UTF8.GetString(theString, ....);
good = true;
}
catch (ArgumentException)
{
// buffer isn't big enough. Find out how much I really need
var bytesNeeded = Encoding.UTF8.GetByteCount(theString);
// and reallocate the buffer
_theBuffer = new byte[bytesNeeded];
}
}
If you make the buffer's initial size large enough to accommodate the largest string you expect, then you probably won't get that exception very often. Which means that the number of times you have to reallocate the buffer will be very small. You could, of course, add some padding to the bytesNeeded so that you allocate more, in case you have some other outliers.

Related

element size influencing C# collection performance?

Given the task to improve the performance of a piece of code, I have came across the following phenomenon. I have a large collection of reference types in a generic Queue and I'm removing and processing the element one by one, then add them to another generic collection.
It seems the larger the elements are the more time it takes to add the element to the collection.
Trying to narrow down the problem to the relevant part of the code, I've written a test (omitting the processing of elements, just doing the insert):
class Small
{
public Small()
{
this.s001 = "001";
this.s002 = "002";
}
string s001;
string s002;
}
class Large
{
public Large()
{
this.s001 = "001";
this.s002 = "002";
...
this.s050 = "050";
}
string s001;
string s002;
...
string s050;
}
static void Main(string[] args)
{
const int N = 1000000;
var storage = new List<object>(N);
for (int i = 0; i < N; ++i)
{
//storage.Add(new Small());
storage.Add(new Large());
}
List<object> outCollection = new List<object>();
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = N-1; i > 0; --i)
{
outCollection.Add(storage[i];);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
On the test machine, using the Small class, it takes about 25-30 ms to run, while it takes 40-45 ms with Large.
I know that the outCollection has to grow from time to time to be able to store all the items, so there is some dynamic memory allocation. But given an initial collection size even makes the difference more obvious: 11-12 ms with Small and 35-38 ms with Large objects.
I am somewhat surprised, as these are reference types, so I was expecting the collections to work only with references to the Small/Large instances. I have read Eric Lippert's relevant article that and know that references should not be treated as pointers. At the same time, AFAIK currently they are implemented as pointers and their size and the collection's performance should be independent of element size.
I've decided to put up a question here hoping that someone could explain or help me to understand what's happening here. Aside the performance improvement, I'm really curious what is happening behind the scenes.
Update:
Profiling data using the diagnostic tools didn't help me much, although I have to admit I'm not an expert using the profiler. I'll collect more data later today to find where the bottleneck is.
The pressure on the GC is quite high of course, especially with the Large instances. But once the instances are created and stored in the storage collection, and the program enters the loop, there was no collection triggered any more, and memory usage hasn't increased significantly (outCollction already pre-allocated).
Most of the CPU time is of course spent with memory allocation (JIT_New), around 62% and the only other significant entry is Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % Module Name
System.Collections.Generic.List`1[System.__Canon].Add with about 7%.
With 1 million items the preallocated outCollection size is 8 million bytes (the same as the size of storage); one can suspect 64 bit addresses being stored in the collections.
Probably I'm not using the tools properly or don't have the experience to interpret the results correctly, but the profiler didn't help me to get closer to the cause.
If the loop is not triggering collections and it only copies pointers between 2 pre-allocated collections, how could the item size cause any difference? Cache hit/miss ratio is supposed to be the more or less the same in both cases, as the loop is iteration over a list of "addresses" in both cases.
Thanks for all the help so far, I will collect more data, and put an update here if anything found.
I suspect that at least one action in the above (maybe some type checks) will require a de-reference. Then the fact that many Smalls are probably sat close together on the heap and thus sharing cache lines could account for some amount of difference (certainly many more of them could share a single cache line than Larges).
Added to which you are also accessing them in the reverse order in which they were allocated which maximises such a benefit.

Struct vs class memory overhead

I'm writing an app that will create thousands of small objects and store them recursively in array. By "recursively" I mean that each instance of K will have an array of K instances which will have and array of K instances and so on, and this array + one int field are the only properties + some methods. I found that memory usage grows very fast for even small amount of data - about 1MB), and when the data I'm processing is about 10MB I get the "OutOfMemoryException", not to mention when it's bigger (I have 4GB of RAM) :). So what do you suggest me to do? I figured, that if I'd create separate class V to process those objects, so that instances of K would have only array of K's + one integer field and make K as a struct, not a class, it should optimize things a bit - no garbage collection and stuff... But it's a bit of a challenge, so I'd rather ask you whether it's a good idea, before I start a total rewrite :).
EDIT:
Ok, some abstract code
public void Add(string word) {
int i;
string shorter;
if (word.Length > 0) {
i = //something, it's really irrelevant
if (t[i] == null) {
t[i] = new MyClass();
}
shorterWord = word.Substring(1);
//end of word
if(shorterWord.Length == 0) {
t[i].WordEnd = END;
}
//saving the word letter by letter
t[i].Add(shorterWord);
}
}
}
For me already when researching deeper into this I had the following assumptions (they may be inexact; i'm getting old for a programmer). A class has extra memory consumption because a reference is required to address it. Store the reference and an Int32 sized pointer is needed on a 32bit compile. Allocated always on the heap (can't remember if C++ has other possibilities, i would venture yes?)
The short answer, found in this article, Object has a 12bytes basic footprint + 4 possibly unused bytes depending on your class (has no doubt something to do with padding).
http://www.codeproject.com/Articles/231120/Reducing-memory-footprint-and-object-instance-size
Other issues you'll run into is Arrays also have an overhead. A possibility would be to manage your own offset into a larger array or arrays. Which in turn is getting closer to something a more efficient language would be better suited for.
I'm not sure if there are libraries that may provide Storage for small objects in an efficient manner. Probably are.
My take on it, use Structs, manage your own offset in a large array, and use proper packing instructions if it serves you (although i suspect this comes at a cost at runtime of a few extra instructions each time you address unevenly packed data)
[StructLayout(LayoutKind.Sequential, Pack = 1)]
Your stack is blowing up.
Do it iteratively instead of recursively.
You're not blowing the system stack up, your blowing the code stack up, 10K function calls will blow it out of the water.
You need proper tail recursion, which is just an iterative hack.
Make sure you have enough memory in your system. Over 100mb+ etc. It really depends on your system. Linked list, recursive objects is what you are looking at. If you keep recursing, it is going to hit the memory limit and nomemoryexception will be thrown. Make sure you keep track of the memory usage on any program. Nothing is unlimited, especially memory. If memory is limited, save it to a disk.
Looks like there is infinite recursion in your code and out of memory is thrown. Check the code. There should be start and end in recursive code. Otherwise it will go over 10 terrabyte memory at some point.
You can use a better data structure
i.e. each letter can be a byte (a-0, b-1 ... ). each word fragment can be in indexed also especially substrings - you should get away with significantly less memory (though a performance penalty)
Just list your recursive algorithm and sanitize variable names. If you are doing BFS type of traversal and keep all objects in memory, you will run out of mem. For example, in this case, replace it with DFS.
Edit 1:
You can speed up the algo by estimating how many items you will generate then allocate that much memory at once. As the algo progresses, fill up the allocated memory. This reduces fragmentation and reallocation & copy-on-full-array operations.
Nonetheless, after you are done operating on these generated words you should delete them from your datastructure so they can be GC-ed so you don't run out of mem.

Danger of C# Substring method?

Recently I have been reading up on some of the flaws with the Java substring method - specifically relating to memory, and how java keeps a reference to the original string. Ironically I am also developing a server application that uses C# .Net's implementation of substring many tens of times in a second. That got me thinking...
Are there memory issues with the C# (.Net) string.Substring?
What is the performance like on string.Substring? Is there a faster way to split a string based on start/end position?
Looking at .NET's implementation of String.Substring, a substring does not share memory with the original.
private unsafe string InternalSubString(int startIndex, int length, bool fAlwaysCopy)
{
if (((startIndex == 0) && (length == this.Length)) && !fAlwaysCopy)
{
return this;
}
// Allocate new (separate) string
string str = FastAllocateString(length);
// Copy chars from old string to new string
fixed (char* chRef = &str.m_firstChar)
{
fixed (char* chRef2 = &this.m_firstChar)
{
wstrcpy(chRef, chRef2 + startIndex, length);
}
}
return str;
}
Every time you use substring you create a new string instance - it has to copy the character from the old string to the new, along with the associated new memory allocation — and don't forget that these are unicode characters. This may or not be a bad thing - at some point you want to use these characters somewhere anyway. Depending on what you're doing, you might want your own method that merely finds the proper indexes within the string that you can then use later.
Just to add another perspective on this.
Out of memory (most times) does not mean you've used up all the memory. It means that your memory has been fragmented and the next time you want to allocate a chunk the system is unable to find a contiguous chunk of memory to fit your needs.
Frequent allocations/deallocations will cause memory fragmentation. The GC may not be in a position to de-fragment in time sue to the kinds of operations you do. I know the Server GC in .NET is pretty good about de-fragmenting memory but you could always starve (preventing the GC from doing a collect) the system by writing bad code.
it is always good to try it out & measure the elapsed milliseconds.
Stopwatch watch = new Stopwatch();
watch.Start();
// run string.Substirng code
watch.Stop();
watch.ElapsedMilliseconds();
In the case of the Java memory leak one may experience when using subString, it's easily fixed by instantiating a new String object with the copy constructor (that is a call of the form "new String(String)"). By using that you can discard all references to the original (and in the case that this is actually an issue, rather large) String, and maintain only the parts of it you need in memory.
Not ideal, in theory the JVM could be more clever and compress the String object (as was suggested above), but this gets the job done with what we have now.
As for C#, as has been said, this problem doesn't exist.
The CLR (hence C#'s) implementation of Substring does not retain a reference to the source string, so it does not have the "memory leak" problem of Java strings.
most of these type of string issues are because String is immutable. The StringBuilder class is intended for when you are doing a lot of string manipulations:
http://msdn.microsoft.com/en-us/library/2839d5h5(VS.71).aspx
Note that the real issue is memory allocation rather than CPU, although excessive memory alloc does take CPU...
I seem to recall that the strings in Java were stored as the actual characters along with a start and length.
This means that a substring string can share the same characters (since they're immutable) and only have to maintain a separate start and length.
So I'm not entirely certain what your memory issues are with the Java strings.
Regarding that article posted in your edit, it seems a bit of a non-issue to me.
Unless you're in the habit of making huge strings, then taking a small substring of them and leaving those lying around, this will have near-zero impact on memory.
Even if you had a 10M string and you made 400 substrings, you're only using that 10M for the underlying char array - it's not making 400 copies of that substring. The only memory impact is the start/length bit of each substring object.
The author seems to be complaining that they read a huge string into memory then only wanted a bit of it, but the entire thing was kept - my suggestion would be they they might want to rethink how they process their data :-)
To call this a Java bug is a huge stretch as well. A bug is something that doesn't work to specification. This was a deliberate design decision to improve performance, running out of memory because you don't understand how things work is not a bug, IMNSHO. And it's definitely not a memory leak.
There was one possible good suggestion in the comments to that article, that the GC could more aggressively recover bits of unused strings by compressing them.
This is not something you'd want to do on a first pass GC since it would be relatively expensive. However, where every other GC operation had failed to reclaim enough space, you could do it.
Unfortunately it would almost certainly mean that the underlying char array would need to keep a record of all the string objects that referenced it, so it could both figure out what bits were unused and modify all the string object start and length fields.
This in itself may introduce unacceptable performance impacts and, on top of that, if your memory is so short for this to be a problem, you may not even be able to allocate enough space for a smaller version of the string.
I think, if the memory's running out, I'd probably prefer not to be maintaining this char-array-to-string mapping to make this level of GC possible, instead I would prefer that memory to be used for my strings.
Since there is a perfectly acceptable workaround, and good coders should know about the foibles of their language of choice, I suspect the author is right - it won't be fixed.
Not because the Java developers are too lazy, but because it's not a problem.
You're free to implement your own string methods which match the C# ones (which don't share the underlying data except in certain limited scenarios). This will fix your memory problems but at the cost of a performance hit, since you have to copy the data every time you call substring. As with most things in IT (and life), it's a trade-off.
For profiling memory while developing you can use this code:
bool forceFullCollection = false;
Int64 valTotalMemoryBefore = System.GC.GetTotalMemory(forceFullCollection);
//call String.Substring
Int64 valTotalMemoryAfter = System.GC.GetTotalMemory(forceFullCollection);
Int64 valDifferenceMemorySize = valTotalMemoryAfter - valTotalMemoryBefore;
About parameter forceFullCollection: "If the forceFullCollection parameter is true, this method waits a short interval before returning while the system collects garbage and finalizes objects. The duration of the interval is an internally specified limit determined by the number of garbage collection cycles completed and the change in the amount of memory recovered between cycles. The garbage collector does not guarantee that all inaccessible memory is collected." GC.GetTotalMemory Method
Good luck!;)

How to get unused memory back from the large object heap LOH from multiple managed apps?

While talking to a colleague about a particular group of apps using up nearly 1.5G memory on startup... he pointed me to a very good link on .NET production debugging
The part that has me puzzled is ...
For example, if you allocate 1 MB of
memory to a single block, the large
object heap expands to 1 MB in size.
When you free this object, the large
object heap does not decommit the
virtual memory, so the heap stays at 1
MB in size. If you allocate another
500-KB block later, the new block is
allocated within the 1 MB block of
memory belonging to the large object
heap. During the process lifetime, the
large object heap always grows to hold
all the large block allocations
currently referenced, but never
shrinks when objects are released,
even if a garbage collection occurs.
Figure 2.4 on the next page shows an
example of a large object heap.
Now let's say we have a fictional app that creates a flurry of large objects ( > 85KB), so the large object heap grows lets say to 200 Meg. Now lets say we have 10 such app instances running.. so that 2000 Megs allocated. Now is this memory never given back to the OS until the process shuts down... (is what I understood)
Are there any gaps in my understanding? How do we get back unused memory in the various LOHeaps ; we don't create the perfect storm of OutOfMemoryExceptions ?
Update: From Marc's response, I wanted to clarify that the LOH objects are not referenced - the large objects are use-n-throw - however the heap doesn't shrink even though the heap is relatively empty post the initial surge.
Update#2: Just including a code snippet (exaggerated but gets the point across I think).. I see an OutOfMemoryException around the time the Virtual memory hits the 1.5G mark on my machine (1.7G on another).. From Eric L.'s blog post, 'process memory can be visualized as a massive file on disk..' - this result is thus unexpected. The machines in this instance had GBs of free space on the HDD. Does the PageFile.sys OS file (or related settings) impose any restrictions?
static float _megaBytes;
static readonly int BYTES_IN_MB = 1024*1024;
static void BigBite()
{
try
{
var list = new List<byte[]>();
int i = 1;
for (int x = 0; x < 1500; x++)
{
var memory = new byte[BYTES_IN_MB + i];
_megaBytes += memory.Length / BYTES_IN_MB;
list.Add(memory);
Console.WriteLine("Allocation #{0} : {1}MB now", i++, _megaBytes);
}
}
catch (Exception e)
{ Console.WriteLine("Boom! {0}", e); // I put a breakpoint here to check the console
throw;
}
}
static void Main(string[] args)
{
BigBite();
Console.WriteLine("Check VM now!"); Console.ReadLine();
_megaBytes = 0;
ThreadPool.QueueUserWorkItem(delegate { BigBite(); });
ThreadPool.QueueUserWorkItem(delegate { BigBite(); });
Console.ReadLine(); // will blow before it reaches here
}
There is a clarification I would like to make first.
- Assuming you are running app as a 32bit app, the VA space available for your process is only 2 GB , 3GB if you enabled large address space switch, so even if you have HUGE page file, it doesn't matter if you are 32bit process, it matters if you run 64bit, where you have huge address space.
Object with size > 85000 bytes are allocated on LOH, note it is 85000 bytes not 85K, it is also implementation details that could change.
Now, back to your question.
The GC will un-commit the LOH segments that are not used in 2 situations
1- When the memory pressure on the machine is high ( ~95-98% )
2- When it fails to satisfy new allocation requests, it will decommit the unused pages in the LOH
so you will get back the memory in one of these cases.
The fact that you are hitting an OOM before reaching the 2GB limit could mean you have VA fragmentation, VA fragmentation occur when you don't have continuous VA address space to satisfy new allocation, for example you ask for 8KB segment, and you don't have 2 consecutive pages in your VA ( assuming page size is 4 K)
you can use !vamap debugger extension in Debugging tools for windows to validate this.
Hope this helps
Thanks
If the LOH wants to keep memory, that is up to the LOH - however, don't forget that OutOfMemoryException is per-process, since really the hard disk is the limiting factor for virtual memory. Eric Lippert blogged about this recently. Of course, that doesn't prevent it getting poor performance from all the paging...
Well, if you really have this kind of allocation pattern, you could move your large objects into another appdomain - when you decide to free all of the large objects, release the appdomain and the heap for that appdomain will be released.

Interesting OutOfMemoryException with StringBuilder

I have the need to continuously build large strings in a loop and save them to database which currently occasionally yields an OutOfMemoryException.
What is basically going on here is I create a string using XmlWriter with StringBuilder based on some data. Then I call a method from an external library that converts this xml string to some other string. After that the converted string is saved to the database. This whole thing is done repeatedly in a loop about a 100 times for different data.
The strings by itself are not too big (below 500kByte each) and the process memory is not increasing during this loop. But still, occasionally I get a OutOfMemeoryExcpetion within StringBuilder.Append. Interestingly this exception does not result in a crash. I can catch that exception and continue the loop.
What is going on here? Why would I get an OutOfMemoryException although there is still enough free memory available in the system? Is this some GC heap problem?
Given that I can't circumvent converting all these strings, what could I do to make this work reliably? Should I force a GC collection? Should put a Thread.Sleep into the loop? Should I stop using StringBuilder? Should simply retry when confronted with a OutOfMemoryException?
There is memory but no contiguous segment that can handle the size of your string builder. You have to know that each time the buffer of the string builder is too short, its size is doubled. If you can define (in the ctor) the size of your builder, it's better.
You MAY call GC.Collect() when you are done with a large collection of objects.
Actually, when you have an OutOfMemory, it generaly shows a bad design, you may use the hard drive (temp files) instead of memory, you shouldn't allocate memory again and again (try to reuse objects/buffers/...).
I STRONGLY advice you to read this post “Out Of Memory” Does Not Refer to Physical Memory from Eric Lippert.
Try to reuse StringBuilder object when you do data generation.
After or before use just reset the size of the StringBuilder to 0 and start appending. This will decrease number of allocations and possibly make OutOfMemory situation very rare.
To illustrate my point:
void MainProgram()
{
StringBuilder builder = new StringBuilder(2 * 1024); //2 Kb
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
}
void PerformOperation(StringBuilder builder)
{
builder.Length = 0;
//
// do the work here builder.Append(...);
//
}
With the sizes you mention you are probably running into Large Object Heap (LOH) fragmentation.
Reusing StringBuilder objects is not a direct solution, you need to get a grip on the underlying buffers.
If possible, calculate or estimate the size beforehand and pre-allocate.
And it could help if you round up allocations, let's say to multiples of 20k or so. That could improve reuse.

Categories