Danger of C# Substring method?

Danger of C# Substring method? - c#

Recently I have been reading up on some of the flaws with the Java substring method - specifically relating to memory, and how java keeps a reference to the original string. Ironically I am also developing a server application that uses C# .Net's implementation of substring many tens of times in a second. That got me thinking...
Are there memory issues with the C# (.Net) string.Substring?
What is the performance like on string.Substring? Is there a faster way to split a string based on start/end position?

Looking at .NET's implementation of String.Substring, a substring does not share memory with the original.
private unsafe string InternalSubString(int startIndex, int length, bool fAlwaysCopy)
{
if (((startIndex == 0) && (length == this.Length)) && !fAlwaysCopy)
{
return this;
}
// Allocate new (separate) string
string str = FastAllocateString(length);
// Copy chars from old string to new string
fixed (char* chRef = &str.m_firstChar)
{
fixed (char* chRef2 = &this.m_firstChar)
{
wstrcpy(chRef, chRef2 + startIndex, length);
}
}
return str;
}

Every time you use substring you create a new string instance - it has to copy the character from the old string to the new, along with the associated new memory allocation — and don't forget that these are unicode characters. This may or not be a bad thing - at some point you want to use these characters somewhere anyway. Depending on what you're doing, you might want your own method that merely finds the proper indexes within the string that you can then use later.

Just to add another perspective on this.
Out of memory (most times) does not mean you've used up all the memory. It means that your memory has been fragmented and the next time you want to allocate a chunk the system is unable to find a contiguous chunk of memory to fit your needs.
Frequent allocations/deallocations will cause memory fragmentation. The GC may not be in a position to de-fragment in time sue to the kinds of operations you do. I know the Server GC in .NET is pretty good about de-fragmenting memory but you could always starve (preventing the GC from doing a collect) the system by writing bad code.

it is always good to try it out & measure the elapsed milliseconds.
Stopwatch watch = new Stopwatch();
watch.Start();
// run string.Substirng code
watch.Stop();
watch.ElapsedMilliseconds();

In the case of the Java memory leak one may experience when using subString, it's easily fixed by instantiating a new String object with the copy constructor (that is a call of the form "new String(String)"). By using that you can discard all references to the original (and in the case that this is actually an issue, rather large) String, and maintain only the parts of it you need in memory.
Not ideal, in theory the JVM could be more clever and compress the String object (as was suggested above), but this gets the job done with what we have now.
As for C#, as has been said, this problem doesn't exist.

The CLR (hence C#'s) implementation of Substring does not retain a reference to the source string, so it does not have the "memory leak" problem of Java strings.

most of these type of string issues are because String is immutable. The StringBuilder class is intended for when you are doing a lot of string manipulations:
http://msdn.microsoft.com/en-us/library/2839d5h5(VS.71).aspx
Note that the real issue is memory allocation rather than CPU, although excessive memory alloc does take CPU...

I seem to recall that the strings in Java were stored as the actual characters along with a start and length.
This means that a substring string can share the same characters (since they're immutable) and only have to maintain a separate start and length.
So I'm not entirely certain what your memory issues are with the Java strings.
Regarding that article posted in your edit, it seems a bit of a non-issue to me.
Unless you're in the habit of making huge strings, then taking a small substring of them and leaving those lying around, this will have near-zero impact on memory.
Even if you had a 10M string and you made 400 substrings, you're only using that 10M for the underlying char array - it's not making 400 copies of that substring. The only memory impact is the start/length bit of each substring object.
The author seems to be complaining that they read a huge string into memory then only wanted a bit of it, but the entire thing was kept - my suggestion would be they they might want to rethink how they process their data :-)
To call this a Java bug is a huge stretch as well. A bug is something that doesn't work to specification. This was a deliberate design decision to improve performance, running out of memory because you don't understand how things work is not a bug, IMNSHO. And it's definitely not a memory leak.
There was one possible good suggestion in the comments to that article, that the GC could more aggressively recover bits of unused strings by compressing them.
This is not something you'd want to do on a first pass GC since it would be relatively expensive. However, where every other GC operation had failed to reclaim enough space, you could do it.
Unfortunately it would almost certainly mean that the underlying char array would need to keep a record of all the string objects that referenced it, so it could both figure out what bits were unused and modify all the string object start and length fields.
This in itself may introduce unacceptable performance impacts and, on top of that, if your memory is so short for this to be a problem, you may not even be able to allocate enough space for a smaller version of the string.
I think, if the memory's running out, I'd probably prefer not to be maintaining this char-array-to-string mapping to make this level of GC possible, instead I would prefer that memory to be used for my strings.
Since there is a perfectly acceptable workaround, and good coders should know about the foibles of their language of choice, I suspect the author is right - it won't be fixed.
Not because the Java developers are too lazy, but because it's not a problem.
You're free to implement your own string methods which match the C# ones (which don't share the underlying data except in certain limited scenarios). This will fix your memory problems but at the cost of a performance hit, since you have to copy the data every time you call substring. As with most things in IT (and life), it's a trade-off.

For profiling memory while developing you can use this code:
bool forceFullCollection = false;
Int64 valTotalMemoryBefore = System.GC.GetTotalMemory(forceFullCollection);
//call String.Substring
Int64 valTotalMemoryAfter = System.GC.GetTotalMemory(forceFullCollection);
Int64 valDifferenceMemorySize = valTotalMemoryAfter - valTotalMemoryBefore;
About parameter forceFullCollection: "If the forceFullCollection parameter is true, this method waits a short interval before returning while the system collects garbage and finalizes objects. The duration of the interval is an internally specified limit determined by the number of garbage collection cycles completed and the change in the amount of memory recovered between cycles. The garbage collector does not guarantee that all inaccessible memory is collected." GC.GetTotalMemory Method
Good luck!;)

Related

Extensive use of LOH causes significant performance issue

We have a Web Service using WebApi 2, .NET 4.5 on Server 2012. We were seeing occasional latency increases by 10-30ms with no good reason. We were able to track down the problematic piece of code to LOH and GC.
There is some text which we convert to its UTF8 byte representation (actually, the serialization library we use does that). As long as the text is shorter than 85000 bytes, latency is stable and short: ~0.2 ms on average and at 99%. As soon as the 85000 boundary is crossed, average latency increases to ~1ms while the 99% jumps to 16-20ms. Profiler shows that most of the time is spent in GC. To be certain, if I put GC.Collect between iterations, the measured latency goes back to 0.2ms.
I have two questions:
Where does the latency come from? As far as I understand the LOH
isn't compacted. SOH is being compacted, but doesn't show the latency.
Is there a practical way to work around this? Note
that I can’t control the size of the data and make it smaller.
--
public void PerfTestMeasureGetBytes()
{
var text = File.ReadAllText(#"C:\Temp\ContactsModelsInferences.txt");
var smallText = text.Substring(0, 85000 + 100);
int count = 1000;
List<double> latencies = new List<double>(count);
for (int i = 0; i < count; i++)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = Encoding.UTF8.GetBytes(smallText);
sw.Stop();
latencies.Add(sw.Elapsed.TotalMilliseconds);
//GC.Collect(2, GCCollectionMode.Default, true);
}
latencies.Sort();
Console.WriteLine("Average: {0}", latencies.Average());
Console.WriteLine("99%: {0}", latencies[(int)(latencies.Count * 0.99)]);
}

The performance problems usually come from two areas: allocation and fragmentation.
Allocation
The runtime guarantees clean memory so spends cycles cleaning it. When you allocate a large object, that's a lot of memory and starts to add milliseconds to a single allocation (when lets be honest, simple allocation in .NET is actually very fast, so we usually never care about this).
Fragmentation occurs when LOH objects are allocated then reclaimed. Until recently, the GC could not reorganise the memory to remove these old object "gaps", and thus could only fit the next object in that gap if it was the same size or smaller. Recently, the GC has been given the ability to compact the LOH, which removes this issue, but costs time during compaction.
My guess in your case is you are suffering from both issues and triggering GC runs, but it depends on how often your code is attempting to allocate items in the LOH. If you are doing lots of allocations, try the object pooling route. If you cannot control a pool effectively (lumpy object lifetimes or disparate usage patterns), try chunking the data you are working against to avoid it completely.
Your Options
I've encountered two approaches to the LOH:
Avoid it.
Use it, but realise you are using it and manage it explicitly.
Avoid it
This involves chunking your large object (usually an array of some sort) into, well, chunks that each fall under the LOH barrier. We do this when serialising large object streams. Works well, but an implementation would be specific to your environment so I'm hesitant to provide a coded example.
Use it
A simple way to tackle both allocation and fragmentation is long-lived objects. Explicitly make an empty array (or arrays) of a large size to accommodate your large object, and don't get rid of it (or them). Leave it around and re-use it like an object pool. You pay for this allocation, but can do this either on first use or during application idle time, but you pay less for re-allocation (because you aren't re-allocating) and lessen fragmentation issues because you aren't constantly asking to allocate stuff and you aren't reclaiming items (which causes the gaps in the first place).
That said, a halfway house may be in order. Reserve a section of memory up-front for an object pool. Done early, these allocations should be contiguous in memory so you won't get any gaps, and leave the tail end of the available memory for uncontrolled items. Do beware though that this obviously has an impact on the working set of your application - an object pool takes space regardless of it being used or not.
Resources
The LOH is covered a lot out in the web, but pay attention to the date of the resource. In the latest .NET versions the LOH has received some love, and has improved. That said, if you are on an older version I think the resources on the net are fairly accurate as the LOH never really received any serious updates in a long time between inception and .NET 4.5 (ish).
For example, there is this article from 2008 http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
And a summary of improvements in .NET 4.5: http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx

In addition to the following, make sure that you're using the server garbage collector. That doesn't affect how the LOH is used, but my experience is that it does significantly reduce the amount of time spent in GC.
The best work around I found for avoiding large object heap problems is to create a persistent buffer and re-use it. So rather than allocating a new byte array with every call to Encoding.GetBytes, pass the byte array to the method.
In this case, use the GetBytes overload that takes a byte array. Allocate an array that's large enough to hold the bytes for your longest expected string, and keep it around. For example:
// allocate buffer at class scope
private byte[] _theBuffer = new byte[1024*1024];
public void PerfTestMeasureGetBytes()
{
// ...
for (...)
{
var sw = Stopwatch.StartNew();
var numberOfBytes = Encoding.UTF8.GetBytes(smallText, 0, smallText.Length, _theBuffer, 0);
sw.Stop();
// ...
}
The only problem here is that you have to make sure your buffer is large enough to hold the largest string. What I've done in the past is to allocate the buffer to the largest size I expect, but then check to make sure it's large enough whenever I go to use it. If it's not large enough, then re-allocate it. How you do that depends on how rigorous you want to be. When working with primarily Western European text, I'd just double the string length. For example:
string textToConvert = ...
if (_theBuffer.Length < 2*textToConvert.Length)
{
// reallocate the buffer
_theBuffer = new byte[2*textToConvert.Length];
}
Another way to do it is to just try the GetString, and reallocate on failure. Then retry. For example:
while (!good)
{
try
{
numberOfBytes = Encoding.UTF8.GetString(theString, ....);
good = true;
}
catch (ArgumentException)
{
// buffer isn't big enough. Find out how much I really need
var bytesNeeded = Encoding.UTF8.GetByteCount(theString);
// and reallocate the buffer
_theBuffer = new byte[bytesNeeded];
}
}
If you make the buffer's initial size large enough to accommodate the largest string you expect, then you probably won't get that exception very often. Which means that the number of times you have to reallocate the buffer will be very small. You could, of course, add some padding to the bytesNeeded so that you allocate more, in case you have some other outliers.

Struct vs class memory overhead

I'm writing an app that will create thousands of small objects and store them recursively in array. By "recursively" I mean that each instance of K will have an array of K instances which will have and array of K instances and so on, and this array + one int field are the only properties + some methods. I found that memory usage grows very fast for even small amount of data - about 1MB), and when the data I'm processing is about 10MB I get the "OutOfMemoryException", not to mention when it's bigger (I have 4GB of RAM) :). So what do you suggest me to do? I figured, that if I'd create separate class V to process those objects, so that instances of K would have only array of K's + one integer field and make K as a struct, not a class, it should optimize things a bit - no garbage collection and stuff... But it's a bit of a challenge, so I'd rather ask you whether it's a good idea, before I start a total rewrite :).
EDIT:
Ok, some abstract code
public void Add(string word) {
int i;
string shorter;
if (word.Length > 0) {
i = //something, it's really irrelevant
if (t[i] == null) {
t[i] = new MyClass();
}
shorterWord = word.Substring(1);
//end of word
if(shorterWord.Length == 0) {
t[i].WordEnd = END;
}
//saving the word letter by letter
t[i].Add(shorterWord);
}
}
}

For me already when researching deeper into this I had the following assumptions (they may be inexact; i'm getting old for a programmer). A class has extra memory consumption because a reference is required to address it. Store the reference and an Int32 sized pointer is needed on a 32bit compile. Allocated always on the heap (can't remember if C++ has other possibilities, i would venture yes?)
The short answer, found in this article, Object has a 12bytes basic footprint + 4 possibly unused bytes depending on your class (has no doubt something to do with padding).
http://www.codeproject.com/Articles/231120/Reducing-memory-footprint-and-object-instance-size
Other issues you'll run into is Arrays also have an overhead. A possibility would be to manage your own offset into a larger array or arrays. Which in turn is getting closer to something a more efficient language would be better suited for.
I'm not sure if there are libraries that may provide Storage for small objects in an efficient manner. Probably are.
My take on it, use Structs, manage your own offset in a large array, and use proper packing instructions if it serves you (although i suspect this comes at a cost at runtime of a few extra instructions each time you address unevenly packed data)
[StructLayout(LayoutKind.Sequential, Pack = 1)]

Your stack is blowing up.
Do it iteratively instead of recursively.
You're not blowing the system stack up, your blowing the code stack up, 10K function calls will blow it out of the water.
You need proper tail recursion, which is just an iterative hack.

Make sure you have enough memory in your system. Over 100mb+ etc. It really depends on your system. Linked list, recursive objects is what you are looking at. If you keep recursing, it is going to hit the memory limit and nomemoryexception will be thrown. Make sure you keep track of the memory usage on any program. Nothing is unlimited, especially memory. If memory is limited, save it to a disk.
Looks like there is infinite recursion in your code and out of memory is thrown. Check the code. There should be start and end in recursive code. Otherwise it will go over 10 terrabyte memory at some point.

You can use a better data structure
i.e. each letter can be a byte (a-0, b-1 ... ). each word fragment can be in indexed also especially substrings - you should get away with significantly less memory (though a performance penalty)

Just list your recursive algorithm and sanitize variable names. If you are doing BFS type of traversal and keep all objects in memory, you will run out of mem. For example, in this case, replace it with DFS.
Edit 1:
You can speed up the algo by estimating how many items you will generate then allocate that much memory at once. As the algo progresses, fill up the allocated memory. This reduces fragmentation and reallocation & copy-on-full-array operations.
Nonetheless, after you are done operating on these generated words you should delete them from your datastructure so they can be GC-ed so you don't run out of mem.

Unsafe string creation from char[]

I'm working on a high performance code in which this construct is part of the performance critical section.
This is what happens in some section:
A string is 'scanned' and metadata is stored efficiently.
Based upon this metadata chunks of the main string are separated into a char[][].
That char[][] should be transferred into a string[].
Now, I know you can just call new string(char[]) but then the result would have to be copied.
To avoid this extra copy step from happening I guess it must be possible to write directly to the string's internal buffer. Even though this would be an unsafe operation (and I know this bring lots of implications like overflow, forward compatibility).
I've seen several ways of achieving this, but none I'm really satisfied with.
Does anyone have true suggestions as to how to achieve this?
Extra information:
The actual process doesn't include converting to char[] necessarily, it's practically a 'multi-substring' operation. Like 3 indexes and their lengths appended.
The StringBuilder has too much overhead for the small number of concats.
EDIT:
Due to some vague aspects of what it is exactly that I'm asking, let me reformulate it.
This is what happens:
Main string is indexed.
Parts of the main string are copied to a char[].
The char[] is converted to a string.
What I'd like to do is merge step 2 and 3, resulting in:
Main string is indexed.
Parts of the main string are copied to a string (and the GC can keep its hands off of it during the process by proper use of the fixed keyword?).
And a note is that I cannot change the output type from string[], since this is an external library, and projects depend on it (backward compatibility).

I think that what you are asking to do is to 'carve up' an existing string in-place into multiple smaller strings without re-allocating character arrays for the smaller strings. This won't work in the managed world.
For one reason why, consider what happens when the garbage collector comes by and collects or moves the original string during a compaction- all of those other strings 'inside' of it are now pointing at some arbitrary other memory, not the original string you carved them out of.
EDIT: In contrast to the character-poking involved in Ben's answer (which is clever but IMHO a bit scary), you can allocate a StringBuilder with a pre-defined capacity, which eliminates the need to re-allocate the internal arrays. See http://msdn.microsoft.com/en-us/library/h1h0a5sy.aspx.

What happens if you do:
string s = GetBuffer();
fixed (char* pch = s) {
pch[0] = 'R';
pch[1] = 'e';
pch[2] = 's';
pch[3] = 'u';
pch[4] = 'l';
pch[5] = 't';
}
I think the world will come to an end (Or at least the .NET managed portion of it), but that's very close to what StringBuilder does.
Do you have profiler data to show that StringBuilder isn't fast enough for your purposes, or is that an assumption?

Just create your own addressing system instead of trying to use unsafe code to map to an internal data structure.
Mapping a string (which is also readable as a char[]) to an array of smaller strings is no different from building a list of address information (index & length of each substring). So make a new List<Tuple<int,int>> instead of a string[] and use that data to return the correct string from your original, unaltered data structure. This could easily be encapsulated into something that exposed string[].

In .NET, there is no way to create an instance of String which shares data with another string. Some discussion on why that is appears in this comment from Eric Lippert.

Are C# Strings (and other .NET API's) limited to 2GB in size?

Today I noticed that C#'s String class returns the length of a string as an Int. Since an Int is always 32-bits, no matter what the architecture, does this mean that a string can only be 2GB or less in length?
A 2GB string would be very unusual, and present many problems along with it. However, most .NET api's seem to use 'int' to convey values such as length and count. Does this mean we are forever limited to collection sizes which fit in 32-bits?
Seems like a fundamental problem with the .NET API's. I would have expected things like count and length to be returned via the equivalent of 'size_t'.

Seems like a fundamental problem with
the .NET API...
I don't know if I'd go that far.
Consider almost any collection class in .NET. Chances are it has a Count property that returns an int. So this suggests the class is bounded at a size of int.MaxValue (2147483647). That's not really a problem; it's a limitation -- and a perfectly reasonable one, in the vast majority of scenarios.
Anyway, what would the alternative be? There's uint -- but that's not CLS-compliant. Then there's long...
What if Length returned a long?
An additional 32 bits of memory would be required anywhere you wanted to know the length of a string.
The benefit would be: we could have strings taking up billions of gigabytes of RAM. Hooray.
Try to imagine the mind-boggling cost of some code like this:
// Lord knows how many characters
string ulysses = GetUlyssesText();
// allocate an entirely new string of roughly equivalent size
string schmulysses = ulysses.Replace("Ulysses", "Schmulysses");
Basically, if you're thinking of string as a data structure meant to store an unlimited quantity of text, you've got unrealistic expectations. When it comes to objects of this size, it becomes questionable whether you have any need to hold them in memory at all (as opposed to hard disk).

Correct, the maximum length would be the size of Int32, however you'll likely run into other memory issues if you're dealing with strings larger than that anyway.

At some value of String.length() probably about 5MB its not really practical to use String anymore. String is optimised for short bits of text.
Think about what happens when you do
msString += " more chars"
Something like:
System calculates length of myString plus length of " more chars"
System allocates that amount of memory
System copies myString to new memory location
System copies " more chars" to new memory location after last copied myString char
The original myString is left to the mercy of the garbage collector.
While this is nice and neat for small bits of text its a nightmare for large strings, just finding 2GB of contiguous memory is probably a showstopper.
So if you know you are handling more than a very few MB of characters use one of the *Buffer classes.

It's pretty unlikely that you'll need to store more than two billion objects in a single collection. You're going to incur some pretty serious performance penalties when doing enumerations and lookups, which are the two primary purposes of collections. If you're dealing with a data set that large, There is almost assuredly some other route you can take, such as splitting up your single collection into many smaller collections that contain portions of the entire set of data you're working with.
Heeeey, wait a sec.... we already have this concept -- it's called a dictionary!
If you need to store, say, 5 billion English strings, use this type:
Dictionary<string, List<string>> bigStringContainer;
Let's make the key string represent, say, the first two characters of the string. Then write an extension method like this:
public static string BigStringIndex(this string s)
{
return String.Concat(s[0], s[1]);
}
and then add items to bigStringContainer like this:
bigStringContainer[item.BigStringIndex()].Add(item);
and call it a day. (There are obviously more efficient ways you could do that, but this is just an example)
Oh, and if you really really really do need to be able to look up any arbitrary object by absolute index, use an Array instead of a collection. Okay yeah, you use some type safety, but you can index array elements with a long.

The fact that the framework uses Int32 for Count/Length properties, indexers etc is a bit of a red herring. The real problem is that the CLR currently has a max object size restriction of 2GB.
So a string -- or any other single object -- can never be larger than 2GB.
Changing the Length property of the string type to return long, ulong or even BigInteger would be pointless since you could never have more than approx 2^30 characters anyway (2GB max size and 2 bytes per character.)
Similarly, because of the 2GB limit, the only arrays that could even approach having 2^31 elements would be bool[] or byte[] arrays that only use 1 byte per element.
Of course, there's nothing to stop you creating your own composite types to workaround the 2GB restriction.
(Note that the above observations apply to Microsoft's current implementation, and could very well change in future releases. I'm not sure whether Mono has similar limits.)

In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.

Even in x64 versions of Windows I got hit by .Net limiting each object to 2GB.
2GB is pretty small for a medical image. 2GB is even small for a Visual Studio download image.

If you are working with a file that is 2GB, that means you're likely going to be using a lot of RAM, and you're seeing very slow performance.
Instead, for very large files, consider using a MemoryMappedFile (see: http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx). Using this method, you can work with a file of nearly unlimited size, without having to load the whole thing in memory.

Interesting OutOfMemoryException with StringBuilder

I have the need to continuously build large strings in a loop and save them to database which currently occasionally yields an OutOfMemoryException.
What is basically going on here is I create a string using XmlWriter with StringBuilder based on some data. Then I call a method from an external library that converts this xml string to some other string. After that the converted string is saved to the database. This whole thing is done repeatedly in a loop about a 100 times for different data.
The strings by itself are not too big (below 500kByte each) and the process memory is not increasing during this loop. But still, occasionally I get a OutOfMemeoryExcpetion within StringBuilder.Append. Interestingly this exception does not result in a crash. I can catch that exception and continue the loop.
What is going on here? Why would I get an OutOfMemoryException although there is still enough free memory available in the system? Is this some GC heap problem?
Given that I can't circumvent converting all these strings, what could I do to make this work reliably? Should I force a GC collection? Should put a Thread.Sleep into the loop? Should I stop using StringBuilder? Should simply retry when confronted with a OutOfMemoryException?

There is memory but no contiguous segment that can handle the size of your string builder. You have to know that each time the buffer of the string builder is too short, its size is doubled. If you can define (in the ctor) the size of your builder, it's better.
You MAY call GC.Collect() when you are done with a large collection of objects.
Actually, when you have an OutOfMemory, it generaly shows a bad design, you may use the hard drive (temp files) instead of memory, you shouldn't allocate memory again and again (try to reuse objects/buffers/...).
I STRONGLY advice you to read this post “Out Of Memory” Does Not Refer to Physical Memory from Eric Lippert.

Try to reuse StringBuilder object when you do data generation.
After or before use just reset the size of the StringBuilder to 0 and start appending. This will decrease number of allocations and possibly make OutOfMemory situation very rare.
To illustrate my point:
void MainProgram()
{
StringBuilder builder = new StringBuilder(2 * 1024); //2 Kb
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
}
void PerformOperation(StringBuilder builder)
{
builder.Length = 0;
//
// do the work here builder.Append(...);
//
}

With the sizes you mention you are probably running into Large Object Heap (LOH) fragmentation.
Reusing StringBuilder objects is not a direct solution, you need to get a grip on the underlying buffers.
If possible, calculate or estimate the size beforehand and pre-allocate.
And it could help if you round up allocations, let's say to multiples of 20k or so. That could improve reuse.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.