When I call MemoryStream.SetLength() and set length less than the current length, is it possible that MemoryStream would reallocate its buffer to reduce the capacity?
In other words, I know that when MemoryStream (created without backing byte[]) is written to, it will dynamically adjust it's capacity reallocating its buffer with twice the size each time the capacity is reached. My question is, does something like this happen in reverse? If i reduce the stream length by calling SetLength() would it at some point reallocate to a smaller buffer? Or would it always keep the same buffer in this case and just change the length variable?
I looked at the source code and it doesn't do it, but since the documentation doesn't exactly spell it out, I m not sure if this could be subject to change. I m worried if i have a code that often calls SetLength() to reduce the length, could it theoretically incur performance penalty because of internal resizing?
Related
I see several times in some code here for TCP communication the following line:
byte[] bytesFrom = new byte[10025];
Therefore I was wondering if this 10025 value has a special reason or if it is just arbitrary chosen.
Thanks
As far as I can tell, 10025 doesn't have any specific meaning. It's probably a result of random tweaking by someone who doesn't understand how to use buffers (hey, I received a 10000 B packet, I didn't expect that, let me increase the buffer size...).
Less arbitrary values would be:
Powers of two are often used, because they're quite handy in computing (which is based on binary numbers). So you'll often see buffer sizes like 256 or 4096.
65536 - Apart from being a power of two, it's also the maximum size of a TCP payload without window scaling (which can increase the possible payload size to a crazy value of 1 GiB - that's one big packet).
Actual known maximum size of the payload. This can be useful if the payload size is significantly smaller than the usual buffer sizes. For example, if you know that the largest payload you can receive is 100 B, you could use a byte array of 100 B, and you can even reuse it without issues (provided you don't reference the buffer anywhere, but you shouldn't really be doing that anyway).
1460 - This is usually the default TCP send buffer size (if you send anything less than this, TCP will wait for some time (say 200ms) before sending the "incomplete" buffer; this allows TCP to work relatively well if you're writing eg. individual bytes to the network stream without buffering them first). So sending a 4 kiB packet would mean that the first 2920 B would be sent immediately, while the remaining 1176 B would wait for the say 200ms "timeout". Not taking this into account can cause significant delays even though the network is actually not busy at all.
There's also some extra possible reasons in more specific environments. For example, on .NET, you may want to force the byte array to end up on the large object heap. While not entirely reliable, it should probably store objects larger than 85,000 B on the LOH (it would be nice if this could be enforced in some way). This can be handy if you really know what you're doing, especially if you need to keep pinned handles on the array (or its part), which is often the case in eg. asynchronous networking - pinned handles can cause significant issues on the main managed heap, because it relies on compaction to work (it always allocates on the end, while LOH has a table of free spaces).
On an even lower level, you might want to for example restrict the array size to fit well into CPU cache, or a single memory page to improve performance (this is also significant with .NET arrays, which store the array size at the beginning of the array - this means that when accessing the array's items, bounds checking will need to load the start of the array, instead of just the requested item). However, by the time you start with optimizations like this, you're probably a bit of a specialist :)
In other words, well chosen buffer size can be simply a good practice leading to less issues. In the end, though, it's all about profiling - if you find a performance problem in your networking code, a badly chosen buffer size is one of the possible culprits.
Also: WOW. So many google results on the new byte[10025] snippet. I wonder where that value originated, because it's obvious that a lot of people just blindly copied it without understanding it at all, best evidenced by snippets like this:
byte[] inStream = new byte[10025];
bufferSize = clientSocket.ReceiveBufferSize;
serverStream.Read(inStream, 0, bufferSize);
Why the hell would you allocate a 10 025 byte buffer and then only ever read ReceiveBufferSize bytes into it? Not to mention that if ReceiveBufferSize (which has nothing to do with the data being sent) is bigger than 10 025 B, you're possibly going to get an out of bounds error. If you care about ReceiveBufferSize at all (and you probably shouldn't), why not create a new byte[clientSocket.ReceiveBufferSize] buffer in the first place?
Today I noticed something strange with the MemoryStream class. The .Length property is a long, but the .Capacity property, which should presumably always be >= .Length is only an int.
I know that it would take a stream of over a GB to for Length to exceed possible Capacity, but this seems very strange to me. Length can't be changed, because it's inherited from Stream, but why not make Capacity a long as well? What happens to the capacity if you do have a MemoryStream that exceeds int.MaxValue in length?
No, MemoryStream.Capacity can't exceed the int.MaxValue because memory stream is backed by a byte[] and arrays maximum length is int.MaxValue.
However, Stream.Length is long, that makes sense because stream can be anything, For example FileStream.Length can be greater than int.MaxValue undoubtedly.
A fundamental limitation in .NET, unfortunately, is that objects cannot exceed 2GB in size. The Stream class needs the long for its Length property, because a Stream can represent a resource outside of .NET (e.g. a file), but since MemoryStream is known to always be an in-memory, managed object, it is guaranteed to always be able to fit its Capacity in an int.
The Length property is inherited from Stream, while the Capacity property is declared for MemoryStream. Streams in general may be larger than 2GB, but this particular kind of stream never will be -- hence, the Capacity that is specific to MemoryStream is just an int.
All, I have the following Append which I am performing when I am producing a single line for a fixed text file
formattedLine.Append(this.reversePadding ?
strData.PadLeft(this.maximumLength) :
strData.PadRight(this.maximumLength));
This particular exception happens on the PadLeft() where this.maximumLength = 1,073,741,823 [a field length of an NVARCHAR(MAX) gathered from SQL Server]. formattedLine = "101102AA-1" at the time of exception so why is this happening. I should have a maximum allowed length of 2,147,483,647?
I am wondering if https://stackoverflow.com/a/1769472/626442 be the answer here - however, I am managing any memory with the appropriate Dispose() calls on any disposable objects and using block where possible.
Note. This fixed text export is being done on a background thread.
Thanks for your time.
This particular exception happens on the PadLeft() where this.maximumLength = 1,073,741,823
Right. So you're trying to create a string with over a billion characters in.
That's not going to work, and I very much doubt that it's what you really want to do.
Note that each char in .NET is two bytes, and also strings in .NET are null-terminated... and have some other fields beyond the data (the length, for one). That means you'd need at least 2147483652 bytes + object overhead, which pushes you over the 2GB-per-object limit.
If you're running on a 64-bit version of Windows, in .NET 4.5, there's a special app.config setting of <gcAllowVeryLargeObjects> that allows arrays bigger than 2GB. However, I don't believe that will change your particular use case:
Using this element in your application configuration file enables arrays that are larger than 2 GB in size, but does not change other limits on object size or array size:
The maximum number of elements in an array is UInt32MaxValue.
The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.
The maximum size for strings and other non-array objects is unchanged.
What would you want to do with such a string after creating it, anyway?
In order to allocate memory for this operation, the OS must find contiguous memory that is large enough to perform the operation.
Memory fragmentation can cause that to be impossible, especially when using a 32-bit .NET implementation.
I think there might be a better approach to what you are trying to accomplish. Presumably, this StringBuilder is going to be written to a file (that's what it sounds like from your description), and apparently, you are also potentially dealing with large (huge) database records.
You might consider a streaming approach, that wont require allocating such a huge block of memory.
To accomplish this you might investigate the following:
The SqlDataReader class exposes a GetChars() method, that allows you to read a chunk of a single large record.
Then, instead of using a StringBuilder, perhaps using a StreamWriter ( or some other TextWriter derived class) to write each chunk to the output.
This will only require having one buffer-full of the record in your application's memory space at one time. Good luck!
Is it worthwhile to initialize the collection size of a List<T> if it's reasonably known?
Edit: Furthering this question, after reading the first answers this question really boils down to what is the default capacity and how is the growth operation performed, does it double the capacity etc.?
Yes, it gets to be important when your List<T> gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add() call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add() calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.
This repeats as necessary, several copies of the internal array will become garbage.
Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is taken from the Large Object Heap.
Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to move such large chunks.
This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.
This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.
Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.
It is, as per documentation
If the size of the collection can be
estimated, specifying the initial
capacity eliminates the need to
perform a number of resizing
operations while adding elements to
the List(T).
Well, it will stop you the values in the list (which will be references if the element type is a reference type) from having to be copied occasionally as the list grows.
If it's going to be a particularly large list and you've got a pretty good idea of the size, it won't hurt. However, if estimating the size involves extra calculations or any significant amount of code, I wouldn't worry about it unless you find it becomes a problem - it could distract from the main focus of the code, and the resizing is unlikely to cause performance issues unless it's a really big list or you're doing it a lot.
I have the need to continuously build large strings in a loop and save them to database which currently occasionally yields an OutOfMemoryException.
What is basically going on here is I create a string using XmlWriter with StringBuilder based on some data. Then I call a method from an external library that converts this xml string to some other string. After that the converted string is saved to the database. This whole thing is done repeatedly in a loop about a 100 times for different data.
The strings by itself are not too big (below 500kByte each) and the process memory is not increasing during this loop. But still, occasionally I get a OutOfMemeoryExcpetion within StringBuilder.Append. Interestingly this exception does not result in a crash. I can catch that exception and continue the loop.
What is going on here? Why would I get an OutOfMemoryException although there is still enough free memory available in the system? Is this some GC heap problem?
Given that I can't circumvent converting all these strings, what could I do to make this work reliably? Should I force a GC collection? Should put a Thread.Sleep into the loop? Should I stop using StringBuilder? Should simply retry when confronted with a OutOfMemoryException?
There is memory but no contiguous segment that can handle the size of your string builder. You have to know that each time the buffer of the string builder is too short, its size is doubled. If you can define (in the ctor) the size of your builder, it's better.
You MAY call GC.Collect() when you are done with a large collection of objects.
Actually, when you have an OutOfMemory, it generaly shows a bad design, you may use the hard drive (temp files) instead of memory, you shouldn't allocate memory again and again (try to reuse objects/buffers/...).
I STRONGLY advice you to read this post “Out Of Memory” Does Not Refer to Physical Memory from Eric Lippert.
Try to reuse StringBuilder object when you do data generation.
After or before use just reset the size of the StringBuilder to 0 and start appending. This will decrease number of allocations and possibly make OutOfMemory situation very rare.
To illustrate my point:
void MainProgram()
{
StringBuilder builder = new StringBuilder(2 * 1024); //2 Kb
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
PerformOperation(builder);
}
void PerformOperation(StringBuilder builder)
{
builder.Length = 0;
//
// do the work here builder.Append(...);
//
}
With the sizes you mention you are probably running into Large Object Heap (LOH) fragmentation.
Reusing StringBuilder objects is not a direct solution, you need to get a grip on the underlying buffers.
If possible, calculate or estimate the size beforehand and pre-allocate.
And it could help if you round up allocations, let's say to multiples of 20k or so. That could improve reuse.