Objects added to the Large Object Heap - c#

I'm trying to debug the reason for high CPU usage in our legacy website, and from looking at some analysis in DebugDiag, I suspect that the amount of objects on the LOH, and subsequent GC collections, could be a reason. In one .dbg file we have ~3.5gb on the LOH with the majority of those objects being strings.
I'm aware that for objects to go on the LOH, they must be over 85000 bytes.
What I'm not sure of is if this refers to, for example, a single array. Or can it refer to a large object graph?
What I mean by that is if I have object Foo, that contains n other objects, each containing n objects themselves. With each of these objects containing strings, and the total size of Foo (and all child objects) being greater than 85000 bytes would Foo be placed on the LOH? Or, if somewhere in the Foo object graph there was a single array greater than 85000 bytes would it just be that array that got placed on the LOH?
Thanks.

You are right that if array is larger than 85000 then it will be consider as LOH not entire object. To explain this here I created example.
class Program
{
static void Main(string[] args)
{
Obj o = new Obj();
o.Allocate(85000);
Console.WriteLine(System.GC.GetGeneration(o));
Console.WriteLine(System.GC.GetGeneration(o.items));
Console.WriteLine(System.GC.GetGeneration(o.items2));
Console.WriteLine(System.GC.GetGeneration(o.Data));
Console.ReadLine();
}
class Obj
{
public byte[] items = null;
public byte[] items2 = null;
public string Data = string.Empty;
public void Allocate(int i)
{
items = new byte[i];
items2 = new byte[10];
Data = System.Text.Encoding.UTF8.GetString(items);
}
}
}
Here if you notice that string data. It is also consider as LOH because string is character array. Items2 is not LOH and Items is LOH but actual object o is not LOH.

Only individual objects greater than 85,000 bytes go into the LOH. An object graph can total more than that but not be in the LOH so long as no individual object crosses the limit. Byte arrays and strings are the most typical culprits.

I agree with #daspek and #dotnetstep.
For instance in #dotnetstep's example, o should have a size of 12bytes on a 32bit machine and 24bytes on a 64bit machine.
All the fields defined in the Obj class are reference types and so all Obj stores are pointers to the heap locations where its child elements (in this case arrays) are created.
Now each of these objects are arrays (string being an array of char[]) so they can get placed on the LOH if they exceed the limit of 85000bytes.
The only way a regular object can get placed on the LOH everytime it is created is if it has 21251 or more fields on a 32 bit machine and 10626 or more fields on a 64 bit machine if all the fields are REFERENCE types. For all intents and purposes, that is not usually a practical kind of class create; it will be extremely rare to come across such a class.
In an instance where the contained fields are structs, they are considered to be part of the class's definition. So rather than hold a 4byte heap address, the contents of the struct are part of the class's layout - a good number of fields with large structs may not quickly get one to the limit of 85000bytes but it can still be breached.

Related

c# Is it possible to recreate an array?

I have a class, CircularBuffer, which has a method CreateBuffer. The class does a number of things but occasionally I need to change the size of an array that is used in the class. I do not need the data any longer. Here is the class:
static class CircularBuffer
{
static Array[,] buffer;
static int columns, rows;
public static void CreateBuffer(int columns, int rows)
{
buffer = new Array[rows,columns];
}
//other methods that use the buffer
}
Now the size of the buffer is up to 100 x 2048 floats. Is this going to cause any memory issues, or will it be automatically replaced with no issues?
Thanks
You are, technically speaking, not recreating anything. You are simply creating a new array and overwriting the variable's value (the address, so to speak, of the array its referencing).
It's important therefore that you distinguish what you are really replacing; you are not replacing the array, only the reference to it.
Problems? None. By your code, the old array will not be reachable anymore and will therefore be eligible for collection by the GC. If the collection ever happens is up to the GC but its not something you should worry about.

Memory consumption when initializing object

I am trying to build some objects and insert them into a database. The number of records that have to be inserted is big ~ millions.
The insert is done in batches.
The problem I am having is that i need to initialize new objects to add them to a list and at the end, i do a bulk insert into the database of the list. Because i am initializing a huge number of objects, my computer memory(RAM) gets filled up and it kinda freezes everything.
The question is :
From a memory point of view, should I initialize objects of set them to null ?
Also, I am trying to work with the same object reference. Am i doing it right ?
Code:
QACompleted completed = new QACompleted();
QAUncompleted uncompleted = new QAUncompleted();
QAText replaced = new QAText();
foreach (QAText question in questions)
{
MatchCollection matchesQ = rgx.Matches(question.Question);
MatchCollection matchesA = rgx.Matches(question.Answer);
foreach (GetKeyValues_Result item in values)
{
hasNull = false;
replaced = new QAText(); <- this object
if (matchesQ.Count > 0)
{
SetQuestion(matchesQ, replaced, question, item);
}
else
{
replaced.Question = question.Question;
}
if (matchesA.Count > 0)
{
SetAnswer(matchesA,replaced,question,item);
}
else
{
replaced.Answer = question.Answer;
}
if (!hasNull)
{
if (matchesA.Count == 0 && matchesQ.Count == 0)
{
completed = new QACompleted(); <- this object
MapEmpty(replaced,completed, question.Id);
}
else
{
completed = new QACompleted(); <- this object
MapCompleted(replaced, completed, question.Id, item);
}
goodResults.Add(completed);
}
else
{
uncompleted = new QAUncompleted(); <- this object
MapUncompleted(replaced,uncompleted,item, question.Id);
badResults.Add(uncompleted);
}
}
var success = InsertIntoDataBase(goodResults, "QACompleted");
var success1 = InsertIntoDataBase(badResults, "QAUncompleted");
}
I have marked the objects. Should I just call them like replaced = NULL, or should i use the constructor ?
What would be the difference between new QAText() and = null ?
The memory cost of creating objects
Creating objects in C# will always have a memory cost. This relates to the memory layout of object. Assuming you are using 64 bit OS, the runtime has to allocate an extra 8 bytes for sync block, and 8 bytes for method table pointer. After the sync block and method table pointer are your customized data fields. Besides the inevitable 16 bytes header, objects are always aligned to the boundary of 8 bytes and therefore can incur extra overhead.
You can roughly estimate the memory overhead if you know exactly what is the number of objects you create. However I would suggest you be careful when assuming that your memory pressure is coming from object layout overhead. This is also the reason I suggest you estimate the overhead as the first step. You might end up realizing that even if the layout overhead can magically be completely removed, you are not going to make a huge difference in terms of memory performance. After all, for a million objects, the overhead of object header is only 16 MB.
The difference between replaced = new QAText() and replaced = null
I suppose after you set replaced to null you still have to create another QAText()? If so, memory-wise there is no real difference to the garbage collector. The old QAText instance will be collected either way if you are not making any other reference to it. When to collect the instance, however, is the call of garbage collector. Doing replaced = null will not make the GC happen earlier.
You can try to reuse the same QAText instance instead of creating a new one every time. But creating a new one every time will not result in high memory pressure. It will make the GC a little busier therefore result in a higher CPU usage.
Identify the real cause for high memory usage
If your application is really using a lot of memory, you have to look at the design of your QACompleted and QAUncompleted objects. Those are the objects added to the list and occupy memory until you submit them to the database. If those objects are designed well(they are only taking the memory they have to take), as Peter pointed out you should use a smaller batch size so you don't have to keep too many of them in memory.
There are other factors in your program that can possible cause unexpected memory usage. What is the data structure for goodResults and badResults? Are they List or LinkedList? List internally is nothing but a dynamic array. It uses a grow policy which will always double its size when it is full. The always-double policy can eat up memory quickly especially when you have a lot of entries.
LinkedList, on the other side, does not suffer from the above-mentioned problem. But every single node requires roughly 40 extra bytes.
It also worth-checking what MapCompleted and MapUnCompleted methods are doing. Are they making long-lived reference to replaced object? If so it will cause a memory leak.
As a summary, when dealing with memory problems, you should focus on macro-scope issues such as the choice of data structures, or memory leaks. Or optimize your algorithms so that you don't have to keep all the data in memory all the time.
Instantiating new (albeit empty) object always takes some memory, as it has to allocate space for the object's fields. If you aren't going to access or set any data in the instance, I see no point in creating it.
It's unfortunate that the code example is not written better. There seem to be lots of declarations left out, and undocumented side-effects in the code. This makes it very hard to offer specific advice.
That said…
Your replaced object does not appear to be retained beyond one iteration of the loop, so it's not part of the problem. The completed and uncompleted objects are added to lists, so they do add to your memory consumption. Likewise the goodResults and badResults lists themselves (where are the declarations for those?).
If you are using a computer with too little RAM, then yes...you'll run into performance issues as Windows uses the disk to make up for the lack of RAM. And even with enough RAM, at some point you could run into .NET's limitations with respect to object size (i.e. you can only put so many elements into a list). So one way or the other, you seem to need to reduce your peak memory usage.
You stated that when the data in the lists is inserted into the database, the lists are cleared. So presumably that means that there are so many elements in the values list (one of the undeclared, undocumented variables in your code example) that the lists and their objects get too large before getting to the end of the inner loop and inserting the data into the database.
In that case, then it seems likely the simplest way to address the issue is to submit the updates in batches inside the inner foreach loop. E.g. at the end of that loop, add something like this:
if (goodResults.Count >= 100000)
{
var success = InsertIntoDataBase(goodResults, "QACompleted");
}
if (badResults.Count >= 100000)
{
var success = InsertIntoDataBase(badResults, "QACompleted");
}
(Declaring the actual cut-off as a named constant of course, and handling the database insert result return value as appropriate).
Of course, you would still do the insert at the end of the outer loop too.

.NET small class with many instances optimization scenario?

I have millions of instances of class Data, I seek optimization advise.
Is there a way to optimize it in any way - save memory for example by serializing it somehow, although it will hurt the retrieval speed which is important too. Maybe turning the class to struct - but it seems that the class is pretty large for struct.
Queries for this objects can take hundreds-millions of these objects at a time. They sit in a list and queried by DateTime. The results are aggregated in different ways, many calculation can be applied.
[Serializable]
[DataContract]
public abstract class BaseData {}
[Serializable]
public class Data : BaseData {
public byte member1;
public int member2;
public long member3;
public double member4;
public DateTime member5;
}
Unfortunately, while you did specify that you want to "optimize", you did not specify what the exact problem is you mean to tackle. So I cannot really give you more than general advice.
Serialization will not help you. Your Data objects are already stored as bytes in memory. Nor will turning it into a struct help. The difference between a struct and a class lies in their addressing and referencing behaviour, not in their memory footprint.
The only way I can think of to reduce the memory footprint of a collection with "hundreds-millions" of these objects would be to serialize and compress the entire thing. But that is not feasible. You would always have to decompress the entire thing before accessing it, which would shoot your performance to hell AND actually almost double the memory consumption on access (compressed and decompressed data both lying in memory at that point).
The best general advice I can give you is not to try to optimize this scenario yourself, but use specialized software for that. By specialized software, I mean a (in-memory) database. One example of a database you can use in-memory, and for which you already have everything you need on-board in the .NET framework, is SQLite.
I assume, as you seem to imply, that you have a class with many members, have a large number of instances, and need to keep them all in memory at the same time to perform calculations.
I ran a few tests to see if you could actually get different sizes for the classes you described.
I used this simple method for finding the in-memory size of an object:
private static void MeasureMemory()
{
int size = 10000000;
object[] array = new object[size];
long before = GC.GetTotalMemory(true);
for (int i = 0; i < size; i++)
{
array[i] = new Data();
}
long after = GC.GetTotalMemory(true);
double diff = after - before;
Console.WriteLine("Total bytes: " + diff);
Console.WriteLine("Bytes per object: " + diff / size);
}
It may be primitive, but I find that it works fine for situations like this.
As expected, almost nothing you can do to that class (turning it to a struct, removing the inheritance, or the method attributes) influences the memory being used by a single instance. As far as memory usage goes, they are all equivalent. However, do try to fiddle with your actual classes and run them through the given code.
The only way you could actually reduce the memory footprint of an instance would be to use smaller structures for keeping your data (int instead of long for example). If you have a large number of booleans, you could group them into a byte or integer, and have simple property wrappers to work with them (A boolean takes a byte of memory). These may be insignificant things in most situations, but for a hundred million objects, removing a boolean could make a difference of a hundred MB of memory. Also, be aware that the platform target you choose for your application can have an impact on the memory footprint of an object (x64 builds take up more memory then x86 ones).
Serializing the data is very unlikely to help. An in-memory database has it's upsides, especially if you are doing complex queries. However, it is unlikely to actually reduce the memory usage for your data. Unfortunately, there just aren't many ways to reduce the footprint of basic data types. At some point, you will just have to move to a file-based database.
However, here are some ideas. Please be aware that they are hacky, highly conditional, decrease the computation performance and will make the code harder to maintain.
It is often a case in large data structures that objects in different states will have only some properties filled, and the other will be set to null or a default value. If you can identify such groups of properties, perhaps you could move them to a sub-class, and have one reference that could be null instead of having several properties take up space. Then you only instantiate the sub-class once it is needed. You could write property wrappers that could hide this from the rest of the code. Have in mind that the worst case scenario here would have you keeping all the properties in memory, plus several object headers and pointers.
You could perhaps turn members that are likely to take a default value into binary representations, and then pack them into a byte array. You would know which byte positions represent which data member, and could write properties that could read them. Position the properties that are most likely to have a default value at the end of the byte array (a few longs that are often 0 for example). Then, when creating the object, adjust the byte array size to exclude the properties that have the default value, starting from the end of the list, until you hit the first member that has a non-default value. When the outside code requests a property, you can check if the byte array is large enough to hold that property, and if not, return the default value. This way, you could potentially save some space. Best case, you will have a null pointer to a byte array instead of several data members. Worst case, you will have full byte arrays taking as much space as the original data, plus some overhead for the array. The usefulness depends on the actual data, and assumes that you do relatively few writes, as the re-computation of the array will be expensive.
Hope any of this helps :)

What data structure is used to implement the arraylist

What data structure is used in building arraylist, as we are able to add/delete values dynamically on it.
I was assuming that its using linkedlist but after doing some google, i found that its using vector.. but no more details about it.
On modern processors, the memory cache is king. Using the cache efficiently makes an enormous difference, the processor can easily be stalled for hundreds of cycles when the program accesses an address whose content isn't cached, waiting for the very slow memory bus to supply the data.
Accessing memory is most efficient when you access it sequentially. The odds that a byte will be available in the cache is then the greatest, it is very likely to be present in the same cache line. Which makes arrays by far the most efficient collection object, assuming you index the array elements sequentially.
Accordingly, all .NET collection classes except LinkedList use arrays to store data. Including hashed collections (Hashtable, Dictionary, Hashset), they use an array of arrays. Including ArrayList. LinkedList should be avoided due to its very poor cache locality, except when cheap inserts and deletes at random known locations is the primary concern.
A problem with arrays is that their size is fixed which makes it difficult to implement auto-sizing collections, like ArrayList. This is solved by intentionally wasting address space. Whenever the array fills up to capacity, the array is reallocated and the elements are copied. The reallocation is double the previous size, you can observe this from the Capacity property. While this sounds expensive, the algorithm is amortized O(1) and the virtual memory sub-system in the operating system ensures that you don't actually pay for memory that you don't use.
You can avoid the not-so-cheap copying by guessing the Capacity up front. More details about that in this answer.
Arraylist internally use arrays to store the data and resize the array when ever needed.
The java implementation of Arraylist internally creates an array with initial size and resizes the array.
You can see the implementation here:
http://www.docjar.com/html/api/java/util/ArrayList.java.html
This is for Java, but the concepts are same for .NET.
From the MSDN page:
Implements the IList interface using an array whose size is dynamically increased as required.
Some of the benefits of using the class instead of an array directly are:
it can be used anywhere an IList
it handles the resizing and copying for you when adding/removing items from the middle of the array
it keeps track of the 'last' item in the array
it provides methods for binary searching for items in the array
See here: ArrayList source
As already mentioned, it is an array underneath.
private object[] _items;
Here is the Add() method:
public virtual int Add(object value)
{
if (this._size == this._items.Length)
{
this.EnsureCapacity(this._size + 1);
}
this._items[this._size] = value;
ArrayList expr_2D = this;
ArrayList arg_2E_0 = expr_2D;
expr_2D._version = arg_2E_0._version + 1;
ArrayList expr_3B = this;
ArrayList arg_3C_0 = expr_3B;
ArrayList arg_45_0 = expr_3B;
int expr_41 = arg_3C_0._size;
int arg_42_0 = expr_41;
int arg_44_0 = expr_41;
int i = arg_42_0;
arg_45_0._size = arg_44_0 + 1;
return i;
}
As you can see, EnsureCapacity is called...which ends up calling set_Capacity:
public virtual void set_Capacity(int value)
{
if (value < this._size)
{
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_SmallCapacity"));
}
if (value != this._items.Length)
{
if (value <= 0)
{
this._items = new object[4];
goto IL_65;
}
object[] array = new object[value];
if (this._size > 0)
{
Array.Copy(this._items, 0, array, 0, this._size);
}
this._items = array;
return;
}
IL_65:
}
Where the entire array is copied to a larger array if the capacity needs to be increased.
The ArrayList stores values internally as an array of objects and exposes some public helper methods to make working with the array easier (exposed via the IList interface).
When items are inserted, all the elements to the right of the insertion point are moved to the right, making inserts rather inefficient. Appends, on the other hand, are quick because there is no need to shift elements (unless the internal array has reached capacity, in which case it is replaced with a larger array).
Because the values are stored internally as an array, it provides the advantages of arrays (such as efficient searches if the values are sorted).

Understanding Memory

In C#, does the following save any memory?
private List<byte[]> _stream;
public object Stream
{
get
{
if (_stream == null)
{
_stream = new List<byte[]>();
}
return _stream;
}
}
Edit: sorry, I guess I should have been more specific.
Specifically using "object" instead of List... I thought that would kinda clue itself in because it's a weird thing to do.
It saves a very small amount of memory. The amount of memory an empty List<byte[]> is going to take up is byte size.
The reason why is that your reference variable _stream only needs to allocate enough memory to hold a reference to an object. Once an object is allocated, it will take up a certain amount of memory which may grow or shrink over time, such as when new byte[]s are added to the List. However the memory taken up by the reference to that object will remain the same size.
This is simpler and less prone to corner cases that cause you headaches:
private List<byte[]> _stream = new List<byte[]>();
public object Stream
{
get
{
return _stream;
}
}
Although, in most cases it's not really optimal to be returning references to private members when they are collections/arrays, etc. Better to return _stream.AsReadOnlyCollection().
Save memory compared to what?
byte[][] _stream;
maybe? Then no, a List<T> will take up more memory since it is an array at its heart (which isn't necessarily exactly the size of its contents, but usually larger) and some statekeeping needs to be done too.
That is a lazy loading. You will create the stream only when someone requests it. It will not create the stream (in your case a list) unless is required.
One might say that it saves some memory because it will not use any unless required. So before using the stream there is no memory allocated for it.
If your edit indicates that you are asking whether the use of the object keyword instead of List<byte[]> as the type of the property saves memory, no, it doesn't. And your if block only saves a negligible amount of memory (and cpu at instantiation) until the first time the property is called. And it does make the first call to that property slightly slower. Consider returning a null instead if it makes sense for the property. And, like another answerer suggested, it may be better to keep the property read-only unless you'd like other classes to be altering it. In general, I'd say attempts at optimization like this are mostly misguided and make your code less maintainable.
Are you sure a Stream wouldn't be just a byte[] or a List of byte? Or even better, a MemoryStream? :) I think you are somewhat confused, so a bigger example and some scenario details will help a lot.
What are objects really
I'd suggest thinking in objects as structs... and object references as pointers to that structure.
If you instantiate an object you are reserving memory for an "struct" with all its fields (and a reference to the class it's implementing), plus all memory reserved by the constructor (other objects, arrays, etc...).
In List you are reserving memory for state keeping (I don't know how it's implemented in C#) and the initial internal array, maybe of ten references. So... if you count its something like (assuming 32 bits runtime, I'm not a .net specialist):
pointer to class: 4 bytes
pointer to array: 4 bytes
array of initialCapacity references: 40 bytes
So in my estimation it's about 48 bytes. But it depends on the implementation.
As SoloBold says: most of times it's not worthy.

Categories