I have a class, CircularBuffer, which has a method CreateBuffer. The class does a number of things but occasionally I need to change the size of an array that is used in the class. I do not need the data any longer. Here is the class:
static class CircularBuffer
{
static Array[,] buffer;
static int columns, rows;
public static void CreateBuffer(int columns, int rows)
{
buffer = new Array[rows,columns];
}
//other methods that use the buffer
}
Now the size of the buffer is up to 100 x 2048 floats. Is this going to cause any memory issues, or will it be automatically replaced with no issues?
Thanks
You are, technically speaking, not recreating anything. You are simply creating a new array and overwriting the variable's value (the address, so to speak, of the array its referencing).
It's important therefore that you distinguish what you are really replacing; you are not replacing the array, only the reference to it.
Problems? None. By your code, the old array will not be reachable anymore and will therefore be eligible for collection by the GC. If the collection ever happens is up to the GC but its not something you should worry about.
Related
I have millions of instances of class Data, I seek optimization advise.
Is there a way to optimize it in any way - save memory for example by serializing it somehow, although it will hurt the retrieval speed which is important too. Maybe turning the class to struct - but it seems that the class is pretty large for struct.
Queries for this objects can take hundreds-millions of these objects at a time. They sit in a list and queried by DateTime. The results are aggregated in different ways, many calculation can be applied.
[Serializable]
[DataContract]
public abstract class BaseData {}
[Serializable]
public class Data : BaseData {
public byte member1;
public int member2;
public long member3;
public double member4;
public DateTime member5;
}
Unfortunately, while you did specify that you want to "optimize", you did not specify what the exact problem is you mean to tackle. So I cannot really give you more than general advice.
Serialization will not help you. Your Data objects are already stored as bytes in memory. Nor will turning it into a struct help. The difference between a struct and a class lies in their addressing and referencing behaviour, not in their memory footprint.
The only way I can think of to reduce the memory footprint of a collection with "hundreds-millions" of these objects would be to serialize and compress the entire thing. But that is not feasible. You would always have to decompress the entire thing before accessing it, which would shoot your performance to hell AND actually almost double the memory consumption on access (compressed and decompressed data both lying in memory at that point).
The best general advice I can give you is not to try to optimize this scenario yourself, but use specialized software for that. By specialized software, I mean a (in-memory) database. One example of a database you can use in-memory, and for which you already have everything you need on-board in the .NET framework, is SQLite.
I assume, as you seem to imply, that you have a class with many members, have a large number of instances, and need to keep them all in memory at the same time to perform calculations.
I ran a few tests to see if you could actually get different sizes for the classes you described.
I used this simple method for finding the in-memory size of an object:
private static void MeasureMemory()
{
int size = 10000000;
object[] array = new object[size];
long before = GC.GetTotalMemory(true);
for (int i = 0; i < size; i++)
{
array[i] = new Data();
}
long after = GC.GetTotalMemory(true);
double diff = after - before;
Console.WriteLine("Total bytes: " + diff);
Console.WriteLine("Bytes per object: " + diff / size);
}
It may be primitive, but I find that it works fine for situations like this.
As expected, almost nothing you can do to that class (turning it to a struct, removing the inheritance, or the method attributes) influences the memory being used by a single instance. As far as memory usage goes, they are all equivalent. However, do try to fiddle with your actual classes and run them through the given code.
The only way you could actually reduce the memory footprint of an instance would be to use smaller structures for keeping your data (int instead of long for example). If you have a large number of booleans, you could group them into a byte or integer, and have simple property wrappers to work with them (A boolean takes a byte of memory). These may be insignificant things in most situations, but for a hundred million objects, removing a boolean could make a difference of a hundred MB of memory. Also, be aware that the platform target you choose for your application can have an impact on the memory footprint of an object (x64 builds take up more memory then x86 ones).
Serializing the data is very unlikely to help. An in-memory database has it's upsides, especially if you are doing complex queries. However, it is unlikely to actually reduce the memory usage for your data. Unfortunately, there just aren't many ways to reduce the footprint of basic data types. At some point, you will just have to move to a file-based database.
However, here are some ideas. Please be aware that they are hacky, highly conditional, decrease the computation performance and will make the code harder to maintain.
It is often a case in large data structures that objects in different states will have only some properties filled, and the other will be set to null or a default value. If you can identify such groups of properties, perhaps you could move them to a sub-class, and have one reference that could be null instead of having several properties take up space. Then you only instantiate the sub-class once it is needed. You could write property wrappers that could hide this from the rest of the code. Have in mind that the worst case scenario here would have you keeping all the properties in memory, plus several object headers and pointers.
You could perhaps turn members that are likely to take a default value into binary representations, and then pack them into a byte array. You would know which byte positions represent which data member, and could write properties that could read them. Position the properties that are most likely to have a default value at the end of the byte array (a few longs that are often 0 for example). Then, when creating the object, adjust the byte array size to exclude the properties that have the default value, starting from the end of the list, until you hit the first member that has a non-default value. When the outside code requests a property, you can check if the byte array is large enough to hold that property, and if not, return the default value. This way, you could potentially save some space. Best case, you will have a null pointer to a byte array instead of several data members. Worst case, you will have full byte arrays taking as much space as the original data, plus some overhead for the array. The usefulness depends on the actual data, and assumes that you do relatively few writes, as the re-computation of the array will be expensive.
Hope any of this helps :)
What data structure is used in building arraylist, as we are able to add/delete values dynamically on it.
I was assuming that its using linkedlist but after doing some google, i found that its using vector.. but no more details about it.
On modern processors, the memory cache is king. Using the cache efficiently makes an enormous difference, the processor can easily be stalled for hundreds of cycles when the program accesses an address whose content isn't cached, waiting for the very slow memory bus to supply the data.
Accessing memory is most efficient when you access it sequentially. The odds that a byte will be available in the cache is then the greatest, it is very likely to be present in the same cache line. Which makes arrays by far the most efficient collection object, assuming you index the array elements sequentially.
Accordingly, all .NET collection classes except LinkedList use arrays to store data. Including hashed collections (Hashtable, Dictionary, Hashset), they use an array of arrays. Including ArrayList. LinkedList should be avoided due to its very poor cache locality, except when cheap inserts and deletes at random known locations is the primary concern.
A problem with arrays is that their size is fixed which makes it difficult to implement auto-sizing collections, like ArrayList. This is solved by intentionally wasting address space. Whenever the array fills up to capacity, the array is reallocated and the elements are copied. The reallocation is double the previous size, you can observe this from the Capacity property. While this sounds expensive, the algorithm is amortized O(1) and the virtual memory sub-system in the operating system ensures that you don't actually pay for memory that you don't use.
You can avoid the not-so-cheap copying by guessing the Capacity up front. More details about that in this answer.
Arraylist internally use arrays to store the data and resize the array when ever needed.
The java implementation of Arraylist internally creates an array with initial size and resizes the array.
You can see the implementation here:
http://www.docjar.com/html/api/java/util/ArrayList.java.html
This is for Java, but the concepts are same for .NET.
From the MSDN page:
Implements the IList interface using an array whose size is dynamically increased as required.
Some of the benefits of using the class instead of an array directly are:
it can be used anywhere an IList
it handles the resizing and copying for you when adding/removing items from the middle of the array
it keeps track of the 'last' item in the array
it provides methods for binary searching for items in the array
See here: ArrayList source
As already mentioned, it is an array underneath.
private object[] _items;
Here is the Add() method:
public virtual int Add(object value)
{
if (this._size == this._items.Length)
{
this.EnsureCapacity(this._size + 1);
}
this._items[this._size] = value;
ArrayList expr_2D = this;
ArrayList arg_2E_0 = expr_2D;
expr_2D._version = arg_2E_0._version + 1;
ArrayList expr_3B = this;
ArrayList arg_3C_0 = expr_3B;
ArrayList arg_45_0 = expr_3B;
int expr_41 = arg_3C_0._size;
int arg_42_0 = expr_41;
int arg_44_0 = expr_41;
int i = arg_42_0;
arg_45_0._size = arg_44_0 + 1;
return i;
}
As you can see, EnsureCapacity is called...which ends up calling set_Capacity:
public virtual void set_Capacity(int value)
{
if (value < this._size)
{
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_SmallCapacity"));
}
if (value != this._items.Length)
{
if (value <= 0)
{
this._items = new object[4];
goto IL_65;
}
object[] array = new object[value];
if (this._size > 0)
{
Array.Copy(this._items, 0, array, 0, this._size);
}
this._items = array;
return;
}
IL_65:
}
Where the entire array is copied to a larger array if the capacity needs to be increased.
The ArrayList stores values internally as an array of objects and exposes some public helper methods to make working with the array easier (exposed via the IList interface).
When items are inserted, all the elements to the right of the insertion point are moved to the right, making inserts rather inefficient. Appends, on the other hand, are quick because there is no need to shift elements (unless the internal array has reached capacity, in which case it is replaced with a larger array).
Because the values are stored internally as an array, it provides the advantages of arrays (such as efficient searches if the values are sorted).
Is this going to break? It compiles fine but based on readings, I'm unsure if its guaranteed that _ptRef will always point to the struct referenced in the constructor.
I guess by 'break' I mean...will the GC move the struct pointed to by the pointer (_ptRef)?
public unsafe class CPointType0
{
private PointType0* _ptRef = null;
public CPointType0(ref PointType0 spt)
{
fixed (PointType0 * pt = &spt)
{
_ptRef = pt;
}
}
...a bunch of property accessors to fields of _ptRef (accessed as return _ptRef->Thing) }
The scenario is
-PointType0 is a struct.
-millions of PointType0's in memory in a data structure. These used to be reference types but there gets to be way too much memory overhead.
-A List is returned only when a search operation finds a relevant PointType0, and this List is passed around and operated on a lot.
It's not safe.
After the code leaves the fixed block, the garbage collector is free to move things around again. What are you trying to accomplish here? Do you maybe want to use the index of an item in the list, instead of a pointer?
In C#, does the following save any memory?
private List<byte[]> _stream;
public object Stream
{
get
{
if (_stream == null)
{
_stream = new List<byte[]>();
}
return _stream;
}
}
Edit: sorry, I guess I should have been more specific.
Specifically using "object" instead of List... I thought that would kinda clue itself in because it's a weird thing to do.
It saves a very small amount of memory. The amount of memory an empty List<byte[]> is going to take up is byte size.
The reason why is that your reference variable _stream only needs to allocate enough memory to hold a reference to an object. Once an object is allocated, it will take up a certain amount of memory which may grow or shrink over time, such as when new byte[]s are added to the List. However the memory taken up by the reference to that object will remain the same size.
This is simpler and less prone to corner cases that cause you headaches:
private List<byte[]> _stream = new List<byte[]>();
public object Stream
{
get
{
return _stream;
}
}
Although, in most cases it's not really optimal to be returning references to private members when they are collections/arrays, etc. Better to return _stream.AsReadOnlyCollection().
Save memory compared to what?
byte[][] _stream;
maybe? Then no, a List<T> will take up more memory since it is an array at its heart (which isn't necessarily exactly the size of its contents, but usually larger) and some statekeeping needs to be done too.
That is a lazy loading. You will create the stream only when someone requests it. It will not create the stream (in your case a list) unless is required.
One might say that it saves some memory because it will not use any unless required. So before using the stream there is no memory allocated for it.
If your edit indicates that you are asking whether the use of the object keyword instead of List<byte[]> as the type of the property saves memory, no, it doesn't. And your if block only saves a negligible amount of memory (and cpu at instantiation) until the first time the property is called. And it does make the first call to that property slightly slower. Consider returning a null instead if it makes sense for the property. And, like another answerer suggested, it may be better to keep the property read-only unless you'd like other classes to be altering it. In general, I'd say attempts at optimization like this are mostly misguided and make your code less maintainable.
Are you sure a Stream wouldn't be just a byte[] or a List of byte? Or even better, a MemoryStream? :) I think you are somewhat confused, so a bigger example and some scenario details will help a lot.
What are objects really
I'd suggest thinking in objects as structs... and object references as pointers to that structure.
If you instantiate an object you are reserving memory for an "struct" with all its fields (and a reference to the class it's implementing), plus all memory reserved by the constructor (other objects, arrays, etc...).
In List you are reserving memory for state keeping (I don't know how it's implemented in C#) and the initial internal array, maybe of ten references. So... if you count its something like (assuming 32 bits runtime, I'm not a .net specialist):
pointer to class: 4 bytes
pointer to array: 4 bytes
array of initialCapacity references: 40 bytes
So in my estimation it's about 48 bytes. But it depends on the implementation.
As SoloBold says: most of times it's not worthy.
In C#, it's possible to declare a struct (or class) that has a pointer type member, like this:
unsafe struct Node
{
public Node* NextNode;
}
Is it ever safe (err.. ignore for a moment that ironic little unsafe flag..) to use this construction? I mean for longterm storage on the heap. From what I understand, the GC is free to move things around, and while it updates the references to something that's been moved, does it update pointers too? I'm guessing no, which would make this construction very unsafe, right?
I'm sure there are way superior alternatives to doing this, but call it morbid curiosity.
EDIT: There appears to be some confusion. I know that this isn't a great construction, I purely want to know if this is ever a safe construction, ie: is the pointer guaranteed to keep pointing to whatever you originally pointed it to?
The original C-code was used to traverse a tree (depth first) without recursion, where the tree is stored in an array. The array is then traversed by incrementing a pointer, unless a certain condition is met, then the pointer is set to the NextNode, where traversal continues. Of course, the same can in C# be accomplished by:
struct Node
{
public int NextNode;
... // other fields
}
Where the int is the index in the array of the next node. But for performance reasons, I'd end up fiddling with pointers and fixed arrays to avoid bounds checks anyway, and the original C-code seemed more natural.
Is it ever safe to use this construction? I mean for long term storage on the heap.
Yes. Doing so is usually foolish, painful and unnecessary, but it is possible.
From what I understand, the GC is free to move things around, and while it updates the references to something that's been moved, does it update pointers too?
No. That's why we make you mark it as unsafe.
I'm guessing no, which would make this construction very unsafe, right?
Correct.
I'm sure there are way superior alternatives to doing this, but call it morbid curiosity.
There certainly are.
is the pointer guaranteed to keep pointing to whatever you originally pointed it to?
Not unless you ensure that happens. There are two ways to do that.
Way one: Tell the garbage collector to not move the memory. There are two ways to do that:
Fix a variable in place with the "fixed" statement.
Use interop services to create a gc handle to the structures you wish to keep alive and in one place.
Doing either of these things will with high likelihood wreck the performance of the garbage collector.
Way two: Don't take references to memory that the garbage collector can possibly move. There are two ways to do that:
Only take addresses of local variables, value parameters, or stack-allocated blocks. Of course, in doing so you are then required to ensure that the pointers do not survive longer than the relevant stack frame, otherwise, you're referencing garbage.
Allocate a block out of the unmanaged heap and then use pointers inside that block. In essence, implement your own memory manager. You are required to correctly implement your new custom memory manager. Be careful.
Some obvious integrity checks have been excluded. The obvious problem with this is you have to allocate more than you will need because you cannot reallocate the buffer as the keyword fixed implies.
public unsafe class NodeList
{
fixed Node _Nodes[1024];
Node* _Current;
public NodeList(params String[] data)
{
for (int i = 0; i < data.Length; i++)
{
_Nodes[i].Data = data[i];
_Nodes[i].Next = (i < data.Length ? &_Nodes[i + 1] : null);
}
_Current = &_Nodes[0];
}
public Node* Current()
{
return _Current++;
}
}
public unsafe struct Node
{
public String Data;
public Node* Next;
}
Why not:
struct Node
{
public Node NextNode;
}
or at least:
struct Node
{
public IntPtr NextNode;
}
You could use the fixed statement to prevent the GC to move pointers around.
Yes, the garbage collector can move the objects around and, no, it will not update your pointers. You need to fix the objects you point to. More information can be found on this memory management explanation.
You can fix objects like this:
unsafe {
fixed (byte* pPtr = object) {
// This will fix object in the memory
}
}
}
The advantages of pointers are usually performance and interaction with other unsafe code. There will be no out-of-bounds checks etc, speeding up your code. But just as if you were programming in e.g. C you have to be very careful of what you are doing.
A dangerous idea, but it may work:
When your array of structs exceeds a certain size (85000 bytes) it will be allocated on the Large Object Heap where blocks are scanned and collected but not moved...
The linked article points out the danger that a newer CLR version might move stuff on the LOH...