I have an array inside a class:
class MatchNode
{
public short X;
public short Y;
public NodeVal[] ControlPoints;
private MatchNode()
{
ControlPoints = new NodeVal[4];
}
}
The NodeVal is:
struct NodeVal
{
public readonly short X;
public readonly short Y;
public NodeVal(short x, short y)
{
X = x;
Y = y;
}
}
Now what if we wanted to take performance to next level and avoid having a separate object for the array. Actually it doesn't have to have an array. The only restriction is that the client code should be able to access NodeVal by index like:
matchNode.ControlPoints[i]
OR
matchNode[i]
and of course the solution should be faster or as fast as array access since it's supposed to be an optimization.
EDIT: As Ryan suggested it seems I should explain more about the motivation:
The MatchNode class is used heavily in the project. Millions of them are used in the project and each are accessed hundreds of times so having them as compact and concise as possible can lead to less cache misses and overall performance.
Let's consider a 64bit machine. In the current implementation the class the array takes 8 bytes for the ControlPoints reference and the size of the array object would be at least 16 bytes of object overhead (for method table and sync block) and 16 byte for the actual byte. So we have at least 24 overhead bytes beside 16 bytes of actual data.
These objects are used in bottlenecks of the project so it matters if we could optimize them more.
Of course we could just have a super big array of NodeVal and just save an index in MatchNode that would locate the actual data but again it will change every client codes that uses the MatchNodes, let alone be a dirty non-object oriented solution.
It is okay to have a messy MatchNode that uses every kind of nasty trick like unsafe or static cache code. It is not okay to leak these optimizations out to the client code.
You´re looking for indexers:
class MatchNode
{
public short X;
public short Y;
private NodeVal[] myField;
public NodeVal this[int i] { get { return myField[i]; } set { myField[i] = value; } }
public MatchNode(int size) { this.myField = new NodeVal[size]; }
}
Now you can simply use this:
var m = new MatchNode(10);
m[0] = new NodeVal();
However I doubt this will affect performance (at least in means of speed) in any way and you should consider the actual problems using a profiling tool (dotTrace for instance). Furthermore this approach will also create a private backing-field which will produce the same memory-footprint.
Considering the next struct...
struct Cell
{
int Value;
}
and the next matrix definitions
var MatrixOfInts = new int[1000,1000];
var MatrixOfCells = new Cell[1000,1000];
which one of the matrices will use less memory space? or are they equivalent (byte per byte)?
Both are the same size because structs are treated like any of the other value type and allocated in place in the heap.
long startMemorySize2 = GC.GetTotalMemory(true);
var MatrixOfCells = new Cell[1000, 1000];
long matrixOfCellSize = GC.GetTotalMemory(true);
long startMemorySize = GC.GetTotalMemory(true);
var MatrixOfInts = new int[1000, 1000];
long matrixOfIntSize = GC.GetTotalMemory(true);
Console.WriteLine("Int Matrix Size:{0}. Cell Matrix Size:{1}",
matrixOfIntSize - startMemorySize, matrixOfCellSize - startMemorySize2);
Here's some fun reading from Jeffery Richter on how arrays are allocated http://msdn.microsoft.com/en-us/magazine/cc301755.aspx
By using the sizeof operator in C# and executing the following code (under Mono 3.10.0) I get the following results:
struct Cell
{
int Value;
}
public static void Main(string[] args)
{
unsafe
{
// result is: 4
var intSize = sizeof(int);
// result is: 4
var structSize = sizeof(Cell);
}
}
So it looks like that an integer and a struct storing an integer consume the same amount of memory, I would therefore assume that arrays would also require an equal amount of memory.
In an array with value-type elements, all of the elements are required to be of the exact same type. The object holding the array needs to store information about the type of elements contained therein, but that information is only stored once per array, rather than once per element.
Note that because arrays receive special handling in the .NET Framework (compared to other collection types) arrays of a structure type will allow elements of the structures contained therein to be acted upon "in-place". As a consequence, if one can limit oneself to storing a structure within an array (rather than some other collection type) and can minimize unnecessary copying of struct instances, it is possible to operate efficiently with structures of almost any size. If one needs to hold a collection of things, each of which will have associated with it four Int64 values and four Int32 values (a total of 48 bytes), using an array of eight-element exposed-field structures may be more efficient and semantically cleaner than representing each thing using four elements from an Int64[] and four elements from an Int32[], or using an array of references to unshared mutable class objects.
I'm running up against the 2gb object limit in c# (this applies even in 64 bit for some annoying reason) with a large collection of structs (est. size of 4.2 gig in total).
Now obviously using List is going to give me a list of size 4.2gb give or take, but would using a list made up of smaller lists, which in turn contain a portion of the structs, allow me to jump this limit?
My reasoning here is that it's only a hard-coded limit in the CLR that stops me instantiating a 9gig object on my 64bit platform, and it's entirely unrelated to system resources. Also Lists and Arrays are reference types, and so a List containing lists would only actually contain the references to each list. No one object therefore exceeds the size limit.
Is there any reason why this wouldn't work? I'd try this myself right now but I don't have a memory profiler on hand to verify.
Now obviously using List is going to give me a list of size 4.2gb give or take, but would using a list made up of smaller lists, which in turn contain a portion of the structs, allow me to jump this limit?
Yes - though, if you're trying to work around this limit, I'd consider using arrays yourself instead of letting the List<T> class manage the array.
The 2gb single object limit in the CLR is exactly that, a single object instance. When you make an array of a struct (which List<T> uses internally), the entire array is "one object instance" in the CLR. However, by using a List<List<T>> or a jagged array, each internal list/array is a separate object, which allows you to effectively have any size object you wish.
The CLR team actually blogged about this, and provided a sample BigArray<T> implementation that acts like a single List<T>, but does the "block" management internally for you. This is another option for getting >2gb lists.
Note that .NET 4.5 will have the option to provide larger than 2gb objects on x64, but it will be something you have to explicitly opt in to having.
The List holds references which are 4 or 8 bytes, depending on if you're running in 32-bit or 64-bit mode, therefore if you reference a 2GB object that would not increase the actual List size to 2 GB but it would only increase it by the number of bytes it is necessary to reference that object.
This will allow you to reference millions of objects and each object could be 2GB. If you have 4 objects in the List and each is 2 GB, then you would have 8 GB worth of objects referenced by the List, but the List object would have only used up an extra 4*8=32 bytes.
The number of references you can hold on a 32-bit machine before the List hits the 2GB limit is 536.87 million, on a 64-bit machine it's 268.43 million.
536 million references * 2 GB = A LOT OF DATA!
P.S. Reed pointed out, the above is true for reference types but not for value types. So if you're holding value types, then your workaround is valid. Please see the comment below for more info.
There's an interesting post around this subject here:
http://blogs.msdn.com/b/joshwil/archive/2005/08/10/450202.aspx
Which talks about writing your own 'BigArray' object.
In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.
class HugeList<T>
{
private const int PAGE_SIZE = 102400;
private const int ALLOC_STEP = 1024;
private T[][] _rowIndexes;
private int _currentPage = -1;
private int _nextItemIndex = PAGE_SIZE;
private int _pageCount = 0;
private int _itemCount = 0;
#region Internals
private void AddPage()
{
if (++_currentPage == _pageCount)
ExtendPages();
_rowIndexes[_currentPage] = new T[PAGE_SIZE];
_nextItemIndex = 0;
}
private void ExtendPages()
{
if (_rowIndexes == null)
{
_rowIndexes = new T[ALLOC_STEP][];
}
else
{
T[][] rowIndexes = new T[_rowIndexes.Length + ALLOC_STEP][];
Array.Copy(_rowIndexes, rowIndexes, _rowIndexes.Length);
_rowIndexes = rowIndexes;
}
_pageCount = _rowIndexes.Length;
}
#endregion Internals
#region Public
public int Count
{
get { return _itemCount; }
}
public void Add(T item)
{
if (_nextItemIndex == PAGE_SIZE)
AddPage();
_itemCount++;
_rowIndexes[_currentPage][_nextItemIndex++] = item;
}
public T this[int index]
{
get { return _rowIndexes[index / PAGE_SIZE][index % PAGE_SIZE]; }
set { _rowIndexes[index / PAGE_SIZE][index % PAGE_SIZE] = value; }
}
#endregion Public
}
How would I do this? I am trying to count when both arrays have the same value of TRUE/1 at the same index. As you can see, my code has multiple bitarrays and is looping through each one and comparing them with a comparisonArray with another loop. It doesn't seem to be very efficient and I need it to be.
foreach (bitArrayTuple in bitarryList) {
for (int i = 0; i < arrayLength; i++)
if (bArrayTuple.Item2[i] && comparisonArray[i])
bitArrayTuple.Item1++;
}
where Item1 is the count and Item2 is a bitarray.
bool equals = ba1.Xor(ba2).OfType<bool>().All(e => !e);
There's not much of a way to do this, because BitArray doesn't let its internal array leak, and because .NET doesn't have the C++ equivalent of const to prevent external modification. You might want to just create your own class from scratch, or, if you feel like hacking, use reflection to get the private field inside the BitArray.
Would this work?
http://msdn.microsoft.com/en-us/library/system.collections.bitarray.and%28v=VS.90%29.aspx
It's like the single & operator in C.
Depending in the number of elements, BitVector32 may be usable. That would simply be an Int32 comparison.
If not possible, you will need to get hold of the int[] located on the m_array private field of each BitArray. Then compare the int[] of each (which is a comparison of 32 bits at a time).
I realize this is an old thread, but I've recently run into a need for this myself and have performed some benchmarks in order to determine which method is fastest:
Firstly, at the moment we can't use BitArray.Clone() because of a known bug in Microsoft's code that will not allow cloning of arrays that are larger than int.MaxValue / 32. We will need to avoid this method until they have fixed the bug.
With that in mind I have run benchmarks against 5 different implementations, all using the largest BitArray I could construct (size of int.MaxValue) with alternating bits. I have run the tests with equal and not equal arrays and resulting speed rankings are the same. Here are the implementations:
Implementation 1: Convert each BitArray into a byte[] and compare the arrays using the CompareTo() method.
Implementation 2: Convert each BitArray into a byte[] and compare the each set of bytes using an XOR operator (^).
Implementation 3: Convert each BitArray into a int[] and compare the arrays using the CompareTo() method.
Implementation 4: Convert each BitArray into a int[] and compare the each set of ints using an XOR operator (^).
Implementation 5: Use a for loop to iterate over each set of bool values and compare
The winner surprised me: Implementation 3.
I would have expected Implementation 4 to be the fastest, but as it turns out 3 is significantly faster.
In terms of speed, here are the implementations ranked fastest first:
Implementation 3
Implementation 4
Implementation 2
Implementation 1
Implementation 5
Here's my code for implementation 3:
public static bool Equals(this BitArray first, BitArray second)
{
// Short-circuit if the arrays are not equal in size
if (first.length != second.length)
return false;
// Convert the arrays to int[]s
int[] firstInts = new int[(int)Math.Ceiling((decimal)first.Count / 32)];
first.CopyTo(firstInts, 0);
int[] secondInts = new int[(int)Math.Ceiling((decimal)second.Count / 32)];
second.CopyTo(secondInts , 0);
// Look for differences
bool areDifferent = false;
for (int i = 0; i < firstInts.Length && !areDifferent; i++)
areDifferent = firstInts[i] != secondInts[i];
return !areDifferent;
}
For any arbitrary instance (collections of different objects, compositions, single objects, etc)
How can I determine its size in bytes?
(I've currently got a collection of various objects and i'm trying to determine the aggregated size of it)
EDIT: Has someone written an extension method for Object that could do this? That'd be pretty neat imo.
First of all, a warning: what follows is strictly in the realm of ugly, undocumented hacks. Do not rely on this working - even if it works for you now, it may stop working tomorrow, with any minor or major .NET update.
You can use the information in this article on CLR internals MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects - last I checked, it was still applicable. Here's how this is done (it retrieves the internal "Basic Instance Size" field via TypeHandle of the type).
object obj = new List<int>(); // whatever you want to get the size of
RuntimeTypeHandle th = obj.GetType().TypeHandle;
int size = *(*(int**)&th + 1);
Console.WriteLine(size);
This works on 3.5 SP1 32-bit. I'm not sure if field sizes are the same on 64-bit - you might have to adjust the types and/or offsets if they are not.
This will work for all "normal" types, for which all instances have the same, well-defined types. Those for which this isn't true are arrays and strings for sure, and I believe also StringBuilder. For them you'll have add the size of all contained elements to their base instance size.
You may be able to approximate the size by pretending to serializing it with a binary serializer (but routing the output to oblivion) if you're working with serializable objects.
class Program
{
static void Main(string[] args)
{
A parent;
parent = new A(1, "Mike");
parent.AddChild("Greg");
parent.AddChild("Peter");
parent.AddChild("Bobby");
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
SerializationSizer ss = new SerializationSizer();
bf.Serialize(ss, parent);
Console.WriteLine("Size of serialized object is {0}", ss.Length);
}
}
[Serializable()]
class A
{
int id;
string name;
List<B> children;
public A(int id, string name)
{
this.id = id;
this.name = name;
children = new List<B>();
}
public B AddChild(string name)
{
B newItem = new B(this, name);
children.Add(newItem);
return newItem;
}
}
[Serializable()]
class B
{
A parent;
string name;
public B(A parent, string name)
{
this.parent = parent;
this.name = name;
}
}
class SerializationSizer : System.IO.Stream
{
private int totalSize;
public override void Write(byte[] buffer, int offset, int count)
{
this.totalSize += count;
}
public override bool CanRead
{
get { return false; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return true; }
}
public override void Flush()
{
// Nothing to do
}
public override long Length
{
get { return totalSize; }
}
public override long Position
{
get
{
throw new NotImplementedException();
}
set
{
throw new NotImplementedException();
}
}
public override int Read(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override long Seek(long offset, System.IO.SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
}
Not directly answers the question, but for those who are interested to investigate object sizes while debugging:
Start debugging in VS, make sure the Diagnostics Tools window is shown (Debug > Windows > Show Diagnostic Tools)
Set a breakpoint (optional)
Click Take Snapshot in the Memory Usage while paused
Explore the snapshot (optionally sort the object list alphabetically to find the type you're interested in)
For unmanaged types aka value types, structs:
Marshal.SizeOf(object);
For managed objects the closer i got is an approximation.
long start_mem = GC.GetTotalMemory(true);
aclass[] array = new aclass[1000000];
for (int n = 0; n < 1000000; n++)
array[n] = new aclass();
double used_mem_median = (GC.GetTotalMemory(false) - start_mem)/1000000D;
Do not use serialization.A binary formatter adds headers, so you can change your class and load an old serialized file into the modified class.
Also it won't tell you the real size in memory nor will take into account memory alignment.
[Edit]
By using BiteConverter.GetBytes(prop-value) recursivelly on every property of your class you would get the contents in bytes, that doesn't count the weight of the class or references but is much closer to reality.
I would recommend to use a byte array for data and an unmanaged proxy class to access values using pointer casting if size matters, note that would be non-aligned memory so on old computers is gonna be slow but HUGE datasets on MODERN RAM is gonna be considerably faster, as minimizing the size to read from RAM is gonna be a bigger impact than unaligned.
safe solution with some optimizations
CyberSaving/MemoryUsage code.
some case:
/* test nullable type */
TestSize<int?>.SizeOf(null) //-> 4 B
/* test StringBuilder */
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++) sb.Append("わたしわたしわたしわ");
TestSize<StringBuilder>.SizeOf(sb ) //-> 3132 B
/* test Simple array */
TestSize<int[]>.SizeOf(new int[100]); //-> 400 B
/* test Empty List<int>*/
var list = new List<int>();
TestSize<List<int>>.SizeOf(list); //-> 205 B
/* test List<int> with 100 items*/
for (int i = 0; i < 100; i++) list.Add(i);
TestSize<List<int>>.SizeOf(list); //-> 717 B
It works also with classes:
class twostring
{
public string a { get; set; }
public string b { get; set; }
}
TestSize<twostring>.SizeOf(new twostring() { a="0123456789", b="0123456789" } //-> 28 B
This doesn't apply to the current .NET implementation, but one thing to keep in mind with garbage collected/managed runtimes is the allocated size of an object can change throughout the lifetime of the program. For example, some generational garbage collectors (such as the Generational/Ulterior Reference Counting Hybrid collector) only need to store certain information after an object is moved from the nursery to the mature space.
This makes it impossible to create a reliable, generic API to expose the object size.
This is impossible to do at runtime.
There are various memory profilers that display object size, though.
EDIT: You could write a second program that profiles the first one using the CLR Profiling API and communicates with it through remoting or something.
For anyone looking for a solution that doesn't require [Serializable] classes and where the result is an approximation instead of exact science.
The best method I could find is json serialization into a memorystream using UTF32 encoding.
private static long? GetSizeOfObjectInBytes(object item)
{
if (item == null) return 0;
try
{
// hackish solution to get an approximation of the size
var jsonSerializerSettings = new JsonSerializerSettings
{
DateFormatHandling = DateFormatHandling.IsoDateFormat,
DateTimeZoneHandling = DateTimeZoneHandling.Utc,
MaxDepth = 10,
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
};
var formatter = new JsonMediaTypeFormatter { SerializerSettings = jsonSerializerSettings };
using (var stream = new MemoryStream()) {
formatter.WriteToStream(item.GetType(), item, stream, Encoding.UTF32);
return stream.Length / 4; // 32 bits per character = 4 bytes per character
}
}
catch (Exception)
{
return null;
}
}
No, this won't give you the exact size that would be used in memory. As previously mentioned, that is not possible. But it'll give you a rough estimation.
Note that this is also pretty slow.
Use Son Of Strike which has a command ObjSize.
Note that actual memory consumed is always larger than ObjSize reports due to a synkblk which resides directly before the object data.
Read more about both here MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects.
AFAIK, you cannot, without actually deep-counting the size of each member in bytes. But again, does the size of a member (like elements inside a collection) count towards the size of the object, or a pointer to that member count towards the size of the object? Depends on how you define it.
I have run into this situation before where I wanted to limit the objects in my cache based on the memory they consumed.
Well, if there is some trick to do that, I'd be delighted to know about it!
For value types, you can use Marshal.SizeOf. Of course, it returns the number of bytes required to marshal the structure in unmanaged memory, which is not necessarily what the CLR uses.
I have created benchmark test for different collections in .NET: https://github.com/scholtz/TestDotNetCollectionsMemoryAllocation
Results are as follows for .NET Core 2.2 with 1,000,000 of objects with 3 properties allocated:
Testing with string: 1234567
Hashtable<TestObject>: 184 672 704 B
Hashtable<TestObjectRef>: 136 668 560 B
Dictionary<int, TestObject>: 171 448 160 B
Dictionary<int, TestObjectRef>: 123 445 472 B
ConcurrentDictionary<int, TestObject>: 200 020 440 B
ConcurrentDictionary<int, TestObjectRef>: 152 026 208 B
HashSet<TestObject>: 149 893 216 B
HashSet<TestObjectRef>: 101 894 384 B
ConcurrentBag<TestObject>: 112 783 256 B
ConcurrentBag<TestObjectRef>: 64 777 632 B
Queue<TestObject>: 112 777 736 B
Queue<TestObjectRef>: 64 780 680 B
ConcurrentQueue<TestObject>: 112 784 136 B
ConcurrentQueue<TestObjectRef>: 64 783 536 B
ConcurrentStack<TestObject>: 128 005 072 B
ConcurrentStack<TestObjectRef>: 80 004 632 B
For memory test I found the best to be used
GC.GetAllocatedBytesForCurrentThread()
For arrays of structs/values, I have different results with:
first = Marshal.UnsafeAddrOfPinnedArrayElement(array, 0).ToInt64();
second = Marshal.UnsafeAddrOfPinnedArrayElement(array, 1).ToInt64();
arrayElementSize = second - first;
(oversimplified example)
Whatever the approach, you really need to understand how .Net works to correctly interpret the results.
For instance, the returned element size is the "aligned" element size, with some padding.
The overhead and thus the size is different depending on the usage of a type: "boxed" on the GC heap, on the stack, as a field, as an array element.
(I wanted to know what would be the memory impact of using "dummy" empty structs (without any field) to mimic "optional" arguments of generics; making tests with different layouts involving empty structs, I can see that an empty struct uses (at least) 1 byte per element; I vaguely remember it is because .Net needs a different address for each field, which wouldn't work if a field really was empty/0-sized).
You can use reflection to gather all the public member or property information (given the object's type). There is no way to determine the size without walking through each individual piece of data on the object, though.
From Pavel and jnm2:
private int DumpApproximateObjectSize(object toWeight)
{
return Marshal.ReadInt32(toWeight.GetType().TypeHandle.Value, 4);
}
On a side note be careful because it only work with contiguous memory objects
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
Arrays and strings just adds that size its length * element size.
For cumulative size of aggreagate objects you need to implement more sophisticated solution which involves visiting every field and inspect its contents.
For anyone looking for a rough approximation comparing the sizes of disparate object graphs/collections, just serialize to JSON - e.g.:
Console.WriteLine($"Size1:\t{(JsonConvert.SerializeObject(someBusyObject)).Length}")); Console.WriteLine($"Size2:\t{(JsonConvert.SerializeObject(someOtherObject)).Length}"));
In my case I have a bunch of IEnumerable's being pulled during a login I'm benchmarking, and I just wanted to roughly size them to see their relative weight.
They're expensive operations and won't give you direct heap allocation size or anything like that, but it was good enough for my use case and was readily available.