Related
Let's say we have one structure :
[StructLayout(LayoutKind.Explicit, Size=8)] // using System.Runtime.InteropServices;
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
}
What I want to have : Both direct access to type string and int values, for the field Ident in this structure, without breaking the 8 bytes size of the structure, nor having to compute a string value each time from the int value.
The field Ident in that structure as int is interesting because I can fast compare with other idents if they match, other idents may come from datas that are unrelated to this structure, but are in the same int format.
Question : Is there a way to define a field that is not part of the struture layout ? Like :
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
[NoOffset()] // <- is there something I can do the like of this
string _identStr;
public string IdentStr {
get { // EDIT ! missed the getter on this property
if (string.IsNullOrEmpty(_identStr)) _identStr =
System.Text.Encoding.ASCII.GetString(Ident.GetBytes());
// do the above only once. May use an extra private bool field to go faster.
return _identStr;
}
}
}
PS : I use pointers ('*' and '&', unsafe) because I need to deal with endianness (Local system, binary files/file format, network) and fast type conversions, fast arrays filling. I also use many flavours of Marshal methods (fixing structures on byte arrays), and a little of PInvoke and COM interop. Too bad some assemblies I'm dealing with doesn't have their dotNet counterpart yet.
TL;DR; For details only
The question is all it is about, I just don't know the answer. The following should answer most questions like "other approaches", or "why not do this instead", but could be ignored as the answer would be straightforward. Anyway, I preemptively put everything so it's clear from the start what am I trying to do. :)
Options/Workaround I'm currently using (or thinking of using) :
Create a getter (not a field) that computes the string value each time :
public string IdentStr {
get { return System.Text.Encoding.ASCII.GetString(Ident.GetBytes()); }
// where GetBytes() is an extension method that converts an int to byte[]
}
This approach, while doing the job, performs poorly : The GUI displays aircraft from a database of default flights, and injects other flights from the network with a refresh rate of one second (I should increase that to 5 seconds). I have around 1200 flights within a area, relating to 2400 airports (departure and arrival), meaning I have 2400 calls to the above code each second to display the ident in a DataGrid.
Create another struct (or class), which only purpose is to manage
data on GUI side, when not reading/writing to a stream or file. That means, read
the data with the explicit layout struct. Create another struct with
the string version of the field. Work with GUI. That will perform
better on an overall point of view, but, in the process of defining
structures for the game binaries, I'm already at 143 structures of
the kind (just with older versions of the game datas; there are a bunch I didn't write yet, and I plan to add structures for the newest datas types). ATM, more than half of them require one or more extra
fields to be of meaningful use. It's okay if I were the only one to use the assembly, but
other users will probably get lost with AirportHeader,
AirportHeaderEx, AirportEntry, AirportEntryEx,
AirportCoords, AirportCoordsEx.... I would avoid doing that.
Optimize option 1 to make computations perform faster (thanks to SO,
there are a bunch of ideas to look for - currently working on the idea). For the Ident field, I
guess I could use pointers (and I will). Already doing it for fields I must display in little endian and read/write in big
endian. There are other values, like 4x4 grid informations that are
packed in a single Int64 (ulong), that needs bit shifting to
expose the actual values. Same for GUIDs or objects pitch/bank/yaw.
Try to take advantage of overlapping fields (on study). That would work for GUIDs. Perhaps it may work for the Ident example, if MarshalAs can constrain the
value to an ASCII string. Then I just need to specify the same
FieldOffset, '0' in this case. But I'm unsure setting the field
value (entry.FieldStr = "FMEP";) actually uses the Marshal constrain on the managed code side. My undestanding is it will store the string in Unicode on managed side (?).
Furthermore, that wouldn't work for packed bits (bytes that contains
several values, or consecutive bytes hosting values that have to be
bit shifted). I believe it is impossible to specify value position, length and format
at bit level.
Why bother ? context :
I'm defining a bunch of structures to parse binary datas from array of bytes (IO.File.ReadAllBytes) or streams, and write them back, datas related to a game. Application logic should use the structures to quickly access and manipulate the datas on demand. Assembly expected capabilities is read, validate, edit, create and write, outside the scope of the game (addon building, control) and inside the scope of the game (API, live modding or monitoring). Other purpose is to understand the content of binaries (hex) and make use of that understanding to build what's missing in the game.
The purpose of the assembly is to provide a ready to use basis components for a c# addon contributor (I don't plan to make the code portable). Creating applications for the game or processing addon from source to compilation into game binaries. It's nice to have a class that loads the entire content of a file in memory, but some context require you to not do that, and only retrieve from the file what is necessary, hence the choice of the struct pattern.
I need to figure out the trust and legal issues (copyrighted data) but that's outside the scope of the main concern. If that matter, Microsoft did provide over the years public freely accessible SDKs exposing binaries structures on previous versions of the game, for the purpose of what I'm doing (I'm not the first and probably not the last to do so). Though, I wouldn't dare to expose undocumented binaries (for the latest game datas for instance), nor facilitate a copyright breach on copyrighted materials/binaries.
I'm just asking confirmation if there is a way or not to have private fields not being part of the structure layout. Naive belief ATM is "that's impossible, but there are workarounds". It's just that my c# experience is pretty sparce, so maybe I'm wrong, why I ask. Thanks !
As suggested, there are several ways to get the job done. Here are the getters/setters I came up with within the structure. I'll measure how each code performs on various scenarios later. The dict approach is very seducing as on many scenarios, I would need a directly accessible global database of (59000) airports with runways and parking spots (not just the Ident), but a fast check between struct fields is also interesting.
public string IdentStr_Marshal {
get {
var output = "";
GCHandle pinnedHandle; // CS0165 for me (-> c# v5)
try { // Fast if no exception, (very) slow if exception thrown
pinnedHandle = GCHandle.Alloc(this, GCHandleType.Pinned);
IntPtr structPtr = pinnedHandle.AddrOfPinnedObject();
output = Marshal.PtrToStringAnsi(structPtr, 4);
// Cannot use UTF8 because the assembly should work in Framework v4.5
} finally { if (pinnedHandle.IsAllocated) pinnedHandle.Free(); }
return output;
}
set {
value.PadRight(4); // Must fill the blanks - initial while loop replaced (Charlieface's)
IntPtr intValuePtr = IntPtr.Zero;
// Cannot use UTF8 because some users are on Win7 with FlightSim 2004
try { // Put a try as a matter of habit, but not convinced it's gonna throw.
intValuePtr = Marshal.StringToHGlobalAnsi(value);
Ident = Marshal.ReadInt32(intValuePtr, 0).BinaryConvertToUInt32(); // Extension method to convert type.
} finally { Marshal.FreeHGlobal(intValuePtr); // freeing the right pointer }
}
}
public unsafe string IdentStr_Pointer {
get {
string output = "";
fixed (UInt32* ident = &Ident) { // Fixing the field
sbyte* bytes = (sbyte*)ident;
output = new string(bytes, 0, 4, System.Text.Encoding.ASCII); // Encoding added (#Charlieface)
}
return output;
}
set {
// value must not exceed a length of 4 and must be in Ansi [A-Z,0-9,whitespace 0x20].
// value validation at this point occurs outside the structure.
fixed (UInt32* ident = &Ident) { // Fixing the field
byte* bytes = (byte*)ident;
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value);
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
}
}
}
static Dictionary<UInt32, string> ps_dict = new Dictionary<UInt32, string>();
public string IdentStr_StaticDict {
get {
string output; // logic update with TryGetValue (#Charlieface)
if (ps_dict.TryGetValue(Ident, out output)) return output;
output = System.Text.Encoding.ASCII.GetString(Ident.ToBytes(EndiannessType.LittleEndian));
ps_dict.Add(Ident, output);
return output;
}
set { // input can be "FMEE", "DME" or "DK". length of 2 characters is the minimum.
var bytes = new byte[4]; // Need to convert value to a 4 byte array
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value); // should be 4 bytes or less
// Put the valid ASCII codes in the array.
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
Ident = BitConverter.ToUInt32(bytes, 0); // Set structure int value
if (!ps_dict.ContainsKey(Ident)) // Add if missing
ps_dict.Add(Ident, System.Text.Encoding.ASCII.GetString(bytes));
}
}
As mentioned by others, it is not possible to exclude a field from a struct for marshalling.
You also cannot use a pointer as a string in most places.
If the number of different possible strings is relatively small (and it probably will be, given it's only 4 characters), then you could use a static Dictionary<int, string> as a kind of string-interning mechanism.
Then you write a property to add/retrieve the real string.
Note that dictionary access is O(1), and hashing an int just returns itself, so it will be very, very fast, but will take up some memory.
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
static Dictionary<int, string> _identStrings = new Dictionary<int, string>();
public string IdentStr =>
_identStrings.TryGetValue(Ident, out var ret) ? ret :
(_identStrings[Ident] = Encoding.ASCII.GetString(Ident.GetBytes());
}
This is not possible because a structure must contain all of its values in a specific order. Usually this order is controlled by the CLR itself. If you want to change the order of the data order, you can use the StructLayout. However, you cannot exclude a field or that data would simply not exist in memory.
Instead of a string (which is a reference type) you can use a pointer to point directly to that string and use that in your structure in combination with the StructLayout. To get this string value, you can use a get-only property that reads directly from unmanaged memory.
I have an array inside a class:
class MatchNode
{
public short X;
public short Y;
public NodeVal[] ControlPoints;
private MatchNode()
{
ControlPoints = new NodeVal[4];
}
}
The NodeVal is:
struct NodeVal
{
public readonly short X;
public readonly short Y;
public NodeVal(short x, short y)
{
X = x;
Y = y;
}
}
Now what if we wanted to take performance to next level and avoid having a separate object for the array. Actually it doesn't have to have an array. The only restriction is that the client code should be able to access NodeVal by index like:
matchNode.ControlPoints[i]
OR
matchNode[i]
and of course the solution should be faster or as fast as array access since it's supposed to be an optimization.
EDIT: As Ryan suggested it seems I should explain more about the motivation:
The MatchNode class is used heavily in the project. Millions of them are used in the project and each are accessed hundreds of times so having them as compact and concise as possible can lead to less cache misses and overall performance.
Let's consider a 64bit machine. In the current implementation the class the array takes 8 bytes for the ControlPoints reference and the size of the array object would be at least 16 bytes of object overhead (for method table and sync block) and 16 byte for the actual byte. So we have at least 24 overhead bytes beside 16 bytes of actual data.
These objects are used in bottlenecks of the project so it matters if we could optimize them more.
Of course we could just have a super big array of NodeVal and just save an index in MatchNode that would locate the actual data but again it will change every client codes that uses the MatchNodes, let alone be a dirty non-object oriented solution.
It is okay to have a messy MatchNode that uses every kind of nasty trick like unsafe or static cache code. It is not okay to leak these optimizations out to the client code.
You´re looking for indexers:
class MatchNode
{
public short X;
public short Y;
private NodeVal[] myField;
public NodeVal this[int i] { get { return myField[i]; } set { myField[i] = value; } }
public MatchNode(int size) { this.myField = new NodeVal[size]; }
}
Now you can simply use this:
var m = new MatchNode(10);
m[0] = new NodeVal();
However I doubt this will affect performance (at least in means of speed) in any way and you should consider the actual problems using a profiling tool (dotTrace for instance). Furthermore this approach will also create a private backing-field which will produce the same memory-footprint.
Background
One of the most used data-structures in our application is a custom Point struct. Recently we have been running into memory issues, mostly caused by an excessive number of instances of this struct.
Many of these instances contain the same data. Sharing a single instance would significantly help to reduce memory usage. However, since we are using structs, instances cannot be shared. It is also not possible to change it to a class, because the struct semantics are important.
Our workaround for this is to have a struct containing a single reference to a backing class, which contains the actual data. These flyweight dataclasses are stored in and retrieved from a factory to ensure no duplicates exist.
A narrowed down version of the code looks something like this:
public struct PointD
{
//Factory
private static class PointDatabase
{
private static readonly Dictionary<PointData, PointData> _data = new Dictionary<PointData, PointData>();
public static PointData Get(double x, double y)
{
var key = new PointData(x, y);
if (!_data.ContainsKey(key))
_data.Add(key, key);
return _data[key];
}
}
//Flyweight data
private class PointData
{
private double pX;
private double pY;
public PointData(double x, double y)
{
pX = x;
pY = y;
}
public double X
{
get { return pX; }
}
public double Y
{
get { return pY; }
}
public override bool Equals(object obj)
{
var other = obj as PointData;
if (other == null)
return false;
return other.X == this.X && other.Y == this.Y;
}
public override int GetHashCode()
{
return X.GetHashCode() * Y.GetHashCode();
}
}
//Public struct
public Point(double x, double y)
{
_data = Point3DDatabase.Get(x, y);
}
public double X
{
get { return _data == null ? 0 : _data.X; }
set { _data = PointDatabase.Get(value, Y); }
}
public double Y
{
get { return _data == null ? 0 : _data.Y; }
set { _data = PointDatabase.Get(X, value); }
}
}
This implementation ensures that the struct semantics are maintained, while ensuring only one instance of the same data is kept around.
(Please don't mention memory leaks or such, this is simplified example code)
The Problem
Although above approach works to lower our memory usage, the performance is horrendous. A project in our application can easily contain a million different points or more. As a result, the lookup of a PointData instance is very costly. And this lookup has to be done whenever a Point is manipulated, which, as you can probably guess, is what our application is all about. As a result, this approach is not suitable for us.
As an alternative, we could make two versions of the Point class: one with backing flyweight as above, and one containing its own data (with possible duplicates). All (short-lived) calculations could be done in the second class, while when storing the Point for longer durations they could be converted to the first, memory-efficient class. However, this means that all the users of the Point class have to be inspected and adjusted to this scheme, something which is not feasible for us.
What we are looking for is an approach which meets below criteria:
When there are multiple Points with the same data, the memory usage should be lower than having a different struct instance for each of these.
Performance should not be much worse than working directly on primitive data in the struct.
Struct semantics should be maintained.
The 'Point' interface should remain the same (i.e. classes that use 'Point' should not have to be changed).
Is there any way we can improve our approach towards these criteria? Or can anyone suggest a different approach we can attempt?
Rather than re-work an entire data structure and programming model, my go-to solution for performance and memory issues is to cache, pre-fetch and most importantly cull you data when it is not needed.
Think of it this way. On a graph, you cannot display few millions of points at once because you run out of pixels (you should occlusion-cull these points). Similarly, in a table, there isn't enough vertical space on screen (you need data set truncation). Consider streaming data from your source file as you need it. If your source data structure is not appropriate for dynamic retrieval, consider an intermediate, temporary file format. This is one of the ways .Net's JITer works so quickly!
Back in the old days of C, one could use array subscripting to address storage in very useful ways. For example, one could declare an array as such.
This array represents an EEPROM image with 8 bit words.
BYTE eepromImage[1024] = { ... };
And later refer to that array as if it were really multi-dimensional storage
BYTE mpuImage[2][512] = eepromImage;
I'm sure I have the syntax wrong, but I hope you get the idea.
Anyway, this projected a two dimension image of what is really single dimensional storage.
The two dimensional projection represents the EEPROM image when loaded into the memory of an MPU with 16 bit words.
In C one could reference the storage multi-dimensionaly and change values and the changed values would show up in the real (single dimension) storage almost as if by magic.
Is it possible to do this same thing using C#?
Our current solution uses multiple arrays and event handlers to keep things synchronized. This kind of works but it is additional complexity that we would like to avoid if there is a better way.
You could wrap the array in a class and write 1-dimensional and 2-dimensional Indexer properties.
Without validations etc it looks like this for a 10x10 array:
class ArrayData
{
byte[] _data = new byte[100];
public byte this[int x]
{
get { return _data[x]; }
set { _data[x] = value; }
}
public byte this[int x, int y]
{
get { return _data[x*10+ y]; }
set { _data[x*10 + y] = value; }
}
}
Yes, but no.
You can allocate a multidimensional array off the bat if you like. You could also create a custom class that allows you to access its underlying data in either a single-dimension or multi-dimensional way, keeping the underlying data in sync.
You cannot, however, access a single-dimensional array with a multi-dimensional index directly.
For any arbitrary instance (collections of different objects, compositions, single objects, etc)
How can I determine its size in bytes?
(I've currently got a collection of various objects and i'm trying to determine the aggregated size of it)
EDIT: Has someone written an extension method for Object that could do this? That'd be pretty neat imo.
First of all, a warning: what follows is strictly in the realm of ugly, undocumented hacks. Do not rely on this working - even if it works for you now, it may stop working tomorrow, with any minor or major .NET update.
You can use the information in this article on CLR internals MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects - last I checked, it was still applicable. Here's how this is done (it retrieves the internal "Basic Instance Size" field via TypeHandle of the type).
object obj = new List<int>(); // whatever you want to get the size of
RuntimeTypeHandle th = obj.GetType().TypeHandle;
int size = *(*(int**)&th + 1);
Console.WriteLine(size);
This works on 3.5 SP1 32-bit. I'm not sure if field sizes are the same on 64-bit - you might have to adjust the types and/or offsets if they are not.
This will work for all "normal" types, for which all instances have the same, well-defined types. Those for which this isn't true are arrays and strings for sure, and I believe also StringBuilder. For them you'll have add the size of all contained elements to their base instance size.
You may be able to approximate the size by pretending to serializing it with a binary serializer (but routing the output to oblivion) if you're working with serializable objects.
class Program
{
static void Main(string[] args)
{
A parent;
parent = new A(1, "Mike");
parent.AddChild("Greg");
parent.AddChild("Peter");
parent.AddChild("Bobby");
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
SerializationSizer ss = new SerializationSizer();
bf.Serialize(ss, parent);
Console.WriteLine("Size of serialized object is {0}", ss.Length);
}
}
[Serializable()]
class A
{
int id;
string name;
List<B> children;
public A(int id, string name)
{
this.id = id;
this.name = name;
children = new List<B>();
}
public B AddChild(string name)
{
B newItem = new B(this, name);
children.Add(newItem);
return newItem;
}
}
[Serializable()]
class B
{
A parent;
string name;
public B(A parent, string name)
{
this.parent = parent;
this.name = name;
}
}
class SerializationSizer : System.IO.Stream
{
private int totalSize;
public override void Write(byte[] buffer, int offset, int count)
{
this.totalSize += count;
}
public override bool CanRead
{
get { return false; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return true; }
}
public override void Flush()
{
// Nothing to do
}
public override long Length
{
get { return totalSize; }
}
public override long Position
{
get
{
throw new NotImplementedException();
}
set
{
throw new NotImplementedException();
}
}
public override int Read(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override long Seek(long offset, System.IO.SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
}
Not directly answers the question, but for those who are interested to investigate object sizes while debugging:
Start debugging in VS, make sure the Diagnostics Tools window is shown (Debug > Windows > Show Diagnostic Tools)
Set a breakpoint (optional)
Click Take Snapshot in the Memory Usage while paused
Explore the snapshot (optionally sort the object list alphabetically to find the type you're interested in)
For unmanaged types aka value types, structs:
Marshal.SizeOf(object);
For managed objects the closer i got is an approximation.
long start_mem = GC.GetTotalMemory(true);
aclass[] array = new aclass[1000000];
for (int n = 0; n < 1000000; n++)
array[n] = new aclass();
double used_mem_median = (GC.GetTotalMemory(false) - start_mem)/1000000D;
Do not use serialization.A binary formatter adds headers, so you can change your class and load an old serialized file into the modified class.
Also it won't tell you the real size in memory nor will take into account memory alignment.
[Edit]
By using BiteConverter.GetBytes(prop-value) recursivelly on every property of your class you would get the contents in bytes, that doesn't count the weight of the class or references but is much closer to reality.
I would recommend to use a byte array for data and an unmanaged proxy class to access values using pointer casting if size matters, note that would be non-aligned memory so on old computers is gonna be slow but HUGE datasets on MODERN RAM is gonna be considerably faster, as minimizing the size to read from RAM is gonna be a bigger impact than unaligned.
safe solution with some optimizations
CyberSaving/MemoryUsage code.
some case:
/* test nullable type */
TestSize<int?>.SizeOf(null) //-> 4 B
/* test StringBuilder */
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++) sb.Append("わたしわたしわたしわ");
TestSize<StringBuilder>.SizeOf(sb ) //-> 3132 B
/* test Simple array */
TestSize<int[]>.SizeOf(new int[100]); //-> 400 B
/* test Empty List<int>*/
var list = new List<int>();
TestSize<List<int>>.SizeOf(list); //-> 205 B
/* test List<int> with 100 items*/
for (int i = 0; i < 100; i++) list.Add(i);
TestSize<List<int>>.SizeOf(list); //-> 717 B
It works also with classes:
class twostring
{
public string a { get; set; }
public string b { get; set; }
}
TestSize<twostring>.SizeOf(new twostring() { a="0123456789", b="0123456789" } //-> 28 B
This doesn't apply to the current .NET implementation, but one thing to keep in mind with garbage collected/managed runtimes is the allocated size of an object can change throughout the lifetime of the program. For example, some generational garbage collectors (such as the Generational/Ulterior Reference Counting Hybrid collector) only need to store certain information after an object is moved from the nursery to the mature space.
This makes it impossible to create a reliable, generic API to expose the object size.
This is impossible to do at runtime.
There are various memory profilers that display object size, though.
EDIT: You could write a second program that profiles the first one using the CLR Profiling API and communicates with it through remoting or something.
For anyone looking for a solution that doesn't require [Serializable] classes and where the result is an approximation instead of exact science.
The best method I could find is json serialization into a memorystream using UTF32 encoding.
private static long? GetSizeOfObjectInBytes(object item)
{
if (item == null) return 0;
try
{
// hackish solution to get an approximation of the size
var jsonSerializerSettings = new JsonSerializerSettings
{
DateFormatHandling = DateFormatHandling.IsoDateFormat,
DateTimeZoneHandling = DateTimeZoneHandling.Utc,
MaxDepth = 10,
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
};
var formatter = new JsonMediaTypeFormatter { SerializerSettings = jsonSerializerSettings };
using (var stream = new MemoryStream()) {
formatter.WriteToStream(item.GetType(), item, stream, Encoding.UTF32);
return stream.Length / 4; // 32 bits per character = 4 bytes per character
}
}
catch (Exception)
{
return null;
}
}
No, this won't give you the exact size that would be used in memory. As previously mentioned, that is not possible. But it'll give you a rough estimation.
Note that this is also pretty slow.
Use Son Of Strike which has a command ObjSize.
Note that actual memory consumed is always larger than ObjSize reports due to a synkblk which resides directly before the object data.
Read more about both here MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects.
AFAIK, you cannot, without actually deep-counting the size of each member in bytes. But again, does the size of a member (like elements inside a collection) count towards the size of the object, or a pointer to that member count towards the size of the object? Depends on how you define it.
I have run into this situation before where I wanted to limit the objects in my cache based on the memory they consumed.
Well, if there is some trick to do that, I'd be delighted to know about it!
For value types, you can use Marshal.SizeOf. Of course, it returns the number of bytes required to marshal the structure in unmanaged memory, which is not necessarily what the CLR uses.
I have created benchmark test for different collections in .NET: https://github.com/scholtz/TestDotNetCollectionsMemoryAllocation
Results are as follows for .NET Core 2.2 with 1,000,000 of objects with 3 properties allocated:
Testing with string: 1234567
Hashtable<TestObject>: 184 672 704 B
Hashtable<TestObjectRef>: 136 668 560 B
Dictionary<int, TestObject>: 171 448 160 B
Dictionary<int, TestObjectRef>: 123 445 472 B
ConcurrentDictionary<int, TestObject>: 200 020 440 B
ConcurrentDictionary<int, TestObjectRef>: 152 026 208 B
HashSet<TestObject>: 149 893 216 B
HashSet<TestObjectRef>: 101 894 384 B
ConcurrentBag<TestObject>: 112 783 256 B
ConcurrentBag<TestObjectRef>: 64 777 632 B
Queue<TestObject>: 112 777 736 B
Queue<TestObjectRef>: 64 780 680 B
ConcurrentQueue<TestObject>: 112 784 136 B
ConcurrentQueue<TestObjectRef>: 64 783 536 B
ConcurrentStack<TestObject>: 128 005 072 B
ConcurrentStack<TestObjectRef>: 80 004 632 B
For memory test I found the best to be used
GC.GetAllocatedBytesForCurrentThread()
For arrays of structs/values, I have different results with:
first = Marshal.UnsafeAddrOfPinnedArrayElement(array, 0).ToInt64();
second = Marshal.UnsafeAddrOfPinnedArrayElement(array, 1).ToInt64();
arrayElementSize = second - first;
(oversimplified example)
Whatever the approach, you really need to understand how .Net works to correctly interpret the results.
For instance, the returned element size is the "aligned" element size, with some padding.
The overhead and thus the size is different depending on the usage of a type: "boxed" on the GC heap, on the stack, as a field, as an array element.
(I wanted to know what would be the memory impact of using "dummy" empty structs (without any field) to mimic "optional" arguments of generics; making tests with different layouts involving empty structs, I can see that an empty struct uses (at least) 1 byte per element; I vaguely remember it is because .Net needs a different address for each field, which wouldn't work if a field really was empty/0-sized).
You can use reflection to gather all the public member or property information (given the object's type). There is no way to determine the size without walking through each individual piece of data on the object, though.
From Pavel and jnm2:
private int DumpApproximateObjectSize(object toWeight)
{
return Marshal.ReadInt32(toWeight.GetType().TypeHandle.Value, 4);
}
On a side note be careful because it only work with contiguous memory objects
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
Arrays and strings just adds that size its length * element size.
For cumulative size of aggreagate objects you need to implement more sophisticated solution which involves visiting every field and inspect its contents.
For anyone looking for a rough approximation comparing the sizes of disparate object graphs/collections, just serialize to JSON - e.g.:
Console.WriteLine($"Size1:\t{(JsonConvert.SerializeObject(someBusyObject)).Length}")); Console.WriteLine($"Size2:\t{(JsonConvert.SerializeObject(someOtherObject)).Length}"));
In my case I have a bunch of IEnumerable's being pulled during a login I'm benchmarking, and I just wanted to roughly size them to see their relative weight.
They're expensive operations and won't give you direct heap allocation size or anything like that, but it was good enough for my use case and was readily available.