Related
Let's say we have one structure :
[StructLayout(LayoutKind.Explicit, Size=8)] // using System.Runtime.InteropServices;
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
}
What I want to have : Both direct access to type string and int values, for the field Ident in this structure, without breaking the 8 bytes size of the structure, nor having to compute a string value each time from the int value.
The field Ident in that structure as int is interesting because I can fast compare with other idents if they match, other idents may come from datas that are unrelated to this structure, but are in the same int format.
Question : Is there a way to define a field that is not part of the struture layout ? Like :
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
[NoOffset()] // <- is there something I can do the like of this
string _identStr;
public string IdentStr {
get { // EDIT ! missed the getter on this property
if (string.IsNullOrEmpty(_identStr)) _identStr =
System.Text.Encoding.ASCII.GetString(Ident.GetBytes());
// do the above only once. May use an extra private bool field to go faster.
return _identStr;
}
}
}
PS : I use pointers ('*' and '&', unsafe) because I need to deal with endianness (Local system, binary files/file format, network) and fast type conversions, fast arrays filling. I also use many flavours of Marshal methods (fixing structures on byte arrays), and a little of PInvoke and COM interop. Too bad some assemblies I'm dealing with doesn't have their dotNet counterpart yet.
TL;DR; For details only
The question is all it is about, I just don't know the answer. The following should answer most questions like "other approaches", or "why not do this instead", but could be ignored as the answer would be straightforward. Anyway, I preemptively put everything so it's clear from the start what am I trying to do. :)
Options/Workaround I'm currently using (or thinking of using) :
Create a getter (not a field) that computes the string value each time :
public string IdentStr {
get { return System.Text.Encoding.ASCII.GetString(Ident.GetBytes()); }
// where GetBytes() is an extension method that converts an int to byte[]
}
This approach, while doing the job, performs poorly : The GUI displays aircraft from a database of default flights, and injects other flights from the network with a refresh rate of one second (I should increase that to 5 seconds). I have around 1200 flights within a area, relating to 2400 airports (departure and arrival), meaning I have 2400 calls to the above code each second to display the ident in a DataGrid.
Create another struct (or class), which only purpose is to manage
data on GUI side, when not reading/writing to a stream or file. That means, read
the data with the explicit layout struct. Create another struct with
the string version of the field. Work with GUI. That will perform
better on an overall point of view, but, in the process of defining
structures for the game binaries, I'm already at 143 structures of
the kind (just with older versions of the game datas; there are a bunch I didn't write yet, and I plan to add structures for the newest datas types). ATM, more than half of them require one or more extra
fields to be of meaningful use. It's okay if I were the only one to use the assembly, but
other users will probably get lost with AirportHeader,
AirportHeaderEx, AirportEntry, AirportEntryEx,
AirportCoords, AirportCoordsEx.... I would avoid doing that.
Optimize option 1 to make computations perform faster (thanks to SO,
there are a bunch of ideas to look for - currently working on the idea). For the Ident field, I
guess I could use pointers (and I will). Already doing it for fields I must display in little endian and read/write in big
endian. There are other values, like 4x4 grid informations that are
packed in a single Int64 (ulong), that needs bit shifting to
expose the actual values. Same for GUIDs or objects pitch/bank/yaw.
Try to take advantage of overlapping fields (on study). That would work for GUIDs. Perhaps it may work for the Ident example, if MarshalAs can constrain the
value to an ASCII string. Then I just need to specify the same
FieldOffset, '0' in this case. But I'm unsure setting the field
value (entry.FieldStr = "FMEP";) actually uses the Marshal constrain on the managed code side. My undestanding is it will store the string in Unicode on managed side (?).
Furthermore, that wouldn't work for packed bits (bytes that contains
several values, or consecutive bytes hosting values that have to be
bit shifted). I believe it is impossible to specify value position, length and format
at bit level.
Why bother ? context :
I'm defining a bunch of structures to parse binary datas from array of bytes (IO.File.ReadAllBytes) or streams, and write them back, datas related to a game. Application logic should use the structures to quickly access and manipulate the datas on demand. Assembly expected capabilities is read, validate, edit, create and write, outside the scope of the game (addon building, control) and inside the scope of the game (API, live modding or monitoring). Other purpose is to understand the content of binaries (hex) and make use of that understanding to build what's missing in the game.
The purpose of the assembly is to provide a ready to use basis components for a c# addon contributor (I don't plan to make the code portable). Creating applications for the game or processing addon from source to compilation into game binaries. It's nice to have a class that loads the entire content of a file in memory, but some context require you to not do that, and only retrieve from the file what is necessary, hence the choice of the struct pattern.
I need to figure out the trust and legal issues (copyrighted data) but that's outside the scope of the main concern. If that matter, Microsoft did provide over the years public freely accessible SDKs exposing binaries structures on previous versions of the game, for the purpose of what I'm doing (I'm not the first and probably not the last to do so). Though, I wouldn't dare to expose undocumented binaries (for the latest game datas for instance), nor facilitate a copyright breach on copyrighted materials/binaries.
I'm just asking confirmation if there is a way or not to have private fields not being part of the structure layout. Naive belief ATM is "that's impossible, but there are workarounds". It's just that my c# experience is pretty sparce, so maybe I'm wrong, why I ask. Thanks !
As suggested, there are several ways to get the job done. Here are the getters/setters I came up with within the structure. I'll measure how each code performs on various scenarios later. The dict approach is very seducing as on many scenarios, I would need a directly accessible global database of (59000) airports with runways and parking spots (not just the Ident), but a fast check between struct fields is also interesting.
public string IdentStr_Marshal {
get {
var output = "";
GCHandle pinnedHandle; // CS0165 for me (-> c# v5)
try { // Fast if no exception, (very) slow if exception thrown
pinnedHandle = GCHandle.Alloc(this, GCHandleType.Pinned);
IntPtr structPtr = pinnedHandle.AddrOfPinnedObject();
output = Marshal.PtrToStringAnsi(structPtr, 4);
// Cannot use UTF8 because the assembly should work in Framework v4.5
} finally { if (pinnedHandle.IsAllocated) pinnedHandle.Free(); }
return output;
}
set {
value.PadRight(4); // Must fill the blanks - initial while loop replaced (Charlieface's)
IntPtr intValuePtr = IntPtr.Zero;
// Cannot use UTF8 because some users are on Win7 with FlightSim 2004
try { // Put a try as a matter of habit, but not convinced it's gonna throw.
intValuePtr = Marshal.StringToHGlobalAnsi(value);
Ident = Marshal.ReadInt32(intValuePtr, 0).BinaryConvertToUInt32(); // Extension method to convert type.
} finally { Marshal.FreeHGlobal(intValuePtr); // freeing the right pointer }
}
}
public unsafe string IdentStr_Pointer {
get {
string output = "";
fixed (UInt32* ident = &Ident) { // Fixing the field
sbyte* bytes = (sbyte*)ident;
output = new string(bytes, 0, 4, System.Text.Encoding.ASCII); // Encoding added (#Charlieface)
}
return output;
}
set {
// value must not exceed a length of 4 and must be in Ansi [A-Z,0-9,whitespace 0x20].
// value validation at this point occurs outside the structure.
fixed (UInt32* ident = &Ident) { // Fixing the field
byte* bytes = (byte*)ident;
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value);
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
}
}
}
static Dictionary<UInt32, string> ps_dict = new Dictionary<UInt32, string>();
public string IdentStr_StaticDict {
get {
string output; // logic update with TryGetValue (#Charlieface)
if (ps_dict.TryGetValue(Ident, out output)) return output;
output = System.Text.Encoding.ASCII.GetString(Ident.ToBytes(EndiannessType.LittleEndian));
ps_dict.Add(Ident, output);
return output;
}
set { // input can be "FMEE", "DME" or "DK". length of 2 characters is the minimum.
var bytes = new byte[4]; // Need to convert value to a 4 byte array
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value); // should be 4 bytes or less
// Put the valid ASCII codes in the array.
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
Ident = BitConverter.ToUInt32(bytes, 0); // Set structure int value
if (!ps_dict.ContainsKey(Ident)) // Add if missing
ps_dict.Add(Ident, System.Text.Encoding.ASCII.GetString(bytes));
}
}
As mentioned by others, it is not possible to exclude a field from a struct for marshalling.
You also cannot use a pointer as a string in most places.
If the number of different possible strings is relatively small (and it probably will be, given it's only 4 characters), then you could use a static Dictionary<int, string> as a kind of string-interning mechanism.
Then you write a property to add/retrieve the real string.
Note that dictionary access is O(1), and hashing an int just returns itself, so it will be very, very fast, but will take up some memory.
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
static Dictionary<int, string> _identStrings = new Dictionary<int, string>();
public string IdentStr =>
_identStrings.TryGetValue(Ident, out var ret) ? ret :
(_identStrings[Ident] = Encoding.ASCII.GetString(Ident.GetBytes());
}
This is not possible because a structure must contain all of its values in a specific order. Usually this order is controlled by the CLR itself. If you want to change the order of the data order, you can use the StructLayout. However, you cannot exclude a field or that data would simply not exist in memory.
Instead of a string (which is a reference type) you can use a pointer to point directly to that string and use that in your structure in combination with the StructLayout. To get this string value, you can use a get-only property that reads directly from unmanaged memory.
I am trying to read a file into structure, but failed as there was a compilation error. See what I tried:
struct file_row_struct
{
datetime file_time;
string file_range_green;
string file_range_red;
double file_dist_green_red;
double file_slope_green;
double file_slope_red;
string file_prev_color;
string file_current_color;
}filerow[];
int size = 1;
FileReader = FileOpen(file_read_path,FILE_READ|FILE_CSV,',');
if(FileReader != INVALID_HANDLE)
{
//while(!FileIsEnding(FileReader))
// linecount++;
while(!FileIsEnding(FileReader))
{
FileReadStruct(FileReader,filerow,size);
size++;
}
Print("File Opened successfully");
//PrintFormat("File path: %s\\Files\\",TerminalInfoString(TERMINAL_DATA_PATH));
FileClose(FileReader);
}
else Print("Not Successful in opening file: %s ", GetLastError());
The gist of sample file is available at: Sample data
The compilation error that I encountered is as follows:
'filerow' - structures containing objects are not allowed NeuralExpert.mq5 108 36
Kindly, suggest me what I have mistaken. My guess is that there is an availability of the string member function in the structure, hence it is not allowing.
Structures are simple types in MQL. That means you can have integer and floating values of all kinds in it (anything that casts to ulong and double) and some others. That also means you cannot have strings and other structures in it. If you have strings in the structure - you cannot pass by reference and many other problems (so it is better to say complex types are not supported in structures, you still may have them but it is your responsibility to do everything correctly).
Since you cannot pass structures by reference, you cannot use FileReadStruct().
What to do - I would suggest use of a CObject-based class and CArrayObj to hold them instead of filerow[].
class CFileRow : public CObject
{
//8 fields
public:
CFileRow(const string line)
{
//convert string line that you are to read from file into class
}
~CFileRow(){}
};
CArrayObj* fileRowArray = new CArrayObj();
while(!FileIsEnding(FileReader))
{
string line=FileReadString(FileReader);
fileRowArray.Add(new CFileRow(line));
}
Overall aim: To skip a very long field when deserializing, and when the field is accessed to read elements from it directly from the stream without loading the whole field.
Example classes The object being serialized/deserialized is FatPropertyClass.
[ProtoContract]
public class FatPropertyClass
{
[ProtoMember(1)]
private int smallProperty;
[ProtoMember(2)]
private FatArray2<int> fatProperty;
[ProtoMember(3)]
private int[] array;
public FatPropertyClass()
{
}
public FatPropertyClass(int sp, int[] fp)
{
smallProperty = sp;
fatProperty = new FatArray<int>(fp);
}
public int SmallProperty
{
get { return smallProperty; }
set { smallProperty = value; }
}
public FatArray<int> FatProperty
{
get { return fatProperty; }
set { fatProperty = value; }
}
public int[] Array
{
get { return array; }
set { array = value; }
}
}
[ProtoContract]
public class FatArray2<T>
{
[ProtoMember(1, DataFormat = DataFormat.FixedSize)]
private T[] array;
private Stream sourceStream;
private long position;
public FatArray2()
{
}
public FatArray2(T[] array)
{
this.array = new T[array.Length];
Array.Copy(array, this.array, array.Length);
}
[ProtoBeforeDeserialization]
private void BeforeDeserialize(SerializationContext context)
{
position = ((Stream)context.Context).Position;
}
public T this[int index]
{
get
{
// logic to get the relevant index from the stream.
return default(T);
}
set
{
// only relevant when full array is available for example.
}
}
}
I can deserialize like so: FatPropertyClass d = model.Deserialize(fileStream, null, typeof(FatPropertyClass), new SerializationContext() {Context = fileStream}) as FatPropertyClass; where the model can be for example:
RuntimeTypeModel model = RuntimeTypeModel.Create();
MetaType mt = model.Add(typeof(FatPropertyClass), false);
mt.AddField(1, "smallProperty");
mt.AddField(2, "fatProperty");
mt.AddField(3, "array");
MetaType mtFat = model.Add(typeof(FatArray<int>), false);
This will skip the deserialization of array in FatArray<T>. However, I then need to read random elements from that array at a later time. One thing I tried is to remember the stream position before deserialization in the BeforeDeserialize(SerializationContext context) method of FatArray2<T>. As in the above code: position = ((Stream)context.Context).Position;. However this seems to always be the end of the stream.
How can I remember the stream position where FatProperty2 begins and how can I read from it at a random index?
Note: The parameter T in FatArray2<T> can be of other types marked with [ProtoContract], not just primitives. Also there could be multiple properties of type FatProperty2<T> at various depths in the object graph.
Method 2: Serialize the field FatProperty2<T> after the serialization of the containing object. So, serialize FatPropertyClass with length prefix, then serialize with length prefix all fat arrays it contains. Mark all of these fat array properties with an attribute, and at deserialization we can remember the stream position for each of them.
Then the question is how do we read primitives out of it? This works OK for classes using T item = Serializer.DeserializeItems<T>(sourceStream, PrefixStyle.Base128, Serializer.ListItemTag).Skip(index).Take(1).ToArray(); to get the item at index index. But how does this work for primitives? An array of primitives does not seem to be able to be deserialized using DeserializeItems.
Is DeserializeItems with LINQ used like that even OK? Does it do what I assume it does (internally skip through the stream to the correct element - at worst reading each length prefix and skipping it)?
Regards,
Iulian
This question depends an awful lot on the actual model - it isn't a scenario that the library specifically targets to make convenient. I suspect that your best bet here would be to write the reader manually using ProtoReader. Note that there are some tricks when it comes to reading selected items if the outermost object is a List<SomeType> or similar, but internal objects are typically either simply read or skipped.
By starting again from the root of the document via ProtoReader, you could seek fairly efficiently to the nth item. I can do a concrete example later if you like (I haven't leapt in unless you're sure it will actually be useful). For reference, the reason the stream's position isn't useful here is: the library aggressively over-reads and buffers data, unless you specifically tell it to limit its length. This is because data like "varint" is hard to read efficiently without lots of buffering, as it would end up being a lot of individual calls to ReadByte(), rather than just working with a local buffer.
This is a completely untested version of reading the n-th array item of the sub-property directly from a reader; note that it would be inefficient to call this lots of times one after the other, but it should be obvious how to change it to read a range of consecutive values, etc:
static int? ReadNthArrayItem(Stream source, int index, int maxLen)
{
using (var reader = new ProtoReader(source, null, null, maxLen))
{
int field, count = 0;
while ((field = reader.ReadFieldHeader()) > 0)
{
switch (field)
{
case 2: // fat property; a sub object
var tok = ProtoReader.StartSubItem(reader);
while ((field = reader.ReadFieldHeader()) > 0)
{
switch (field)
{
case 1: // the array field
if(count++ == index)
return reader.ReadInt32();
reader.SkipField();
break;
default:
reader.SkipField();
break;
}
}
ProtoReader.EndSubItem(tok, reader);
break;
default:
reader.SkipField();
break;
}
}
}
return null;
}
Finally, note that if this is a large array, you might want to use "packed" arrays (see the protobuf documentation, but this basically stores them without the header per-item). This would be a lot more efficient, but note that it requires slightly different reading code. You enable packed arrays by adding IsPacked = true onto the [ProtoMember(...)] for that array.
BinaryFormatter behaving in weird way in my code. I have code like following
[Serializable]
public class LogEntry
{
private int id;
private List<object> data = new List<object>();
public int Id
{
get { return id; }
}
public IList<object> Data
{
get { return data.AsReadOnly(); }
}
...
}
....
....
private static readonly BinaryFormatter logSerializer = new BinaryFormatter();
....
....
public void SerializeLog(IList<LogEntry> logEntries)
{
using (MemoryStream serializationStream = new MemoryStream())
{
logSerializer.Serialize(serializationStream, logEntries);
this.binarySerializedLog = serializationStream.GetBuffer();
}
}
In some machine (32 or 64 bit machine), it is serializing in binary format - which is expected. But in some machine ( all of them are 64 bit machine and not for debug builds) it is not serializing, binarySerializedLog is showing ToString() value of all individual Data, class name (...LogEntry) and id value. My question is - are there specific reason for this type of behavior or am I doing some mistake? Thanks in advance.
Your question isn't very clear (can you define "not serializing"?), but some thoughts:
You should really use ToArray() to capture the buffer, not GetBuffer() (which is cheaper, but returns the oversized array, and should only be used in conjunction with Length).
Where are you seeing this .ToString()? BinaryFormatter writes the objects type, then either uses reflection to write the fields (for [Serializable]) or uses customer serialization (for ISerializable). It never calls .ToString() (unless that is what your ISerializable does). However, strings etc will be in the output "as is".
Note that BinaryFormatter can be brittle between versions, so be careful if you are keeping this data for any length of time (it is generally fine for transport, though, assuming you update both ends at the same time). If you know in advance what your .Data objects are, there are a range of contract-based serializers that might provide more stability. I can provide more specific help if you think this would be worth investigating.
For any arbitrary instance (collections of different objects, compositions, single objects, etc)
How can I determine its size in bytes?
(I've currently got a collection of various objects and i'm trying to determine the aggregated size of it)
EDIT: Has someone written an extension method for Object that could do this? That'd be pretty neat imo.
First of all, a warning: what follows is strictly in the realm of ugly, undocumented hacks. Do not rely on this working - even if it works for you now, it may stop working tomorrow, with any minor or major .NET update.
You can use the information in this article on CLR internals MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects - last I checked, it was still applicable. Here's how this is done (it retrieves the internal "Basic Instance Size" field via TypeHandle of the type).
object obj = new List<int>(); // whatever you want to get the size of
RuntimeTypeHandle th = obj.GetType().TypeHandle;
int size = *(*(int**)&th + 1);
Console.WriteLine(size);
This works on 3.5 SP1 32-bit. I'm not sure if field sizes are the same on 64-bit - you might have to adjust the types and/or offsets if they are not.
This will work for all "normal" types, for which all instances have the same, well-defined types. Those for which this isn't true are arrays and strings for sure, and I believe also StringBuilder. For them you'll have add the size of all contained elements to their base instance size.
You may be able to approximate the size by pretending to serializing it with a binary serializer (but routing the output to oblivion) if you're working with serializable objects.
class Program
{
static void Main(string[] args)
{
A parent;
parent = new A(1, "Mike");
parent.AddChild("Greg");
parent.AddChild("Peter");
parent.AddChild("Bobby");
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
SerializationSizer ss = new SerializationSizer();
bf.Serialize(ss, parent);
Console.WriteLine("Size of serialized object is {0}", ss.Length);
}
}
[Serializable()]
class A
{
int id;
string name;
List<B> children;
public A(int id, string name)
{
this.id = id;
this.name = name;
children = new List<B>();
}
public B AddChild(string name)
{
B newItem = new B(this, name);
children.Add(newItem);
return newItem;
}
}
[Serializable()]
class B
{
A parent;
string name;
public B(A parent, string name)
{
this.parent = parent;
this.name = name;
}
}
class SerializationSizer : System.IO.Stream
{
private int totalSize;
public override void Write(byte[] buffer, int offset, int count)
{
this.totalSize += count;
}
public override bool CanRead
{
get { return false; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return true; }
}
public override void Flush()
{
// Nothing to do
}
public override long Length
{
get { return totalSize; }
}
public override long Position
{
get
{
throw new NotImplementedException();
}
set
{
throw new NotImplementedException();
}
}
public override int Read(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override long Seek(long offset, System.IO.SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
}
Not directly answers the question, but for those who are interested to investigate object sizes while debugging:
Start debugging in VS, make sure the Diagnostics Tools window is shown (Debug > Windows > Show Diagnostic Tools)
Set a breakpoint (optional)
Click Take Snapshot in the Memory Usage while paused
Explore the snapshot (optionally sort the object list alphabetically to find the type you're interested in)
For unmanaged types aka value types, structs:
Marshal.SizeOf(object);
For managed objects the closer i got is an approximation.
long start_mem = GC.GetTotalMemory(true);
aclass[] array = new aclass[1000000];
for (int n = 0; n < 1000000; n++)
array[n] = new aclass();
double used_mem_median = (GC.GetTotalMemory(false) - start_mem)/1000000D;
Do not use serialization.A binary formatter adds headers, so you can change your class and load an old serialized file into the modified class.
Also it won't tell you the real size in memory nor will take into account memory alignment.
[Edit]
By using BiteConverter.GetBytes(prop-value) recursivelly on every property of your class you would get the contents in bytes, that doesn't count the weight of the class or references but is much closer to reality.
I would recommend to use a byte array for data and an unmanaged proxy class to access values using pointer casting if size matters, note that would be non-aligned memory so on old computers is gonna be slow but HUGE datasets on MODERN RAM is gonna be considerably faster, as minimizing the size to read from RAM is gonna be a bigger impact than unaligned.
safe solution with some optimizations
CyberSaving/MemoryUsage code.
some case:
/* test nullable type */
TestSize<int?>.SizeOf(null) //-> 4 B
/* test StringBuilder */
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++) sb.Append("わたしわたしわたしわ");
TestSize<StringBuilder>.SizeOf(sb ) //-> 3132 B
/* test Simple array */
TestSize<int[]>.SizeOf(new int[100]); //-> 400 B
/* test Empty List<int>*/
var list = new List<int>();
TestSize<List<int>>.SizeOf(list); //-> 205 B
/* test List<int> with 100 items*/
for (int i = 0; i < 100; i++) list.Add(i);
TestSize<List<int>>.SizeOf(list); //-> 717 B
It works also with classes:
class twostring
{
public string a { get; set; }
public string b { get; set; }
}
TestSize<twostring>.SizeOf(new twostring() { a="0123456789", b="0123456789" } //-> 28 B
This doesn't apply to the current .NET implementation, but one thing to keep in mind with garbage collected/managed runtimes is the allocated size of an object can change throughout the lifetime of the program. For example, some generational garbage collectors (such as the Generational/Ulterior Reference Counting Hybrid collector) only need to store certain information after an object is moved from the nursery to the mature space.
This makes it impossible to create a reliable, generic API to expose the object size.
This is impossible to do at runtime.
There are various memory profilers that display object size, though.
EDIT: You could write a second program that profiles the first one using the CLR Profiling API and communicates with it through remoting or something.
For anyone looking for a solution that doesn't require [Serializable] classes and where the result is an approximation instead of exact science.
The best method I could find is json serialization into a memorystream using UTF32 encoding.
private static long? GetSizeOfObjectInBytes(object item)
{
if (item == null) return 0;
try
{
// hackish solution to get an approximation of the size
var jsonSerializerSettings = new JsonSerializerSettings
{
DateFormatHandling = DateFormatHandling.IsoDateFormat,
DateTimeZoneHandling = DateTimeZoneHandling.Utc,
MaxDepth = 10,
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
};
var formatter = new JsonMediaTypeFormatter { SerializerSettings = jsonSerializerSettings };
using (var stream = new MemoryStream()) {
formatter.WriteToStream(item.GetType(), item, stream, Encoding.UTF32);
return stream.Length / 4; // 32 bits per character = 4 bytes per character
}
}
catch (Exception)
{
return null;
}
}
No, this won't give you the exact size that would be used in memory. As previously mentioned, that is not possible. But it'll give you a rough estimation.
Note that this is also pretty slow.
Use Son Of Strike which has a command ObjSize.
Note that actual memory consumed is always larger than ObjSize reports due to a synkblk which resides directly before the object data.
Read more about both here MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects.
AFAIK, you cannot, without actually deep-counting the size of each member in bytes. But again, does the size of a member (like elements inside a collection) count towards the size of the object, or a pointer to that member count towards the size of the object? Depends on how you define it.
I have run into this situation before where I wanted to limit the objects in my cache based on the memory they consumed.
Well, if there is some trick to do that, I'd be delighted to know about it!
For value types, you can use Marshal.SizeOf. Of course, it returns the number of bytes required to marshal the structure in unmanaged memory, which is not necessarily what the CLR uses.
I have created benchmark test for different collections in .NET: https://github.com/scholtz/TestDotNetCollectionsMemoryAllocation
Results are as follows for .NET Core 2.2 with 1,000,000 of objects with 3 properties allocated:
Testing with string: 1234567
Hashtable<TestObject>: 184 672 704 B
Hashtable<TestObjectRef>: 136 668 560 B
Dictionary<int, TestObject>: 171 448 160 B
Dictionary<int, TestObjectRef>: 123 445 472 B
ConcurrentDictionary<int, TestObject>: 200 020 440 B
ConcurrentDictionary<int, TestObjectRef>: 152 026 208 B
HashSet<TestObject>: 149 893 216 B
HashSet<TestObjectRef>: 101 894 384 B
ConcurrentBag<TestObject>: 112 783 256 B
ConcurrentBag<TestObjectRef>: 64 777 632 B
Queue<TestObject>: 112 777 736 B
Queue<TestObjectRef>: 64 780 680 B
ConcurrentQueue<TestObject>: 112 784 136 B
ConcurrentQueue<TestObjectRef>: 64 783 536 B
ConcurrentStack<TestObject>: 128 005 072 B
ConcurrentStack<TestObjectRef>: 80 004 632 B
For memory test I found the best to be used
GC.GetAllocatedBytesForCurrentThread()
For arrays of structs/values, I have different results with:
first = Marshal.UnsafeAddrOfPinnedArrayElement(array, 0).ToInt64();
second = Marshal.UnsafeAddrOfPinnedArrayElement(array, 1).ToInt64();
arrayElementSize = second - first;
(oversimplified example)
Whatever the approach, you really need to understand how .Net works to correctly interpret the results.
For instance, the returned element size is the "aligned" element size, with some padding.
The overhead and thus the size is different depending on the usage of a type: "boxed" on the GC heap, on the stack, as a field, as an array element.
(I wanted to know what would be the memory impact of using "dummy" empty structs (without any field) to mimic "optional" arguments of generics; making tests with different layouts involving empty structs, I can see that an empty struct uses (at least) 1 byte per element; I vaguely remember it is because .Net needs a different address for each field, which wouldn't work if a field really was empty/0-sized).
You can use reflection to gather all the public member or property information (given the object's type). There is no way to determine the size without walking through each individual piece of data on the object, though.
From Pavel and jnm2:
private int DumpApproximateObjectSize(object toWeight)
{
return Marshal.ReadInt32(toWeight.GetType().TypeHandle.Value, 4);
}
On a side note be careful because it only work with contiguous memory objects
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
Arrays and strings just adds that size its length * element size.
For cumulative size of aggreagate objects you need to implement more sophisticated solution which involves visiting every field and inspect its contents.
For anyone looking for a rough approximation comparing the sizes of disparate object graphs/collections, just serialize to JSON - e.g.:
Console.WriteLine($"Size1:\t{(JsonConvert.SerializeObject(someBusyObject)).Length}")); Console.WriteLine($"Size2:\t{(JsonConvert.SerializeObject(someOtherObject)).Length}"));
In my case I have a bunch of IEnumerable's being pulled during a login I'm benchmarking, and I just wanted to roughly size them to see their relative weight.
They're expensive operations and won't give you direct heap allocation size or anything like that, but it was good enough for my use case and was readily available.