how does binaryformatter serializes objects? - c#

BinaryFormatter behaving in weird way in my code. I have code like following
[Serializable]
public class LogEntry
{
private int id;
private List<object> data = new List<object>();
public int Id
{
get { return id; }
}
public IList<object> Data
{
get { return data.AsReadOnly(); }
}
...
}
....
....
private static readonly BinaryFormatter logSerializer = new BinaryFormatter();
....
....
public void SerializeLog(IList<LogEntry> logEntries)
{
using (MemoryStream serializationStream = new MemoryStream())
{
logSerializer.Serialize(serializationStream, logEntries);
this.binarySerializedLog = serializationStream.GetBuffer();
}
}
In some machine (32 or 64 bit machine), it is serializing in binary format - which is expected. But in some machine ( all of them are 64 bit machine and not for debug builds) it is not serializing, binarySerializedLog is showing ToString() value of all individual Data, class name (...LogEntry) and id value. My question is - are there specific reason for this type of behavior or am I doing some mistake? Thanks in advance.

Your question isn't very clear (can you define "not serializing"?), but some thoughts:
You should really use ToArray() to capture the buffer, not GetBuffer() (which is cheaper, but returns the oversized array, and should only be used in conjunction with Length).
Where are you seeing this .ToString()? BinaryFormatter writes the objects type, then either uses reflection to write the fields (for [Serializable]) or uses customer serialization (for ISerializable). It never calls .ToString() (unless that is what your ISerializable does). However, strings etc will be in the output "as is".
Note that BinaryFormatter can be brittle between versions, so be careful if you are keeping this data for any length of time (it is generally fine for transport, though, assuming you update both ends at the same time). If you know in advance what your .Data objects are, there are a range of contract-based serializers that might provide more stability. I can provide more specific help if you think this would be worth investigating.

Related

Serializing an Array in WCF with protobuf-net

I am trying to serialize an array of my dataobjects through WCF with protobuf-net.
If I serialize the Array of my dataobjects manually, it works successfully:
var proto = Serializer.CreateFormatter<DataType[]>();
which is way faster and smaller than the ordinary binary xml DataContractSerializer - thats why I wanna use it!
The 'DataType' class is just an example - I have many of those. When the reponse of my service is just a single object, everything works just fine.
But when my service returns an Array of objects it seems it does not know what to do and uses the ordinary DataContractSerializer.
The ProtoBehavior is applied:
endpoint.Behaviors.Add(new ProtoBuf.ServiceModel.ProtoEndpointBehavior());
My dataobject is more or less like that:
[Serializable]
[DataContract]
[ProtoContract]
public class DataType
{
[DataMember(EmitDefaultValue = false, Name = "K")]
[ProtoMember(1)]
public string Key { get; set; }
// many more to come
}
and that's basically my service:
[ServiceContract(CallbackContract = typeof(IBaseDataObjectUpdate), SessionMode = SessionMode.Required)]
[ServiceKnownType("GetKnownTypes", typeof(KnownTypesProvider))]
public interface IDataTypeService
{
[OperationContract]
DataType[] Load(Filter[] filter, Guid clientGuid);
// some more
}
I could track it down to the TryCreate in the XmlProtoSerializer. The call:
int key = GetKey(model, ref type, out isList);
does not return a valid key, therefore no XmlProtoSerializer is created.
That explains the behavior, but what are my options here?
I found an old answer of Marc Gravell where he suggests the creation of an object which consists of the Array. But as it is from 2011 it might be outdated:
https://stackoverflow.com/a/6270267/2243584
Or can I add the model to protobuf-net somehow manually? As mentioned above the manual serialization is working.
Any comment is appreciated!
OK, so far I have come up with 2 solutions.
Do not use Arrays! It works with any other collection. Which caused me to investigate and leaded to solution:
Support Arrays in protobuf-net
I have adapted the method internal static Type GetListItemType(TypeModel model, Type listType) in the TypeMode class as follows:
if (listType.IsArray) // NEW
{
if (listType.GetElementType() == typeof(byte))
return null;
}
if (listType == model.MapType(typeof(string)) // || listType.IsArray // CHANGED!!!
|| !model.MapType(typeof(IEnumerable)).IsAssignableFrom(listType)) return null;
I think I did figure out why arrays are excluded. Because if you support byte[], you get some problems when finally sending the data to the wire. At least I got some Assert and exception in the encode factory when dealing with byte[].
As I have no idea on the side effects of solution Nr. 2 - I stick with solution Nr. 1.
Nevertheless I am quite keen on a comment from Marc - of course, everybody is welcome!

Objects with many value checks c#

I want to see your ideas on a efficient way to check values of a newly serialized object.
Example I have an xml document I have serialized into an object, now I want to do value checks. First and most basic idea I can think of is to use nested if statments and checks each property, could be from one value checking that it has he correct url format, to checking another proprieties value that is a date but making sue it is in the correct range etc.
So my question is how would people do checks on all values in an object? Type checks are not important as this is already taken care of it is more to do with the value itself. It needs to be for quite large objects this is why I did not really want to use nested if statements.
Edit:
I want to achieve complete value validation on all properties in a given object.
I want to check the value it self not that it is null. I want to check the value for specific things if i have, an object with many properties one is of type string and named homepage.
I want to be able to check that the string in the in the correct URL format if not fail. This is just one example in the same object I could check that a date is in a given range if any are not I will return false or some form of fail.
I am using c# .net 4.
Try to use Fluent Validation, it is separation of concerns and configure validation out of your object
public class Validator<T>
{
List<Func<T,bool>> _verifiers = new List<Func<T, bool>>();
public void AddPropertyValidator(Func<T, bool> propValidator)
{
_verifiers.Add(propValidator);
}
public bool IsValid(T objectToValidate)
{
try {
return _verifiers.All(pv => pv(objectToValidate));
} catch(Exception) {
return false;
}
}
}
class ExampleObject {
public string Name {get; set;}
public int BirthYear { get;set;}
}
public static void Main(string[] args)
{
var validator = new Validator<ExampleObject>();
validator.AddPropertyValidator(o => !string.IsNullOrEmpty(o.Name));
validator.AddPropertyValidator(o => o.BirthYear > 1900 && o.BirthYear < DateTime.Now.Year );
validator.AddPropertyValidator(o => o.Name.Length > 3);
validator.Validate(new ExampleObject());
}
I suggest using Automapper with a ValueResolver. You can deserialize the XML into an object in a very elegant way using autommaper and check if the values you get are valid with a ValueResolver.
You can use a base ValueResolver that check for Nulls or invalid casts, and some CustomResolver's that check if the Values you get are correct.
It might not be exacly what you are looking for, but I think it's an elegant way to do it.
Check this out here: http://dannydouglass.com/2010/11/06/simplify-using-xml-data-with-automapper-and-linqtoxml
In functional languages, such as Haskell, your problem could be solved with the Maybe-monad:
The Maybe monad embodies the strategy of combining a chain of
computations that may each return Nothing by ending the chain early if
any step produces Nothing as output. It is useful when a computation
entails a sequence of steps that depend on one another, and in which
some steps may fail to return a value.
Replace Nothing with null, and the same thing applies for C#.
There are several ways to try and solve the problem, none of them are particularly pretty. If you want a runtime-validation that something is not null, you could use an AOP framework to inject null-checking code into your type. Otherwise you would really have to end up doing nested if checks for null, which is not only ugly, it will probably violate the Law of Demeter.
As a compromise, you could use a Maybe-monad like set of extension methods, which would allow you to query the object, and choose what to do in case one of the properties is null.
Have a look at this article by Dmitri Nesteruk: http://www.codeproject.com/Articles/109026/Chained-null-checks-and-the-Maybe-monad
Hope that helps.
I assume your question is: How do I efficiently check whether my object is valid?
If so, it does not matter that your object was just deserialized from some text source. If your question regards checking the object while deserializing to quickly stop deserializing if an error is found, that is another issue and you should update your question.
Validating an object efficiently is not often discussed when it comes to C# and administrative tools. The reason is that it is very quick no matter how you do it. It is more common to discuss how to do the checks in a manner that is easy to read and easily maintained.
Since your question is about efficiency, here are some ideas:
If you have a huge number of objects to be checked and performance is of key importance, you might want to change your objects into arrays of data so that they can be checked in a consistent manner. Example:
Instead of having MyObject[] MyObjects where MyObject has a lot of properties, break out each property and put them into an array like this:
int[] MyFirstProperties
float[] MySecondProperties
This way, the loop that traverses the list and checks the values, can be as quick as possible and you will not have many cache misses in the CPU cache, since you loop forward in the memory. Just be sure to use regular arrays or lists that are not implemented as linked lists, since that is likely to generate a lot of cache misses.
If you do not want to break up your objects into arrays of properties, it seems that top speed is not of interest but almost top speed. Then, your best bet is to keep your objects in a serial array and do:
.
bool wasOk = true;
foreach (MyObject obj in MyObjects)
{
if (obj.MyFirstProperty == someBadValue)
{
wasOk = false;
break;
}
if (obj.MySecondProperty == someOtherBadValue)
{
wasOk = false;
break;
}
}
This checks whether all your objects' properties are ok. I am not sure what your case really is but I think you get the point. Speed is already great when it comes to just checking properties of an object.
If you do string compares, make sure that you use x = y where possible, instead of using more sophisticated string compares, since x = y has a few quick opt outs, like if any of them is null, return, if the memory address is the same, the strings are equal and a few more clever things if I remember correctly. For any Java guy reading this, do not do this in Java!!! It will work sometimes but not always.
If I did not answer your question, you need to improve your question.
I'm not certain I understand the depth of your question but, wouldn't you just do somthing like this,
public SomeClass
{
private const string UrlValidatorRegex = "http://...
private const DateTime MinValidSomeDate = ...
private const DateTime MaxValidSomeDate = ...
public string SomeUrl { get; set; }
public DateTime SomeDate { get; set; }
...
private ValidationResult ValidateProperties()
{
var urlValidator = new RegEx(urlValidatorRegex);
if (!urlValidator.IsMatch(this.Someurl))
{
return new ValidationResult
{
IsValid = false,
Message = "SomeUrl format invalid."
};
}
if (this.SomeDate < MinValidSomeDate
|| this.SomeDate > MinValidSomeDate)
{
return new ValidationResult
{
IsValid = false,
Message = "SomeDate outside permitted bounds."
};
}
...
// Check other fields and properties here, return false on failure.
...
return new ValidationResult
{
IsValid = true,
};
}
...
private struct ValidationResult
{
public bool IsValid;
public string Message;
}
}
The exact valdiation code would vary depending on how you would like your class to work, no? Consider a property of a familar type,
public string SomeString { get; set; }
What are the valid values for this property. Both null and string.Empty may or may not be valid depending on the Class adorned with the property. There may be maximal length that should be allowed but, these details would vary by implementation.
If any suggested answer is more complicated than code above without offering an increase in performance or functionality, can it be more efficient?
Is your question actually, how can I check the values on an object without having to write much code?

Protobuf-net r282 having problems deserializing object serialized with r249

I've just updated from r249 to r282. Other than replacing the dll I've made no changes. Unfortunately, now deserializing the objects created before the update takes significantly longer. What used to take two seconds now takes five minutes.
Were there syntax changes between versions? Is there anything it no longer supports?
My classes are all using ProtoContract, ProtoMember, and ProtoInclude. I am running VS2010. As far as I was concerned there were no problems with my protocol buffer code. I'm only trying to upgrade because I figured it's good to have the most recent version.
Edit - 2010.09.09
One of the properties of my object is an array of ushorts. I've just noticed that this property did not serialize/deserialize properly with r282. The resulting values of the array are all zeros. The array had values before being serialized (r282) but not after deserialization (r282).
It turns out that despite my efforts, yes there was a breaking change in data format in one of the earlier builds. This only impacts ushort data, which was omitted from the handling at one point. This is regrettable, but the good news is that no data is lost - it is simply a bit inconvenient to access (it is essentially written via a string at the moment).
Here's my suggested workaround; for a member like:
[ProtoBuf.ProtoMember(1)]
public ushort[] Data {get;set;}
Replace that with:
[ProtoBuf.ProtoMember(1)]
private string[] LegacyData {get;set;}
private bool LegacyDataSpecified { get { return false; } set { } }
/* where 42 is just an unused new field number */
[ProtoBuf.ProtoMember(42, Options = MemberSerializationOptions.Packed)]
public ushort[] Data { get; set; }
[ProtoBuf.ProtoAfterDeserialization]
private void SerializationCallback()
{
if (LegacyData != null && LegacyData.Length > 0)
{
ushort[] parsed = Array.ConvertAll<string, ushort>(
LegacyData, ushort.Parse);
if (Data != null && Data.Length > 0)
{
int oldLen = parsed.Length;
Array.Resize(ref parsed, parsed.Length + Data.Length);
Array.Copy(Data, 0, parsed, oldLen, Data.Length);
}
Data = parsed;
}
LegacyData = null;
}
This imports old-style data into LegacyData and merges it during (after) serialization, or writes new-style data from Data. Faster, smaller, and supports both old and new data.

Enum and performance

My app has a lot of different lookup values, these values don't ever change, e.g. US States. Rather than putting them into database tables, I'd like to use enums.
But, I do realize doing it this way involves having a few enums and a lot of casting from "int" and "string" to and from my enums.
Alternative, I see someone mentioned using a Dictionary<> as a lookup tables, but enum implementation seems to be cleaner.
So, I'd like to ask if keeping and passing around a lot of enums and casting them be a problem to performance or should I use the lookup tables approach, which performs better?
Edit: The casting is needed as ID to be stored in other database tables.
Casting from int to an enum is extremely cheap... it'll be faster than a dictionary lookup. Basically it's a no-op, just copying the bits into a location with a different notional type.
Parsing a string into an enum value will be somewhat slower.
I doubt that this is going to be a bottleneck for you however you do it though, to be honest... without knowing more about what you're doing, it's somewhat hard to recommendation beyond the normal "write the simplest, mode readable and maintainable code which will work, then check that it performs well enough."
You're not going to notice a big difference in performance between the two, but I'd still recommend using a Dictionary because it will give you a little more flexibility in the future.
For one thing, an Enum in C# can't automatically have a class associated with it like in Java, so if you want to associate additional information with a state (Full Name, Capital City, Postal abbreviation, etc.), creating a UnitedState class will make it easier to package all of that information into one collection.
Also, even though you think this value will never change, it's not perfectly immutable. You could conceivably have a new requirement to include Territories, for example. Or maybe you'll need to allow Canadian users to see the names of Canadian Provinces instead. If you treat this collection like any other collection of data (using a repository to retrieve values from it), you will later have the option to change your repository implementation to pull values from a different source (Database, Web Service, Session, etc.). Enums are much less versatile.
Edit
Regarding the performance argument: Keep in mind that you're not just casting an Enum to an int: you're also running ToString() on that enum, which adds considerable processing time. Consider the following test:
const int C = 10000;
int[] ids = new int[C];
string[] names = new string[C];
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i< C; i++)
{
var id = (i % 50) + 1;
names[i] = ((States)id).ToString();
}
sw.Stop();
Console.WriteLine("Enum: " + sw.Elapsed.TotalMilliseconds);
var namesById = Enum.GetValues(typeof(States)).Cast<States>()
.ToDictionary(s => (int) s, s => s.ToString());
sw.Restart();
for (int i = 0; i< C; i++)
{
var id = (i % 50) + 1;
names[i] = namesById[id];
}
sw.Stop();
Console.WriteLine("Dictionary: " + sw.Elapsed.TotalMilliseconds);
Results:
Enum: 26.4875
Dictionary: 0.7684
So if performance really is your primary concern, a Dictionary is definitely the way to go. However, we're talking about such fast times here that there are half a dozen other concerns I'd address before I would even care about the speed issue.
Enums in C# were not designed to provide mappings between values and strings. They were designed to provide strongly-typed constant values that you can pass around in code. The two main advantages of this are:
You have an extra compiler-checked clue to help you avoid passing arguments in the wrong order, etc.
Rather than putting "magical" number values (e.g. "42") in your code, you can say "States.Oklahoma", which renders your code more readable.
Unlike Java, C# does not automatically check cast values to ensure that they are valid (myState = (States)321), so you don't get any runtime data checks on inputs without doing them manually. If you don't have code that refers to the states explicitly ("States.Oklahoma"), then you don't get any value from #2 above. That leaves us with #1 as the only real reason to use enums. If this is a good enough reason for you, then I would suggest using enums instead of ints as your key values. Then, when you need a string or some other value related to the state, perform a Dictionary lookup.
Here's how I'd do it:
public enum StateKey{
AL = 1,AK,AS,AZ,AR,CA,CO,CT,DE,DC,FM,FL,GA,GU,
HI,ID,IL,IN,IA,KS,KY,LA,ME,MH,MD,MA,MI,MN,MS,
MO,MT,NE,NV,NH,NJ,NM,NY,NC,ND,MP,OH,OK,OR,PW,
PA,PR,RI,SC,SD,TN,TX,UT,VT,VI,VA,WA,WV,WI,WY,
}
public class State
{
public StateKey Key {get;set;}
public int IntKey {get {return (int)Key;}}
public string PostalAbbreviation {get;set;}
}
public interface IStateRepository
{
State GetByKey(StateKey key);
}
public class StateRepository : IStateRepository
{
private static Dictionary<StateKey, State> _statesByKey;
static StateRepository()
{
_statesByKey = Enum.GetValues(typeof(StateKey))
.Cast<StateKey>()
.ToDictionary(k => k, k => new State {Key = k, PostalAbbreviation = k.ToString()});
}
public State GetByKey(StateKey key)
{
return _statesByKey[key];
}
}
public class Foo
{
IStateRepository _repository;
// Dependency Injection makes this class unit-testable
public Foo(IStateRepository repository)
{
_repository = repository;
}
// If you haven't learned the wonders of DI, do this:
public Foo()
{
_repository = new StateRepository();
}
public void DoSomethingWithAState(StateKey key)
{
Console.WriteLine(_repository.GetByKey(key).PostalAbbreviation);
}
}
This way:
you get to pass around strongly-typed values that represent a state,
your lookup gets fail-fast behavior if it is given invalid input,
you can easily change where the actual state data resides in the future,
you can easily add state-related data to the State class in the future,
you can easily add new states, territories, districts, provinces, or whatever else in the future.
getting a name from an int is still about 15 times faster than when using Enum.ToString().
[grunt]
You could use TypeSafeEnum s
Here's a base class
Public MustInherit Class AbstractTypeSafeEnum
Private Shared ReadOnly syncroot As New Object
Private Shared masterValue As Integer = 0
Protected ReadOnly _name As String
Protected ReadOnly _value As Integer
Protected Sub New(ByVal name As String)
Me._name = name
SyncLock syncroot
masterValue += 1
Me._value = masterValue
End SyncLock
End Sub
Public ReadOnly Property value() As Integer
Get
Return _value
End Get
End Property
Public Overrides Function ToString() As String
Return _name
End Function
Public Shared Operator =(ByVal ats1 As AbstractTypeSafeEnum, ByVal ats2 As AbstractTypeSafeEnum) As Boolean
Return (ats1._value = ats2._value) And Type.Equals(ats1.GetType, ats2.GetType)
End Operator
Public Shared Operator <>(ByVal ats1 As AbstractTypeSafeEnum, ByVal ats2 As AbstractTypeSafeEnum) As Boolean
Return Not (ats1 = ats2)
End Operator
End Class
And here's an Enum :
Public NotInheritable Class EnumProcType
Inherits AbstractTypeSafeEnum
Public Shared ReadOnly CREATE As New EnumProcType("Création")
Public Shared ReadOnly MODIF As New EnumProcType("Modification")
Public Shared ReadOnly DELETE As New EnumProcType("Suppression")
Private Sub New(ByVal name As String)
MyBase.New(name)
End Sub
End Class
And it gets easier to add Internationalization.
Sorry about the fact that it's in VB and french though.
Cheers !
Alternatively you can use constants
If the question was "is casting enum faster than accessing a dictionary item?" then the other answers addressing the various aspects of the performance would make sense.
But here the question seems to be "is casting enum when I need to store their value to a database table going to negatively affect the application performance?".
If that is the case, I don't need to run any test to say that storing data in a database table is always going to be orders of magnitude slower than casting an enum or executing its ToString().
In this case I would say the important thing is readability and maintainability of the code. In simple cases enums will do the job cleanly, but I agree with other answers that dictionaries are more flexible in the long term.
Enums will greatly outperform almost anything, especially dictionary's. Enums only use single byte. But why would you be casting? Seems like you should be using the enums everywhere.
Avoid enum as you can: enums should be replaced by singletons deriving from a base class or implementing an interface.
The practice of using enum comes from an old style programming in C.
You start to use an enum for the US States, then you will need the number of inhabitants, the capitol..., and you will need a lot of big switches to get all of this infos.

Find size of object instance in bytes in c#

For any arbitrary instance (collections of different objects, compositions, single objects, etc)
How can I determine its size in bytes?
(I've currently got a collection of various objects and i'm trying to determine the aggregated size of it)
EDIT: Has someone written an extension method for Object that could do this? That'd be pretty neat imo.
First of all, a warning: what follows is strictly in the realm of ugly, undocumented hacks. Do not rely on this working - even if it works for you now, it may stop working tomorrow, with any minor or major .NET update.
You can use the information in this article on CLR internals MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects - last I checked, it was still applicable. Here's how this is done (it retrieves the internal "Basic Instance Size" field via TypeHandle of the type).
object obj = new List<int>(); // whatever you want to get the size of
RuntimeTypeHandle th = obj.GetType().TypeHandle;
int size = *(*(int**)&th + 1);
Console.WriteLine(size);
This works on 3.5 SP1 32-bit. I'm not sure if field sizes are the same on 64-bit - you might have to adjust the types and/or offsets if they are not.
This will work for all "normal" types, for which all instances have the same, well-defined types. Those for which this isn't true are arrays and strings for sure, and I believe also StringBuilder. For them you'll have add the size of all contained elements to their base instance size.
You may be able to approximate the size by pretending to serializing it with a binary serializer (but routing the output to oblivion) if you're working with serializable objects.
class Program
{
static void Main(string[] args)
{
A parent;
parent = new A(1, "Mike");
parent.AddChild("Greg");
parent.AddChild("Peter");
parent.AddChild("Bobby");
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
SerializationSizer ss = new SerializationSizer();
bf.Serialize(ss, parent);
Console.WriteLine("Size of serialized object is {0}", ss.Length);
}
}
[Serializable()]
class A
{
int id;
string name;
List<B> children;
public A(int id, string name)
{
this.id = id;
this.name = name;
children = new List<B>();
}
public B AddChild(string name)
{
B newItem = new B(this, name);
children.Add(newItem);
return newItem;
}
}
[Serializable()]
class B
{
A parent;
string name;
public B(A parent, string name)
{
this.parent = parent;
this.name = name;
}
}
class SerializationSizer : System.IO.Stream
{
private int totalSize;
public override void Write(byte[] buffer, int offset, int count)
{
this.totalSize += count;
}
public override bool CanRead
{
get { return false; }
}
public override bool CanSeek
{
get { return false; }
}
public override bool CanWrite
{
get { return true; }
}
public override void Flush()
{
// Nothing to do
}
public override long Length
{
get { return totalSize; }
}
public override long Position
{
get
{
throw new NotImplementedException();
}
set
{
throw new NotImplementedException();
}
}
public override int Read(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override long Seek(long offset, System.IO.SeekOrigin origin)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
}
Not directly answers the question, but for those who are interested to investigate object sizes while debugging:
Start debugging in VS, make sure the Diagnostics Tools window is shown (Debug > Windows > Show Diagnostic Tools)
Set a breakpoint (optional)
Click Take Snapshot in the Memory Usage while paused
Explore the snapshot (optionally sort the object list alphabetically to find the type you're interested in)
For unmanaged types aka value types, structs:
Marshal.SizeOf(object);
For managed objects the closer i got is an approximation.
long start_mem = GC.GetTotalMemory(true);
aclass[] array = new aclass[1000000];
for (int n = 0; n < 1000000; n++)
array[n] = new aclass();
double used_mem_median = (GC.GetTotalMemory(false) - start_mem)/1000000D;
Do not use serialization.A binary formatter adds headers, so you can change your class and load an old serialized file into the modified class.
Also it won't tell you the real size in memory nor will take into account memory alignment.
[Edit]
By using BiteConverter.GetBytes(prop-value) recursivelly on every property of your class you would get the contents in bytes, that doesn't count the weight of the class or references but is much closer to reality.
I would recommend to use a byte array for data and an unmanaged proxy class to access values using pointer casting if size matters, note that would be non-aligned memory so on old computers is gonna be slow but HUGE datasets on MODERN RAM is gonna be considerably faster, as minimizing the size to read from RAM is gonna be a bigger impact than unaligned.
safe solution with some optimizations
CyberSaving/MemoryUsage code.
some case:
/* test nullable type */
TestSize<int?>.SizeOf(null) //-> 4 B
/* test StringBuilder */
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++) sb.Append("わたしわたしわたしわ");
TestSize<StringBuilder>.SizeOf(sb ) //-> 3132 B
/* test Simple array */
TestSize<int[]>.SizeOf(new int[100]); //-> 400 B
/* test Empty List<int>*/
var list = new List<int>();
TestSize<List<int>>.SizeOf(list); //-> 205 B
/* test List<int> with 100 items*/
for (int i = 0; i < 100; i++) list.Add(i);
TestSize<List<int>>.SizeOf(list); //-> 717 B
It works also with classes:
class twostring
{
public string a { get; set; }
public string b { get; set; }
}
TestSize<twostring>.SizeOf(new twostring() { a="0123456789", b="0123456789" } //-> 28 B
This doesn't apply to the current .NET implementation, but one thing to keep in mind with garbage collected/managed runtimes is the allocated size of an object can change throughout the lifetime of the program. For example, some generational garbage collectors (such as the Generational/Ulterior Reference Counting Hybrid collector) only need to store certain information after an object is moved from the nursery to the mature space.
This makes it impossible to create a reliable, generic API to expose the object size.
This is impossible to do at runtime.
There are various memory profilers that display object size, though.
EDIT: You could write a second program that profiles the first one using the CLR Profiling API and communicates with it through remoting or something.
For anyone looking for a solution that doesn't require [Serializable] classes and where the result is an approximation instead of exact science.
The best method I could find is json serialization into a memorystream using UTF32 encoding.
private static long? GetSizeOfObjectInBytes(object item)
{
if (item == null) return 0;
try
{
// hackish solution to get an approximation of the size
var jsonSerializerSettings = new JsonSerializerSettings
{
DateFormatHandling = DateFormatHandling.IsoDateFormat,
DateTimeZoneHandling = DateTimeZoneHandling.Utc,
MaxDepth = 10,
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
};
var formatter = new JsonMediaTypeFormatter { SerializerSettings = jsonSerializerSettings };
using (var stream = new MemoryStream()) {
formatter.WriteToStream(item.GetType(), item, stream, Encoding.UTF32);
return stream.Length / 4; // 32 bits per character = 4 bytes per character
}
}
catch (Exception)
{
return null;
}
}
No, this won't give you the exact size that would be used in memory. As previously mentioned, that is not possible. But it'll give you a rough estimation.
Note that this is also pretty slow.
Use Son Of Strike which has a command ObjSize.
Note that actual memory consumed is always larger than ObjSize reports due to a synkblk which resides directly before the object data.
Read more about both here MSDN Magazine Issue 2005 May - Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects.
AFAIK, you cannot, without actually deep-counting the size of each member in bytes. But again, does the size of a member (like elements inside a collection) count towards the size of the object, or a pointer to that member count towards the size of the object? Depends on how you define it.
I have run into this situation before where I wanted to limit the objects in my cache based on the memory they consumed.
Well, if there is some trick to do that, I'd be delighted to know about it!
For value types, you can use Marshal.SizeOf. Of course, it returns the number of bytes required to marshal the structure in unmanaged memory, which is not necessarily what the CLR uses.
I have created benchmark test for different collections in .NET: https://github.com/scholtz/TestDotNetCollectionsMemoryAllocation
Results are as follows for .NET Core 2.2 with 1,000,000 of objects with 3 properties allocated:
Testing with string: 1234567
Hashtable<TestObject>: 184 672 704 B
Hashtable<TestObjectRef>: 136 668 560 B
Dictionary<int, TestObject>: 171 448 160 B
Dictionary<int, TestObjectRef>: 123 445 472 B
ConcurrentDictionary<int, TestObject>: 200 020 440 B
ConcurrentDictionary<int, TestObjectRef>: 152 026 208 B
HashSet<TestObject>: 149 893 216 B
HashSet<TestObjectRef>: 101 894 384 B
ConcurrentBag<TestObject>: 112 783 256 B
ConcurrentBag<TestObjectRef>: 64 777 632 B
Queue<TestObject>: 112 777 736 B
Queue<TestObjectRef>: 64 780 680 B
ConcurrentQueue<TestObject>: 112 784 136 B
ConcurrentQueue<TestObjectRef>: 64 783 536 B
ConcurrentStack<TestObject>: 128 005 072 B
ConcurrentStack<TestObjectRef>: 80 004 632 B
For memory test I found the best to be used
GC.GetAllocatedBytesForCurrentThread()
For arrays of structs/values, I have different results with:
first = Marshal.UnsafeAddrOfPinnedArrayElement(array, 0).ToInt64();
second = Marshal.UnsafeAddrOfPinnedArrayElement(array, 1).ToInt64();
arrayElementSize = second - first;
(oversimplified example)
Whatever the approach, you really need to understand how .Net works to correctly interpret the results.
For instance, the returned element size is the "aligned" element size, with some padding.
The overhead and thus the size is different depending on the usage of a type: "boxed" on the GC heap, on the stack, as a field, as an array element.
(I wanted to know what would be the memory impact of using "dummy" empty structs (without any field) to mimic "optional" arguments of generics; making tests with different layouts involving empty structs, I can see that an empty struct uses (at least) 1 byte per element; I vaguely remember it is because .Net needs a different address for each field, which wouldn't work if a field really was empty/0-sized).
You can use reflection to gather all the public member or property information (given the object's type). There is no way to determine the size without walking through each individual piece of data on the object, though.
From Pavel and jnm2:
private int DumpApproximateObjectSize(object toWeight)
{
return Marshal.ReadInt32(toWeight.GetType().TypeHandle.Value, 4);
}
On a side note be careful because it only work with contiguous memory objects
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
Arrays and strings just adds that size its length * element size.
For cumulative size of aggreagate objects you need to implement more sophisticated solution which involves visiting every field and inspect its contents.
For anyone looking for a rough approximation comparing the sizes of disparate object graphs/collections, just serialize to JSON - e.g.:
Console.WriteLine($"Size1:\t{(JsonConvert.SerializeObject(someBusyObject)).Length}")); Console.WriteLine($"Size2:\t{(JsonConvert.SerializeObject(someOtherObject)).Length}"));
In my case I have a bunch of IEnumerable's being pulled during a login I'm benchmarking, and I just wanted to roughly size them to see their relative weight.
They're expensive operations and won't give you direct heap allocation size or anything like that, but it was good enough for my use case and was readily available.

Categories