Struct vs Class for long lived objects - c#

When you need to have very small objects, say that contains 2 float property, and you will have millions of them that aren't gonna be "destroyed" right away, are structs a better choice or classes?
Like in xna as a library, there are point3s, etc as structs but if you need to hold onto those values for a long time, would it pose a performance threat?

Contrary to most questions about structs, this actually seems to be a good use of a struct. If the data it contains are value types, and you are going to use a lot of these, a structure would work well.
Some tips:
:: The struct should not be larger than 16 bytes, or you lose the performance advantages.
:: Make the struct immutable. That makes the usage clearer.
Example:
public struct Point3D {
public float X { get; private set; }
public float Y { get; private set; }
public float Z { get; private set; }
public Point3D(float x, float y, float z) {
X = x;
Y = y;
Z = z;
}
public Point3D Invert() {
return new Point3D(-X, -Y, -Z);
}
}

The answer depends on where the objects/values will eventually be stored. If they are to be stored in an untyped collection like ArrayList, then you end up boxing them. Boxing creates an object wrapper for a struct and the footprint is the same as with a class object. On the other hand if you use a typed array like T[] or List, then using structs will only store the actual data for each element with footprint for entire collection only and not its elements.
So structs are more efficient for use in T[] arrays.

The big concern is whether the memory is allocated on the stack or the heap. Structs go in the stack by default, and the stack is generally much more limited in terms of space. So creating a whole bunch of structs just like that can be a problem.
In practice, though, I don't really think it's that big of a deal. If you have that many of them they're likely part of a class instance (on the heap) somewhere.

Struct seems right for this application.
Bear in mind that the "need to hold onto those valueS" implies their storage on the heap somewhere, probably an array field of a class instance.
One thing to watch out for is that this results in a allocation on the large object heap. Its not that clear how, if at all, this heap defrags itself, however for very long lived objects that perhaps isn't an issue.
Using class for millions of these data types would likely be expensive in the shear volume of dereferencing that will likely be taking place for operations on this type.

As a rule, large arrays of non-aliased (i.e. unshared) data of the same type is best stored in structs for performance since you reduce the number of indirections. (See also when-are-structs-the-answer). The exact performance difference between class and struct depends on your usage. (E.g., in operations, do you only access parts of the struct? Do you do a lot of temporary copying? If the struct is small it's probably always better to use but if it's large, creating temporary copies may slow you down. If you make it immutable you will have to always copy the whole thing to change the value.)
When in doubt, measure.
Since you are interested in possible long-term effects that may not be apparent by such a measurement, be aware that such arrays are likely stored on the large-object heap and should be re-used instead of destroyed and re-allocated. (see CRL Inside Out: Large Object Heap Uncovered.)
When passing larger-size structs in calls you might want to pass them with the ref argument to avoid copying.

Value types (struct) are good for type that are not allocated on heap often, that is, they are mostly contained in another reference or value type.
The Vector3 example you gave is a perfect example. You will rarely have dangling Vector3 in heap, they will most of the time be contained in a type that is itself in heap, or used as a local variable, in which case, it will be allocated on the stack.

Related

Where should I store different types of value?

After Python and JavaScript I started using C# and can't understand some basic concepts.
In Python and JavaScript I used to store everything in a heap without thinking about the type of object. But in C# I can't create Dictionary or List with different type of object.
I want to store some mouse and keyboard events. For that, I use instances of class, like this:
class UserActionEvent
{
public MacroEventType Type;
public int[] MouseCoordinate = new int[2];
public string MouseKey;
public string KeyBoardKey;
public int TimeSinceLastEvent;
}
And all instances is saved in Queue. But I worry whether it is normal to store several thousand objects like this? Maybe there is a more universal way to store data of different types?
Storage in C# is not much different from Python in JavaScript in that it uses a garbage collected heap (of course every runtime has its own way of implementing the GC). So for "normal" classes you can just go ahead and treat them as you would in JS.
C#, however, also has the concept of value types, which are typically allocated on the stack. The stack has a much more limited space than the heap, so this is where you need to be a bit more careful, but it is unlikely that you accidentally allocate a large amount of stack space, since collection types are all reference types (with the exception of the more exotic stackalloc arrays that you should stay away from unless you are sure what you are doing). When passing value types between methods, they are copied, but it is also possible to pass them by reference (for example by casting to object). This will wrap the value type in a reference type, a process called boxing (the opposite process is called unboxing).
To create a value type, use struct instead of class. In your example above, using a value type for the mouse coordinate, e.g.
struct Point {
public int X, Y;
}
instead of an int array would likely save memory (and GC CPU time) since in your example you would have to allocate a reference object (the array) to hold only eight bytes (the two ints). But this only matters in more exotic cases, maybe in the render loop of a game engine, or if you have huge data sets. For most type of programs this is likely to be premature optimization (though one could argue creating the struct would make the code more readable, which would likely then be the main benefit).
Some useful reads:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/value-types
https://medium.com/fhinkel/confused-about-stack-and-heap-2cf3e6adb771
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/stackalloc
If you want to store different type of objects on c# I recommend the use of ArrayList
With ArrayList you can store any type of object since it is a dynamic collection of objects.
ArrayList myAL = new ArrayList();
myAL.Add("Hello");
myAL.Add("World");
myAL.Add("!");
You will need a
using System.Collections;
To be abel to use this collection

Generic base class implementation for both struct and object [duplicate]

I know that structs in .NET do not support inheritance, but its not exactly clear why they are limited in this way.
What technical reason prevents structs from inheriting from other structs?
The reason value types can't support inheritance is because of arrays.
The problem is that, for performance and GC reasons, arrays of value types are stored "inline". For example, given new FooType[10] {...}, if FooType is a reference type, 11 objects will be created on the managed heap (one for the array, and 10 for each type instance). If FooType is instead a value type, only one instance will be created on the managed heap -- for the array itself (as each array value will be stored "inline" with the array).
Now, suppose we had inheritance with value types. When combined with the above "inline storage" behavior of arrays, Bad Things happen, as can be seen in C++.
Consider this pseudo-C# code:
struct Base
{
public int A;
}
struct Derived : Base
{
public int B;
}
void Square(Base[] values)
{
for (int i = 0; i < values.Length; ++i)
values [i].A *= 2;
}
Derived[] v = new Derived[2];
Square (v);
By normal conversion rules, a Derived[] is convertible to a Base[] (for better or worse), so if you s/struct/class/g for the above example, it'll compile and run as expected, with no problems. But if Base and Derived are value types, and arrays store values inline, then we have a problem.
We have a problem because Square() doesn't know anything about Derived, it'll use only pointer arithmetic to access each element of the array, incrementing by a constant amount (sizeof(A)). The assembly would be vaguely like:
for (int i = 0; i < values.Length; ++i)
{
A* value = (A*) (((char*) values) + i * sizeof(A));
value->A *= 2;
}
(Yes, that's abominable assembly, but the point is that we'll increment through the array at known compile-time constants, without any knowledge that a derived type is being used.)
So, if this actually happened, we'd have memory corruption issues. Specifically, within Square(), values[1].A*=2 would actually be modifying values[0].B!
Try to debug THAT!
Imagine structs supported inheritance. Then declaring:
BaseStruct a;
InheritedStruct b; //inherits from BaseStruct, added fields, etc.
a = b; //?? expand size during assignment?
would mean struct variables don't have fixed size, and that is why we have reference types.
Even better, consider this:
BaseStruct[] baseArray = new BaseStruct[1000];
baseArray[500] = new InheritedStruct(); //?? morph/resize the array?
Structs do not use references (unless they are boxed, but you should try to avoid that) thus polymorphism isn't meaningful since there is no indirection via a reference pointer. Objects normally live on the heap and are referenced via reference pointers, but structs are allocated on the stack (unless they are boxed) or are allocated "inside" the memory occupied by a reference type on the heap.
Class like inheritance is not possible, as a struct is laid directly on the stack. An inheriting struct would be bigger then it parent, but the JIT doesn't know so, and tries to put too much on too less space. Sounds a little unclear, let's write a example:
struct A {
int property;
} // sizeof A == sizeof int
struct B : A {
int childproperty;
} // sizeof B == sizeof int * 2
If this would be possible, it would crash on the following snippet:
void DoSomething(A arg){};
...
B b;
DoSomething(b);
Space is allocated for the sizeof A, not for the sizeof B.
Here's what the docs say:
Structs are particularly useful for small data structures that have value semantics. Complex numbers, points in a coordinate system, or key-value pairs in a dictionary are all good examples of structs. Key to these data structures is that they have few data members, that they do not require use of inheritance or referential identity, and that they can be conveniently implemented using value semantics where assignment copies the value instead of the reference.
Basically, they're supposed to hold simple data and therefore do not have "extra features" such as inheritance. It would probably be technically possible for them to support some limited kind of inheritance (not polymorphism, due to them being on the stack), but I believe it is also a design choice to not support inheritance (as many other things in the .NET languages are.)
On the other hand, I agree with the benefits of inheritance, and I think we all have hit the point where we want our struct to inherit from another, and realize that it's not possible. But at that point, the data structure is probably so advanced that it should be a class anyway.
Structs are allocated on the stack. This means the value semantics are pretty much free, and accessing struct members is very cheap. This doesn't prevent polymorphism.
You could have each struct start with a pointer to its virtual function table. This would be a performance issue (every struct would be at least the size of a pointer), but it's doable. This would allow virtual functions.
What about adding fields?
Well, when you allocate a struct on the stack, you allocate a certain amount of space. The required space is determined at compile time (whether ahead of time or when JITting). If you add fields and then assign to a base type:
struct A
{
public int Integer1;
}
struct B : A
{
public int Integer2;
}
A a = new B();
This will overwrite some unknown part of the stack.
The alternative is for the runtime to prevent this by only writing sizeof(A) bytes to any A variable.
What happens if B overrides a method in A and references its Integer2 field? Either the runtime throws a MemberAccessException, or the method instead accesses some random data on the stack. Neither of these is permissible.
It's perfectly safe to have struct inheritance, so long as you don't use structs polymorphically, or so long as you don't add fields when inheriting. But these aren't terribly useful.
There is a point I would like to correct. Even though the reason structs cannot be inherited is because they live on the stack is the right one, it is at the same a half correct explanation. Structs, like any other value type can live in the stack. Because it will depend on where the variable is declared they will either live in the stack or in the heap. This will be when they are local variables or instance fields respectively.
In saying that, Cecil Has a Name nailed it correctly.
I would like to emphasize this, value types can live on the stack. This doesn't mean they always do so. Local variables, including method parameters, will. All others will not. Nevertheless, it still remains the reason they can't be inherited. :-)
This seems like a very frequent question. I feel like adding that value types are stored "in place" where you declare the variable; apart from implementation details, this means that there is no object header that says something about the object, only the variable knows what kind of data resides there.
Structs do support interfaces, so you can do some polymorphic things that way.
IL is a stack-based language, so calling a method with an argument goes something like this:
Push the argument onto the stack
Call the method.
When the method runs, it pops some bytes off the stack to get its argument. It knows exactly how many bytes to pop off because the argument is either a reference type pointer (always 4 bytes on 32-bit) or it is a value type for which the size is always known exactly.
If it is a reference type pointer then the method looks up the object in the heap and gets its type handle, which points to a method table which handles that particular method for that exact type. If it is a value type, then no lookup to a method table is necessary because value types do not support inheritance, so there is only one possible method/type combination.
If value types supported inheritance then there would be extra overhead in that the particular type of the struct would have to placed on the stack as well as its value, which would mean some sort of method table lookup for the particular concrete instance of the type. This would eliminate the speed and efficiency advantages of value types.

C#: Should I use out or ref to get this struct back?

I'm not sure of the best way to ask this question, so I'll start with an example:
public static void ConvertPoint(ref Point point, View fromView, View toView) {
//Convert Point
}
This call is recursive. You pass in a point, it converts it relative to fromView to be relative to toView (as long as one is an ancestor of the other).
The call is recursive, converting the point one level at a time. I know, mutable structs are bad, but the reason I'm using a mutable point is so that I only need to create a single point and pass it along the recursive call, which is why I'm using ref. Is this the right way to go, or would it be better to use an out parameter, or simply declare the method to return a point instead?
I'm not very familiar with how structs are handled as opposed to classes in these circumstances. This is code being ported from Java, where point obviously had to be a class, so it made sense to use the same temporary point over and over rather than create a new point which had to be garbage collected at every call.
This may be a confusing question, and to heap on some more confusion, while I'm at it, should I keep a temporary static Point instance around for quick conversions or would it be just as simple to create a new point whenever this method is called (it's called a lot)?
Fretting over the garbage collector is never not a mistake when dealing with short-lived objects such as the Point, assuming it is actually a class. Given that this is C#, it is likely to be a struct, not bigger than 16 bytes. In which case you should always write the method to return a Point. This gets optimized at runtime, the struct fits in cpu registers.
Only ever consider passing structs by ref when they are large.
I tend to make everything a class to simplify things. If you find that this creates unwanted memory pressure in your application only then should you investigate a solution that involves a mutable struct.
The reason I say this is that the GC tends to be very good at optimizing collections around the needs of your app and mutable structs are a notorious headache. Don't introduce that headache if the GC can handle the memory pressure of multiple objects.
Structs should be immutable, so I would vote for returning a different Point from the method. If you need mutability, then you should make it a class and create a method on the class to convert it. A by-product of the latter could be changing the recursive descent into an iterative application of the transformations.
Depends on what you're doing, and why you're doing it.
Structs in C# are not like classes in Java at all. They're just like primitive types in terms of speed and semantics. (In fact, the primitive types are structs in a sense.) There's really no cost to passing them around, so if this is optimizing for something that's not a problem yet, I suggest you just make the method return a Point.
However, it's possible that passing around a copy of the struct will be slower in some cases, if the struct is large or if this method is the bottleneck, possibly because you're calling it many times per second (which I doubt is the case here); in those cases, it makes sense to pass them by reference, but first you'd need to figure out if it's actually the bottleneck in the first place.
Notice, for example, that in Microsoft's XNA library, the Matrix struct has two versions of the same method:
static void CreateLookAt(
ref Vector3 cameraPosition, ref Vector3 cameraTarget, ref Vector3 cameraUpVector,
out Matrix result)
static Matrix CreateLookAt(
Vector3 cameraPosition, Vector3 cameraTarget, Vector3 cameraUpVector)
This is purely for a performance reason, but only because this method is being called many times per second, and Vector3 is a bit larger than a normal struct.
So why exactly were you doing this? The answer depends on the reason why you wrote it like this in the first place, but for typical code, it shouldn't matter.
As the point is a struct, I see no advantage in using ref here. It will be copied multiple times on the stack anyway, so why not just return it? I think it will make the code more readable if you declare this method with Point as the return type.

Structs - real life examples? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
There are any number of questions here on SO dealing with the differences between Structs and Classes in C#, and when to use one or the other. (The one sentence answer: use structs if you need value semantics.) There are plenty of guidelines out there about how to choose one or the other, most of which boil down to: use a class unless you meet these specific requirements, then use a struct.
This all makes sense to me.
However, I can't seem to find any real-life examples of people using structs in a system. I'm (semi-)new to C#, and I'm having trouble imagining a concrete situation where structs are really the right choice (at least, I haven't run into one yet.)
So, I turn to the SO world-brain. What are some cases where you actually used a struct in a system where a class wouldn't have worked?
Well a class would still work for it, but an example I could think of is something like a Point. Assuming it is an x and y value, you could use a struct.
struct Point {
int x;
int y;
}
In my mind, I would rather have a more simple representation of a pair of integers than to define a use a class with instantiations when the actual entity does not really have much(or any) behavior.
I used a struct to represent a Geolocation
struct LatLng
{
public decimal Lattitude
{
get;
set;
}
public decimal Longitude
{
get;
set;
}
}
this represents a single entity, for instance I can add 2 LatLng's together or perform other operations on this single entity.
MSDN-struct
The struct type is suitable for
representing lightweight objects such
as Point, Rectangle, and Color.
Although it is possible to represent a
point as a class, a struct is more
efficient in some scenarios. For
example, if you declare an array of
1000 Point objects, you will allocate
additional memory for referencing each
object. In this case, the struct is
less expensive.
Also if you look at primitive types Int32,decimal,double..etc you will notice they are all structs, which allows them to be value types whilst allowing them to implement certain crucial interfaces.
Structs are also typically used in graphics/rendering systems. There are many benefits to making points/vectors structs.
Rico Mariani posted an excellent quiz on value based programming. He discussed many reasons to prefer structs in specific situations, and explained it in detail in his quiz results post.
A Money struct is probably one of the most common, however Phone number or Address are also common.
public struct Money
{
public string Currency { get; set; }
public double Amount { get; set; }
}
public struct PhoneNumber
{
public int Extension { get; set; }
public int RegionCode { get; set; }
//... etc.
}
public struct FullName
{
public string FirstName { get; set; }
public string MiddleName { get; set; }
public string LastName { get; set; }
}
Keep in mind though that in .NET your structs should not be larger in memory footprint than 16 Bytes, because if they get bigger the CLR has to allocate additional memory.
Also because structs 'live' on the stack (and not the heap as reference types do) you might consider using structs if you need to instantiate a lot of the same types of objects.
The quintessential example is the frameworks nullable types, such as int?. These use structs so they retain the value semantics of an int, yet providing a way to make them null without boxing and turning them into reference types.
You would use a struct when you don't want to pass something by reference. Suppose you have a collection of data, or an object that you wish to pass by value (ie, anything you pass it to is working with its own unique copy, not a reference to the original version) then a struct is the right type to use.
They provide a default implementation for Object.GetHashCode(), so you might want to use a struct instead of a class when the object is a simple collection of non-reference types that you want to use as keys to a dictionary.
They are also useful for PInvoke/interop or low-level networking scenarios where you want precise control over the binary layout of a data structure. (go to www.pinvoke.net for lots of interop code that requires structs)
But really, I never use them myself. Don't sweat not using them.
Basically I try to NOT use them. I find they confuse other developers on the team and thus are not worth the effort. I have only found one case to use it, a custom Enum-like type we use a code generator to produce from XML.
The key for me is to define if I want to keep reference to the same object.
Which makes sence when struct is part of another entity, but does entity itself.
In the example above with LatLong that makes perfect sence, for example. You need to copy values from one object to another, not keep referensing the same object.
I often use structs to represent a domain model value type that might be represented as an enum, but needs an arbitrary unlimited number of discrete values, or I want it to have additional behavior (methods) that you cannot add to an enum... For example, in a recent project many data elements were associated with a specific calendar Month rather than with a date. So I created a CalendarMonth struct that had methods:
static CalendarMonth Parse(DateTime inValue);
static CalendarMonth Parse(string inValue);
and TryParse( ) method,
static bool TryParse(string inValue, out CalendarMonth outVal);
And Properties
int Month { get; set; }
int Year { get; set; }
DateTime StartMonthLocal { get; set; }
DateTime StartMonthUTC{ get; set; }
DateTime EndMonthLocal { get; set; }
DateTime EndMonthUTC { get; set; }
etc.
im not usually concerned with 'data-density' in my business apps. I will typically always use a class unless I specifically want value semantics
this means that i am forseeing a situation where i want to compare two of these things and i want them to show up as the same if they have the same value. With classes this is actually more work because i need to override ==, !=, Equals, and GetHashcode, which even if resharper does it for me, is extra needless code.
So in my mind, always use classes unless you know that you want these things to be compared by value(in this case component value)
So I take it you've never used DateTime (a struct).
I can't believe no one has mentioned XNA: in XNA, almost everything is a struct. So when you do
Matrix rotation = Matrix.CreateRotationZ(Math.PiOver2);
You are really creating a value-type.
This is because, unlike in applications programming, a stall of a few milliseconds while the garbage collector runs is not acceptable (we only get 16.6 ms to render the entire frame!), so we have to avoid allocations as much as possible so the GC doesn't have to run as much.
This is especially true on the XBox 360, where the GC is nowhere near the quality it is on the PC - even an average of one allocation per frame can kill performance!
I have been working in financial institutions where large scale caching and latency requirements was achieved using structs. Basically structs can spare the garbage collector of A LOT of work.
See these examples:
http://00sharp.wordpress.com/2013/07/03/a-case-for-the-struct/
http://00sharp.wordpress.com/2013/07/04/a-case-for-the-structpart-2/
Basically, I use Structs for modeling geometric and mathematical data, or when I want a Value-based data-structure.
The only time I've ever used a struct was when I was building a Fraction struct:
public struct Fraction
{
public int Numerator {get;set;}
public int Denominator {get; set;}
//it then had a bunch of Fraction methods like Reduce, Add, Subtract etc...
}
I felt that it represents a value, just like the built in value types, and therefore coding against it would feel more natural if it behaved like a value type.
I think the .Net Framework is quite real life. See the list under "Structures":
System Namespace
In some performance-critical situations, a struct (a value type and thus allocated from the stack) can be better than a class (a reference type and thus allocated from the heap). Joe Duffy's blog post "A single-word reader/writer spin lock" shows a real-life application of this.
One I've created in the past is StorageCapacity. It represented 0 bytes to N exabytes (could have gone higher to the yottabyte, but exa seemed enough at the time). The struct made sense since I worked for a storage management company. You would think it was fairly simple: a struct with a StorageUnit (enum) and a Quantity (I used decimal). But when you add in conversions, operators, and classes to support formatting, parsing, etc. it adds up.
The abstraction was useful to enable you to take any StorageCapacity and represent it as bytes, kilobytes, etc. without having to multiply or divide by 1024 many times.
I have given my reasons for using structs already elsewhere (When to use struct in C#), and I have used structs for these reasons in real-life projects:
I would choose to use structs for performance reasons if I needed to store a large number of the same item type in an array, which may happen in image processing.
One needs to use structs for passing structured data between C# and C++.
Unless I have a very good reason to use them I try to avoid them.
I know that some people like to use them for implementing value semantics but I find that this behavior is so different from the "normal" assignment behavior of classes (in C#) that one finds oneself running into difficult to trace bugs because one did not remember that the object one was assigning from or to had this behavior because it was implemented as a struct instead of a class. (It has happened to me more than once, so I give this warning since I actually have been burned by the injudicuous use of C# structs.)
I'm not sure how much use this is, but I discovered today that whilst you cannot have instance field intializers in structs, you can in classes.
Hence the following code will give compilation errors, but if you change the "struct" to "class" it compiles.
public struct ServiceType
{
public bool backEnd { get; set; }
public bool frontEnd { get; set; }
public string[] backEndServices = { "Service1", "Service2" };
public string[] frontEndServices = { "Service3", "Service4" };
}
A struct in C# is at its heart nothing more nor less than a bunch of variables stuck together with duct tape. If one wants each variable of a particular type to represent a bunch of independent but related variables (such as the coordinates of a point) stuck together with duct tape, it's often better to use an exposed-field struct than a class, regardless of whether "bunch" means two or twenty. Note that although Microsoft's struct-versus-class advice is fine for data types which encapsulate a single value, it should be considered inapplicable for types whose purpose is to encapsulate independent but related values. The greater the extent to which the variables are independent, the greater the advantages of using an exposed-field struct.
If one wishes to use a class to encapsulate a bunch of independent variables, there are two ways one can do it, neither of which is terribly convenient. One may use an immutable class, in which case any non-null storage location of that class type will encapsulate the values held by the instance identified thereby, and one storage location may be copied to another to make the new one encapsulate those same values. Unfortunately, changing one of the values encapsulated by a storage location will generally require constructing a new instance which is just like the old one except with that value changed. For example, if one has a variable pt of type Immutable3dPoint and one wished to increase pt.X by one, one would have to do something like: pt = new Immutable3dPoint(pt.X+1, pt.Y, pt.Z); Perhaps tolerable if the type only encapsulates three values, but pretty annoying if there very many.
The other class-based approach is to use a mutable class; this generally requires that one ensure that every storage location of the class type holds the only reference anywhere in the universe to an instance of that class. When a storage location is created, one must construct a new instance and store a reference there. If one wishes to copy the values from storage location P to storage location Q, to another, one must copy all the fields or properties from one instance to the other (perhaps by having the type implement a CopyFrom method, and saying Q.CopyFrom(P);. Note that if one instead says Q=P; that may seem to work, but future attempts to modify P will also modify Q and vice versa. Mutable classes may work, and they can at times be efficient, but it's very easy to mess things up.
Exposed-field structures combine the convenient value-copy semantics of immutable classes with the convenient piecewise modifications allowed by mutable classes. Large structures are slower to copy than are references to immutable objects, but the cost of modifying part of an exposed-field structure depends only upon the extent of the modification, rather than upon the overall structure size. By contrast, the cost of changing one piece of data encapsulated in an immutable class type will be proportional to the total class size.

Choosing between immutable objects and structs for value objects

How do you choose between implementing a value object (the canonical example being an address) as an immutable object or a struct?
Are there performance, semantic or any other benefits of choosing one over the other?
There are a few things to consider:
A struct is allocated on the stack (usually). It is a value type, so passing the data around across methods can be costly if it is too large.
A class is allocated on the heap. It is a reference type, so passing the object around through methods is not as costly.
Generally, I use structs for immutable objects that are not very large. I only use them when there is a limited amount of data being held in them or I want immutability. An example is the DateTime struct. I like to think that if my object is not as lightweight as something like a DateTime, it is probably not worth being used as a struct. Also, if my object makes no sense being passed around as a value type (also like DateTime), then it may not be useful to use as a struct. Immutability is key here though. Also, I want to stress that structs are not immutable by default. You have to make them immutable by design.
In 99% of situations I encounter, a class is the proper thing to use. I find myself not needing immutable classes very often. It's more natural for me to think of classes as mutable in most cases.
How do you choose between implementing a value object (the canonical example being an address) as an immutable object or a struct?
I think your options are wrong. Immutable object and struct are not opposites, nor are they the only options. Rather, you've got four options:
Class
mutable
immutable
Struct
mutable
immutable
I argue that in .NET, the default choice should be a mutable class to represent logic and an immutable class to represent an entity. I actually tend to choose immutable classes even for logic implementations, if at all feasible. Structs should be reserved for small types that emulate value semantics, e.g. a custom Date type, a Complex number type similar entities. The emphasis here is on small since you don't want to copy large blobs of data, and indirection through references is actually cheap (so we don't gain much by using structs). I tend to make structs always immutable (I can't think of a single exception at the moment). Since this best fits the semantics of the intrinsic value types I find it a good rule to follow.
I like to use a thought experiment:
Does this object make sense when only an empty constructor is called?
Edit at Richard E's request
A good use of struct is to wrap primitives and scope them to valid ranges.
For example, probability has a valid range of 0-1. Using a decimal to represent this everywhere is prone to error and requires validation at every point of usage.
Instead, you can wrap a primitive with validation and other useful operations. This passes the thought experiment because most primitives have a natural 0 state.
Here is an example usage of struct to represent probability:
public struct Probability : IEquatable<Probability>, IComparable<Probability>
{
public static bool operator ==(Probability x, Probability y)
{
return x.Equals(y);
}
public static bool operator !=(Probability x, Probability y)
{
return !(x == y);
}
public static bool operator >(Probability x, Probability y)
{
return x.CompareTo(y) > 0;
}
public static bool operator <(Probability x, Probability y)
{
return x.CompareTo(y) < 0;
}
public static Probability operator +(Probability x, Probability y)
{
return new Probability(x._value + y._value);
}
public static Probability operator -(Probability x, Probability y)
{
return new Probability(x._value - y._value);
}
private decimal _value;
public Probability(decimal value) : this()
{
if(value < 0 || value > 1)
{
throw new ArgumentOutOfRangeException("value");
}
_value = value;
}
public override bool Equals(object obj)
{
return obj is Probability && Equals((Probability) obj);
}
public override int GetHashCode()
{
return _value.GetHashCode();
}
public override string ToString()
{
return (_value * 100).ToString() + "%";
}
public bool Equals(Probability other)
{
return other._value.Equals(_value);
}
public int CompareTo(Probability other)
{
return _value.CompareTo(other._value);
}
public decimal ToDouble()
{
return _value;
}
public decimal WeightOutcome(double outcome)
{
return _value * outcome;
}
}
Factors: construction, memory requirements, boxing.
Normally, the constructor restrictions for structs - no explicit parameterless constructors, no base construction - decides if a struct should be used at all. E.g. if the parameterless constructor should not initialize members to default values, use an immutable object.
If you still have the choice between the two, decide on memory requirements. Small items should be stored in structs especially if you expect many instances.
That benefit is lost when the instances get boxed (e.g. captured for an anonymous function or stored in a non-generic container) - you even start to pay extra for the boxing.
What is "small", what is "many"?
The overhead for an object is (IIRC) 8 bytes on a 32 bit system. Note that with a few hundred of instances, this may already decide whether or not an inner loop runs fully in cache, or invokes GC's. If you expect tens of thousands of instances, this may be the difference between run vs. crawl.
From that POV, using structs is NOT a premature optimization.
So, as rules of thumb:
If most instances would get boxed, use immutable objects.
Otherwise, for small objects, use an immutable object only if struct construction would lead to an awkward interface and you expect not more than thousands of instances.
I actually don't recommend using .NET structs for Value Object implementation. There're two reasons:
Structs don't support inheritance
ORMs don't handle mapping to structs well
Here I describe this topic in detail: Value Objects explained
In today's world (I'm thinking C# 3.5) I do not see a need for structs (EDIT: Apart from in some niche scenarios).
The pro-struct arguments appear to be mostly based around perceived performance benefits. I would like to see some benchmarks (that replicate a real-world scenario) that illustrate this.
The notion of using a struct for "lightweight" data structures seems way too subjective for my liking. When does data cease to be lightweight? Also, when adding functionality to code that uses a struct, when would you decide to change that type to a class?
Personally, I cannot recall the last time I used a struct in C#.
Edit
I suggest that the use of a struct in C# for performance reasons is a clear case of Premature Optimization*
* unless the application has been performance profiled and the use of a class has been identified as a performance bottleneck
Edit 2
MSDN States:
The struct type is suitable for
representing lightweight objects such
as Point, Rectangle, and Color.
Although it is possible to represent a
point as a class, a struct is more
efficient in some scenarios. For
example, if you declare an array of
1000 Point objects, you will allocate
additional memory for referencing each
object. In this case, the struct is
less expensive.
Unless you need reference type
semantics, a class that is smaller
than 16 bytes may be more efficiently
handled by the system as a struct.
In general, I would not recommend structs for business objects. While you MIGHT gain a small amount of performance by heading this direction, as you are running on the stack, you end up limiting yourself in some ways and the default constructor can be a problem in some instances.
I would state this is even more imperative when you have software that is released to the public.
Structs are fine for simple types, which is why you see Microsoft using structs for most of the data types. In like manner, structs are fine for objects that make sense on the stack. The Point struct, mentioned in one of the answers, is a fine example.
How do I decide? I generally default to object and if it seems to be something that would benefit from being a struct, which as a rule would be a rather simple object that only contains simple types implemented as structs, then I will think it through and determine if it makes sense.
You mention an address as your example. Let's examine one, as a class.
public class Address
{
public string AddressLine1 { get; set; }
public string AddressLine2 { get; set; }
public string City { get; set; }
public string State { get; set; }
public string PostalCode { get; set; }
}
Think through this object as a struct. In the consideration, consider the types included inside this address "struct", if you coded it that way. Do you see anything that might not work out the way you want? Consider the potential performance benefit (ie, is there one)?
What is the cost of copying instances if passed by value.
If high, then immutable reference (class) type, otherwise value (struct) type.
As a rule of thumb a struct size should not exceed 16 bytes, otherwise passing it between methods may become more expensive that passing object references, which are just 4 bytes (on a 32-bit machine) long.
Another concern is a default constructor. A struct always has a default (parameterless and public) constructor, otherwise the statements like
T[] array = new T[10]; // array with 10 values
would not work.
Additionally it's courteous for structs to overwrite the == and the != operators and to implement the IEquatable<T> interface.
From an object modeling perspective, I appreciate structs because they let me use the compiler to declare certain parameters and fields as non-nullable. Of course, without special constructor semantics (like in Spec#), this is only applicable to types that have a natural 'zero' value. (Hence Bryan Watt's 'though experiment' answer.)
Structs are strictly for advances users ( along with out and ref) .
Yes structs can give great performance when using ref but you have to see what memory they are using. Who controls the memory etc.
If your not using ref and outs with structs they are not worth it , if you are expect some nasty bugs :-)

Categories