Why is this implemented as a struct? - c#

In System.Data.Linq, EntitySet<T> uses a couple of ItemList<T> structs which look like this:
internal struct ItemList<T> where T : class
{
private T[] items;
private int count;
...(methods)...
}
(Took me longer than it should to discover this - couldn't understand why the entities field in EntitySet<T> was not throwing null reference exceptions!)
My question is what are the benefits of implementing this as a struct over a class?

Lets assume that you want to store ItemList<T> in an array.
Allocating an array of value types (struct) will store the data inside the array. If on the other hand ItemList<T> was a reference type (class) only references to ItemList<T> objects would be stored inside the array. The actualy ItemList<T> objects would be allocated on the heap. An extra level of indirection is required to reach an ItemList<T> instance and as it simply is a an array combined with a length it is more efficient to use a value type.
After the inspecting the code for EntitySet<T> I can see that no array is involved. However, an EntitySet<T> still contains two ItemList<T> instances. As ItemList<T> is a struct the storage for these instances are allocated inside the EntitySet<T> object. If a class was used instead the EntitySet<T> would have contained references pointing to EntitySet<T> objects allocated separately.
The performance difference between using one or the other may not be noticable in most cases but perhaps the developer decided that he wanted to treat the array and the tightly coupled count as a single value simply because it seemed like the best thing to do.

For small critical internal data structures like ItemList<T>, we often have the choice of using either a reference type or a value type. If the code is written well, switching from one to the other is of a trivial change.
We can speculate that a value type avoids heap allocation and a reference type avoids struct copying so it's not immediately clear either way because it depends so much on how it is used.
The best way to find out which one is better is to measure it. Whichever is faster is the clear winner. I'm sure they did their benchmarking and struct was faster. After you've done this a few times your intuition is pretty good and the benchmark just confirms that your choice was correct.

Maybe its important that...quote about struct from here
The new variable and the original
variable therefore contain two
separate copies of the same data.
Changes made to one copy do not affect the other copy.
Just thinking, dont judge me hard :)

There are really only two reasons to ever use a struct, and that is either to get value type semantics, or for better performance.
As the struct contains an array, value type semantics doesn't work well. When you copy the struct you get a copy of the count, but you only get a copy of the reference to the array, not a copy of the items in the array. Therefore you would have to use special care whenever the struct is copied so that you don't get inconsistent instances of it.
So, the only remaining valid reason would be performance. There is a small overhead for each reference type instance, so if you have a lot of them there may be a noticable performance gain.
One nifty feature of such a structure is that you can create an array of them, and you get an array of empty lists without having to initialise each list:
ItemList<string>[] = new ItemList<string>[42];
As the items in the array are zero-filled, the count member will be zero and the items member will be null.

Purely speculating here:
Since the object is fairly small (only has two member variables), it is a good candidate for making it a struct to allow it to be passed as a ValueType.
Also, as #Martin Liversage points out, by being a ValueType it can be stored more efficiently in larger data structures (e.g. as an item in an array), without the overhead of having an individual object and a reference to it.

Related

Where should I store different types of value?

After Python and JavaScript I started using C# and can't understand some basic concepts.
In Python and JavaScript I used to store everything in a heap without thinking about the type of object. But in C# I can't create Dictionary or List with different type of object.
I want to store some mouse and keyboard events. For that, I use instances of class, like this:
class UserActionEvent
{
public MacroEventType Type;
public int[] MouseCoordinate = new int[2];
public string MouseKey;
public string KeyBoardKey;
public int TimeSinceLastEvent;
}
And all instances is saved in Queue. But I worry whether it is normal to store several thousand objects like this? Maybe there is a more universal way to store data of different types?
Storage in C# is not much different from Python in JavaScript in that it uses a garbage collected heap (of course every runtime has its own way of implementing the GC). So for "normal" classes you can just go ahead and treat them as you would in JS.
C#, however, also has the concept of value types, which are typically allocated on the stack. The stack has a much more limited space than the heap, so this is where you need to be a bit more careful, but it is unlikely that you accidentally allocate a large amount of stack space, since collection types are all reference types (with the exception of the more exotic stackalloc arrays that you should stay away from unless you are sure what you are doing). When passing value types between methods, they are copied, but it is also possible to pass them by reference (for example by casting to object). This will wrap the value type in a reference type, a process called boxing (the opposite process is called unboxing).
To create a value type, use struct instead of class. In your example above, using a value type for the mouse coordinate, e.g.
struct Point {
public int X, Y;
}
instead of an int array would likely save memory (and GC CPU time) since in your example you would have to allocate a reference object (the array) to hold only eight bytes (the two ints). But this only matters in more exotic cases, maybe in the render loop of a game engine, or if you have huge data sets. For most type of programs this is likely to be premature optimization (though one could argue creating the struct would make the code more readable, which would likely then be the main benefit).
Some useful reads:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/value-types
https://medium.com/fhinkel/confused-about-stack-and-heap-2cf3e6adb771
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/stackalloc
If you want to store different type of objects on c# I recommend the use of ArrayList
With ArrayList you can store any type of object since it is a dynamic collection of objects.
ArrayList myAL = new ArrayList();
myAL.Add("Hello");
myAL.Add("World");
myAL.Add("!");
You will need a
using System.Collections;
To be abel to use this collection

Benefit of Value Types over Reference Types?

Seeing as new instances of value types are created every time they are passed as arguments, I started thinking about scenarios where using the ref or out keywords can show a substantial performance improvement.
After a while it hit me that while I see the deficit of using value types I didn't know of any advantages.
So my question is rather straight forward - what is the purpose of having value types? what do we gain by copying a structure instead of just creating a new reference to it?
It seems to me that it would be a lot easier to only have reference types like in Java.
Edit: Just to clear this up, I am not referring to value types smaller than 8 bytes (max size of a reference), but rather value types that are 8 bytes or more.
For example - the Rectangle struct that contains four int values.
An instance of a one-byte value type takes up one byte. A reference type takes up the space for the reference plus the sync block and the virtual function table and ...
To copy a reference, you copy a four (or eight) byte reference. To copy a four-byte integer, you copy a four byte integer. Copying small value types is no more expensive than copying references.
Value types that contain no references need not be examined by the garbage collector at all. Every reference must be tracked by the garbage collector.
Value types are usually more performant than reference types:
A reference type costs extra memory for the reference and performance when dereferencing
A value type does not need extra garbage collection. It gets garbage collected together with the instance it lives in. Local variables in methods get cleaned up upon method leave.
Value type arrays are efficient in combination with caches. (Think of an array of ints compared with an array of instances of type Integer)
"Creating a reference" is not the problem. This is just a copy of 32/64 bits. Creating the object is what is costly. Actually creating the object is cheap but collecting it isn't.
Value types are good for performance when they are small and discarded often. They can be used in huge arrays very efficiently. A struct has no object header. There are a lot of other performance differences.
Edit: Eric Lippert posed a great example in the comments: "How many bytes does an array of one million bytes take up if they are value types? How many does it take up if they are reference types?"
I will answer: If struct packing is set to 1 such an array will take 1 million and 16 bytes (on 32 bit system). Using reference types it will take:
array, object header: 12
array, length: 4
array, data: 4*(1 million) = 4m
1 million objects, headers = 12 * (1 million)
1 million objects, data padded to 4 bytes: 4 * (1 million)
And that is why using value types in large arrays can be a good idea.
The gain is visible if your data is small (<16 bytes), you have lots of instances and/or you manipulate them a lot, especially passing to functions. This is because creating an object is relatively expensive compared to creating a small value type instance. And as someone else pointed out, objects need to be collected and that is even more expensive. Plus, very small value types take less memory than their reference type equivalents.
Example of non-primitive value type in .NET is Point structure (System.Drawing).
Every variable has a lifecycle. but not every variable need the flexibility for your variable to perform high but not managed in heap.
Value types (Struct) contain their data allocate in stack or allocated in-line in a structure. Reference types (Class) store a reference to the value's memory address, and are allocated on the heap.
what is the purpose of having value types?
Value types are quite efficient to handle simple data, (It should be use to represent immutable types to represent value)
Value type objects cannot be allocated on the garbage-collected heap, and the variable representing the object does not contain a pointer to an object; the variable contains the object itself.
what do we gain by copying a structure instead of just creating a new reference to it?
If you copy a struct, C# creates a new copy of the object and assigns the copy of the object to a separate struct instance. However, if you copy a class, C# creates a new copy of the reference to the object and assigns the copy of the reference to the separate class instance. Structs can't have destructors, but classes can have destructors.
One major advantage of value types like Rectangle is that if one has n storage locations of type Rectangle, one can be certain that one has n distinct instances of type Rectangle. If one has an array MyArray of type Rectangle, of length at least two, a statement like MyArray[0] = MyArray[1] will copy the fields of MyArray[1] into those of MyArray[0], but they will continue to refer to distinct Rectangle instances. If one then performs a statement line MyArray[0].X += 4 that will modify field X of one instance, without modifying the X value of any other array slot or Rectangle instance. Note, by the way, that creating the array instantly populates it with writable Rectangle instances.
Imagine if Rectangle were a mutable class type. Creating an array of mutable Rectangle instances would require that one first dimension the array, and then assign to each element in the array a new Rectangle instance. If one wanted to copy the value of one rectangle instance to another, one would have to say something like MyArray[0].CopyValuesFrom(MyArray[1]) [which would, of course, fail if MyArray[0] had not been populated with a reference to a new instance). If one were to accidentally say MyArray[0] = MyArray[1], then writing to MyArray[0].X would also affect MyArray[1].X. Nasty stuff.
It's important to note that there are a few places in C# and vb.net where the compiler will implicitly copy a value type and then act upon a copy as though it was the original. This is a really unfortunate language design, and has prompted some people to put forth the proposition that value types should be immutable (since most situations involving implicit copying only cause problems with mutable value types). Back when compilers were very bad at warning of cases where semantically-dubious copies would yield broken behavior, such a notion might have been reasonable. It should be considered obsolete today, though, given that any decent modern compiler will flag errors in most scenarios where implicit copying would yield broken semantics, including all scenarios where structs are only mutated via constructors, property setters, or external assignments to public mutable fields. A statement like MyArray[0].X += 5 is far more readable than MyArray[0] = new Rectangle(MyArray[0].X + 5, MyArray[0].Y, MyArray[0].Width, MyArray[0].Height).

Why are C# structs immutable?

I was just curious to know why structs, strings etc are immutable? What is the reason for making them immutable and rest of the objects as mutable. What are the things that are considered to make an object immutable?
Is there any difference on the way how memory is allocated and deallocated for mutable and immutable objects?
If this subject interests you, I have a number of articles about immutable programming at https://ericlippert.com/2011/05/26/atomicity-volatility-and-immutability-are-different-part-one/
I was just curious to know why structs, strings etc are immutable?
Structs and classes are not immutable by default, though it is a best practice to make structs immutable. I like immutable classes too.
Strings are immutable.
What is the reason for making them immutable and rest of the objects as mutable.
Reasons to make all types immutable:
It is easier to reason about objects that do not change. If I have a queue with three items in it, I know it is not empty now, it was not empty five minutes ago, it will not be empty in the future. It's immutable! Once I know a fact about it, I can use that fact forever. Facts about immutable objects do not go stale.
A special case of the first point: immutable objects are much easier to make threadsafe. Most thread safety problems are due to writes on one thread and reads on another; immutable objects don't have writes.
Immutable objects can be taken apart and re-used. For example, if you have an immutable binary tree then you can use its left and right subtrees as subtrees of a different tree without worrying about it. In a mutable structure you typically end up making copies of data to re-use it because you don't want changes to one logical object affecting another. This can save lots of time and memory.
Reasons to make structs immutable
There are lots of reasons to make structs immutable. Here's just one.
Structs are copied by value, not by reference. It is easy to accidentally treat a struct as being copied by reference. For example:
void M()
{
S s = whatever;
... lots of code ...
s.Mutate();
... lots more code ...
Console.WriteLine(s.Foo);
...
}
Now you want to refactor some of that code into a helper method:
void Helper(S s)
{
... lots of code ...
s.Mutate();
... lots more code ...
}
WRONG! That should be (ref S s) -- if you don't do that then the mutation will happen on a copy of s. If you don't allow mutations in the first place then all these sorts of problems go away.
Reasons to make strings immutable
Remember my first point about facts about immutable structures staying facts?
Suppose string were mutable:
public static File OpenFile(string filename)
{
if (!HasPermission(filename)) throw new SecurityException();
return InternalOpenFile(filename);
}
What if the hostile caller mutates filename after the security check and before the file is opened? The code just opened a file that they might not have permission to!
Again, mutable data is hard to reason about. You want the fact "this caller is authorized to see the file described by this string" to be true forever, not until a mutation happens. With mutable strings, to write secure code we'd constantly have to be making copies of data that we know do not change.
What are the things that are considered to make an object immutable?
Does the type logically represent something that is an "eternal" value? The number 12 is the number 12; it doesn't change. Integers should be immutable. The point (10, 30) is the point (10, 30); it doesn't change. Points should be immutable. The string "abc" is the string "abc"; it doesn't change. Strings should be immutable. The list (10, 20, 30) doesn't change. And so on.
Sometimes the type represents things that do change. Mary Smith's last name is Smith, but tomorrow she might be Mary Jones. Or Miss Smith today might be Doctor Smith tomorrow. The alien has fifty health points now but has ten after being hit by the laser beam. Some things are best represented as mutations.
Is there any difference on the way how memory is allocated and deallocated for mutable and immutable objects?
Not as such. As I mentioned before though, one of the nice things about immutable values is that something you can re-use parts of them without making copies. So in that sense, memory allocation can be very different.
Structs are not necessarily immutable, but mutable structs are evil.
Creating mutable structs can lead to all kinds of strange behavior in your application and, therefore, they are considered a very bad idea (stemming from the fact that they look like a reference type but are actually a value type and will be copied whenever you pass them around).
Strings, on the other hand, are immutable. This makes them inherently thread-safe as well as allowing for optimizations via string interning. If you need to construct a complicated string on the fly, you can use StringBuilder.
The concepts of mutability and immutability have different meanings when applied to structs and classes. A key aspect (oftentimes, the key weakness) of mutable classes is if Foo has a field Bar of type List<Integer>, which holds a reference to a list containing (1,2,3), other code which has a reference to that same list could modify it, such that Bar holds a reference to a list containing (4,5,6), even if that other code has no access whatsoever to Bar. By contrast, if Foo had a field Biz of type System.Drawing.Point, the only way anything could modify any aspect of Biz would be to have write access to that field.
The fields (public and private) of a struct can be mutated by any code which can mutate the storage location in which the struct is stored, and cannot be mutated by any code which cannot mutate the storage location where it is stored. If all of the information encapsulated within a struct is held in its fields, such a struct can effectively combine the control of an immutable type with the convenience of a mutable type, unless the struct is coded in such a way as to remove such convenience (a habit which, unfortunately, some Microsoft programmers recommend).
The "problem" with structs is that when a method (including a property implementation) is invoked on a struct in a read-only context (or immutable location), the system copies the struct, performs the method on the temporary copy, and silently discards the result. This behavior has led programmers to put forth the unfortunate notion that the way to avoid problems with mutating methods is to have many structs disallow piecewise updates, when the problems could have been better avoided by simply replacing properties with exposed fields.
Incidentally, some people complain that when a class property returns a conveniently-mutable struct, changes to the struct don't affect the class from which it came. I would posit that's a good thing--the fact that the returned item is a struct makes the behavior clear (especially if it's an exposed-field struct). Compare a snippet using a hypothetical struct and property on Drawing.Matrix with one using an actual property on that class as implemented by Microsoft:
// Hypothetical struct
public struct {
public float xx,xy,yx,yy,dx,dy;
} Transform2d;
// Hypothetical property of "System.Drawing.Drawing2d.Matrix"
public Transform2d Transform {get;}
// Actual property of "System.Drawing.Drawing2d.Matrix"
public float[] Elements { get; }
// Code using hypothetical struct
Transform2d myTransform = myMatrix.Transform;
myTransform.dx += 20;
... other code using myTransform
// Code using actual Microsoft property
float[] myArray = myMatrix.Elements;
myArray[4] += 20;
... other code using myArray
Looking at the actual Microsoft property, is there any way to tell whether the write to myArray[4] will affect myMatrix? Even looking at the page http://msdn.microsoft.com/en-us/library/system.drawing.drawing2d.matrix.elements.aspx is there any way to tell? If the property had been written using the struct-based equivalent, there would be no confusion; the property that returns the struct would return nothing more nor less than the present value of six numbers. Changing myTransform.dx would be nothing more nor less than a write to a floating-point variable which was unattached to anything else. Anyone who doesn't like the fact that changing myTransform.dx doesn't affect myMatrix should be equally annoyed that writing myArray[4] doesn't affect myMatrix either, except that the independence of myMatrix and myTransform is apparent, while the independence of myMatrix and myArray is not.
A struct type is not immutable. Yes, strings are. Making your own type immutable is easy, simply don't provide a default constructor, make all fields private and define no methods or properties that change a field value. Have a method that should mutate the object return a new object instead. There is a memory management angle, you tend to create a lot of copies and garbage.
Structs can be mutable, but it's a bad idea because they have copy-semantics. If you make a change to a struct, you might actually be modifying a copy. Keeping track of exactly what has been changed is very tricky.
Mutable structs breed mistakes.

Are stack based arrays possible in C#?

Let's say, hypothetically (read: I don't think I actually need this, but I am curious as the idea popped into my head), one wanted an array of memory set aside locally on the stack, not on the heap. For instance, something like this:
private void someFunction()
{
int[20] stackArray; //C style; I know the size and it's set in stone
}
I'm guessing the answer is no. All I've been able to find is heap based arrays. If someone were to need this, would there be any workarounds? Is there any way to set aside a certain amount of sequential memory in a "value type" way? Or are structs with named parameters the only way (like the way the Matrix struct in XNA has 16 named parameters (M11-M44))?
What you want is stackalloc; unfortunately, you can only use this in unsafe code, which means it won't run in a limited permissions context.
You could also create a struct with the necessary number of variables in it for each element type, but you would need a new type for each size of 'array' you wanted to use
The closest thing I can think of to a stack-based array would be a manually-nested structure; for an array of size N^M, the code size would be O(MN) and the access time O(M); one could scale M and N as convenient (e.g. one could handle a 4096-element array as six-deep nested 4-element structures, four-deep nested 8-element structures or three-deep nested 16-element structures, two-deep nested 64-element structures, etc.) If one wanted to do three-deep nesting of 16-element arrays (probably the most practical trade-off) one would define a 16-element structure with fields f0 through f15, and an access method using switch/case to select an element. One could then define a 16-element structure of those, a 16-element structure of those, etc.
In general, using a standard Array is apt to be better than using value-type structures to mimic arrays, but there are times when having an array-ish thing as a value type would be advantageous. The advantages of value type arrays would tend to be limited in .net, however, by some limitations in its handling of manipulating value types by reference. While it would be nice if one could simply access element 0x123 from an array described as above by writing "MyArrayishThing[1][2][3]", that would be inefficient for reading and ineffective for writing (since the subexpression MyArrayishThing[1] would make a copy of structures holding 256 elements of the array). Instead, what's necessary is to pass MyArrayishThing[1] by reference to a routine that can access element 2 of that and pass it by reference to a routine to access element 3 of that. It's possible to do that efficiently, but the code ends up looking rather nasty.

Questions about Structs

MSDN says that a class that would be 16 bytes or less would be better handled as a struct [citation].
Why is that?
Does that mean that if a struct is over 16 bytes it's less efficient than a class or is it the same?
How do you determine if your class is under 16 bytes?
What restricts a struct from acting like a class? (besides disallowing parameterless constructors)
There are a couple different answers to this question, and it is a bit subjective, but some reasons I can think of are:
structs are value-type, classes are reference type. If you're using 16 bytes for total storage, it's probably not worth it to create memory references (4 to 8 bytes) for each one.
When you have really small objects, they can often be pushed onto the IL stack, instead of references to the objects. This can really speed up some code, as you're eliminating a memory dereference on the callee side.
There is a bit of extra "fluff" associated with classes in IL, and if your data structure is very small, none of this fluff would be used anyway, so it's just extra junk you don't need.
The most important difference between a struct and a class, though, is that structs are value type and classes are reference type.
By "efficient", they're probably talking about the amount of memory it takes to represent the class or struct.
On the 32-bit platform, allocating an object requires a minimum of 16 bytes. On a 64-bit platform, the minimum object size is 24 bytes. So, if you're looking at it purely from the amount of memory used, a struct that contains less than 16 bytes of data will be "better" than the corresponding class.
But the amount of memory used is not the whole story. Value types (structs) are fundamentally different than reference types (classes). Structs can be inconvenient to work with, and can actually cause performance problems if you're not careful.
The real answer, of course, is to use whichever works best in your situation. In most cases, you'll be much better off using classes.
Check this link, I found it on one of the answers in SO today: .NET Type Internals. You can also try searching SO and Googling for "reference types vs value types" for differences between structs and classes.
What restricts a struct from acting like a class?
There are many differences. You cannot inherit from a struct, for example.
You can't have virtual methods, so you cannot use a struct to implement an interface. Instance methods in structs can access struct's private fields, but apart from that they behave a lot like auxilirary "helper" functions (for immutable structs, they sometimes don't even need to access private data). So I find them to be not as near as "valuable" as class methods.
structs are different from classes because they are stored on the stack, and not on the heap. That means that every time you call a method with the struct as parameter, a copy is created and passed to the method. That is why large structs are extremely inefficient.
I would actively discourage to use structs nevertheless, because it could cause some subtle bugs: e.g. when you change a field of a struct, its not going to be reflected for the caller (because you only changed the copy) - which is completely different behavior to classes.
So the 16 bytes I think is a reasonable maximum size of a struct, but still in most cases it is better to have a class. If you still want to create a struct, try to make it immutable at least.
This is due to the different way that the CLR handles structs and classes. Structs are value types which means they live on the stack rather than in the managed heap. It is a good rule of thumb to keep structs small because once you start passing them as method arguments you will incur overhead as structs are copied in their entirety when passed to a method.
Since classes pass a copy of their reference to methods they incur much less overhead when used as method arguments.
The best way to determine the size of your class is to total the number of bytes required by all the members of your class plus an extra 8 bytes for CLR overhead stuff (the sync block index and the reference to the type of the object).
In memory, the struct will hold the data directly, while a class will behave more like a pointer. That alone makes an important difference, since passing the struct as a parameter to a method will pass its values (copy them on the stack), while the class will pass the reference to the values. If the struct is big, you will be copying a lot of values on each method call. When it is really small copying the values and using them directly will be probably faster than copying the pointer and having to grab them from another place.
About restrictions: you can't assign it to null (although you can use Nullable<>) and you have to initialize it right away.
Copying an instance of a struct takes less time than creating a new instance of a class and copying data from an old one, but class instances can be shared and struct instances cannot. Thus, "structvar1 = structvar2" requires copying new struct instance, whereas "classvar1 = classvar2" allows classvar1 and classvar2 refer to the same struct instance (without having to create a new one).
The code to handle the creation of new struct instances is optimized for sizes up to 16 bytes. Larger structs are handled less efficiently. Structs are a win in cases where every variable that holds a struct will hold an independent instance (i.e. there's no reason to expect that any particular two variables will hold identical instances); they are not much of a win (if they're a win at all) in cases where many variables could hold the same instance.

Categories