Optimal solution for struct with more than 16 bytes - c#

I have a type which I consider use it as struct.
It represents single value
It is immutable
But the problem is, it has 6 fields of int.
So which solution I should use for this type?
keep using struct?
change to class?
or pack 6 integers into an array of int, so it has only one field
EDIT
Size of struct with 6 integer fields is 24 bytes, which is huge to pass around.
Recommend size for struct is not more than 16 bytes

It depends how you are going to use it?
Are you going to allocate a lot of it vs. pass it around a lot?
Is it going to be consumed by 3rd party code? In this case, classes typically give you more flexibility.
Do you want struct vs. class semantics? For example, non-nullable?
Would you benefit from having a couple of pre-created instances of this class that can be re-use to represent special cases? Similar to String.Empty. In this case you would benefit from a class.
It is hard to answer just from the information you provided in your question.

Be careful of boxing. If your struct is going to be consumed by a method which expects an Object, it will be coerced into an object and you'll have a potential performance hit.
Here's a reference that explains it in more detail.
I'd make it a class (#2) and then you wouldn't have to worry about it.
Using an array of six integers (#3) would probably make your code harder to read. A class with descriptive identifiers for each int would be much better.

Without seeing your struct, it's difficult to say anything definitively. But I suspect you should leave this as a struct.

How about a WriteOnce<int[]> ?

I would suggest writing a little benchmark to measure the performance of the different options, this is the only way to know for sure. You may be surprised at the results (I often am).
(I'm assuming that your concern here is performance.)

If the data holder is going to be immutable, the struct-versus-class question will most likely depend upon the average number of references that would exist to each instance. If one has an array of TwentyFourByteStruct[1000], that array will take 24,000 bytes, regardless of whether every element holds a different value, all elements hold the same value, or somewhere in-between. If one has an array of TwentyFourByteClass[1000], that array will take 4,000 or 8,000 bytes (for 32/64-bit systems), and each distinct instance of TwentyFourByteClass which is created will take about 48 bytes. If all of the array elements happen to hold a reference to the same TwentyFourByteClass object, the total will be 4,048 or 8,048 bytes. If all of the array elements hold references to different TwentyFourByteClass objects, the total will be 52,000 or 56,000 bytes.
As for run-time performance, the best performance you can get will generally be passing structures by reference. Passing structures by value will require copying them, which can get expensive for structures larger than 16 bytes (.net includes optimizations for structures 16 bytes or smaller), but the cost of a value type by reference is the same whether it is 1 byte or 16,000 bytes.

In general, when storing more than two pieces of related data I like to make a class that binds them together. Especially if I will be passing them around as a unit.

Related

Why is this implemented as a struct?

In System.Data.Linq, EntitySet<T> uses a couple of ItemList<T> structs which look like this:
internal struct ItemList<T> where T : class
{
private T[] items;
private int count;
...(methods)...
}
(Took me longer than it should to discover this - couldn't understand why the entities field in EntitySet<T> was not throwing null reference exceptions!)
My question is what are the benefits of implementing this as a struct over a class?
Lets assume that you want to store ItemList<T> in an array.
Allocating an array of value types (struct) will store the data inside the array. If on the other hand ItemList<T> was a reference type (class) only references to ItemList<T> objects would be stored inside the array. The actualy ItemList<T> objects would be allocated on the heap. An extra level of indirection is required to reach an ItemList<T> instance and as it simply is a an array combined with a length it is more efficient to use a value type.
After the inspecting the code for EntitySet<T> I can see that no array is involved. However, an EntitySet<T> still contains two ItemList<T> instances. As ItemList<T> is a struct the storage for these instances are allocated inside the EntitySet<T> object. If a class was used instead the EntitySet<T> would have contained references pointing to EntitySet<T> objects allocated separately.
The performance difference between using one or the other may not be noticable in most cases but perhaps the developer decided that he wanted to treat the array and the tightly coupled count as a single value simply because it seemed like the best thing to do.
For small critical internal data structures like ItemList<T>, we often have the choice of using either a reference type or a value type. If the code is written well, switching from one to the other is of a trivial change.
We can speculate that a value type avoids heap allocation and a reference type avoids struct copying so it's not immediately clear either way because it depends so much on how it is used.
The best way to find out which one is better is to measure it. Whichever is faster is the clear winner. I'm sure they did their benchmarking and struct was faster. After you've done this a few times your intuition is pretty good and the benchmark just confirms that your choice was correct.
Maybe its important that...quote about struct from here
The new variable and the original
variable therefore contain two
separate copies of the same data.
Changes made to one copy do not affect the other copy.
Just thinking, dont judge me hard :)
There are really only two reasons to ever use a struct, and that is either to get value type semantics, or for better performance.
As the struct contains an array, value type semantics doesn't work well. When you copy the struct you get a copy of the count, but you only get a copy of the reference to the array, not a copy of the items in the array. Therefore you would have to use special care whenever the struct is copied so that you don't get inconsistent instances of it.
So, the only remaining valid reason would be performance. There is a small overhead for each reference type instance, so if you have a lot of them there may be a noticable performance gain.
One nifty feature of such a structure is that you can create an array of them, and you get an array of empty lists without having to initialise each list:
ItemList<string>[] = new ItemList<string>[42];
As the items in the array are zero-filled, the count member will be zero and the items member will be null.
Purely speculating here:
Since the object is fairly small (only has two member variables), it is a good candidate for making it a struct to allow it to be passed as a ValueType.
Also, as #Martin Liversage points out, by being a ValueType it can be stored more efficiently in larger data structures (e.g. as an item in an array), without the overhead of having an individual object and a reference to it.

Are stack based arrays possible in C#?

Let's say, hypothetically (read: I don't think I actually need this, but I am curious as the idea popped into my head), one wanted an array of memory set aside locally on the stack, not on the heap. For instance, something like this:
private void someFunction()
{
int[20] stackArray; //C style; I know the size and it's set in stone
}
I'm guessing the answer is no. All I've been able to find is heap based arrays. If someone were to need this, would there be any workarounds? Is there any way to set aside a certain amount of sequential memory in a "value type" way? Or are structs with named parameters the only way (like the way the Matrix struct in XNA has 16 named parameters (M11-M44))?
What you want is stackalloc; unfortunately, you can only use this in unsafe code, which means it won't run in a limited permissions context.
You could also create a struct with the necessary number of variables in it for each element type, but you would need a new type for each size of 'array' you wanted to use
The closest thing I can think of to a stack-based array would be a manually-nested structure; for an array of size N^M, the code size would be O(MN) and the access time O(M); one could scale M and N as convenient (e.g. one could handle a 4096-element array as six-deep nested 4-element structures, four-deep nested 8-element structures or three-deep nested 16-element structures, two-deep nested 64-element structures, etc.) If one wanted to do three-deep nesting of 16-element arrays (probably the most practical trade-off) one would define a 16-element structure with fields f0 through f15, and an access method using switch/case to select an element. One could then define a 16-element structure of those, a 16-element structure of those, etc.
In general, using a standard Array is apt to be better than using value-type structures to mimic arrays, but there are times when having an array-ish thing as a value type would be advantageous. The advantages of value type arrays would tend to be limited in .net, however, by some limitations in its handling of manipulating value types by reference. While it would be nice if one could simply access element 0x123 from an array described as above by writing "MyArrayishThing[1][2][3]", that would be inefficient for reading and ineffective for writing (since the subexpression MyArrayishThing[1] would make a copy of structures holding 256 elements of the array). Instead, what's necessary is to pass MyArrayishThing[1] by reference to a routine that can access element 2 of that and pass it by reference to a routine to access element 3 of that. It's possible to do that efficiently, but the code ends up looking rather nasty.

When structures are better than classes? [duplicate]

This question already has answers here:
When should I use a struct rather than a class in C#?
(31 answers)
Closed 9 years ago.
Duplicate of: When to use struct in C#?
Are there practical reasons to use structures instead of some classes in Microsoft .NET 2.0/3.5 ?
"What is the difference between structures and classes?" - this is probably the most popular question on intrviews for ".NET developer" vacancies. The only answer that interviewer considers to be right is "structures are allocated on stack and classes are allocated on heap" and no further questions are asked about that.
Some google search showed that:
a) structures have numerous limitations and no additional abilities in comparison to classes and
b) stack (and as such
structures) can be faster on very specialized conditions including:
size of data chunk less that 16 bytes
no extensive boxing/unboxing
structure's members are nearly immutable
whole set of data is not big (otherwise we get stack overflow)
(please correct/add to this list if it is wrong or not full)
As far as I know, most typical commercial projects (ERM, accouting, solutions for banks, etc.) do not define even a single structure, all custom data types are defined as classes instead. Is there something wrong or at least imperfect in this approach?
NOTE: question is about run-of-the-mill business apps, please don't list "unusual" cases like game development, real-time animation, backward compatibility (COM/Interop), unmanaged code and so on - these answers are already under this similar question:
When to use struct?
As far as I know, most typical commercial projects (ERM, accouting, solutions for banks, etc.) do not define even a single structure, all custom data types are defined as classes instead. Is there something wrong or at least imperfect in this approach?
No! Everything is perfectly right with that. Your general rule should be to always use objects by default. After all we are talking about object-oriented programing for a reason and not structure-oriented programing (structs themselves are missing some OO principles like Inheritance and Abstraction).
However structures are sometimes better if:
You need precise control over the amount of memory used (structures use (depending on the size) a little bit to FAR less memory than objects.
You need precise control of memory layout. This is especially important for interop with Win32 or other native APIs
You need the fastest possible speed. (In lots of scenarios with larger sets of data you can get a decent speedup when correctly using structs).
You need to waste less memory and have large amounts of structured data in arrays. Especially in conjunction with Arrays you could get huge amount of memory savings with structures.
You are working extensively with pointers. Then structures offer lots of interesting characteristics.
IMO the most important use case are large arrays of small composite entities. Imagine an array containing 10^6 complex numbers. Or a 2d array containing 1000x1000 24-bit RGB values. Using struct instead of classes can make a huge difference in cases like these.
EDIT:
To clarify: Assume you have a struct
struct RGB
{
public byte R,G,B;
}
If you declare an array of 1000x1000 RGB values, this array will take exactly 3 MB of memory, because the values types are stored inline.
If you used a class instead of a struct, the array would contain 1000000 references. That alone would take 4 or 8 MB (on a 64 bit machine) of memory. If you initialized all items with separate objects, so you can modify the values separately, you'd habe 1000000 objects swirling around on the managed heap to keep the GC busy. Each object has an overhead (IIRC) of 2 references, i.e. the objects would use 11/19 MB of memory. In total that's 5 times as much memory as the simple struct version.
One advantage of stack allocated value types is that they are local to the thread. That means that they are inherently thread safe. That cannot be said for objects on the heap.
This of course assumes we're talking about safe, managed code.
Another difference with classes is that when you assign an structure instance to a variable, you are not just copying a reference but indeed copying the whole structure. So if you modify one of the instances (you shouldn't anyway, since structure instances are intended to be immutable), the other one is not modified.
All good answers thus far...I only have to add that by definition value types are not nullable and hence are a good candidate for use in scenarios where you do not want to be bothered with creating a new instance of a class and assigning it to fields, for example...
struct Aggregate1
{
int A;
}
struct Aggregate2
{
Aggregate1 A;
Aggregate1 B;
}
Note if Aggregate1 were a class then you would have had to initialize the fields in Aggregate2 manually...
Aggregate2 ag2 = new Aggregate2();
ag2.A = new Aggregate1();
ag2.B = new Aggregate1();
This is obviously not required as long as Aggregate1 is a struct...this may prove to be useful when you are creating a class/struct heirarchy for the express purpose of serialization/deserialization with the XmlSerializer Many seemingly mysterious exceptions will disappear just by using structs in this case.
If the purpose of a type is to bind a small fixed collection of independent values together with duct tape (e.g. the coordinates of a point, a key and associated value of an enumerated dictionary entry, a six-item 2d transformation matrix, etc.), the best representation, from the standpoint of both efficiency and semantics, is likely to be a mutable exposed-field structure. Note that this represents a very different usage scenario from the case where a struct represents a single unified concept (e.g. a Decimal or DateTime), and Microsoft's advice for when to use structures gives advice which is only applicable to the latter one. The style of "immutable" structure Microsoft describes is only really suitable for representing a single unified concept; if one needs to represent a small fixed collection of independent values, the proper alternative is not an immutable class (which offers inferior performance), nor a mutable class (which will in many cases offer incorrect semantics), but rather an exposed-field struct (which--when used properly--offers superior semantics and performance). For example, if one has a struct MyTransform which holds a 2d transformation matrix, a method like:
static void Offset(ref it, double x, double y)
{
it.dx += x;
it.dy += y;
}
is both faster and clearer than
static void Offset(ref it, double x, double y)
{
it = new Transform2d(it.xx, int.xy, it.yx, it.yy, it.dx+x, it.dy+y);
}
or
Transform2d Offset(double dx, double dy)
{
it = new Transform2d(xx, xy, yx, yy, dx+x, dy+y);
}
Knowing that dx and dy are fields of Transform2d is sufficient to know that the first method modifies those fields and has no other side-effect. By contrast, to know what the other methods do, one would have to examine the code for the constructor.
There have been some excellent answers that touch on the practicality of using structs vs. classes and visa-versa, but I think your original comment about structs being immutable is a pretty good argument for why classes are used more often in the high-level design of LOB applications.In Domain Driven Design http://www.infoq.com/minibooks/domain-driven-design-quickly there is somewhat of a parallel between Entities/Classes and Value Objects/Structs. Entities in DDD are items within the business domain whose identity we need to track with an identifier, e.g. CustomerId, ProductId, etc. Value Objects are items whose values we might be interested in, but whose identity we don't track with an identifier e.g Price or OrderDate. Entities are mutable in DDD except for their Identity Field, while Value Objects do not have an identity.So when modeling a typical business entity, a class is usually designed along with an identity attribute, which tracks the identity of the business object round trip from the persistance store and back again. Although at runtime we might change all the property values on a business object instance, the entity's identity is retained as long as the identifier is immutable. With business concepts that correspond to Money or Time, a struct is sort of a natural fit because even though a new instance is created whenever we perform a computation, that's ok because we aren't tracking an identity, only storing a value.
sometime, you just wanna transfer data between components, then struct is better than class. e.g. Data Transfer Object(DTO) which only carry data.

What is the best way to determine the initial capacity for collection objects?

When using objects that have a capacity, what are some guidelines that you can use to ensure the best effeciency when using to collections? It also seems like .NET framework has set some of these capacities low. For example, I think StringBuilder has an intial capacity of 16. Does this mean that after 16 strings are inserted into the StringBuilder, the StringBuilder object is reallocated and doubled in size?
If you know how large a collection or StringBuilder will be up front, it is good practice to pass that as the capacity to the constructor. That way, only one allocation will take place. If you don't know the precise number, even an approximation can be helpful.
With StringBuilder, it isn't the number of strings, but the number of characters. In general; if you can predict the length, go ahead and tell it - but since it uses doubling, there isn't a huge overhead in reallocating occasionally if you need to juts use Add etc.
In most cases, the difference will be trivial and a micro-optimisation. The biggest problem with not telling it the size is that unless the collection has a "trim" method, you might have nearly double the size you really needed (if you are very unlucky).
There are only two circumstances where I ever explicitly set the capacity of a collection
I know the exact number of items that will appear in the collection and I'm using an Array or List<T>.
I am PInvoking into a function which writes to a char[] and i'm using a StringBuilder to interop with parameter. In this case you must set a capacity for the CLR to marshal to native code.
Interestingly, for #1 it is almost always done when I am copying data returned from a COM interface into a BCL collection class. So I guess you could say I only ever do this in interop scenarios :).
Speaking of StringBuilder, I'd dare to use the worst-case size. StringBuilder requires contigous memory block, which is hard to allocate on a highly fragmented heap.
I'd go with an estimation for other collections, though.

Questions about Structs

MSDN says that a class that would be 16 bytes or less would be better handled as a struct [citation].
Why is that?
Does that mean that if a struct is over 16 bytes it's less efficient than a class or is it the same?
How do you determine if your class is under 16 bytes?
What restricts a struct from acting like a class? (besides disallowing parameterless constructors)
There are a couple different answers to this question, and it is a bit subjective, but some reasons I can think of are:
structs are value-type, classes are reference type. If you're using 16 bytes for total storage, it's probably not worth it to create memory references (4 to 8 bytes) for each one.
When you have really small objects, they can often be pushed onto the IL stack, instead of references to the objects. This can really speed up some code, as you're eliminating a memory dereference on the callee side.
There is a bit of extra "fluff" associated with classes in IL, and if your data structure is very small, none of this fluff would be used anyway, so it's just extra junk you don't need.
The most important difference between a struct and a class, though, is that structs are value type and classes are reference type.
By "efficient", they're probably talking about the amount of memory it takes to represent the class or struct.
On the 32-bit platform, allocating an object requires a minimum of 16 bytes. On a 64-bit platform, the minimum object size is 24 bytes. So, if you're looking at it purely from the amount of memory used, a struct that contains less than 16 bytes of data will be "better" than the corresponding class.
But the amount of memory used is not the whole story. Value types (structs) are fundamentally different than reference types (classes). Structs can be inconvenient to work with, and can actually cause performance problems if you're not careful.
The real answer, of course, is to use whichever works best in your situation. In most cases, you'll be much better off using classes.
Check this link, I found it on one of the answers in SO today: .NET Type Internals. You can also try searching SO and Googling for "reference types vs value types" for differences between structs and classes.
What restricts a struct from acting like a class?
There are many differences. You cannot inherit from a struct, for example.
You can't have virtual methods, so you cannot use a struct to implement an interface. Instance methods in structs can access struct's private fields, but apart from that they behave a lot like auxilirary "helper" functions (for immutable structs, they sometimes don't even need to access private data). So I find them to be not as near as "valuable" as class methods.
structs are different from classes because they are stored on the stack, and not on the heap. That means that every time you call a method with the struct as parameter, a copy is created and passed to the method. That is why large structs are extremely inefficient.
I would actively discourage to use structs nevertheless, because it could cause some subtle bugs: e.g. when you change a field of a struct, its not going to be reflected for the caller (because you only changed the copy) - which is completely different behavior to classes.
So the 16 bytes I think is a reasonable maximum size of a struct, but still in most cases it is better to have a class. If you still want to create a struct, try to make it immutable at least.
This is due to the different way that the CLR handles structs and classes. Structs are value types which means they live on the stack rather than in the managed heap. It is a good rule of thumb to keep structs small because once you start passing them as method arguments you will incur overhead as structs are copied in their entirety when passed to a method.
Since classes pass a copy of their reference to methods they incur much less overhead when used as method arguments.
The best way to determine the size of your class is to total the number of bytes required by all the members of your class plus an extra 8 bytes for CLR overhead stuff (the sync block index and the reference to the type of the object).
In memory, the struct will hold the data directly, while a class will behave more like a pointer. That alone makes an important difference, since passing the struct as a parameter to a method will pass its values (copy them on the stack), while the class will pass the reference to the values. If the struct is big, you will be copying a lot of values on each method call. When it is really small copying the values and using them directly will be probably faster than copying the pointer and having to grab them from another place.
About restrictions: you can't assign it to null (although you can use Nullable<>) and you have to initialize it right away.
Copying an instance of a struct takes less time than creating a new instance of a class and copying data from an old one, but class instances can be shared and struct instances cannot. Thus, "structvar1 = structvar2" requires copying new struct instance, whereas "classvar1 = classvar2" allows classvar1 and classvar2 refer to the same struct instance (without having to create a new one).
The code to handle the creation of new struct instances is optimized for sizes up to 16 bytes. Larger structs are handled less efficiently. Structs are a win in cases where every variable that holds a struct will hold an independent instance (i.e. there's no reason to expect that any particular two variables will hold identical instances); they are not much of a win (if they're a win at all) in cases where many variables could hold the same instance.

Categories