Let's say, hypothetically (read: I don't think I actually need this, but I am curious as the idea popped into my head), one wanted an array of memory set aside locally on the stack, not on the heap. For instance, something like this:
private void someFunction()
{
int[20] stackArray; //C style; I know the size and it's set in stone
}
I'm guessing the answer is no. All I've been able to find is heap based arrays. If someone were to need this, would there be any workarounds? Is there any way to set aside a certain amount of sequential memory in a "value type" way? Or are structs with named parameters the only way (like the way the Matrix struct in XNA has 16 named parameters (M11-M44))?
What you want is stackalloc; unfortunately, you can only use this in unsafe code, which means it won't run in a limited permissions context.
You could also create a struct with the necessary number of variables in it for each element type, but you would need a new type for each size of 'array' you wanted to use
The closest thing I can think of to a stack-based array would be a manually-nested structure; for an array of size N^M, the code size would be O(MN) and the access time O(M); one could scale M and N as convenient (e.g. one could handle a 4096-element array as six-deep nested 4-element structures, four-deep nested 8-element structures or three-deep nested 16-element structures, two-deep nested 64-element structures, etc.) If one wanted to do three-deep nesting of 16-element arrays (probably the most practical trade-off) one would define a 16-element structure with fields f0 through f15, and an access method using switch/case to select an element. One could then define a 16-element structure of those, a 16-element structure of those, etc.
In general, using a standard Array is apt to be better than using value-type structures to mimic arrays, but there are times when having an array-ish thing as a value type would be advantageous. The advantages of value type arrays would tend to be limited in .net, however, by some limitations in its handling of manipulating value types by reference. While it would be nice if one could simply access element 0x123 from an array described as above by writing "MyArrayishThing[1][2][3]", that would be inefficient for reading and ineffective for writing (since the subexpression MyArrayishThing[1] would make a copy of structures holding 256 elements of the array). Instead, what's necessary is to pass MyArrayishThing[1] by reference to a routine that can access element 2 of that and pass it by reference to a routine to access element 3 of that. It's possible to do that efficiently, but the code ends up looking rather nasty.
Related
I'm learning C# and basically know the difference between arrays and Lists that the last is a generic and can dynamically grow but I'm wondering:
are List elements sequentially located in heap like array or is each element located "randomly" in a different locations?
and if that is true, does that affect the speed of access & data retrieval from memory?
and if that is true, is this what makes arrays a little faster than Lists?
Let's see the second and the third questions first:
and if that true does that affect the speed of access & data retrieval from memory ?
and if that true is this what makes array little faster than list ?
There is only a single type of "native" collection in .NET (with .NET I mean the CLR, so the runtime): the array (technically, if you consider a string a type of collection, then there are two native types of collections :-) ) (technically part 2: not all the arrays you think that are arrays are "native" arrays... Only the monodimensional 0 based arrays are "native" arrays. Arrays of type T[,] aren't, and arrays where the first element doesn't have an index of 0 aren't) . Every other collection (other than the LinkedList<>) is built atop it. If you look at the List<T> with IlSpy you'll see that at the base of it there is a T[] with an added int for the Count (the T[].Length is the Capacity). Clearly an array is a little faster than a List<T> because to use it, you have one less indirection (you access the array directly, instead of accessing the array that accesses the list).
Let's see the first question:
does List elements sequentially located in heap like array or each element is located randomly in different locations?
Being based on an array internally, clearly the List<> memorizes its elements like an array, so in a contiguous block of memory (but be aware that with a List<SomeObject> where SomeObject is a reference type, the list is a list of references, not of objects, so the references are put in a contiguous block of memory (we will ignore that with the advanced memory management of computers, the word "contiguous block of memory" isn't exact", it would be better to say "a contiguous block of addresses") )
(yes, even Dictionary<> and HashSet<> are built atop arrays. Conversely a tree-like collection could be built without using an array, because it's more similar to a LinkedList)
Some additional details: there are four groups of instructions in the CIL language (the intermediate language used in compiled .NET programs) that are used with "native" arrays:
Newarr
Ldelem and family Ldelem_*
Stelem and family Stelem_*
ReadOnly (don't ask me its use, I don't know, and the documentation isn't clear)
if you look at OpCodes.Newarr you'll see this comment in the XML documentation:
// Summary:
// Pushes an object reference to a new zero-based, one-dimensional array whose
// elements are of a specific type onto the evaluation stack.
Yes, elements in a List are stored contiguously, just like an array. A List actually uses arrays internally, but that is an implementation detail that you shouldn't really need to be concerned with.
Of course, in order to get the correct impression from that statement, you also have to understand a bit about memory management in .NET. Namely, the difference between value types and reference types, and how objects of those types are stored. Value types will be stored in contiguous memory. With reference types, the references will be stored in contiguous memory, but not the instances themselves.
The advantage of using a List is that the logic inside of the class handles allocating and managing the items for you. You can add elements anywhere, remove elements from anywhere, and grow the entire size of the collection without having to do any extra work. This is, of course, also what makes a List slightly slower than an array. If any reallocation has to happen in order to comply with your request, there'll be a performance hit as a new, larger-sized array is allocated and the elements are copied to it. But it won't be any slower than if you wrote the code to do it manually with a raw array.
If your length requirement is fixed (i.e., you never need to grow/expand the total capacity of the array), you can go ahead and use a raw array. It might even be marginally faster than a List because it avoids the extra overhead and indirection (although that is subject to being optimized out by the JIT compiler).
If you need to be able to dynamically resize the collection, or you need any of the other features provided by the List class, just use a List. The performance difference will be virtually imperceptible.
In System.Data.Linq, EntitySet<T> uses a couple of ItemList<T> structs which look like this:
internal struct ItemList<T> where T : class
{
private T[] items;
private int count;
...(methods)...
}
(Took me longer than it should to discover this - couldn't understand why the entities field in EntitySet<T> was not throwing null reference exceptions!)
My question is what are the benefits of implementing this as a struct over a class?
Lets assume that you want to store ItemList<T> in an array.
Allocating an array of value types (struct) will store the data inside the array. If on the other hand ItemList<T> was a reference type (class) only references to ItemList<T> objects would be stored inside the array. The actualy ItemList<T> objects would be allocated on the heap. An extra level of indirection is required to reach an ItemList<T> instance and as it simply is a an array combined with a length it is more efficient to use a value type.
After the inspecting the code for EntitySet<T> I can see that no array is involved. However, an EntitySet<T> still contains two ItemList<T> instances. As ItemList<T> is a struct the storage for these instances are allocated inside the EntitySet<T> object. If a class was used instead the EntitySet<T> would have contained references pointing to EntitySet<T> objects allocated separately.
The performance difference between using one or the other may not be noticable in most cases but perhaps the developer decided that he wanted to treat the array and the tightly coupled count as a single value simply because it seemed like the best thing to do.
For small critical internal data structures like ItemList<T>, we often have the choice of using either a reference type or a value type. If the code is written well, switching from one to the other is of a trivial change.
We can speculate that a value type avoids heap allocation and a reference type avoids struct copying so it's not immediately clear either way because it depends so much on how it is used.
The best way to find out which one is better is to measure it. Whichever is faster is the clear winner. I'm sure they did their benchmarking and struct was faster. After you've done this a few times your intuition is pretty good and the benchmark just confirms that your choice was correct.
Maybe its important that...quote about struct from here
The new variable and the original
variable therefore contain two
separate copies of the same data.
Changes made to one copy do not affect the other copy.
Just thinking, dont judge me hard :)
There are really only two reasons to ever use a struct, and that is either to get value type semantics, or for better performance.
As the struct contains an array, value type semantics doesn't work well. When you copy the struct you get a copy of the count, but you only get a copy of the reference to the array, not a copy of the items in the array. Therefore you would have to use special care whenever the struct is copied so that you don't get inconsistent instances of it.
So, the only remaining valid reason would be performance. There is a small overhead for each reference type instance, so if you have a lot of them there may be a noticable performance gain.
One nifty feature of such a structure is that you can create an array of them, and you get an array of empty lists without having to initialise each list:
ItemList<string>[] = new ItemList<string>[42];
As the items in the array are zero-filled, the count member will be zero and the items member will be null.
Purely speculating here:
Since the object is fairly small (only has two member variables), it is a good candidate for making it a struct to allow it to be passed as a ValueType.
Also, as #Martin Liversage points out, by being a ValueType it can be stored more efficiently in larger data structures (e.g. as an item in an array), without the overhead of having an individual object and a reference to it.
When designing an API, I may want to persist details (Eg of a process running) into my own custom struct. However, if I am going to do this for more than 1 process, meaning I need several structs, should I have an array of structs or one struct with an array for each of its properties (eg startTime, processName and other process properties I am interested in).
Which way is better for performance and better for an api/class library?
Thanks
IMHO you should do an array of structs despite the performance hit you take for instansiating all of the structs. The organizational sense of one state being stored in one struct far outweighs the loss of performance, and using one struct with a bunch of arrays and simply assigning each process an index in a number of arrays is very messy and can be a huge pain to debug.
You might consider using a class rather than a struct and I would use a list of classes.
Eric Lippert has a few arguments against using arrays in an API. One of the more compelling to me is why you'd want to keep the collection size fixed, but allow consumers to modify the contents. You can see more here.
Ultimately, you may want to store them internally using arrays, but I would avoid exposing this through the API. If people need to enumerate, use IEnumerable<T> instead.
From a data-storage standpoint, if one will frequently be accessing all the parts of an items more often than one would be accessing some particular part from a group of consecutive items, caching behavior will be better with an array of structs.
A more interesting question is how to expose the data. If you expose the struct as an indexer, anyone wanting to change a field of the struct will have to read out the struct, change the field in their temporary copy, and write it back. You could expose methods to read/write individual properties, but "foo.setBar(100, 23)" seems rather less natural than "foo(100).Bar=23". Too allow the latter syntax, I'd suggest perhaps having the indexer return a struct with two private fields, "root" and "index", and properties for each field of the struct so that e.g. the setter for the Bar property of the indexer would perform root.setBar(index, value). The indexer should also have an "asWhateverStructType" property to get/set the struct as a whole.
This question already has answers here:
When should I use a struct rather than a class in C#?
(31 answers)
Closed 9 years ago.
Duplicate of: When to use struct in C#?
Are there practical reasons to use structures instead of some classes in Microsoft .NET 2.0/3.5 ?
"What is the difference between structures and classes?" - this is probably the most popular question on intrviews for ".NET developer" vacancies. The only answer that interviewer considers to be right is "structures are allocated on stack and classes are allocated on heap" and no further questions are asked about that.
Some google search showed that:
a) structures have numerous limitations and no additional abilities in comparison to classes and
b) stack (and as such
structures) can be faster on very specialized conditions including:
size of data chunk less that 16 bytes
no extensive boxing/unboxing
structure's members are nearly immutable
whole set of data is not big (otherwise we get stack overflow)
(please correct/add to this list if it is wrong or not full)
As far as I know, most typical commercial projects (ERM, accouting, solutions for banks, etc.) do not define even a single structure, all custom data types are defined as classes instead. Is there something wrong or at least imperfect in this approach?
NOTE: question is about run-of-the-mill business apps, please don't list "unusual" cases like game development, real-time animation, backward compatibility (COM/Interop), unmanaged code and so on - these answers are already under this similar question:
When to use struct?
As far as I know, most typical commercial projects (ERM, accouting, solutions for banks, etc.) do not define even a single structure, all custom data types are defined as classes instead. Is there something wrong or at least imperfect in this approach?
No! Everything is perfectly right with that. Your general rule should be to always use objects by default. After all we are talking about object-oriented programing for a reason and not structure-oriented programing (structs themselves are missing some OO principles like Inheritance and Abstraction).
However structures are sometimes better if:
You need precise control over the amount of memory used (structures use (depending on the size) a little bit to FAR less memory than objects.
You need precise control of memory layout. This is especially important for interop with Win32 or other native APIs
You need the fastest possible speed. (In lots of scenarios with larger sets of data you can get a decent speedup when correctly using structs).
You need to waste less memory and have large amounts of structured data in arrays. Especially in conjunction with Arrays you could get huge amount of memory savings with structures.
You are working extensively with pointers. Then structures offer lots of interesting characteristics.
IMO the most important use case are large arrays of small composite entities. Imagine an array containing 10^6 complex numbers. Or a 2d array containing 1000x1000 24-bit RGB values. Using struct instead of classes can make a huge difference in cases like these.
EDIT:
To clarify: Assume you have a struct
struct RGB
{
public byte R,G,B;
}
If you declare an array of 1000x1000 RGB values, this array will take exactly 3 MB of memory, because the values types are stored inline.
If you used a class instead of a struct, the array would contain 1000000 references. That alone would take 4 or 8 MB (on a 64 bit machine) of memory. If you initialized all items with separate objects, so you can modify the values separately, you'd habe 1000000 objects swirling around on the managed heap to keep the GC busy. Each object has an overhead (IIRC) of 2 references, i.e. the objects would use 11/19 MB of memory. In total that's 5 times as much memory as the simple struct version.
One advantage of stack allocated value types is that they are local to the thread. That means that they are inherently thread safe. That cannot be said for objects on the heap.
This of course assumes we're talking about safe, managed code.
Another difference with classes is that when you assign an structure instance to a variable, you are not just copying a reference but indeed copying the whole structure. So if you modify one of the instances (you shouldn't anyway, since structure instances are intended to be immutable), the other one is not modified.
All good answers thus far...I only have to add that by definition value types are not nullable and hence are a good candidate for use in scenarios where you do not want to be bothered with creating a new instance of a class and assigning it to fields, for example...
struct Aggregate1
{
int A;
}
struct Aggregate2
{
Aggregate1 A;
Aggregate1 B;
}
Note if Aggregate1 were a class then you would have had to initialize the fields in Aggregate2 manually...
Aggregate2 ag2 = new Aggregate2();
ag2.A = new Aggregate1();
ag2.B = new Aggregate1();
This is obviously not required as long as Aggregate1 is a struct...this may prove to be useful when you are creating a class/struct heirarchy for the express purpose of serialization/deserialization with the XmlSerializer Many seemingly mysterious exceptions will disappear just by using structs in this case.
If the purpose of a type is to bind a small fixed collection of independent values together with duct tape (e.g. the coordinates of a point, a key and associated value of an enumerated dictionary entry, a six-item 2d transformation matrix, etc.), the best representation, from the standpoint of both efficiency and semantics, is likely to be a mutable exposed-field structure. Note that this represents a very different usage scenario from the case where a struct represents a single unified concept (e.g. a Decimal or DateTime), and Microsoft's advice for when to use structures gives advice which is only applicable to the latter one. The style of "immutable" structure Microsoft describes is only really suitable for representing a single unified concept; if one needs to represent a small fixed collection of independent values, the proper alternative is not an immutable class (which offers inferior performance), nor a mutable class (which will in many cases offer incorrect semantics), but rather an exposed-field struct (which--when used properly--offers superior semantics and performance). For example, if one has a struct MyTransform which holds a 2d transformation matrix, a method like:
static void Offset(ref it, double x, double y)
{
it.dx += x;
it.dy += y;
}
is both faster and clearer than
static void Offset(ref it, double x, double y)
{
it = new Transform2d(it.xx, int.xy, it.yx, it.yy, it.dx+x, it.dy+y);
}
or
Transform2d Offset(double dx, double dy)
{
it = new Transform2d(xx, xy, yx, yy, dx+x, dy+y);
}
Knowing that dx and dy are fields of Transform2d is sufficient to know that the first method modifies those fields and has no other side-effect. By contrast, to know what the other methods do, one would have to examine the code for the constructor.
There have been some excellent answers that touch on the practicality of using structs vs. classes and visa-versa, but I think your original comment about structs being immutable is a pretty good argument for why classes are used more often in the high-level design of LOB applications.In Domain Driven Design http://www.infoq.com/minibooks/domain-driven-design-quickly there is somewhat of a parallel between Entities/Classes and Value Objects/Structs. Entities in DDD are items within the business domain whose identity we need to track with an identifier, e.g. CustomerId, ProductId, etc. Value Objects are items whose values we might be interested in, but whose identity we don't track with an identifier e.g Price or OrderDate. Entities are mutable in DDD except for their Identity Field, while Value Objects do not have an identity.So when modeling a typical business entity, a class is usually designed along with an identity attribute, which tracks the identity of the business object round trip from the persistance store and back again. Although at runtime we might change all the property values on a business object instance, the entity's identity is retained as long as the identifier is immutable. With business concepts that correspond to Money or Time, a struct is sort of a natural fit because even though a new instance is created whenever we perform a computation, that's ok because we aren't tracking an identity, only storing a value.
sometime, you just wanna transfer data between components, then struct is better than class. e.g. Data Transfer Object(DTO) which only carry data.
I have a type which I consider use it as struct.
It represents single value
It is immutable
But the problem is, it has 6 fields of int.
So which solution I should use for this type?
keep using struct?
change to class?
or pack 6 integers into an array of int, so it has only one field
EDIT
Size of struct with 6 integer fields is 24 bytes, which is huge to pass around.
Recommend size for struct is not more than 16 bytes
It depends how you are going to use it?
Are you going to allocate a lot of it vs. pass it around a lot?
Is it going to be consumed by 3rd party code? In this case, classes typically give you more flexibility.
Do you want struct vs. class semantics? For example, non-nullable?
Would you benefit from having a couple of pre-created instances of this class that can be re-use to represent special cases? Similar to String.Empty. In this case you would benefit from a class.
It is hard to answer just from the information you provided in your question.
Be careful of boxing. If your struct is going to be consumed by a method which expects an Object, it will be coerced into an object and you'll have a potential performance hit.
Here's a reference that explains it in more detail.
I'd make it a class (#2) and then you wouldn't have to worry about it.
Using an array of six integers (#3) would probably make your code harder to read. A class with descriptive identifiers for each int would be much better.
Without seeing your struct, it's difficult to say anything definitively. But I suspect you should leave this as a struct.
How about a WriteOnce<int[]> ?
I would suggest writing a little benchmark to measure the performance of the different options, this is the only way to know for sure. You may be surprised at the results (I often am).
(I'm assuming that your concern here is performance.)
If the data holder is going to be immutable, the struct-versus-class question will most likely depend upon the average number of references that would exist to each instance. If one has an array of TwentyFourByteStruct[1000], that array will take 24,000 bytes, regardless of whether every element holds a different value, all elements hold the same value, or somewhere in-between. If one has an array of TwentyFourByteClass[1000], that array will take 4,000 or 8,000 bytes (for 32/64-bit systems), and each distinct instance of TwentyFourByteClass which is created will take about 48 bytes. If all of the array elements happen to hold a reference to the same TwentyFourByteClass object, the total will be 4,048 or 8,048 bytes. If all of the array elements hold references to different TwentyFourByteClass objects, the total will be 52,000 or 56,000 bytes.
As for run-time performance, the best performance you can get will generally be passing structures by reference. Passing structures by value will require copying them, which can get expensive for structures larger than 16 bytes (.net includes optimizations for structures 16 bytes or smaller), but the cost of a value type by reference is the same whether it is 1 byte or 16,000 bytes.
In general, when storing more than two pieces of related data I like to make a class that binds them together. Especially if I will be passing them around as a unit.