In our application we have a Queue which is defined as the following:
private static Queue RawQ = new Queue();
Then two different types of objects are put onto the Queue, one are objects from a class (class A) and one are objects from a struct (struct B).
When we process the data from the Queue, we use typeof to check the item from the Queue belongs to which type (class A or struct B).
My questions:
for objects from class A, only their references are copied to the Queue and for object from struct B, their values are copied to the Queue, am I right?
for a Queue, some items are references which is small and some items are values which are much bigger (about 408 Bytes). This will waste many memory space if the Queue is not small?
do you have a better way to do the same thing?
thanks,
for objects from class A, only their references are copied to the Queue and for object from struct B, their values are copied to the Queue, am I right?
Correct. Actually, when you add a struct B to the queue, it is boxed first. In other words, your B instance is copied onto the managed heap, and a reference to the copy is put on the queue.
for a Queue, some items are references which is small and some items are values which are much bigger (about 408 Bytes). This will waste many memory space if the Queue is not small?
Possibly - boxing the B instance takes a copy, which uses more memory than not taking a copy. It depends what happens to the original.
408 bytes is very large for a .NET struct; the general rule of thumb is that structs shouldn't be bigger than 16 bytes. The reason is similar to this: large structs introduce overhead due to copying and boxing.
do you have a better way to do the same thing?
I'd question whether B needs to be a struct in the first place. Another rule of thumb (mine, this time): you probably don't need ever need a struct in .NET code.
1.for objects from class A, only their references are copied to the Queue and
for object from struct B, their values
are copied to the Queue, am I right?
That is correct. Except that value types would be boxed.
2.for a Queue, some items are references which is small and some
items are values which are much bigger
(about 408 Bytes). This will waste
many memory space if the Queue is not
small?
That is mostly correct. The boxing will add another 8 bytes (4 for the syncblock and 4 for the type information) so for large structs that is insignificant, but for smaller structs that would represent a larger ratio.
3.do you have a better way to do the same thing?
The best thing to do is convert that large struct into a class. There is no hard rule for knowing when to choose a struct or class based on size, but 32 bytes seems to be a common threshold. Of course, you could easily justify larger structs based on whether you really wanted value-type semantics, but 408 bytes is probably way beyond that threshold. If the type really needs value semantics you could make it an immutable class.
Another change you could make is to use the generic Queue class instead. Value types are not boxed as they would be with the normal Queue. However, you would still be copying that large struct even with the generic version.
From the C# spec:
Since structs are not reference types,
these operations are implemented
differently for struct types. When a
value of a struct type is converted to
type object or to an interface type
that is implemented by the struct, a
boxing operation takes place.
So, to answer 1) the queue contains boxed structs, not the actual struct values.
The answer to 2) falls out of that, a boxed struct and a reference have the same size in a queue's actual allocation.
For 3), I'd need more information. It would be preferable to have the same type in a queue and have polymorphic operations that are handled both by classes and structs in the appropriate ways. Excessive case statements and typeof() calls suggest that your program is more procedural than object-oriented. Maybe that's what you want, but C# is optimized for a OO approach.
I was trying to double check this, but here's what I belive happens:
The System.Collections.Queue class holds a collection of type Object which is a reference type. Therefore, when you pass an instance of a Struct to your queue it gets boxed as an object. This creates a copy on the Heap, and provides a refernce pointer (which is what the Queue sees). So, the Queue itself does not get too big, but if you're doing a lot of these operations, you'll end up (according to Microsoft) with a memory and performance hit over the boxing/unboxing.
See the C# Language Specification for more.
Related
A String is a reference type even though it has most of the characteristics of a value type such as being immutable and having == overloaded to compare the text rather than making sure they reference the same object.
Why isn't string just a value type then?
Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB for 32-bit and 4MB for 64-bit, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc...
(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)
It is not a value type because performance (space and time!) would be terrible if it were a value type and its value had to be copied every time it were passed to and returned from methods, etc.
It has value semantics to keep the world sane. Can you imagine how difficult it would be to code if
string s = "hello";
string t = "hello";
bool b = (s == t);
set b to be false? Imagine how difficult coding just about any application would be.
A string is a reference type with value semantics. This design is a tradeoff which allows certain performance optimizations.
The distinction between reference types and value types are basically a performance tradeoff in the design of the language. Reference types have some overhead on construction and destruction and garbage collection, because they are created on the heap. Value types on the other hand have overhead on assignments and method calls (if the data size is larger than a pointer), because the whole object is copied in memory rather than just a pointer. Because strings can be (and typically are) much larger than the size of a pointer, they are designed as reference types. Furthermore the size of a value type must be known at compile time, which is not always the case for strings.
But strings have value semantics which means they are immutable and compared by value (i.e. character by character for a string), not by comparing references. This allows certain optimizations:
Interning means that if multiple strings are known to be equal, the compiler can just use a single string, thereby saving memory. This optimization only works if strings are immutable, otherwise changing one string would have unpredictable results on other strings.
String literals (which are known at compile time) can be interned and stored in a special static area of memory by the compiler. This saves time at runtime since they don't need to be allocated and garbage collected.
Immutable strings does increase the cost for certain operations. For example you can't replace a single character in-place, you have to allocate a new string for any change. But this is a small cost compared to the benefit of the optimizations.
Value semantics effectively hides the distinction between reference type and value types for the user. If a type has value semantics, it doesn't matter for the user if the type is a value type or reference type - it can be considered an implementation detail.
This is a late answer to an old question, but all other answers are missing the point, which is that .NET did not have generics until .NET 2.0 in 2005.
String is a reference type instead of a value type because it was of crucial importance for Microsoft to ensure that strings could be stored in the most efficient way in non-generic collections, such as System.Collections.ArrayList.
Storing a value-type in a non-generic collection requires a special conversion to the type object which is called boxing. When the CLR boxes a value type, it wraps the value inside a System.Object and stores it on the managed heap.
Reading the value from the collection requires the inverse operation which is called unboxing.
Both boxing and unboxing have non-negligible cost: boxing requires an additional allocation, unboxing requires type checking.
Some answers claim incorrectly that string could never have been implemented as a value type because its size is variable. Actually it is easy to implement string as a fixed-length data structure containing two fields: an integer for the length of the string, and a pointer to a char array. You can also use a Small String Optimization strategy on top of that.
If generics had existed from day one I guess having string as a value type would probably have been a better solution, with simpler semantics, better memory usage and better cache locality. A List<string> containing only small strings could have been a single contiguous block of memory.
Not only strings are immutable reference types.
Multi-cast delegates too.
That is why it is safe to write
protected void OnMyEventHandler()
{
delegate handler = this.MyEventHandler;
if (null != handler)
{
handler(this, new EventArgs());
}
}
I suppose that strings are immutable because this is the most safe method to work with them and allocate memory.
Why they are not Value types? Previous authors are right about stack size etc. I would also add that making strings a reference types allow to save on assembly size when you use the same constant string in the program. If you define
string s1 = "my string";
//some code here
string s2 = "my string";
Chances are that both instances of "my string" constant will be allocated in your assembly only once.
If you would like to manage strings like usual reference type, put the string inside a new StringBuilder(string s). Or use MemoryStreams.
If you are to create a library, where you expect a huge strings to be passed in your functions, either define a parameter as a StringBuilder or as a Stream.
In a very simple words any value which has a definite size can be treated as a value type.
Also, the way strings are implemented (different for each platform) and when you start stitching them together. Like using a StringBuilder. It allocats a buffer for you to copy into, once you reach the end, it allocates even more memory for you, in the hopes that if you do a large concatenation performance won't be hindered.
Maybe Jon Skeet can help up out here?
It is mainly a performance issue.
Having strings behave LIKE value type helps when writing code, but having it BE a value type would make a huge performance hit.
For an in-depth look, take a peek at a nice article on strings in the .net framework.
How can you tell string is a reference type? I'm not sure that it matters how it is implemented. Strings in C# are immutable precisely so that you don't have to worry about this issue.
Actually strings have very few resemblances to value types. For starters, not all value types are immutable, you can change the value of an Int32 all you want and it it would still be the same address on the stack.
Strings are immutable for a very good reason, it has nothing to do with it being a reference type, but has a lot to do with memory management. It's just more efficient to create a new object when string size changes than to shift things around on the managed heap. I think you're mixing together value/reference types and immutable objects concepts.
As far as "==": Like you said "==" is an operator overload, and again it was implemented for a very good reason to make framework more useful when working with strings.
The fact that many mention the stack and memory with respect to value types and primitive types is because they must fit into a register in the microprocessor. You cannot push or pop something to/from the stack if it takes more bits than a register has....the instructions are, for example "pop eax" -- because eax is 32 bits wide on a 32-bit system.
Floating-point primitive types are handled by the FPU, which is 80 bits wide.
This was all decided long before there was an OOP language to obfuscate the definition of primitive type and I assume that value type is a term that has been created specifically for OOP languages.
Isn't just as simple as Strings are made up of characters arrays. I look at strings as character arrays[]. Therefore they are on the heap because the reference memory location is stored on the stack and points to the beginning of the array's memory location on the heap. The string size is not known before it is allocated ...perfect for the heap.
That is why a string is really immutable because when you change it even if it is of the same size the compiler doesn't know that and has to allocate a new array and assign characters to the positions in the array. It makes sense if you think of strings as a way that languages protect you from having to allocate memory on the fly (read C like programming)
Seeing as new instances of value types are created every time they are passed as arguments, I started thinking about scenarios where using the ref or out keywords can show a substantial performance improvement.
After a while it hit me that while I see the deficit of using value types I didn't know of any advantages.
So my question is rather straight forward - what is the purpose of having value types? what do we gain by copying a structure instead of just creating a new reference to it?
It seems to me that it would be a lot easier to only have reference types like in Java.
Edit: Just to clear this up, I am not referring to value types smaller than 8 bytes (max size of a reference), but rather value types that are 8 bytes or more.
For example - the Rectangle struct that contains four int values.
An instance of a one-byte value type takes up one byte. A reference type takes up the space for the reference plus the sync block and the virtual function table and ...
To copy a reference, you copy a four (or eight) byte reference. To copy a four-byte integer, you copy a four byte integer. Copying small value types is no more expensive than copying references.
Value types that contain no references need not be examined by the garbage collector at all. Every reference must be tracked by the garbage collector.
Value types are usually more performant than reference types:
A reference type costs extra memory for the reference and performance when dereferencing
A value type does not need extra garbage collection. It gets garbage collected together with the instance it lives in. Local variables in methods get cleaned up upon method leave.
Value type arrays are efficient in combination with caches. (Think of an array of ints compared with an array of instances of type Integer)
"Creating a reference" is not the problem. This is just a copy of 32/64 bits. Creating the object is what is costly. Actually creating the object is cheap but collecting it isn't.
Value types are good for performance when they are small and discarded often. They can be used in huge arrays very efficiently. A struct has no object header. There are a lot of other performance differences.
Edit: Eric Lippert posed a great example in the comments: "How many bytes does an array of one million bytes take up if they are value types? How many does it take up if they are reference types?"
I will answer: If struct packing is set to 1 such an array will take 1 million and 16 bytes (on 32 bit system). Using reference types it will take:
array, object header: 12
array, length: 4
array, data: 4*(1 million) = 4m
1 million objects, headers = 12 * (1 million)
1 million objects, data padded to 4 bytes: 4 * (1 million)
And that is why using value types in large arrays can be a good idea.
The gain is visible if your data is small (<16 bytes), you have lots of instances and/or you manipulate them a lot, especially passing to functions. This is because creating an object is relatively expensive compared to creating a small value type instance. And as someone else pointed out, objects need to be collected and that is even more expensive. Plus, very small value types take less memory than their reference type equivalents.
Example of non-primitive value type in .NET is Point structure (System.Drawing).
Every variable has a lifecycle. but not every variable need the flexibility for your variable to perform high but not managed in heap.
Value types (Struct) contain their data allocate in stack or allocated in-line in a structure. Reference types (Class) store a reference to the value's memory address, and are allocated on the heap.
what is the purpose of having value types?
Value types are quite efficient to handle simple data, (It should be use to represent immutable types to represent value)
Value type objects cannot be allocated on the garbage-collected heap, and the variable representing the object does not contain a pointer to an object; the variable contains the object itself.
what do we gain by copying a structure instead of just creating a new reference to it?
If you copy a struct, C# creates a new copy of the object and assigns the copy of the object to a separate struct instance. However, if you copy a class, C# creates a new copy of the reference to the object and assigns the copy of the reference to the separate class instance. Structs can't have destructors, but classes can have destructors.
One major advantage of value types like Rectangle is that if one has n storage locations of type Rectangle, one can be certain that one has n distinct instances of type Rectangle. If one has an array MyArray of type Rectangle, of length at least two, a statement like MyArray[0] = MyArray[1] will copy the fields of MyArray[1] into those of MyArray[0], but they will continue to refer to distinct Rectangle instances. If one then performs a statement line MyArray[0].X += 4 that will modify field X of one instance, without modifying the X value of any other array slot or Rectangle instance. Note, by the way, that creating the array instantly populates it with writable Rectangle instances.
Imagine if Rectangle were a mutable class type. Creating an array of mutable Rectangle instances would require that one first dimension the array, and then assign to each element in the array a new Rectangle instance. If one wanted to copy the value of one rectangle instance to another, one would have to say something like MyArray[0].CopyValuesFrom(MyArray[1]) [which would, of course, fail if MyArray[0] had not been populated with a reference to a new instance). If one were to accidentally say MyArray[0] = MyArray[1], then writing to MyArray[0].X would also affect MyArray[1].X. Nasty stuff.
It's important to note that there are a few places in C# and vb.net where the compiler will implicitly copy a value type and then act upon a copy as though it was the original. This is a really unfortunate language design, and has prompted some people to put forth the proposition that value types should be immutable (since most situations involving implicit copying only cause problems with mutable value types). Back when compilers were very bad at warning of cases where semantically-dubious copies would yield broken behavior, such a notion might have been reasonable. It should be considered obsolete today, though, given that any decent modern compiler will flag errors in most scenarios where implicit copying would yield broken semantics, including all scenarios where structs are only mutated via constructors, property setters, or external assignments to public mutable fields. A statement like MyArray[0].X += 5 is far more readable than MyArray[0] = new Rectangle(MyArray[0].X + 5, MyArray[0].Y, MyArray[0].Width, MyArray[0].Height).
In System.Data.Linq, EntitySet<T> uses a couple of ItemList<T> structs which look like this:
internal struct ItemList<T> where T : class
{
private T[] items;
private int count;
...(methods)...
}
(Took me longer than it should to discover this - couldn't understand why the entities field in EntitySet<T> was not throwing null reference exceptions!)
My question is what are the benefits of implementing this as a struct over a class?
Lets assume that you want to store ItemList<T> in an array.
Allocating an array of value types (struct) will store the data inside the array. If on the other hand ItemList<T> was a reference type (class) only references to ItemList<T> objects would be stored inside the array. The actualy ItemList<T> objects would be allocated on the heap. An extra level of indirection is required to reach an ItemList<T> instance and as it simply is a an array combined with a length it is more efficient to use a value type.
After the inspecting the code for EntitySet<T> I can see that no array is involved. However, an EntitySet<T> still contains two ItemList<T> instances. As ItemList<T> is a struct the storage for these instances are allocated inside the EntitySet<T> object. If a class was used instead the EntitySet<T> would have contained references pointing to EntitySet<T> objects allocated separately.
The performance difference between using one or the other may not be noticable in most cases but perhaps the developer decided that he wanted to treat the array and the tightly coupled count as a single value simply because it seemed like the best thing to do.
For small critical internal data structures like ItemList<T>, we often have the choice of using either a reference type or a value type. If the code is written well, switching from one to the other is of a trivial change.
We can speculate that a value type avoids heap allocation and a reference type avoids struct copying so it's not immediately clear either way because it depends so much on how it is used.
The best way to find out which one is better is to measure it. Whichever is faster is the clear winner. I'm sure they did their benchmarking and struct was faster. After you've done this a few times your intuition is pretty good and the benchmark just confirms that your choice was correct.
Maybe its important that...quote about struct from here
The new variable and the original
variable therefore contain two
separate copies of the same data.
Changes made to one copy do not affect the other copy.
Just thinking, dont judge me hard :)
There are really only two reasons to ever use a struct, and that is either to get value type semantics, or for better performance.
As the struct contains an array, value type semantics doesn't work well. When you copy the struct you get a copy of the count, but you only get a copy of the reference to the array, not a copy of the items in the array. Therefore you would have to use special care whenever the struct is copied so that you don't get inconsistent instances of it.
So, the only remaining valid reason would be performance. There is a small overhead for each reference type instance, so if you have a lot of them there may be a noticable performance gain.
One nifty feature of such a structure is that you can create an array of them, and you get an array of empty lists without having to initialise each list:
ItemList<string>[] = new ItemList<string>[42];
As the items in the array are zero-filled, the count member will be zero and the items member will be null.
Purely speculating here:
Since the object is fairly small (only has two member variables), it is a good candidate for making it a struct to allow it to be passed as a ValueType.
Also, as #Martin Liversage points out, by being a ValueType it can be stored more efficiently in larger data structures (e.g. as an item in an array), without the overhead of having an individual object and a reference to it.
MSDN says that a class that would be 16 bytes or less would be better handled as a struct [citation].
Why is that?
Does that mean that if a struct is over 16 bytes it's less efficient than a class or is it the same?
How do you determine if your class is under 16 bytes?
What restricts a struct from acting like a class? (besides disallowing parameterless constructors)
There are a couple different answers to this question, and it is a bit subjective, but some reasons I can think of are:
structs are value-type, classes are reference type. If you're using 16 bytes for total storage, it's probably not worth it to create memory references (4 to 8 bytes) for each one.
When you have really small objects, they can often be pushed onto the IL stack, instead of references to the objects. This can really speed up some code, as you're eliminating a memory dereference on the callee side.
There is a bit of extra "fluff" associated with classes in IL, and if your data structure is very small, none of this fluff would be used anyway, so it's just extra junk you don't need.
The most important difference between a struct and a class, though, is that structs are value type and classes are reference type.
By "efficient", they're probably talking about the amount of memory it takes to represent the class or struct.
On the 32-bit platform, allocating an object requires a minimum of 16 bytes. On a 64-bit platform, the minimum object size is 24 bytes. So, if you're looking at it purely from the amount of memory used, a struct that contains less than 16 bytes of data will be "better" than the corresponding class.
But the amount of memory used is not the whole story. Value types (structs) are fundamentally different than reference types (classes). Structs can be inconvenient to work with, and can actually cause performance problems if you're not careful.
The real answer, of course, is to use whichever works best in your situation. In most cases, you'll be much better off using classes.
Check this link, I found it on one of the answers in SO today: .NET Type Internals. You can also try searching SO and Googling for "reference types vs value types" for differences between structs and classes.
What restricts a struct from acting like a class?
There are many differences. You cannot inherit from a struct, for example.
You can't have virtual methods, so you cannot use a struct to implement an interface. Instance methods in structs can access struct's private fields, but apart from that they behave a lot like auxilirary "helper" functions (for immutable structs, they sometimes don't even need to access private data). So I find them to be not as near as "valuable" as class methods.
structs are different from classes because they are stored on the stack, and not on the heap. That means that every time you call a method with the struct as parameter, a copy is created and passed to the method. That is why large structs are extremely inefficient.
I would actively discourage to use structs nevertheless, because it could cause some subtle bugs: e.g. when you change a field of a struct, its not going to be reflected for the caller (because you only changed the copy) - which is completely different behavior to classes.
So the 16 bytes I think is a reasonable maximum size of a struct, but still in most cases it is better to have a class. If you still want to create a struct, try to make it immutable at least.
This is due to the different way that the CLR handles structs and classes. Structs are value types which means they live on the stack rather than in the managed heap. It is a good rule of thumb to keep structs small because once you start passing them as method arguments you will incur overhead as structs are copied in their entirety when passed to a method.
Since classes pass a copy of their reference to methods they incur much less overhead when used as method arguments.
The best way to determine the size of your class is to total the number of bytes required by all the members of your class plus an extra 8 bytes for CLR overhead stuff (the sync block index and the reference to the type of the object).
In memory, the struct will hold the data directly, while a class will behave more like a pointer. That alone makes an important difference, since passing the struct as a parameter to a method will pass its values (copy them on the stack), while the class will pass the reference to the values. If the struct is big, you will be copying a lot of values on each method call. When it is really small copying the values and using them directly will be probably faster than copying the pointer and having to grab them from another place.
About restrictions: you can't assign it to null (although you can use Nullable<>) and you have to initialize it right away.
Copying an instance of a struct takes less time than creating a new instance of a class and copying data from an old one, but class instances can be shared and struct instances cannot. Thus, "structvar1 = structvar2" requires copying new struct instance, whereas "classvar1 = classvar2" allows classvar1 and classvar2 refer to the same struct instance (without having to create a new one).
The code to handle the creation of new struct instances is optimized for sizes up to 16 bytes. Larger structs are handled less efficiently. Structs are a win in cases where every variable that holds a struct will hold an independent instance (i.e. there's no reason to expect that any particular two variables will hold identical instances); they are not much of a win (if they're a win at all) in cases where many variables could hold the same instance.
I have a type which I consider use it as struct.
It represents single value
It is immutable
But the problem is, it has 6 fields of int.
So which solution I should use for this type?
keep using struct?
change to class?
or pack 6 integers into an array of int, so it has only one field
EDIT
Size of struct with 6 integer fields is 24 bytes, which is huge to pass around.
Recommend size for struct is not more than 16 bytes
It depends how you are going to use it?
Are you going to allocate a lot of it vs. pass it around a lot?
Is it going to be consumed by 3rd party code? In this case, classes typically give you more flexibility.
Do you want struct vs. class semantics? For example, non-nullable?
Would you benefit from having a couple of pre-created instances of this class that can be re-use to represent special cases? Similar to String.Empty. In this case you would benefit from a class.
It is hard to answer just from the information you provided in your question.
Be careful of boxing. If your struct is going to be consumed by a method which expects an Object, it will be coerced into an object and you'll have a potential performance hit.
Here's a reference that explains it in more detail.
I'd make it a class (#2) and then you wouldn't have to worry about it.
Using an array of six integers (#3) would probably make your code harder to read. A class with descriptive identifiers for each int would be much better.
Without seeing your struct, it's difficult to say anything definitively. But I suspect you should leave this as a struct.
How about a WriteOnce<int[]> ?
I would suggest writing a little benchmark to measure the performance of the different options, this is the only way to know for sure. You may be surprised at the results (I often am).
(I'm assuming that your concern here is performance.)
If the data holder is going to be immutable, the struct-versus-class question will most likely depend upon the average number of references that would exist to each instance. If one has an array of TwentyFourByteStruct[1000], that array will take 24,000 bytes, regardless of whether every element holds a different value, all elements hold the same value, or somewhere in-between. If one has an array of TwentyFourByteClass[1000], that array will take 4,000 or 8,000 bytes (for 32/64-bit systems), and each distinct instance of TwentyFourByteClass which is created will take about 48 bytes. If all of the array elements happen to hold a reference to the same TwentyFourByteClass object, the total will be 4,048 or 8,048 bytes. If all of the array elements hold references to different TwentyFourByteClass objects, the total will be 52,000 or 56,000 bytes.
As for run-time performance, the best performance you can get will generally be passing structures by reference. Passing structures by value will require copying them, which can get expensive for structures larger than 16 bytes (.net includes optimizations for structures 16 bytes or smaller), but the cost of a value type by reference is the same whether it is 1 byte or 16,000 bytes.
In general, when storing more than two pieces of related data I like to make a class that binds them together. Especially if I will be passing them around as a unit.