Generic base class implementation for both struct and object [duplicate] - c#

I know that structs in .NET do not support inheritance, but its not exactly clear why they are limited in this way.
What technical reason prevents structs from inheriting from other structs?

The reason value types can't support inheritance is because of arrays.
The problem is that, for performance and GC reasons, arrays of value types are stored "inline". For example, given new FooType[10] {...}, if FooType is a reference type, 11 objects will be created on the managed heap (one for the array, and 10 for each type instance). If FooType is instead a value type, only one instance will be created on the managed heap -- for the array itself (as each array value will be stored "inline" with the array).
Now, suppose we had inheritance with value types. When combined with the above "inline storage" behavior of arrays, Bad Things happen, as can be seen in C++.
Consider this pseudo-C# code:
struct Base
{
public int A;
}
struct Derived : Base
{
public int B;
}
void Square(Base[] values)
{
for (int i = 0; i < values.Length; ++i)
values [i].A *= 2;
}
Derived[] v = new Derived[2];
Square (v);
By normal conversion rules, a Derived[] is convertible to a Base[] (for better or worse), so if you s/struct/class/g for the above example, it'll compile and run as expected, with no problems. But if Base and Derived are value types, and arrays store values inline, then we have a problem.
We have a problem because Square() doesn't know anything about Derived, it'll use only pointer arithmetic to access each element of the array, incrementing by a constant amount (sizeof(A)). The assembly would be vaguely like:
for (int i = 0; i < values.Length; ++i)
{
A* value = (A*) (((char*) values) + i * sizeof(A));
value->A *= 2;
}
(Yes, that's abominable assembly, but the point is that we'll increment through the array at known compile-time constants, without any knowledge that a derived type is being used.)
So, if this actually happened, we'd have memory corruption issues. Specifically, within Square(), values[1].A*=2 would actually be modifying values[0].B!
Try to debug THAT!

Imagine structs supported inheritance. Then declaring:
BaseStruct a;
InheritedStruct b; //inherits from BaseStruct, added fields, etc.
a = b; //?? expand size during assignment?
would mean struct variables don't have fixed size, and that is why we have reference types.
Even better, consider this:
BaseStruct[] baseArray = new BaseStruct[1000];
baseArray[500] = new InheritedStruct(); //?? morph/resize the array?

Structs do not use references (unless they are boxed, but you should try to avoid that) thus polymorphism isn't meaningful since there is no indirection via a reference pointer. Objects normally live on the heap and are referenced via reference pointers, but structs are allocated on the stack (unless they are boxed) or are allocated "inside" the memory occupied by a reference type on the heap.

Class like inheritance is not possible, as a struct is laid directly on the stack. An inheriting struct would be bigger then it parent, but the JIT doesn't know so, and tries to put too much on too less space. Sounds a little unclear, let's write a example:
struct A {
int property;
} // sizeof A == sizeof int
struct B : A {
int childproperty;
} // sizeof B == sizeof int * 2
If this would be possible, it would crash on the following snippet:
void DoSomething(A arg){};
...
B b;
DoSomething(b);
Space is allocated for the sizeof A, not for the sizeof B.

Here's what the docs say:
Structs are particularly useful for small data structures that have value semantics. Complex numbers, points in a coordinate system, or key-value pairs in a dictionary are all good examples of structs. Key to these data structures is that they have few data members, that they do not require use of inheritance or referential identity, and that they can be conveniently implemented using value semantics where assignment copies the value instead of the reference.
Basically, they're supposed to hold simple data and therefore do not have "extra features" such as inheritance. It would probably be technically possible for them to support some limited kind of inheritance (not polymorphism, due to them being on the stack), but I believe it is also a design choice to not support inheritance (as many other things in the .NET languages are.)
On the other hand, I agree with the benefits of inheritance, and I think we all have hit the point where we want our struct to inherit from another, and realize that it's not possible. But at that point, the data structure is probably so advanced that it should be a class anyway.

Structs are allocated on the stack. This means the value semantics are pretty much free, and accessing struct members is very cheap. This doesn't prevent polymorphism.
You could have each struct start with a pointer to its virtual function table. This would be a performance issue (every struct would be at least the size of a pointer), but it's doable. This would allow virtual functions.
What about adding fields?
Well, when you allocate a struct on the stack, you allocate a certain amount of space. The required space is determined at compile time (whether ahead of time or when JITting). If you add fields and then assign to a base type:
struct A
{
public int Integer1;
}
struct B : A
{
public int Integer2;
}
A a = new B();
This will overwrite some unknown part of the stack.
The alternative is for the runtime to prevent this by only writing sizeof(A) bytes to any A variable.
What happens if B overrides a method in A and references its Integer2 field? Either the runtime throws a MemberAccessException, or the method instead accesses some random data on the stack. Neither of these is permissible.
It's perfectly safe to have struct inheritance, so long as you don't use structs polymorphically, or so long as you don't add fields when inheriting. But these aren't terribly useful.

There is a point I would like to correct. Even though the reason structs cannot be inherited is because they live on the stack is the right one, it is at the same a half correct explanation. Structs, like any other value type can live in the stack. Because it will depend on where the variable is declared they will either live in the stack or in the heap. This will be when they are local variables or instance fields respectively.
In saying that, Cecil Has a Name nailed it correctly.
I would like to emphasize this, value types can live on the stack. This doesn't mean they always do so. Local variables, including method parameters, will. All others will not. Nevertheless, it still remains the reason they can't be inherited. :-)

This seems like a very frequent question. I feel like adding that value types are stored "in place" where you declare the variable; apart from implementation details, this means that there is no object header that says something about the object, only the variable knows what kind of data resides there.

Structs do support interfaces, so you can do some polymorphic things that way.

IL is a stack-based language, so calling a method with an argument goes something like this:
Push the argument onto the stack
Call the method.
When the method runs, it pops some bytes off the stack to get its argument. It knows exactly how many bytes to pop off because the argument is either a reference type pointer (always 4 bytes on 32-bit) or it is a value type for which the size is always known exactly.
If it is a reference type pointer then the method looks up the object in the heap and gets its type handle, which points to a method table which handles that particular method for that exact type. If it is a value type, then no lookup to a method table is necessary because value types do not support inheritance, so there is only one possible method/type combination.
If value types supported inheritance then there would be extra overhead in that the particular type of the struct would have to placed on the stack as well as its value, which would mean some sort of method table lookup for the particular concrete instance of the type. This would eliminate the speed and efficiency advantages of value types.

Related

Why adding a value to a byte (through an extention method in C#) returns zero? [duplicate]

Some guy asked me this question couple of months ago and I couldn't explain it in detail. What is the difference between a reference type and a value type in C#?
I know that value types are int, bool, float, etc and reference types are delegate, interface, etc. Or is this wrong, too?
Can you explain it to me in a professional way?
Your examples are a little odd because while int, bool and float are specific types, interfaces and delegates are kinds of type - just like struct and enum are kinds of value types.
I've written an explanation of reference types and value types in this article. I'd be happy to expand on any bits which you find confusing.
The "TL;DR" version is to think of what the value of a variable/expression of a particular type is. For a value type, the value is the information itself. For a reference type, the value is a reference which may be null or may be a way of navigating to an object containing the information.
For example, think of a variable as like a piece of paper. It could have the value "5" or "false" written on it, but it couldn't have my house... it would have to have directions to my house. Those directions are the equivalent of a reference. In particular, two people could have different pieces of paper containing the same directions to my house - and if one person followed those directions and painted my house red, then the second person would see that change too. If they both just had separate pictures of my house on the paper, then one person colouring their paper wouldn't change the other person's paper at all.
Value type:
Holds some value not memory addresses
Example:
Struct
Storage:
TL;DR: A variable's value is stored wherever it is decleared. Local variables live on the stack for example, but when declared inside a class as a member it lives on the heap tightly coupled with the class it is declared in.
Longer: Thus value types are stored wherever they are declared.
E.g.: an int's value inside a function as a local variable would be stored on the stack, whilst an in int's value declared as member in a class would be stored on the heap with the class it is declared in. A value type on a class has a lifetype that is exactly the same as the class it is declared in, requiring almost no work by the garbage collector. It's more complicated though, i'd refer to #JonSkeet's book "C# In Depth" or his article "Memory in .NET" for a more concise explenation.
Advantages:
A value type does not need extra garbage collection. It gets garbage collected together with the instance it lives in. Local variables in methods get cleaned up upon method leave.
Drawbacks:
When large set of values are passed to a method the receiving variable actually copies so there are two redundant values in memory.
As classes are missed out.it losses all the oop benifits
Reference type:
Holds a memory address of a value not value
Example:
Class
Storage:
Stored on heap
Advantages:
When you pass a reference variable to a method and it changes it indeed changes the original value whereas in value types a copy of the given variable is taken and that's value is changed.
When the size of variable is bigger reference type is good
As classes come as a reference type variables, they give reusability, thus benefitting Object-oriented programming
Drawbacks:
More work referencing when allocating and dereferences when reading the value.extra overload for garbage collector
I found it easier to understand the difference of the two if you know how computer allocate stuffs in memory and know what a pointer is.
Reference is usually associated with a pointer. Meaning the memory address where your variable reside is actually holding another memory address of the actual object in a different memory location.
The example I am about to give is grossly over simplified, so take it with a grain of salt.
Imagine computer memory is a bunch of PO boxes in a row (starting w/ PO Box 0001 to PO Box n) that can hold something inside it. If PO boxes doesn't do it for you, try a hashtable or dictionary or an array or something similar.
Thus, when you do something like:
var a = "Hello";
the computer will do the following:
allocate memory (say starting at memory location 1000 for 5 bytes) and put H (at 1000), e (at 1001), l (at 1002), l (at 1003) and o (at 1004).
allocate somewhere in memory (say at location 0500) and assigned it as the variable a.
So it's kind of like an alias (0500 is a).
assign the value at that memory location (0500) to 1000 (which is where the string Hello start in memory). Thus the variable a is holding a reference to the actual starting memory location of the "Hello" string.
Value type will hold the actual thing in its memory location.
Thus, when you do something like:
var a = 1;
the computer will do the following:
allocate a memory location say at 0500 and assign it to variable a (the same alias thing)
put the value 1 in it (at memory location 0500).
Notice that we are not allocating extra memory to hold the actual value (1).
Thus a is actually holding the actual value and that's why it's called value type.
This is from a post of mine from a different forum, about two years ago. While the language is vb.net (as opposed to C#), the Value Type vs. Reference type concepts are uniform throughout .net, and the examples still hold.
It is also important to remember that within .net, ALL types technically derive from the base type Object. The value types are designed to behave as such, but in the end they also inherit the functionality of base type Object.
A. Value Types are just that- they represent a distinct area in memory where a discrete VALUE is stored. Value types are of fixed memory size and are stored in the stack, which is a collection of addresses of fixed size.
When you make a statement like such:
Dim A as Integer
DIm B as Integer
A = 3
B = A
You have done the following:
Created 2 spaces in memory sufficient to hold 32 bit integer values.
Placed a value of 3 in the memory allocation assigned to A
Placed a value of 3 in the memory allocation assigned to B by assigning it the same value as the held in A.
The Value of each variable exists discretely in each memory location.
B. Reference Types can be of various sizes. Therefore, they can't be stored in the "Stack" (remember, the stack is a collection of memory allocations of fixed size?). They are stored in the "Managed Heap". Pointers (or "references") to each item on the managed heap are maintained in the stack (Like an Address). Your code uses these pointers in the stack to access objects stored in the managed heap. So when your code uses a reference variable, it is actually using a pointer (or "address" to an memory location in the managed heap).
Say you have created a Class named clsPerson, with a string Property Person.Name
In this case, when you make a statement such as this:
Dim p1 As clsPerson
p1 = New clsPerson
p1.Name = "Jim Morrison"
Dim p2 As Person
p2 = p1
In the case above, the p1.Name Property will Return "Jim Morrison", as you would expect. The p2.Name property will ALSO return "Jim Morrison", as you would Iintuitively expect. I believe that both p1 and p2 represent distinct addresses on the Stack. However, now that you have assigned p2 the value of p1, both p1 and p2 point to the SAME LOCATION on the managed heap.
Now COnsider THIS situation:
Dim p1 As clsPerson
Dim p2 As clsPerson
p1 = New clsPerson
p1.Name = "Jim Morrison"
p2 = p1
p2.Name = "Janis Joplin"
In this situation, You have created one new instance of the person Class on the Managed Heap with a pointer p1 on the Stack which references the object, and assigned the Name Property of the object instance a value of "Jim Morrison" again. Next, you created another pointer p2 in the Stack, and pointed it at the same address on the managed heap as that referenced by p1 (when you made the assignement p2 = p1).
Here comes the twist. When you the Assign the Name property of p2 the value "Janis Joplin" you are changing the Name property for the object REFERENCED by Both p1 and p2, such that, if you ran the following code:
MsgBox(P1.Name)
'Will return "Janis Joplin"
MsgBox(p2.Name)
'will ALSO return "Janis Joplin"Because both variables (Pointers on the Stack) reference the SAME OBJECT in memory (an Address on the Managed Heap).
Did that make sense?
Last. If you do THIS:
DIm p1 As New clsPerson
Dim p2 As New clsPerson
p1.Name = "Jim Morrison"
p2.Name = "Janis Joplin"
You now have two distinct Person Objects. However, the minute you do THIS again:
p2 = p1
You have now pointed both back to "Jim Morrison". (I am not exactly sure what happened to the Object on the Heap referenced by p2 . . . I THINK it has now gone out of scope. This is one of those areas where hopefullly someone can set me straight . . .). -EDIT: I BELIEVE this is why you would Set p2 = Nothing OR p2 = New clsPerson before making the new assignment.
Once again, if you now do THIS:
p2.Name = "Jimi Hendrix"
MsgBox(p1.Name)
MsgBox(p2.Name)
Both msgBoxes will now return "Jimi Hendrix"
This can be pretty confusing for a bit, and I will say one last time, I may have some of the details wrong.
Good Luck, and hopefully others who know better than me will come along to help clarify some of this . . .
value data type and reference data type
1) value( contain the data directly )
but
reference ( refers to the data )
2) in value( every variable has its own copy)
but
in reference (more than variable can refer to some objects)
3) in value (operation variable can`t effect on other variable )
but
in reference (variable can affect other )
4) value types are(int, bool, float)
but
reference type are (array , class objects , string )
Value Type:
Fixed memory size.
Stored in Stack memory.
Holds actual value.
Ex. int, char, bool, etc...
Reference Type:
Not fixed memory.
Stored in Heap memory.
Holds memory address of actual value.
Ex. string, array, class, etc...
"Variables that are based on value types directly contain values. Assigning one value type variable to another copies the contained value. This differs from the assignment of reference type variables, which copies a reference to the object but not the object itself." from Microsoft's library.
You can find a more complete answer here and here.
Sometimes explanations won't help especially for the beginners. You can imagine value type as data file and reference type as a shortcut to a file.
So if you copy a reference variable you only copy the link/pointer to a real data somewhere in memory. If you copy a value type, you really clone the data in memory.
This is probably wrong in esoterical ways, but, to make it simple:
Value types are values that are passed normally "by value" (so copying them). Reference types are passed "by reference" (so giving a pointer to the original value). There isn't any guarantee by the .NET ECMA standard of where these "things" are saved. You could build an implementation of .NET that is stackless, or one that is heapless (the second would be very complex, but you probably could, using fibers and many stacks)
Structs are value type (int, bool... are structs, or at least are simulated as...), classes are reference type.
Value types descend from System.ValueType. Reference type descend from System.Object.
Now.. In the end you have Value Type, "referenced objects" and references (in C++ they would be called pointers to objects. In .NET they are opaque. We don't know what they are. From our point of view they are "handles" to the object). These lasts are similar to Value Types (they are passed by copy). So an object is composed by the object (a reference type) and zero or more references to it (that are similar to value types). When there are zero references the GC will probably collect it.
In general (in the "default" implementation of .NET), Value type can go on the stack (if they are local fields) or on the heap (if they are fields of a class, if they are variables in an iterator function, if they are variables referenced by a closure, if they are variable in an async function (using the newer Async CTP)...). Referenced value can only go to the heap. References use the same rules as Value types.
In the cases of Value Type that go on the heap because they are in an iterator function, an async function, or are referenced by a closure, if you watch the compiled file you'll see that the compiler created a class to put these variables, and the class is built when you call the function.
Now, I don't know how to write long things, and I have better things to do in my life. If you want a "precise" "academic" "correct" version, read THIS:
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
It's 15 minutes I'm looking for it! It's better than the msdn versions, because it's a condensed "ready to use" article.
The simplest way to think of reference types is to consider them as being "object-IDs"; the only things one can do with an object ID are create one, copy one, inquire or manipulate the type of one, or compare two for equality. An attempt to do anything else with an object-ID will be regarded as shorthand for doing the indicated action with the object referred to by that id.
Suppose I have two variables X and Y of type Car--a reference type. Y happens to hold "object ID #19531". If I say "X=Y", that will cause X to hold "object ID #19531". Note that neither X nor Y holds a car. The car, otherwise known as "object ID #19531", is stored elsewhere. When I copied Y into X, all I did was copy the ID number. Now suppose I say X.Color=Colors.Blue. Such a statement will be regarded as an instruction to go find "object ID#19531" and paint it blue. Note that even though X and Y now refer to a blue car rather than a yellow one, the statement doesn't actually affect X or Y, because both still refer to "object ID #19531", which is still the same car as it always has been.
Variable types and Reference Value are easy to apply and well applied to the domain model, facilitate the development process.
To remove any myth around the amount of "value type", I will comment on how this is handled on the platform. NET, specifically in C # (CSharp) when called APIS and send parameters by value, by reference, in our methods, and functions and how to make the correct treatment of the passages of these values​​.
Read this article Variable Type Value and Reference in C #
Suppose v is a value-type expression/variable, and r is a reference-type expression/variable
x = v
update(v) //x will not change value. x stores the old value of v
x = r
update(r) //x now refers to the updated r. x only stored a link to r,
//and r can change but the link to it doesn't .
So, a value-type variable stores the actual value (5, or "h"). A reference-type varaible only stores a link to a metaphorical box where the value is.
Before explaining the different data types available in C#, it's important to mention that C# is a strongly-typed language. This means that each variable, constant, input parameter, return type and in general every expression that evaluates to a value, has a type.
Each type contains information that will be embedded by the compiler into the executable file as metadata which will be used by the common language runtime (CLR) to guarantee type safety when it allocates and reclaims memory.
If you wanna know how much memory a specific type allocates, you can use the sizeof operator as follows:
static void Main()
{
var size = sizeof(int);
Console.WriteLine($"int size:{size}");
size = sizeof(bool);
Console.WriteLine($"bool size:{size}");
size = sizeof(double);
Console.WriteLine($"double size:{size}");
size = sizeof(char);
Console.WriteLine($"char size:{size}");
}
The output will show the number of bytes allocated by each variable.
int size:4
bool size:1
double size:8
char size:2
The information related to each type are:
The required storage space.
The maximum and minimum values. For example, the type Int32 accepts values between 2147483648 and 2147483647.
The base type it inherits from.
The location where the memory for variables will be allocated at run time.
The kinds of operations that are permitted.
The members (methods, fields, events, etc.) contained by the type. For example, if we check the definition of type int, we will find the following struct and members:
namespace System
{
[ComVisible(true)]
public struct Int32 : IComparable, IFormattable, IConvertible, IComparable<Int32>, IEquatable<Int32>
{
public const Int32 MaxValue = 2147483647;
public const Int32 MinValue = -2147483648;
public static Int32 Parse(string s, NumberStyles style, IFormatProvider provider);
...
}
}
Memory management
When multiple processes are running on an operating system and the amount of RAM isn't enough to hold it all, the operating system maps parts of the hard disk with the RAM and starts storing data in the hard disk. The operating system will use than specific tables where virtual addresses are mapped to their correspondent physical addresses to perform the request. This capability to manage the memory is called virtual memory.
In each process, the virtual memory available is organized in the following 6 sections but for the relevance of this topic, we will focus only on the stack and the heap.
Stack
The stack is a LIFO (last in, first out) data structure, with a size-dependent on the operating system (by default, for ARM, x86 and x64 machines Windows's reserve 1MB, while Linux reserve from 2MB to 8MB depending on the version).
This section of memory is automatically managed by the CPU. Every time a function declares a new variable, the compiler allocates a new memory block as big as its size on the stack, and when the function is over, the memory block for the variable is deallocated.
Heap
This region of memory isn't managed automatically by the CPU and its size is bigger than the stack. When the new keyword is invoked, the compiler starts looking for the first free memory block that fits the size of the request. and when it finds it, it is marked as reserved by using the built-in C function malloc() and a return the pointer to that location. It's also possible to deallocate a block of memory by using the built-in C function free(). This mechanism causes memory fragmentation and has to use pointers to access the right block of memory, it's slower than the stack to perform the read/write operations.
Custom and Built-in types
While C# provides a standard set of built-in types representing integers, boolean, text characters, and so on, You can use constructs like struct, class, interface, and enum to create your own types.
An example of custom type using the struct construct is:
struct Point
{
public int X;
public int Y;
};
Value and reference types
We can categorize the C# type into the following categories:
Value types
Reference types
Value types
Value types derive from the System.ValueType class and variables of this type contain their values within their memory allocation in the stack. The two categories of value types are struct and enum.
The following example shows the member of the type boolean. As you can see there is no explicit reference to System.ValueType class, this happens because this class is inherited by the struct.
namespace System
{
[ComVisible(true)]
public struct Boolean : IComparable, IConvertible, IComparable<Boolean>, IEquatable<Boolean>
{
public static readonly string TrueString;
public static readonly string FalseString;
public static Boolean Parse(string value);
...
}
}
Reference types
On the other hand, the reference types do not contain the actual data stored in a variable, but the memory address of the heap where the value is stored. The categories of reference types are classes, delegates, arrays, and interfaces.
At run time, when a reference type variable is declared, it contains the value null until an object that has been created using the keywords new is assigned to it.
The following example shows the members of the generic type List.
namespace System.Collections.Generic
{
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(Generic.Mscorlib_CollectionDebugView<>))]
[DefaultMember("Item")]
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IEnumerable, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>
{
...
public T this[int index] { get; set; }
public int Count { get; }
public int Capacity { get; set; }
public void Add(T item);
public void AddRange(IEnumerable<T> collection);
...
}
}
In case you wanna find out the memory address of a specific object, the class System.Runtime.InteropServices provides a way to access to managed objects from unmanaged memory. In the following example, we are gonna use the static method GCHandle.Alloc() to allocate a handle to a string and then the method AddrOfPinnedObject to retrieve its address.
string s1 = "Hello World";
GCHandle gch = GCHandle.Alloc(s1, GCHandleType.Pinned);
IntPtr pObj = gch.AddrOfPinnedObject();
Console.WriteLine($"Memory address:{pObj.ToString()}");
The output will be
Memory address:39723832
References
Official documentation: https://learn.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations?view=vs-2019
I think these two pictures describe it the best. This is the case in languages like C#, Java, JavaScript, and Python. For C++ references mean different, and the equivalent of reference types are pointer types (That's why you see in various documents of different languages that they are used interchangeably). One of the important things is the meaning of "Pass by Value" and "Pass by Reference". I think there are other questions about them on StackOverflow you can seek for.
There are many little details of the differences between value types and reference types that are stated explicitly by the standard and some of them are not easy to understand, especially for beginners.
See ECMA standard 33, Common Language Infrastructure (CLI). The CLI is also standardized by the ISO. I would provide a reference but for ECMA we must download a PDF and that link depends on the version number. ISO standards cost money.
One difference is that value types can be boxed but reference types generally cannot be. There are exceptions but they are quite technical.
Value types cannot have parameter-less instance constructors or finalizers and they cannot refer to themselves. Referring to themselves means for example that if there is a value type Node then a member of Node cannot be a Node. I think there are other requirements/limitations in the specifications but if so then they are not gathered together in one place.

clearing doubts about value and reference types

value type variables contain the actual data directly and reference type variables contain the reference to the actual data.
I think of this as:
l.h.s is value type and r.h.s is reference type
on the left hand side, if I copy i into j, a new memory location is filled with same original data (45).
on the right hand side, if I copy k to l, a new memory location is filled with the reference to the object; and this reference points to the actual object in memory.
now, I am confused about this reference type copying. here's a slight different thing:
Here, the copy on the r.h.s makes l points to same location as k.
My question is 1. "Which one is truer?" or is there more to it than I imagined?
Also, value types may be allocated on heap, depending upon the how jitter sees it fit, then 2. Can we force a reference type to be allocated on stack?
Sorry for sloppy image editing.
The first picture is better, l and k are different variables, occupying different places in memory.
value types may be allocated on heap, depending upon the how jitter sees it fit
Actually it depends more on the context and the way a value is used. A value-type field would always be allocated on the heap, boxing and closures are other reasons.
However, the 2nd picture applies when l is a ref parameter:
MyClass k = new ...;
M(ref k);
void M(ref MyClass l) { /* Here l is an alias for k */ }
then 2. Can we force a reference type to be allocated on stack?
There is something like stackalloc but it's an optimization that is 'invisible' to a C# programmer.
The simple and most useful answer is: No.
Neither, and both. The problem is that you are talking about an implementation detail that is not specified in the C# language itself.
In fact you could be programming against a machine that only has a stack, or you might have registers available. At the end of the day, it is all just an implementation detail. The model that most resembles reality is dependent on the machine architecture you are running on.
Can we force a reference type to be allocated on stack?
We cannot force anything in that manner, the reference is on the stack when you initialize a variable inside a method for example. The reference type, actually the object initialized with new keyword, including the value types inside it is allocated on the heap.
Although it is a topic that one could write a book about, it all comes down to this:
We have two kind of type behavior in .NET. Value behavior and reference behavior. The distinction between the two is in their concept. Value types represent the value itself, the actual data and references are memory locations. Memory locations that are addresses in the actual object instances created on the heap. They represent a sort of a link to the actual object in the virtual address space.
I have written a blog post that goes somewhat into detail and tries to conceptually explain how it all works on a lower level. But my explenation in there is based primarily on the x86 architecture and is not exactly how everything is implemented. How C# and the .NET framework together with the JIT can change, but hope it helps to make it clearer a little bit.
Nice drawing! I think the first image is "truer". They reference the same object, but to store the references they also need variables. References or pointers are also variables, which means they have their own memory location. 2. I don't think so. (I am not sure)
the first is the most right. a reference is a number, and there for when you copy you copy the address. in c you could actually put a reference into a integer, in c# the look over you a beat more.
if you declare a struct instead of a class it well be in the stack (as far as I remember)
The first picture is most correct because you consider
l = k;
If instead you have this situation
class MyClass
{
internal string k;
internal void Test()
{
OtherClass.Method(ref k);
}
}
class OtherClass
{
internal static Method(ref string l)
{
// do a lot of stuff using l
}
}
then in this case I would say the second picture is more correct because this parameter has the ref keyword with it. That means that if someone changes the k reference to point to another string object, while the OtherClass.Method is running, then all of a sudden the l variable will point to the new object too.
But this is only true because of the ref keyword.
First picture definitely... Otherwise an invocation with a "by-ref" argument would affect both references, i.e. Xxx(ref k); possibly "redirecting" both k and l.
Regarding stack et al, you might want to read Eric Lippert (The Stack Is An
Implementation Detail)

Passing immutable value types by reference by default

Usally I choose between struct and class not because of memory issues but because of semantics of the type. Some of my value types have quite large memory footprint, sometimes too large to copy this data all the time. So I wonder if it is a good idea to pass immutable value objects always by reference? Since the objects are immutable they cannot by modified by methods that accept them by reference. Are there other issues when passing by reference?
Some of my value types have quite large memory footprint
That suggests they shouldn't be value types, from an implementation point of view. From "Design Guidelines for Developing Class Libraries", section "Choosing Between Classes And Structures":
Do not define a structure unless the type has all of the following characteristics:
It logically represents a single value, similar to primitive types (integer, double, and so on).
It has an instance size smaller than 16 bytes.
It is immutable.
It will not have to be boxed frequently.
It sounds like you should be creating immutable reference types instead. In many ways they end up "feeling" like value objects anyway (think strings) but you won't need to worry about the efficiency of passing them around.
"Immutability" for value types is a slightly fluid concept - and it certainly doesn't mean that using ref is safe:
// int is immutable, right?
int x = 5;
Foo(ref x);
Console.WriteLine(x); // Eek, prints 6...
...
void Foo(ref int y)
{
y = 6;
}
We're not changing one part of the value - we're replacing the whole of the value of x with an entirely different value.
Immutability is somewhat easier to think about when it comes to reference types - although even then you can have an object which in itself won't change, but can refer to mutable objects...
Jon's answer is of course correct; I would add this to it: value types are already passed by reference when you call a method on the value type. For example:
struct S
{
int x;
public S(int x) { this.x = x; }
public void M() { Console.WriteLine(this.x); }
}
Method M() is logically the same thing as:
public static void M(ref S _this) { Console.WriteLine(_this.x); }
Whenever you call an instance method on a struct, we pass a ref to the variable that was the receiver of the call.
So what if the receiver is not a variable? Then the value is copied into a temporary variable which is used as the receiver. And if the value is big, that's potentially an expensive copy!
Value types are copied by value; that's why they're called value types. Unless you are planning on being extremely careful about finding all the possible expensive copies and eliminating them, I would follow the framework design guideline's advice: keep structs under 16 bytes, and pass them by value.
I would also emphasize that Jon is right: passing a struct by ref means passing a reference to a variable, and variables can change. That's why they're called "variables". There is no "const ref" in C# the way there is in C++; even if the value type itself seems to be "immutable" that doesn't mean that the variable holding it is immutable. You can see an extreme example of that in this contrived but educational example:
struct S
{
readonly int x;
public S(int x) { this.x = x; }
public void M(ref S s)
{
Console.WriteLine(this.x);
s = new S(this.x + 1);
Console.WriteLine(this.x);
}
}
Is it possible for M to write out two different numbers? You would naively think that the struct is immutable, and therefore x cannot change. But both s and this are variables, and variables can change:
S q = new S(1);
q.M(ref q);
That prints 1, 2 because this and s are both references to q, and nothing is stopping q from changing; it is not readonly.
In short: if I had a lot of data that I wanted to be passing around and have strong guarantees that it was immutable, I'd be using a class, not a struct. Only use a struct in that scenario if you have a demonstrated performance problem that is actually solved by making it a struct, keeping in mind that large structs are potentially very expensive to copy.
So I wonder if it is a good idea to pass immutable value objects always by reference? Since the objects are immutable they cannot by modified by methods that accept them by reference. Are there other issues when passing by reference?
It's not clear exactly what you mean. Assuming that you mean passing it as a ref or out parameter, then the method could merely assign a new instance to the storage location. This would modify what the caller sees because the storage location in the callee is an alias for the storage location passed by the caller.
If you're dealing with memory issues because of copying instances of struct around, you should consider making an immutable reference type, much like string.
I think actually the bad idea is to use structs when everything you do points to use classes
Related answer: https://stackoverflow.com/questions/2440029/is-it-a-bad-practice-to-pass-structs-by-reference

struct - what is it for?

I know something about struct type. But I can't understand: what is it for? when have I use it? Classes, simple value-types and enums - that's all that I need.
Any suggestions?
UPD: PLEASE! Don't tell me that struct is in the stack (I know this :). What struct is for?
You choose a struct if you want value-type semantics. You choose a class if you want reference-type semantics. All other concerns are secondary to this one.
MSDN provdies a guide : Choosing Between Classes and Structures:
Consider defining a structure instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.
Do not define a structure unless the type has all of the following characteristics:
It logically represents a single value, similar to primitive types (integer, double, > and so on).
It has an instance size smaller than 16 bytes.
It is immutable.
It will not have to be boxed frequently.
Things that should be a struct (because they are values):
struct Color
struct Point
struct Rectangle
struct GLVertex (contains location, color, normal and texcoord)
struct DateTime
Things that should be a class (because they are things to which you refer):
class RandomGenerator
class Socket
class Thread
class Window
Why?
Consider the following code.
class Button
{
public Point Location { get; set; }
}
class Program
{
public static void Main()
{
var button = Util.GetButtonFromSomewhere();
var location = button.Location;
Util.DrawText("one", location);
location.Y += 50;
Util.DrawText("two", location);
location.Y += 50;
Util.DrawText("three", location);
}
}
This will draw 3 text labels, vertically aligned. But if Point is a class, this will also move the button, which is really unexpected: var location = button.Location feels like it should copy a value, and not a reference! In other words, we think of Point as a value type and not a reference type. "value" here is used in the mathematical sense of value. Consider the number 5, it's an abstract object "somewhere out there", you just "refer" to it. Similarly, a Point simply is. It doesn't reside anywhere, we can't change it. Therefore we choose to make it a struct, so it has the semantics users expect.
On the other hand, we could have class Button { public Window Parent { get; set; } }. Here, Parent is an entity, so we represent it with a reference type - Window. It may make sense to use code like myButton.Parent.Redraw();. So Window should be a class.
So far so good. But all this probably sounds too vague to you. How do you really decide if something "feels" like a reference or a value? My rule of thumb is simple:
What should Foo a = b; a.Mutate(); do?
If it seems like it should leave b unchanged, make Foo a struct.
Otherwise make it a class.
Use the principle of least surprise here.
Simple value types are best implemented via a struct.
Struct Usage Guidelines
It is recommended that you use a
struct for types that meet any of the
following criteria:
* Act like primitive types.
* Have an instance size under 16 bytes.
* Are immutable.
* Value semantics are desirable.
You must also understand that a class instance is allocated on the heap.
A struct -is a vallue type- and is allocated on the stack.
First you must understand the difference between value-type and reference type. I will assume since you said to skip that part that you understand what it is.
Struct is a value-type and you get all of the privileges that you would get working with a value-type.
Structs are passed around by value. When you do something like
DateTime time = new DateTime();
DateTime newTime = time;
// you are not referencing time
// instead you have created a new instance
Structs are NOT lightweight classes they may have many methods, just look at DateTime struct.
Structs maybe lighter in performance, but not all the time. Consider passing a large struct to a method. Because structs are value-types each time you pass one into a method a new instance of the struct is created, hence copying the struct each time. If you have a fairly large struct this will be a much larger performance hit.
You may have to occasionally box and unbox structs, since they are value types.
In short, use a struct to represent an atomic value in memory.
Youcan use structs when you want a "class" with value (rather than reference) semantics.
structs are for objects that represent something whose identity is defined by the values stored in it's properties rather than by an Id or key. These are called "Value types" as opposed tyo objects called "Entity Types", whose identity persists over time and is not dependant on the values of the properties of the object.
As an example, a Person class (an Entity) has an identity that persists from one session to another, even from year to year, in spite of how the Person's address, phone number, employer, etc might change from one instance to another. If you inadvertently have two instances of a Person class in memory at the same time, which represent the same individual/entity, then it is important that they have the same values for their properties.
A CalendarMonth object otoh, (a value type) only has identity defined by the value which specifies which calendar month it is... no matter how many instances of "March 2009" you might be using, they are all interchangeable and equivilent. Another example might be an object representing a FiscalYear designation in a tax program. A great example is an address object. (Not talking here about the asociation of an address with a person or a business, or any other entity, but just the address itself). Changing just about any property of an address makes it a different address. Once you create an address object in memory, it is equivilent and interchangeable with every other address object that has the same properties. This is why these value types should generally be immutable. When you need one, create a new one with the property value(s) you need, and throw it away when you're done using it.
If you don't know why you need it, you probably don't.
struct is a value type rather than a reference type. If you don't know what that means, then you probably aren't going to need it.
Example: Say you want a data type to represent Coordinates at X Y Z. You don't really need any functionality, only three variables. A struct would be good for this, a class may be overkill.
In reality, I think struct is an legacy from C. I do not think we MUST use it in any condition. Perhaps sometime you feel that leaving something on stack rather than on heap is more efficient; but as Java/C# never takes efficient as its first stand, so just neglect it:) That's my opinion.

Generic code is shared for reference type why and efficiency implications

Allegedly, the native code is shared for instantiated generic types when it is instantiated with a reference type, but not for value types.
Why is that? would someone explain the in-depth details?
To make more concrete:
class MyGenericType{
}
MyGenericType<string> and MyGenericType<Shape>
will have only one code generated, whereas
MyGenericType<int> and MyGenericType<long>
will NOT, hence it begs the question if using reference types is more efficient --
MyGenericType<int> vs. MyGenericType<SomeIntegerWrapper>
Thanks
First, to correct a fallacy in the question, int and System.Int32 are synonymous. MyGenericType<int> and MyGenericType<Int32> are exactly the same type.
Secondly, to address the question (and slightly expand on Mehrdad's answer): consider what the CLR needs to know about a type. It includes:
The size of a value of that type (i.e. if you have a variable of some type, how much space will that memory need?)
How to treat the value in terms of garbage collection: is it a reference to an object, or a value which may in turn contain other references?
For all reference types, the answers to these questions are the same. The size is just the size of a pointer, and the value is always just a reference (so if the variable is considered a root, the GC needs to recursively descend into it).
For value types, the answers can vary significantly. For instance, consider:
struct First
{
int x;
object y;
}
struct Second
{
object a;
int b;
}
When the GC looks at some memory, it needs to know the difference between First and Second so it can recurse into y and a but not x and b. I believe this information is generated by the JIT. Now consider the information for List<First> and List<Second> - it differs, so the JIT needs to treat the two differently.
Apologies if this isn't as clear as it might be - this is somewhat deep stuff, and I'm not as hot on CLR details as I might be.
Technically, at a lower level, all reference types are pointers and therefore, have the same size and characteristics. There is no need for the runtime to build separate native code for reference types. Value types can have different sizes, so a single native code cannot deal with all of them.

Categories