After reading the MSDN article on the ref keyword, I am confused as to what C# does when you pass a value type using the ref keyword. The documentation states that the ValueTypes are not boxed. My question is how does C# handle passing a value type as a reference? Is it passing some copy to the data that is allocated on the Stack? Thanks.
Is it passing some copy to the data that is allocated on the Stack?
No, it does not make a copy. ref and out keyword can be compared to passing by pointer in C or passing by reference in C++, when the memory location (i.e. an address) of the variable is passed to the target method. The method that takes a reference would then modify the value directly in place using the memory location passed in.
Knowing that the variable is passed by reference, compiler inserts instructions that treat the ref variable as an address, allowing in-place modifications.
tl;dr: Boxing isn't "how you create a reference"; it's "how you package a primitive value type for consumers who don't expect that exact type".
In .NET, reference types are class instances on the heap. Value types like int or double are just the bytes: A 32-bit int is just four bytes worth of zeroes and ones. When you put it in, say a System.List (the old-timey pre-generic kind, that Granpaw whittled out down at the General Store), then take it back out, how will the compiler know what to do if you call GetType() on it? It would just have four bytes of... what? Who knows? If it stored a pointer in the List, it would have a pointer to four bytes of... who knows?
In your own method, the generated code knows what your variable is. Regular strong type-checking. But that doesn't work when you send your variable's value it to somebody else who only knows he's expecting Object.
So when you add an int to a List, or pass it to a function that takes Object as an argument, the compiler has to add some information to it so everybody else knows what he's getting.
So "Boxing" means packaging a non-reference value into an object that can be treated as an instance of Object. For ordinary ref parameters, that's not necessary, because the type is known the whole way: The code generated for the guts of the function doesn't have to be prepared to deal with any arbitrary reference type. It knows it's getting (for example) a pointer to an integer, and that's all it's going to get. Boxing provides capability that's not required in this case, and so the compiler doesn't waste your users' cycles on it.
Boxing isn't the only way to have a reference (in the broadest sense of the term) to, for example, a double. Rather, boxing is the only way to treat a double as an object that can be stored in a System.List: It has to be on the heap, it has to be castable to Object, has to have run-time type information, etc. etc.
For the following, all all the caller or the callee need is the address of 64 zeroes and ones somewhere:
void f(ref double d) { d *= 2; }
Related
There are many examples, let's take array copy method as an example. The signature of the Array.Copy is method is as below
public static void Copy (Array sourceArray, long sourceIndex, Array destinationArray, long destinationIndex, long length);
Judging only from signature, one can not tell that the sourceArray will not be changed while the destinationArray will be altered, even if it is some thing as simple as an array of Int. The guarantee coming from the keyword "ref" for programmers have lost here.
It seems to me that the the destinationArray parameter should better be marked as "ref Array". If it had been done this way, the syntax would be more consistent with the usage of the keyword "ref", indicating that the passed in object might be modified by the callee and the change is visible for the caller. The only benefit I can think of concerning mitting the keyword "ref", is that saves a few key strokes. or it is just mimicking the C/C++ style without much thinking.
My question is: what are some seasonings behind this design decision?
Update: For the record, I am advocating that an array be of the same value/reference category as its elements, thus making a clear extinction between Fun(array) and Fun(ref array), that is the same guarantee programmers get with Fun(int) and Fun(ref int). Optimization for efficiency can be left to the implementation level.
Array is a reference type. You can pass references by value and the instances they reference will still be the same ones that get modified. The callee is modifying the same instance using its own reference to it and has no reason to change it into a completely different instance entirely (which is where ref would actually come into use).
There isn't any convention that states to use ref when passing reference types — you generally don't need to most of the time, except as mentioned if your method actually intends to change the instance entirely like so:
class Foo { public int Value; }
public static void ReplaceFoo(ref Foo foo)
{
foo = new Foo { Value = 2 };
}
var foo = new Foo { Value = 1 };
Console.WriteLine(foo.Value);
ReplaceFoo(ref foo);
Console.WriteLine(foo.Value);
Judging only from signature, one can not tell that the sourceArray will not be changed while the destinationArray will be altered
Why is this a problem? No one reads APIs only paying attention to method signatures and ignoring parameter names. Signatures are there for the compiler to distinguish overloads. Anyone reading the API for Array.Copy() would understand that sourceArray is going to be unchanged, being where the method is getting the values from, and destinationArray is going to be modified, being the one receiving the values — unless they don't speak English (which is fine, but most APIs are written in English).
The only other scenario I can think of where a reader would be confused is if they didn't have the prior knowledge that arrays are reference types in .NET. But misusing ref in a situation where it's not needed at best and inappropriate at worst doesn't solve that problem.
C# (and .NET) include both reference types and value types.
Normally (absent ref or out keywords), parameters are passed to methods by value. So, if you pass an integer to a function, the value of the integer is passed. If you put a a variable referring to an array in a function call (remembering that all arrays are instances of the reference type System.Array), the value of that variable, i.e., the reference to to the array, is passed to the function.
So, within the function, the code gets to play on that array. When the function returns, that variable (in the scope of the caller) still refers to that same object. However, the function may have mutated that array, so the variable (in the caller scope) may be referring to a changed object.
If you pass a value type by reference (with the ref keyword), the function can change the value of the parameter, and when the function returns, the variable (in the caller scope) will receive the new value.
But, if you use ref (or out) on a parameter of reference type, you are passing a reference by reference. So, for example, you could pass in an array of five integers and the function could assign that parameter and array of ten integers (they are of the same type, but definitely differentobjects). In the caller, when the function returns, the variable associated with that parameter will see what it refers to completely change during the call.
In your example, the caller will instantiate two arrays of the same type and compatible lengths (usually the same length if the source and destination indexes are 0 and the length is sourceArray.Length). The function does not change what object the destination array parameter refers to, it just fills the destination from the source.
In fact, if the destination was by ref, it wouldn't be as flexible. Consider a case where the destination is 30 entries long, and your intention is to fill the middle ten array entries with the source. It just works. It wouldn't with a ref destination parameter (without a lot more work).
The reason for omitting the ref keyword is that in most cases, it won't make any difference to include it, so it's superfluous. However, it does actually make a difference in some cases. An array is a reference type, and that means a value representing that reference gets passed. Normally, updating the passed in value will trigger updates to the original object. BUT if you create a NEW array and assign the passed in parameter to the new item, the reference gets lost - whereas the ref keyword preserves it.
i am stuck in programming terminology here, which is getting me confused and i cannot gather my thoughts on how to actually and correctly express(write) these few MSDN theory sentences from the Common Type System page.
Would anyone help me on this one, i want to understand this!
And if someone would be so kind to write some code and comment on this issue,
it would be awesome and praiseworthy of you!
//This is the text(it is taken from the "Structures" paragraph):
https://msdn.microsoft.com/en-us/library/zcx1eb1e(v=vs.110).aspx#
"For each value type, the common language runtime supplies a corresponding boxed type, which is a class that has the same state and behavior as the value type.
An instance of a value type is boxed when it is passed to a method that accepts a parameter of type System.Object.
It is unboxed (that is, converted from an instance of a class back to an instance of a value type) when control returns from a method call that accepts a value type as a by-reference parameter.
Some languages require that you use special syntax when the boxed type is required; others automatically use the boxed type when it is needed.
When you define a value type, you are defining both the boxed and the unboxed type."
Thank You in advance, best regards!
An object and a value type are stored differently. An object is a pointer to memory in the heap which contains the binary representation of that object. The stack is memory allocated to store pointers and value types. So a function doesn't have a pointer to an integer or bool. It is passed a copy of the actual value.
But if you have a method like this:
string GetString(object o)
{
return o.ToString();
}
That method expects an object, a pointer to a location in memory, even if you pass a value type to it. So in order to do that, the framework has to create an object stored on the heap containing that int so that it can pass a reference (pointer) to that value to the function. That's boxing.
Boxing is implicit. You don't have to call some conversion function to convert an int to an object.
Unboxing occurs when you take that object and cast it as a value type. For example,
object x = 5; //Boxes the value to create an object with a pointer
var y = (int)x; //Unboxes the value, creating an int on the stack.
When you unbox the object stored in the heap and referenced by x is inspected and its value retrieved. Unboxing is explicit. When you convert anything from object to a value type you must specify the type to which you are converting it.
.NET is language agnostic which allows programmers to write code in different languages (which can be compiled to IL), and that code can interact with other code written in different languages.
This feature is provided by the CTS (Common Type System), a standard that specifies how type definitions are represented in the memory and how types are declared, used, and managed in the CLR (Common Language Runtime).
Example
C# has an int data type and VB.NET has a Integer data type. After the compilation, both instances of int and Integer will use the same structure Int32 from CTS.
As Int32 is a struct which means it is a System.ValueType (which inherits System.Object), when I pass an Integer to a function which expects Object, why should CLR box it?
Does CLR assumes that Object is always a reference type?
It is a bit confusing to think that ValueType "is" an Object but when you have to pass it "as" object, you need box it...
Am I the only one who is wondering about this?
It's not that a type derived from Object is always a reference type, but rather that a variable of type Object always contains a reference. Suppose you wanted to store the actual value in the Object; how then would you decide how big the Object value would need to be?
A variable of a compile-time-known value type has a known size for which space can be allocated, but an Object, being able to 'contain' any value type, cannot be sized in advance. One logical solution then is to have the Object variable contain a special type of reference to a boxed object, whereby the size of the 'box' is allocated dynamically depending on what type is being boxed.
Some slightly more technical notes:
Another solution to the above problem would be to treat the Object as a reference to an arbitrary location in memory, which would prevent having to create a boxed copy. This is how it's done in C, where you can create a pointer to a value on the stack, for instance, then pass that to another function for use. This can be quite dangerous though, as what happens, for instance, if the function decides to keep that pointer around and use it at some undefined later time. Since the call stack has changed, that pointer is now pointing to something entirely different than was originally intended and writing to it will almost certainly have disastrous side effects.
Part of the goal of .NET, as a managed runtime, is to provide a 'safe' environment where these particular kinds of failures can't happen. Part of that trade-off is disallowing persisted direct references to stack memory, necessitating boxing when you want to 'persist' the contents of a value type in a variable containing a reference. This used to be a performance problem with collections in .NET 1.1, but the addition of Generics in .NET 2.0 meant that boxing was far less common an occurrence.
When I read next book of chapter "Value and reference types" then a question comes to my mind: "When are value types stored in stack"? Cause programmer cannot initialise any value type out of class. Cause when we initialise some variable of value type in class then variable is stored in heap.
My question is: when are value types stored in stack?
Well, firstly it is very rare that you would need to know, but basically, value-types are stored where-ever they are owned.
They are stored on the stack when they are part of the execution flow of a thread, which can mean:
in a "local" (a method variable) - excluding some cases (below)
as a floating value in part of a method, i.e. the return value from one method that is about to be passed as a value to another method - no "local" is involved, but the value is still on the stack
value-type parameters that are passed by-value (i.e. without ref or out) are simply a special-case of this
in an instance "field" (a type variable) on another value-type that is itself on the stack (for the above reasons)
They are stored on the heap (as part of an object) when:
in an instance "field" on a class
in an instance "field" on a value-type that is itself on the heap
in a static "field"
in an array
in a "local" (a method variable) that is part of an iterator block, an async method, or which is a "captured" variable in a lambda or anonymous method (all of which cause the local to be hoisted onto a field on a class that is generated by the compiler)
when "boxed" - i.e. cast into a reference-type (object, dynamic, Enum, ValueType (yes: ValueType is a reference-type; fun, eh?), ISomeInterface, etc)
My question is: when are value types stored in stack?
From The Truth About Value Types:
[I]in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value
The first web search hit on your question gives you Eric Lippert's The Truth About Value Types, which starts with the most important part: it is almost always irrelevant. So, why do you want to know? Will you program differently?
Anyway:
The truth is: the choice of allocation mechanism has to do only with the known required lifetime of the storage.
To be precise, the stack and the heap are (or should be) irrelevant in managed environments.
In practice, local variables value types (structs in C#) tend to be allocated on the stack. However, there are cases when they are allocated on the heap instead.
One such case is when they are boxed. Boxing means using an Int32 as an Object, for example by passing it to a method that takes an object parameter. One reason for this is polymorphism: Structs don't carry a vTable pointer and thus cannot do dynamic virtual method resolution (for such methods as ToString(), for example) - but they are sealed, so they can do the resolution statically. On the other hand, if a struct is forced to be stored in an object reference, it needs to be transformed to a heap-allocated vTable-enabled object.
A value type may also be allocated in the heap when it's part of a heap-allocated object - for example, when it's a data member (field) of a class.
Another source of confusion appears to be that you assume reference and value types are 2 types of classes, that is not true
keyword class -> Reference type
keyword struct-> Value type
Why is string a reference type, even though it's normally primitive data type such as int, float, or double.
In addition to the reasons posted by Dan:
Value types are, by definition those types which store their values in themselves, rather than referring to a value somewhere else. That's why value types are called "value types" and reference types are called "reference types". So your question is really "why does a string refer to its contents rather than simply containing its contents?"
It's because value types have the nice property that every instance of a given value type is of the same size in memory.
So what? Why is this a nice property? Well, suppose strings were value types that could be of any size and consider the following:
string[] mystrings = new string[3];
What are the initial contents of that array of three strings? There is no "null" for value types, so the only sensible thing to do is to create an array of three empty strings. How would that be laid out in memory? Think about that for a bit. How would you do it?
Now suppose you say
string[] mystrings = new string[3];
mystrings[1] = "hello";
Now we have "", "hello" and "" in the array. Where in memory does the "hello" go? How large is the slot that was allocated for mystrings[1] anyway? The memory for the array and its elements has to go somewhere.
This leaves the CLR with the following choices:
resize the array every time you change one of its elements, copying the entire thing, which could be megabytes in size
disallow creating arrays of value types of unknown size
disallow creating value types of unknown size
The CLR team chose the latter one. Making strings into reference types means that you can create arrays of them efficiently.
Yikes, this answer got accepted and then I changed it. I should probably include the original answer at the bottom since that's what was accepted by the OP.
New Answer
Update: Here's the thing. string absolutely needs to behave like a reference type. The reasons for this have been touched on by all answers so far: the string type does not have a constant size, it makes no sense to copy the entire contents of a string from one method to another, string[] arrays would otherwise have to resize themelves -- just to name a few.
But you could still define string as a struct that internally points to a char[] array or even a char* pointer and an int for its length, make it immutable, and voila!, you'd have a type that behaves like a reference type but is technically a value type.
This would seem quite silly, honestly. As Eric Lippert has pointed out in a few of the comments to other answers, defining a value type like this is basically the same as defining a reference type. In nearly every sense, it would be indistinguishable from a reference type defined the same way.
So the answer to the question "Why is string a reference type?" is, basically: "To make it a value type would just be silly." But if that's the only reason, then really, the logical conclusion is that string could actually have been defined as a struct as described above and there would be no particularly good argument against that choice.
However, there are reasons that it's better to make string a class than a struct that are more than purely intellectual. Here are a couple I was able to think of:
To prevent boxing
If string were a value type, then every time you passed it to some method expecting an object it would have to be boxed, which would create a new object, which would bloat the heap and cause pointless GC pressure. Since strings are basically everywhere, having them cause boxing all the time would be a big problem.
For intuitive equality comparison
Yes, string could override Equals regardless of whether it's a reference type or value type. But if it were a value type, then ReferenceEquals("a", "a") would return false! This is because both arguments would get boxed, and boxed arguments never have equal references (as far as I know).
So, even though it's true that you could define a value type to act just like a reference type by having it consist of a single reference type field, it would still not be exactly the same. So I maintain this as the more complete reason why string is a reference type: you could make it a value type, but this would only burden it with unnecessary weaknesses.
Original Answer
It's a reference type because only references to it are passed around.
If it were a value type then every time you passed a string from one method to another the entire string would be copied*.
Since it is a reference type, instead of string values like "Hello world!" being passed around -- "Hello world!" is 12 characters, by the way, which means it requires (at least) 24 bytes of storage -- only references to those strings are passed around. Passing around a reference is much cheaper than passing every single character in a string.
Also, it's really not a normal primitive data type. Who told you that?
*Actually, this isn't stricly true. If the string internally held a char[] array, then as long as the array type is a reference type, the contents of the string would actually not be passed by value -- only the reference to the array would be. I still think this is basically right answer, though.
String is a reference type, not a value type. In many cases, you know the length of the string and the content of the string, in such cases, it is easy to allocate the memory for the string. but consider something like this.
string s = Console.ReadLine();
is it not possible to know the allocation details for "s" in compilation time. User enters the values and all the entered string/line is stored in the s. So, strings are stored on heap so that memory is reallocated to fit the content for the string s. And reference to this string is stored on stack.
To learn more please read: .net zero by petzold
Read: Garbage collection from CLR Via C# for allocation details on stack.
Edit: Console.WriteLine(); to Console.ReadLine();