So strings are reference types right? My understanding is a reference to the string in the heap is passed even when you pass the string ByVal to a method.
Sooo.....
String myTestValue = "NotModified";
TestMethod(myTestValue);
System.Diagnostics.Debug.Write(myTestValue); /* myTestValue = "NotModified" WTF? */
private void TestMethod(String Value)
{
Value = "test1";
}
Alternatively
Dim myTestValue As String = "NotModified"
TestMethod(myTestValue)
Debug.Print(myTestValue) /* myTestValue = "NotModified" WTF? */
Private Sub TestMethod(ByVal Value As String)
Value = "test1"
End Sub
What am I missing? And what is going on under the hood? I would have bet my life that the value would have changed....
Reference types are passed "reference by value" in .NET. This means that assigning a different value to the actual parameter does not actually change original value (unless you use ByRef/ref). However, anything you do to change the actual object that gets passed in will change the object that the calling method refers to. For example, consider the following program:
void Main()
{
var a = new A{I=1};
Console.WriteLine(a.I);
DoSomething(a);
Console.WriteLine(a.I);
DoSomethingElse(a);
Console.WriteLine(a.I);
}
public void DoSomething(A a)
{
a = new A{I=2};
}
public void DoSomethingElse(A a)
{
a.I = 2;
}
public class A
{
public int I;
}
Output:
1
1
2
The DoSomething method assigned its a parameter to have a different value, but that parameter is just a local pointer to the location of the original a from the calling method. Changing the pointer's value did nothing to change the calling method's a value. However, DoSomethingElse actually made a change to one of the values on the referenced object.
Regardless of what the other answerers say, string is not exceptional in this way. All objects behave this way.
Where string differs from many objects is that it is immutable: there aren't any methods or properties or fields on string that you can call to actually change the string. Once a string is created in .NET, it is read-only.
When you do something like this:
var s = "hello";
s += " world";
... the compiler turns this into something like this:
// this is compiled into the assembly, and doesn't need to be set at runtime.
const string S1 = "hello";
const string S2 = " world"; // likewise
string s = S1;
s = new StringBuilder().Append(s).Append(S2).ToString();
This last line generates a new string, but S1 and S2 are still hanging around. If they are constant strings built into the assembly, they'll stay there. If they were created dynamically and have no more references to them, the garbage collector can de-reference them to free up memory. But the key is to realize that S1 never actually changed. The variable pointing to it just changed to point to a different string.
Everything is passed by value unless you specify otherwise. When you're passing a String, you're actually passing a reference by value.
For Strings, this doesn't make much difference, as Strings are immutable. Meaning you never get to modify the string you receive. For other classes, though, you can modify an object passed by value (unless, like String, it is immutable). What you can't do, and what passing by reference allows you to do, is modify the variable you're passing.
Example:
Public Class Example
Private Shared Sub ExampleByValue(ByVal arg as String)
arg = "ByVal args can be modifiable, but can't be replaced."
End Sub
Private Shared Sub ExampleByRef(ByRef arg as String)
arg = "ByRef args can be set to a whole other object, if you want."
End Sub
Public Shared Sub Main()
Dim s as String = ""
ExampleByValue(s)
Console.WriteLine(s) ''// This will print an empty line
ExampleByRef(s)
Console.WriteLine(s) ''// This will print our lesson for today
End Sub
End Class
Now, this should be used very sparingly, because by-value is the default and expected. Particularly in VB, which doesn't always make it clear when you're passing by reference, it can cause a lot of problems when some method starts unexpectedly mucking around with your variables.
All types, including reference types are passed by value by default, as in your example, which means that a copy of the reference is passed. So, so no matter what, re-assigning an object like that would have no effect when you pass it by value. You're just changing what the copy of the reference points to. You must explicitly pass by reference to achieve what you're trying to do.
It's only when you modify an object that's passed by value can the effect be seen outside of the method. Of course strings are immutable, so this doesn't really apply here.
You are passing a copy not the actual reference.
read this article from microsoft
http://msdn.microsoft.com/en-us/library/s6938f28.aspx
When you pass the string to the method, a copy of the reference is taken. Thus, Value is a whole new variable that just happens to still refer to the same string in memory.
The "test" string literal is also created as a real reference type object. It's not just a value in your source code.
When you assign "test" to Value, the reference for your Value variable is updated to refer to "test" instead of the original string. Since this reference is just a copy (as we saw in step 1), the myTestValue variable outside of the function remains unchanged and still refers to the original string.
You can get a better understanding of this by testing on a type with a property you can update. If you make a change to just the property, that change is visible outside the function. If you try to replace the entire object (as you are doing with this string), that is not visible outside the function.
Related
I have a variable that I want to change inside a function and reflex the new change in the orginal variable . I am trying to change the original variable value to Scott inside the function and then reflex that new change outside the function:
public ActionResult HomePage()
{
string name = "John";
ChangeName(name);
string newName = name ; -- This still says John
}
public static void ChangeName(string myname)
{
myname = "Scott";
}
You can do that by passing the string by reference -
public ActionResult HomePage()
{
string name = "John";
ChangeName(ref name);
string newName = name ; -- This is now Scott.
}
public static void ChangeName(ref string myname)
{
myname = "Scott";
}
However, as stated by TheSoftwareJedi in the comments, it is usually best to avoid passing parameters by reference. Instead, you should have your method return the new string, especially considering the fact that strings are immutable, so you can't really change them, you can only change the reference to point to another string.
So a better method would be something like this:
public static string GetAnotherName()
{
return "Scott";
}
A little more in depth - there are basically two kinds of types in c# (relevant to this point, at least): There are value types like enums, structs (including all primitive types such as int, bool etc') and there are reference types (basically, everything else).
Whenever you pass an argument to a method, it gets passed by value, unless you specify the ref (or out) keyword, even if it's a reference type (in that case, the reference gets passed by value). This means that when ever you are assigning a new value to the argument inside the method, you will only see it outside the method if the argument was passed explicitly by reference (using the ref or out keyword).
The main difference between reference types and value types is that when you change the properties of a reference type inside a method, you will see the new values outside the method as well, however when you change the properties of a value type inside a method, that change will not reflect to the variable outside that method.
Jon Skeet have written a fairly extensive article about that subject, and he is way better than me in explaining things, so you should probably read it as well.
To start with, I would recommend you to read about references, values and parameters passing. There is a nice summary on this theme by Jon Skeet — Parameter passing in C# and good explanation of reference concept by Eric Lippert — References are not addresses.
You should know that by default parameters are passed by value in C#, it means parameter will contain a copy of the reference passed as argument, it means assignments will only change parameter itself and won't be observable at the call site.
That's why
myname = "Scott";
Only changes value of the method parameter myname and not the outer name variable.
At the same time, we are able to pass our variable by reference with use of ref, in or out keywords. Although in and out keywords are adding excess guarantees, which are out of theme discussed, so I'll continue with ref.
You should change both declaration of your method and call site to use it.
public static void ChangeName(ref string myname)
{
myname = "Scott";
}
And it should be invoked now as
ChangeName(ref name);
This time there is no copying, so myname parameter stores the same reference as name variable and, moreover parameter and variable are stored at one location, it means changes to myname inside ChangeName method will be visible to invoking code.
To continue with, I'd like to point you to a separate, but related theme in regards of your question — Expressions and Statements and to link you to a good article about them written by Scott Wlaschin — Expressions vs statements (there is a bit of F# inside, but that's not critical).
Generally, there is nothing wrong with approach you've selected, but it's imperative, statement based and a bit too low level. You are forced to deal with references and their values, while what you really want is just to get value "Scott" from your method. This will look more straightforward and obvious, if implemented as an expression.
public static string GetName() => "Scott";
This way code is declarative and thus more simple (and short), it directly illustrates your goals.
At work we were encountering a problem where the original object was changed after we send a copy through a method. We did find a workaround by using IClonable in the original class, but as we couldn't find out why it happened in the first place.
We wrote this example code to reproduce the problem (which resembles our original code), and hope someone is able to explain why it happens.
public partial class ClassRefTest : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
var myclass = new MyClass();
var copy = myclass;
myclass.Mystring = "jadajadajada";
Dal.DoSomeThing(copy);
lit.Text = myclass.Mystring; //Text is expected to be jadajadajada,
but ends up to be referenced
}
}
public class MyClass
{
public string Mystring { get; set; }
}
public static class Dal
{
public static int? DoSomeThing(MyClass daclass)
{
daclass.Mystring = "referenced";
return null;
}
}
As you can see, in the DoSomething() method we're not using any ref argument, but still the lit.Text ends up to be referenced.
Why does this happen?
It is always interesting to explain how this works. Of course my explanation could not be on par with the magnificiency of the Jon Skeet one or Joseph Albahari, but I would try nevertheless.
In the old days of C programming, grasping the concept of pointers was fundamental to work with that language. So many years are passed and now we call them references but they are still ... glorified pointers and, if you understand how they work, you are half the way to become a programmer (just kidding)
What is a reference? In a very short answer I would tell. It is a number stored in a variable and this number represent an address in memory where your data lies.
Why we need references? Because it is very simple to handle a single number with which we could read the memory area of our data instead of having a whole object with all its fields moved along with our code.
So, what happens when we write
var myclass = new MyClass();
We all know that this is a call to the constructor of the class MyClass, but for the Framework it is also a request to provide a memory area where the values of the instance (property, fields and other internal housekeeping infos) live and exist in a specific point in time. Suppose that MyClass needs 100 bytes to store everything it needs. The framework search the computer memory in some way and let's suppose that it finds a place in memory identified by the address 4200. This value (4200) is the value that it is assigned to the var myclass It is a pointer to the memory (oops it is a reference to the object instance)
Now what happens when you call?
var copy = myclass;
Nothing particular. The copy variable gets the same value of myclass (4200). But the two variables are referencing the same memory area so using one or the other doesn't make any difference. The memory area (the instance of MyClass) is still located at our fictional memory address 4200.
myclass.Mystring = "jadajadajada";
This uses the reference value as a base value to find the area of memory occupied by the property and sets its value to the intern area where the literal strings are kept. If I could make an analogy with pointers it is as you take the base memory (4200), add an offset to find the point where the reference representing the propery MyString is kept inside the boundaries of the 100 bytes occupied by our object instance. Let's say that the MyString reference is 42 bytes past the beginning of the memory area. Adding 42 to 4200 yelds 4242 and this is the point in which the reference to the literal "jadajadajada" will be stored.
Dal.DoSomeThing(copy);
Here the problem (well the point where you have the problem). When you pass the copy variable don't think that the framework repeat the search for a memory area and copy everything from the original area in a new area. No, it would be practically impossible (think about if MyClass contains a property that is an instance of another class and so on... it could never stop.) So the value passed to the DoSomeThing method is again the reference value 4200. This value is automatically assigned to the local variable daclass declared as the input parameter for DoSomething (It is like you have explicitly done before with var copy = myclass;.
At this point it is clear that any operation using daClass acts on the same memory area occupied by the original instance and you see the results when code returns back to your starting point.
I beg the pardon from the more technically expert users here. Particularly for my casual and imprecise use of the term 'memory address'.
that's normal since your MyClass is a reference type so you are passing a reference to original data not the data itself this why it's an expected behavior
here is an explanation of what a reference type is from Parameter passing in C#
A reference type is a type which has as its value a reference to the appropriate data rather than the data itself
I see two issues here...
Making a Copy of an object
var copy = myClass; does not make a copy - what it really does is create a second reference ("pointer") to myClass (naming the variable "copy" is misleading). So you have myClass and copy pointing to the same exact object.
To make a copy you have to do something like:
var copy = new MyClass(myClass);
Notice that I created a new object.
Passing By Reference
When passing value type variables without ref, the variable cannot be changed by the the receiving method.
Example: DoSomething(int foo) - DoSomething cannot affect the value of foo outside of itself.
When passing value type variables with ref, the variable can be changed
Example: DoSomething(ref int foo) - if DoSomething changes foo, it will remain changed.
When passing an object without ref, the object's data can be changed, but the reference to the object cannot be changed.
void DoSomething(MyClass myClass)
{
myClass.myString = "ABC" // the string is set to ABC
myClass = new MyClass(); // has no affect - or may not even be allowed
}
When passing an object with ref, the object's data can be changed, and the reference to the object can be changed.
void DoSomething(ref MyClass myClass)
{
myClass.myString = "ABC" // the string is set to ABC
myClass = new MyClass(); // the string will now be "" since myClass has been changed
}
The docs at MSDN say it pretty clearly. Value types are passed as a copy by default, objects are passed as a reference by default. Methods in C#
So I was reading Jon Skeet's C# in depth and came across some myths like Reference types are always passed by ref, So I decided to do a little experiment myself.
As you can see in the following code I have a simple Car class with one property which is initialized to 500 when the constructor is called. I also have the NullIt function which assigns null to the parameter value and a SpeedUp method which just changes the Speed property value.
Examining the main method you can see I instantiate a Car object, then I pass the object to the static SpeedUp method and the Speed value changes to 1000 but when I'm passing it to the also static NullIt method the object remains intact. From this the only thing I can assume is that the object is passed by value and the fields / properties are passed by reference. Is this right?
I know that if I pass it using the ref keyword will return null.
class Program
{
static void Main(string[] args)
{
Car c = new Car();
Car.SpeedUP(c);
Car.NullIt(c);
}
class Car
{
public int Speed { get; set; }
public Car() { Speed = 500; }
public static void SpeedUP(Car c)
{
c.Speed = 1000;
}
public static void NullIt(Car c)
{
c = null;
}
}
}
From this the only thing I can assume is that the object is passed by
value and the fields / properties are passed by reference. Is this
right?
Not really. Object's address is passed by value.
So when you do:
Car.SpeedUP(c);
Now your parameter of method SpeedUp and your field c in caller, both point to the same location in memory. Thus changing the property works.
But for your call:
Car.NullIt(c);
Your method parameter c and caller's c both points to same location. But since you assign null to your parameter c , now it is not pointing to any memory location, but the original/caller's c still points to the same memory location.
Consider the following:
When you pass parameter to your method then two references in memory would be pointing to the same address like:
But when you assign null it doesn't change the other reference.
The first reference (in caller) still points to the same location, only the method parameter is now pointing to null.
When you call NullIt() you are passing the value of a reference to your instance of Car. You then change this value to null. However, the original copy of that value remains intact. Exactly the same way passing the value of an int works - you can modify your local copy without affecting the "original".
If you were to change that to NullIt(ref Car c), you would be passing a reference to the reference, and hence setting it to null would set the original value of the reference to null. That last part can be a bit of a mind bender, but it's so rarely necessary (if ever) that you don't need to worry too much about it.
From this the only thing I can assume is that the object is passed by value and the fields / properties are passed by reference. Is this right?
While the observed effect may look like this, the way you describe it has some incorrect implications: The fields/properties are not "passed by reference" in a way that there is a reference to each field/property. It is rather that the reference to the object is passed by value.
That is why by accessing any member of the object, or the object itself, you are accessing the very same instance you passed into the method, not a copy thereof.
However, the variable c itself is a reference, and that reference is passed by value. That is why, in your methods, you cannot change that reference by assigning a new value and expect that the variable c itself now has a new value (in your case null).
It IS a bit confusing, because the word reference is overloaded (has two subtly different meanings).
When describing types as either value types or reference types. it means one thing,
Whether the data for the state of an object of that type is stored on the stack, (which is a section of memory that methods have access to) or whether it is stored on another section of memory called the Heap, and only the address of that section of heap is then stored on the stack, allowing code to only access the object indirectly.
When describing whether parameter values are passed to a method by value or by reference, otoh, it means something different. It means whether the actual value of the parameter is [copied and] passed to the method, or whether the address of the parameter value's memory slot is passed.
So you can actually have four combinations here:
Pass a value type by value - The value is copied and passed to
the method. The method cannot change the source value.
Pass a reference type by value. The address of the reference type
(which is on the Heap) is copied and passed. The method cannot
change the address in the source variable, but it CAN change the
data on the HEAP that the address points to.
Pass a value type by reference. The method gets the address of the
source object, (on the stack) and can change that source value.
Pass a reference type by reference. The method gets the address of
the variable (on the stack) that contains the address of the object
itself (on the Heap). The method can change the data in the source
object, AND CAN ALSO CHANGE WHICH OBJECT (ON THE HEAP) THE SOURCE
VARIABLE POINTS TO)
Take the following example:
string me = "Ibraheem";
string copy = me;
me = "Empty";
Console.WriteLine(me);
Console.WriteLine(copy);
The output is:
Empty
Ibraheem
Since it is class type (i.e. not a struct), String copy should also contain Empty because the = operator in C# assigns reference of objects rather than the object itself (as in C++)??
While the accepted answer addresses this (as do some others), I wanted to give an answer dedicated to what it seems like you're actually asking, which is about the semantics of variable assignment.
Variables in C# are simply pieces of memory that are set aside to hold a single value. It's important to note that there's no such thing as a "value variable" and a "reference variable", because variables only hold values.
The distinction between "value" and "reference" comes with the type. A Value Type (VT) means that the entire piece of data is stored within the variable.
If I have an integer variable named abc that holds the value 100, then that means that I have a four-byte block of memory within my application that stores the literal value 100 inside it. This is because int is a value type, and thus all of the data is stored within the variable.
On the other hand, if I have a string variable named foo that holds the value "Adam", then there are two actual memory locations involved. The first is the piece of memory that stores the actual characters "Adam", as well as other information about my string (its length, etc.). A reference to this location is then stored within my variable. References are very similar to pointers in C/C++; while they are not the same, the analogy is sufficient for this explanation.
So, to sum it up, the value for a reference type is a reference to another location in memory, where the value for a value type is the data itself.
When you assign something to a variable, all you're changing is that variable's value. If I have this:
string str1 = "foo";
string str2 = str1;
Then I have two string variables that hold the same value (in this case, they each hold a reference to the same string, "foo".) If then do this:
str1 = "bar";
Then I have changed the value of str1 to a reference to the string "bar". This doesn't change str2 at all, since its value is still a reference to the string "foo".
System.String is not a value type. It exhibits some behaviors that are similar to value types, but the behavior you have come across is not one of them. Consider the following code.
class Foo
{
public string SomeProperty { get; private set; }
public Foo(string bar) { SomeProperty = bar }
}
Foo someOtherFoo = new Foo("B");
Foo foo = someOtherFoo;
someOtherFoo = new Foo("C");
If you checked the output of foo.SomeProperty, do you expect it to be the same as someOtherFoo.SomeProperty? If so, you have a flawed understanding of the language.
In your example, you have assigned a string a value. That's it. It has nothing to do with value types, reference types, classes or structs. It's simple assignment, and it's true whether you're talking about strings, longs, or Foos. Your variables temporarily contained the same value (a reference to the string "Ibraheem"), but then you reassigned one of them. Those variables were not inextricably linked for all time, they just held something temporarily in common.
It isn't a value type. When you use a string literal, its actually a reference stored when compiled. So when you assign a string, you are basically changing the pointer like in C++.
Strings behave the same as any other class. Consider:
class Test {
public int SomeValue { get; set; }
public Test(int someValue) { this.SomeValue = someValue; }
}
Test x = new Test(42);
Test y = x;
x = new Test(23);
Console.WriteLine(x.SomeValue + " " + y.SomeValue);
Output:
23 42
– exactly the same behaviour as in your string example.
What your example shows is the classic behavior of a reference type which string is.
string copy = me; means that copy reference will point to same memory location where me is pointing.
Later me can point to other memory location but it won't affect copy.
Your code would do the same if you used value types as well. Consider using integers:
int me = 1;
int copy = me;
me = 2;
Console.WriteLine(me);
Console.WriteLine(copy);
This will print out the following:
2
1
While the other answers said exactly what the solution to your answer was, to get a better fundamental understanding of why you will want to have a read up on heap and stack memory allocation and when data is removed from memory by the garbage collector.
Here is a good page that describes the stack and heap memory and the garbage collector. At the bottom of the article there are links to the other parts of the explanation:
http://www.c-sharpcorner.com/UploadFile/rmcochran/csharp_memory01122006130034PM/csharp_memory.aspx?ArticleID=9adb0e3c-b3f6-40b5-98b5-413b6d348b91
Hopefully this should give you a better understanding of why
Answering the original question:
Strings in C# are the reference type with value type semantics.
They are being stored on the heap because storing them on the stack might be unsafe due to the limited size of the stack.
I have the following piece of code:
class Foo
{
public Foo()
{
Bar bar;
if (null == bar)
{
}
}
}
class Bar { }
Code gurus will already see that this gives an error. Bar might not be initialized before the if statement.
What is the value of bar? Shouldn't it be null? Aren't they set to null? (null pointer?)
No, local variables don't have a default value1. They have to be definitely assigned before you read them. This reduces the chance of you using a variable you think you've given a sensible value to, when actually it's got some default value. This can't be done for instance or static variables because you don't know in what order methods will be called.
See section 5.3 of the C# 3.0 spec for more details of definite assignment.
Note that this has nothing to do with this being a reference type variable. This will fail to compile in the same way:
int i;
if (i == 0) // Nope, i isn't definitely assigned
{
}
1 As far as the language is concerned, anyway... clearly the storage location in memory has something in it, but it's irrelevant and implementation-specific. There is one way you can find out what that value is, by creating a method with an out parameter but then using IL to look at the value of that parameter within the method, without having given it another value. The CLR doesn't mind that at all. You can then call that method passing in a not-definitely-assigned variable, and lo and behold you can detect the value - which is likely to be the "all zeroes" value basically.
I suspect that the CLI specification does enforce local variables having a default value - but I'd have to check. Unless you're doing evil things like the above, it shouldn't matter to you in C#.
Fields (variables on classes / structs) are initialized to null/zero/etc. Local variables... well - since (by "definite assignment") you can't access them without assigning there is no sensible way of answering; simply, it isn't defined since it is impossible. I believe they happen to be null/zero/etc (provable by hacking some out code via dynamic IL generation), but that is an implementation detail.
For info, here's some crafy code that shows the value of a formally uninitialised variable:
using System;
using System.Reflection.Emit;
static class Program
{
delegate void Evil<T>(out T value);
static void Main()
{
MakeTheStackFilthy();
Test();
}
static void Test()
{
int i;
DynamicMethod mthd = new DynamicMethod("Evil", null, new Type[] { typeof(int).MakeByRefType()});
mthd.GetILGenerator().Emit(OpCodes.Ret); // just return; no assignments
Evil<int> evil = (Evil<int>)mthd.CreateDelegate(typeof(Evil<int>));
evil(out i);
Console.WriteLine(i);
}
static void MakeTheStackFilthy()
{
DateTime foo = new DateTime();
Bar(ref foo);
Console.WriteLine(foo);
}
static void Bar(ref DateTime foo)
{
foo = foo.AddDays(1);
}
}
The IL just does a "ret" - it never assigns anything.
Local variables do not get assigned a default value. You have to initialize them before you use them. You can explicityly initialize to null though:
public Foo()
{
Bar bar = null;
if (null == bar)
{
}
}
Local variables are not assigned a default value, not even a null.
The value of bar is undefined. There's space allocated for it on the stack, but the space isn't initialised to any value so it contains anything that happened to be there before.
(The local variable might however be optimised to use a register instead of stack space, but it's still undefined.)
The compiler won't let you use the undefined value, it has to be able to determine that the variable is initialised before you can use it.
As a comparison, VB does initialise local variables. While this can be practical sometimes, it can also mean that you unintenionally use a variable before you have given it a meaningful value, and the compiler can't determine if it's what you indended to do or not.
It doesn't matter because no such code should be compilable by any compiler that implements C#.
If there was a default value, then it would be compilable. But there is none for local variables.
Besides "correctness", local variable initialization is also related to the CLR's verification process.
For more details, see my answer to this similar question: Why must local variables have initial values?