Are all objects Immutable inside Heap? - c#

Given,
public class SomeClass {
public string SomeName{get;}
public List<string> RelatedNames{get;}
}
public class Program{
public void Main(){
var someClassInstance = new SomeClass(){ SomeName = "A", RelatedNames = new List<string>(1){ "a" }};
// So, now someClassInstance have been allocated some memory in heap = 1 string object and a list with 1 string object.
// Since SomeClass is mutable, it could be modified as below
someClassInstance.SomeName = "Now This is much more than a name";
someClassInstance.RelatedNames = someClassInstance.RelatedNames.AddRange(new List<string>(100} { "N","o","w".....});
//Now what happens inside heap?
//1.someClassInstance.SomeName will move it's pointer to another string inside heap
//2.someClassInstance.RealtedNames will move it's pointer to another List<>(101) inside heap.
//Is it correct? Then where is 'mutability' ?
}
}
As mentioned in the comments above, "AFAIK" on modifying a mutable object the internal pointers of that object will just point to another memory location inside heap. If that is correct, then does that mean that all objects inside heap (reference type) are immutable?
Thanks for your interest.

Where's mutability? Right there:
someClassInstance.SomeName = "Now This is much more than a name";
someClassInstance.RelatedNames = new List<string>(100} { "N","o","w".....};
You just mutated the object pointed to by someClassInstance.
Also, your example is a bit contrived. Strings are indeed immutable, but Lists are not, so you could have done this:
someClassInstance.RelatedNames.Add("HELLO!");
And then you just mutated the object pointed to by someClassInstance.RelatedNames.
EDIT: I see you changed your question. Well, then:
someClassInstance.SomeName will move it's pointer to another string inside heap
someClassInstance.RealtedNames will move it's pointer to another List<>(101) inside heap.
1 is true because String was designed to be immutable. That's why there's the StringBuilder class in case you need a mutable string.
2 is false, because that's not how List is implemented. Perhaps that's where your confusion comes from. Still, when you invoke AddRange, someClassInstance.RelatedNames will still point to the same instance, but that instance's internal state will have changed (most likely, its backing array will have been changed to point to a different array object, and its count would now be 101). In fact, a reference cannot magically change based on the operations that are invoked to the object it refers to.
And none of that changes the fact that someClassInstance's internal state was mutated anyway.

Object in the CLR are definitely not immutable by default. There is a little bit of confusion here because you've used string in your example which is a type that's implemented as an immutable type. This is certainly not the default in .Net though and mutability is far more common than immutability.
Take this line as an example
someClassInstance.SomeName = "Now This is much more than a name";
There are 3 objects of interest here in this statement.
The object referenced by someClassInstance.SomeName
The string which has the value "Now this is much more than a name"
The object referenced by 'someClassInstance`
All 3 of these values live in the heap. The execution of this statement will mutate the contents of the object referenced by someClassInstance. This is a prime example of mutability in action. If everything in this scenario were immutable then the settnig of SomeName would need to produce a copy of the object referenced by someClassInstance and give it the new value. This doesn't happen here and can be demonstrated by the following
var obj = someClassInstance; // Both reference the same object
someClassInstance.SomeName = "hello";
Console.WriteLine(someClassInstance.SomeName): // Prints "hello"

Yes because they are put on the heap using new or malloc and are pointers. As such you can only add or remove pointer references. So technically the objects themselves are not immutable, since they are not on the heap to begin with, but the pointer allocations on the heap are immutable.

Related

I don't understand why we need the 'new' keyword

I am new to C#, from a C++ background. In C++ you can do this:
class MyClass{
....
};
int main()
{
MyClass object; // this will create object in memory
MyClass* object = new MyClass(); // this does same thing
}
Whereas, in C#:
class Program
{
static void Main(string[] args)
{
Car x;
x.i = 2;
x.j = 3;
Console.WriteLine(x.i);
Console.ReadLine();
}
}
class Car
{
public int i;
public int j;
}
you can't do this. I wonder why Car x won't do its work.
There are a lot of misconceptions here, both in the question itself and in the several answers.
Let me begin by examining the premise of the question. The question is "why do we need the new keyword in C#?" The motivation for the question is this fragment of C++:
MyClass object; // this will create object in memory
MyClass* object = new MyClass(); // this does same thing
I criticize this question on two grounds.
First, these do not do the same thing in C++, so the question is based on a faulty understanding of the C++ language. It is very important to understand the difference between these two things in C++, so if you do not understand very clearly what the difference is, find a mentor who can teach you how to know what the difference is, and when to use each.
Second, the question presupposes -- incorrectly -- that those two syntaxes do the same thing in C++, and then, oddly, asks "why do we need new in C#?" Surely the right question to ask given this -- again, false -- presupposition is "why do we need new in C++?" If those two syntaxes do the same thing -- which they do not -- then why have two syntaxes in the first place?
So the question is both based on a false premise, and the question about C# does not actually follow from the -- misunderstood -- design of C++.
This is a mess. Let's throw out this question and ask some better questions. And let's ask the question about C# qua C#, and not in the context of the design decisions of C++.
What does the new X operator do in C#, where X is a class or struct type? (Let's ignore delegates and arrays for the purposes of this discussion.)
The new operator:
Causes a new instance of the given type to be allocated; new instances have all their fields initialized to default values.
Causes a constructor of the given type to be executed.
Produces a reference to the allocated object, if the object is a reference type, or the value itself if the object is a value type.
All right, I can already hear the objections from C# programmers out there, so let's dismiss them.
Objection: no new storage is allocated if the type is a value type, I hear you say. Well, the C# specification disagrees with you. When you say
S s = new S(123);
for some struct type S, the spec says that new temporary storage is allocated on the short-term pool, initialized to its default values, the constructor runs with this set to refer to the temp storage, and then the resulting object is copied to s. However, the compiler is permitted to use a copy-elision optimization provided that it can prove that it is impossible for the optimization to become observed in a safe program. (Exercise: work out under what circumstances a copy elision cannot be performed; give an example of a program that would have different behaviours if elision was or was not used.)
Objection: a valid instance of a value type can be produced using default(S); no constructor is called, I hear you say. That's correct. I didn't say that new is the only way to create an instance of a value type.
In fact, for a value type new S() and default(S) are the same thing.
Objection: Is a constructor really executed for situations like new S(), if not present in the source code in C# 6, I hear you say. This is an "if a tree falls in the forest and no one hears it, does it make a sound?" question. Is there a difference between a call to a constructor that does nothing, and no call at all? This is not an interesting question. The compiler is free to elide calls that it knows do nothing.
Suppose we have a variable of value type. Must we initialize the variable with an instance produced by new?
No. Variables which are automatically initialized, such as fields and array elements, will be initialized to the default value -- that is, the value of the struct where all the fields are themselves their default values.
Formal parameters will be initialized with the argument, obviously.
Local variables of value type are required to be definitely assigned with something before the fields are read, but it need not be a new expression.
So effectively, variables of value type are automatically initialized with the equivalent of default(S), unless they are locals?
Yes.
Why not do the same for locals?
Use of an uninitialized local is strongly associated with buggy code. The C# language disallows this because doing so finds bugs.
Suppose we have a variable of reference type. Must we initialize S with an instance produced by new?
No. Automatic-initialization variables will be initialized with null. Locals can be initialized with any reference, including null, and must be definitely assigned before being read.
So effectively, variables of reference type are automatically initialized with null, unless they are locals?
Yes.
Why not do the same for locals?
Same reason. A likely bug.
Why not automatically initialize variables of reference type by calling the default constructor automatically? That is, why not make R r; the same as R r = new R();?
Well, first of all, many types do not have a default constructor, or for that matter, any accessible constructor at all. Second, it seems weird to have one rule for an uninitialized local or field, another rule for a formal, and yet another rule for an array element. Third, the existing rule is very simple: a variable must be initialized to a value; that value can be anything you like; why is the assumption that a new instance is desired warranted? It would be bizarre if this
R r;
if (x) r = M(); else r = N();
caused a constructor to run to initialize r.
Leaving aside the semantics of the new operator, why is it necessary syntactically to have such an operator?
It's not. There are any number of alternative syntaxes that could be grammatical. The most obvious would be to simply eliminate the new entirely. If we have a class C with a constructor C(int) then we could simply say C(123) instead of new C(123). Or we could use a syntax like C.construct(123) or some such thing. There are any number of ways to do this without the new operator.
So why have it?
First, C# was designed to be immediately familiar to users of C++, Java, JavaScript, and other languages that use new to indicate new storage is being initialized for an object.
Second, the right level of syntactic redundancy is highly desirable. Object creation is special; we wish to call out when it happens with its own operator.
In C# you can do the similar thing:
// please notice "struct"
struct MyStruct {
....
}
MyStruct sample1; // this will create object on stack
MyStruct sample2 = new MyStruct(); // this does the same thing
Recall that primitives like int, double, and bool are also of type struct, so even though it's conventional to write
int i;
we may also write
int i = new int();
unlike C++, C# doesn't use pointers (in the safe mode) to instances,
however C# has class and struct declarations:
class: you have reference to instance,
memory is allocated on heap,
new is mandatory; similar to MyClass* in C++
struct: you have value,
memory is (usually) allocated on stack,
new is optional; similar to MyClass in C++
In your particular case you can just turn Car into struct
struct Car
{
public int i;
public int j;
}
and so the fragment
Car x; // since Car is struct, new is optional now
x.i = 2;
x.j = 3;
will be correct
In C#, class type objects are always allocated on the heap, i.e. variables of such types are always references ("pointers"). Just declaring a variable of such a type does not cause the allocation of an object. Allocating a class object on the stack like it's common to do in C++ isn't (in general) an option in C#.
Local variables of any type that have not been assigned to are considered uninitialized, and they cannot be read until they have been assigned to. This is a design choice (another way would have been to assign default(T) to every variable at declaration time) which seems like a good idea because it should protect you from some programming errors.
It's similar to how in C++ it wouldn't make sense to say SomeClass *object; and never assign anything to it.
Because in C# all class type variables are pointers, allocating an empty object when the variable is declared would lead to inefficient code when you actually only want to assign a value to the variable later, for instance in situations like this:
// Needs to be declared here to be available outside of `try`
Foo f;
try { f = GetFoo(); }
catch (SomeException) { return null; }
f.Bar();
Or
Foo f;
if (bar)
f = GetFoo();
else
f = GetDifferentFoo();
ignoring the stack vs heap side of things:
because C# made the bad decision to copy C++ when they should have just made the syntax
Car car = Car()
(or something similar). Having 'new' is superfluous.
When you use referenced types then in this statement
Car c = new Car();
there are created two entities: a reference named c to an object of type Car in the stack and the object of type Car itself in the heap.
If you will just write
Car c;
then you create an uninitialized reference (provided that c is a local variable) that points to nowhere.
In fact it is equivalent to C++ code where instead of references there are used pointers.
For example
Car *c = new Car();
or just
Car *c;
The difference between C++ and C# is that C++ can create instances of classes in the stack like
Car c;
In C# this means creating a reference of type Car that as I said points nowhere.
From the microsoft programming guide:
At run time, when you declare a variable of a reference type, the variable contains the value null until you explicitly create an instance of the object by using the new operator, or assign it an object that has been created elsewhere by using new
A class is a reference type. When an object of the class is created, the variable to which the object is assigned holds only a reference to that memory. When the object reference is assigned to a new variable, the new variable refers to the original object. Changes made through one variable are reflected in the other variable because they both refer to the same data.
A struct is a value type. When a struct is created, the variable to which the struct is assigned holds the struct's actual data. When the struct is assigned to a new variable, it is copied. The new variable and the original variable therefore contain two separate copies of the same data. Changes made to one copy do not affect the other copy.
I think in your C# example your effectively trying to assign values to a null pointer. In c++ translation this would look like:
Car* x = null;
x->i = 2;
x->j = 3;
This would obviously compile but crash.

C# use of new to instantiate an object

From my experience of C++, I know that in C, objects declared as ClassName ObjectName; are stored on the stack, and objects declared as ClassName ObjectName = new ClassName; are stored on the heap.
In C#, I seem to be being told from everywhere that the new keyword must be used, i.e. you cannot initialize an object like ClassName ObjectName; i.e.
Product P;
P.someMethod();
Why is this?
In C# class Objects and any values in the objects will always be stored on the heap. The new key word allocates memory on the heap for the object and any values it has, and returns the reference to its location. Until this is done you should not be able to work with the object functions.
so in example:
Product P = new Product();
p is actually a reference to the allocated object. An object can have multiple references to the same object.
Product C = P;
In the case C does not copy P, but it copies the reference to the object.
Structs work differently than objects since they are allocated on the stack. This means the same operation as above will actually copy the struct and allocate new memory for it on the stack.
I'll answer my own question for the sake of pulling the info together for clarity.
A combination of mohits00691 and Jon Skeet's answers clears this up. Even though P is declared as a type of Product, it has no default value and is not instantiated until it is set with "= new Product".
This differs from C++, where Product P would instantiate an object of class Product.
as far i know, code like :
Product p;
p.someFunction();
will throw an error while compiling only : "Unassigned Local variable". So you need to give value to every variable, be it reference type or value type, before using it in C#.

Why is the original object changed after a copy, without using ref arguments?

At work we were encountering a problem where the original object was changed after we send a copy through a method. We did find a workaround by using IClonable in the original class, but as we couldn't find out why it happened in the first place.
We wrote this example code to reproduce the problem (which resembles our original code), and hope someone is able to explain why it happens.
public partial class ClassRefTest : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
var myclass = new MyClass();
var copy = myclass;
myclass.Mystring = "jadajadajada";
Dal.DoSomeThing(copy);
lit.Text = myclass.Mystring; //Text is expected to be jadajadajada,
but ends up to be referenced
}
}
public class MyClass
{
public string Mystring { get; set; }
}
public static class Dal
{
public static int? DoSomeThing(MyClass daclass)
{
daclass.Mystring = "referenced";
return null;
}
}
As you can see, in the DoSomething() method we're not using any ref argument, but still the lit.Text ends up to be referenced.
Why does this happen?
It is always interesting to explain how this works. Of course my explanation could not be on par with the magnificiency of the Jon Skeet one or Joseph Albahari, but I would try nevertheless.
In the old days of C programming, grasping the concept of pointers was fundamental to work with that language. So many years are passed and now we call them references but they are still ... glorified pointers and, if you understand how they work, you are half the way to become a programmer (just kidding)
What is a reference? In a very short answer I would tell. It is a number stored in a variable and this number represent an address in memory where your data lies.
Why we need references? Because it is very simple to handle a single number with which we could read the memory area of our data instead of having a whole object with all its fields moved along with our code.
So, what happens when we write
var myclass = new MyClass();
We all know that this is a call to the constructor of the class MyClass, but for the Framework it is also a request to provide a memory area where the values of the instance (property, fields and other internal housekeeping infos) live and exist in a specific point in time. Suppose that MyClass needs 100 bytes to store everything it needs. The framework search the computer memory in some way and let's suppose that it finds a place in memory identified by the address 4200. This value (4200) is the value that it is assigned to the var myclass It is a pointer to the memory (oops it is a reference to the object instance)
Now what happens when you call?
var copy = myclass;
Nothing particular. The copy variable gets the same value of myclass (4200). But the two variables are referencing the same memory area so using one or the other doesn't make any difference. The memory area (the instance of MyClass) is still located at our fictional memory address 4200.
myclass.Mystring = "jadajadajada";
This uses the reference value as a base value to find the area of memory occupied by the property and sets its value to the intern area where the literal strings are kept. If I could make an analogy with pointers it is as you take the base memory (4200), add an offset to find the point where the reference representing the propery MyString is kept inside the boundaries of the 100 bytes occupied by our object instance. Let's say that the MyString reference is 42 bytes past the beginning of the memory area. Adding 42 to 4200 yelds 4242 and this is the point in which the reference to the literal "jadajadajada" will be stored.
Dal.DoSomeThing(copy);
Here the problem (well the point where you have the problem). When you pass the copy variable don't think that the framework repeat the search for a memory area and copy everything from the original area in a new area. No, it would be practically impossible (think about if MyClass contains a property that is an instance of another class and so on... it could never stop.) So the value passed to the DoSomeThing method is again the reference value 4200. This value is automatically assigned to the local variable daclass declared as the input parameter for DoSomething (It is like you have explicitly done before with var copy = myclass;.
At this point it is clear that any operation using daClass acts on the same memory area occupied by the original instance and you see the results when code returns back to your starting point.
I beg the pardon from the more technically expert users here. Particularly for my casual and imprecise use of the term 'memory address'.
that's normal since your MyClass is a reference type so you are passing a reference to original data not the data itself this why it's an expected behavior
here is an explanation of what a reference type is from Parameter passing in C#
A reference type is a type which has as its value a reference to the appropriate data rather than the data itself
I see two issues here...
Making a Copy of an object
var copy = myClass; does not make a copy - what it really does is create a second reference ("pointer") to myClass (naming the variable "copy" is misleading). So you have myClass and copy pointing to the same exact object.
To make a copy you have to do something like:
var copy = new MyClass(myClass);
Notice that I created a new object.
Passing By Reference
When passing value type variables without ref, the variable cannot be changed by the the receiving method.
Example: DoSomething(int foo) - DoSomething cannot affect the value of foo outside of itself.
When passing value type variables with ref, the variable can be changed
Example: DoSomething(ref int foo) - if DoSomething changes foo, it will remain changed.
When passing an object without ref, the object's data can be changed, but the reference to the object cannot be changed.
void DoSomething(MyClass myClass)
{
myClass.myString = "ABC" // the string is set to ABC
myClass = new MyClass(); // has no affect - or may not even be allowed
}
When passing an object with ref, the object's data can be changed, and the reference to the object can be changed.
void DoSomething(ref MyClass myClass)
{
myClass.myString = "ABC" // the string is set to ABC
myClass = new MyClass(); // the string will now be "" since myClass has been changed
}
The docs at MSDN say it pretty clearly. Value types are passed as a copy by default, objects are passed as a reference by default. Methods in C#

Assign this keyword in C#

Main question is what are the implications of allowing the this keyword to be modified in regards to usefulness and memory; and why is this allowed in the C# language specifications?
The other questions/subparts can be answered or not if choose to do so. I thought answers to them would help clarify the answer to the main question.
I ran across this as an answer to What's the strangest corner case you've seen in C# or .NET?
public struct Teaser
{
public void Foo()
{
this = new Teaser();
}
}
I've been trying to wrap my head around why the C# language specifications would even allow this. Sub-part 1. is there anything that would justify having this be modifiable? Is it every useful?
One of the comments to that answer was
From CLR via C#: The reason they made this is because you
can call the parameterless constructor of a struct in another
constructor. If you only want to initialize one value of a struct and
want the other values to be zero/null (default), you can write public
Foo(int bar){this = new Foo(); specialVar = bar;}. This is not
efficient and not really justified (specialVar is assigned twice), but
just FYI. (That's the reason given in the book, I don't know why we
shouldn't just do public Foo(int bar) : this())
Sub-part 2. I'm not sure I follow that reasoning. Can someone clarify what he meant? Maybe a concrete example of how it would be used?
EDIT (Disregard stack or heap main point is in regards to memory release or garbage collection. Instead of the int[] you could replace that with 262144 public int fields)
Also from my understanding structs are created on the stack as opposed to the heap if this struct were to have a 1 Mb byte array field initialized as so
public int[] Mb = new int[262144];
Sub-part 3. does this ever get removed from the stack when Foo is called? To me it seems since the struct never went out of scope it would not be removed from the stack. Don't have time tonight to create a test case but maybe I will for this one tomorrow.
In the below code
Teaser t1 = new Teaser();
Teaser tPlaceHolder = t1;
t1.Foo();
Sub-part 4. Are t1 and tPlaceHolder occupying the same or different address space?
Sorry to bring up a 3 year old post but this one has really got my head scratching.
FYI first question on stackoverflow so if I got something wrong with the question kindly post a comment and I will edit.
After 2 days I'll put a bounty of 50 on this question even if I have a winner chosen in my mind already as I think the answer will require a reasonable amount of work to explain the questions.
First of all, I think you should start by examining if you're even asking the right question. Perhaps we should be asking, "Why would C# not allow assignment to this in a struct?"
Assigning to the this keyword in a reference type is potentially dangerous: you are overwriting a reference to the object who's method you are running; you could even be doing so within the constructor that is initializing that reference. Its not clear what the behavior of that ought to be. To avoid having to figure that out, since it is not generally useful, it's not allowed by the spec (or compiler).
Assigning to the this keyword in a value type, however, is well defined. Assignment of value types is a copy operation. The value of each field is recursively copied over from right to left side of the assignment. This is a perfectly safe operation on a structure, even in a constructor, because the original copy of the structure is still present, you are just changing its data. It is exactly equivalent to manually setting each field in the struct. Why should the spec or compiler forbid an operation that is well-defined and safe?
This, by the way, answers one of your sub-questions. Value type assignment is a deep copy operation, not a reference copy. Given this code:
Teaser t1 = new Teaser();
Teaser tPlaceHolder = t1;
t1.Foo();
You have allocated two copies of your Teaser structure, and copied the values of the fields in the first into the fields in the second. This is the nature of value types: two types that have identical fields are identical, just like two int variables that both contain 10 are identical, regardless of where they are "in memory".
Also, this is important and worth repeating: careful making assumptions about what goes on "the stack" vs "the heap". Value types end up on the heap all the time, depending on the context in which they are used. Short-lived (locally scoped) structs that are not closed over or otherwise lifted out of their scope are quite likely to get allocated onto the stack. But that is an unimportant implementation detail that you should neither care about nor rely on. The key is that they are value types, and behave as such.
As far as how useful assignment to this really is: not very. Specific use cases have been mentioned already. You can use it to mostly-initialize a structure with default values but specify a small number. Since you are required to set all fields before your constructor returns, this can save a lot of redundant code:
public struct Foo
{
// Fields etc here.
public Foo(int a)
{
this = new Foo();
this.a = a;
}
}
It can also be used to perform a quick swap operation:
public void SwapValues(MyStruct other)
{
var temp = other;
other = this;
this = temp;
}
Beyond that, its just an interesting side-effect of the language and the way that structures and value types are implemented that you will most likely never need to know about.
Having this assignable allows for 'advanced' corner cases with structs. One example i found was a swap method:
struct Foo
{
void Swap(ref Foo other)
{
Foo temp = this;
this = other;
other = temp;
}
}
I would strongly argue against this use since it violates the default 'desired' nature of a struct which is immutability. The reason for having this option around is arguably unclear.
Now when it comes to structs themselfs. They differ from classes in a few ways:
They can live on the stack rather than the managed heap.
They can be marshaled back to unmanaged code.
They can not be assigned to a NULL value.
For a complete overview, see: http://www.jaggersoft.com/pubs/StructsVsClasses.htm
Relative to your question is whether your struct lives on the stack or the heap. This is determined by the allocation location of a struct. If the struct is a member of a class, it will be allocated on the heap. Else if a struct is allocated directly, it will be allocated on the heap (Actually this is only a part of the picture. This whole will get pretty complex once starting to talk about closures introduced in C# 2.0 but for now it's sufficient in order to answer your question).
An array in .NET is be default allocated on the heap (this behavior is not consistent when using unsafe code and the stackalloc keyword). Going back to the explanation above, that would indicate that the struct instances also gets allocated on the heap. In fact, an easy way of proving this is by allocating an array of 1 mb in size and observe how NO stackoverflow exception is thrown.
The lifetime for an instance on the stack is determined by it's scope. This is different from an instance on the manager heap which lifetime is determined by the garbage collector (and whether there are still references towards that instance). You can ensure that anything on the stack lives as long as it's within scope. Allocating an instance on the stack and calling a method wont deallocate that instance until that instance gets out of scope (by default when the method wherein that instance was declared ends).
A struct cant have managed references towards it (pointers are possible in unmanaged code). When working with structs on the stack in C#, you basically have a label towards an instance rather than a reference. Assigning one struct to another simply copies the underlying data. You can see references as structs. Naively put, a reference is nothing more than a struct containing a pointer to a certain part in memory. When assigning one reference to the other, the pointer data gets copied.
// declare 2 references to instances on the managed heap
var c1 = new MyClass();
var c2 = new MyClass();
// declare 2 labels to instances on the stack
var s1 = new MyStruct();
var s2 = new MyStruct();
c1 = c2; // copies the reference data which is the pointer internally, c1 and c2 both point to the same instance
s1 = s2; // copies the data which is the struct internally, c1 and c2 both point to their own instance with the same data
You can take advantage of this and mutate an immutable structure
public struct ImmutableData
{
private readonly int data;
private readonly string name;
public ImmutableData(int data, string name)
{
this.data = data;
this.name = name;
}
public int Data { get => data; }
public string Name { get => name; }
public void SetName(string newName)
{
// this wont work
// this.name = name;
// but this will
this = new ImmutableData(this.data, newName);
}
public override string ToString() => $"Data={data}, Name={name}";
}
class Program
{
static void Main(string[] args)
{
var X = new ImmutableData(100, "Jane");
X.SetName("Anne");
Debug.WriteLine(X);
// "Data=100, Name=Anne"
}
}
This is advantageous as you can implement IXmlSerializable and maintain the robustness of immutable structures, while allowing serialization (that happens one property at a time).
Just two methods in the above example to achieve this:
public void ReadXml(XmlReader reader)
{
var data = int.Parse(reader.GetAttribute("Data"));
var name = reader.GetAttribute("Name");
this = new ImmutableData(data, name);
}
public void WriteXml(XmlWriter writer)
{
writer.WriteAttributeString("Data", data.ToString());
writer.WriteAttributeString("Name", name);
}
which creates the followng xml file
<?xml version="1.0" encoding="utf-8"?>
<ImmutableData Data="100" Name="Anne" />
and can be read with
var xs = new XmlSerializer(typeof(ImmutableData));
var fs = File.OpenText("Store.xml");
var Y = (ImmutableData)xs.Deserialize(fs);
fs.Close();
I came across this when I was looking up how System.Guid was implemented, because I had a similar scenario.
Basically, it does this (simplified):
struct Guid
{
Guid(string value)
{
this = Parse(value);
}
}
Which I think is a pretty neat solution.

Why String is Value type although it is a class not a struct?

Take the following example:
string me = "Ibraheem";
string copy = me;
me = "Empty";
Console.WriteLine(me);
Console.WriteLine(copy);
The output is:
Empty
Ibraheem
Since it is class type (i.e. not a struct), String copy should also contain Empty because the = operator in C# assigns reference of objects rather than the object itself (as in C++)??
While the accepted answer addresses this (as do some others), I wanted to give an answer dedicated to what it seems like you're actually asking, which is about the semantics of variable assignment.
Variables in C# are simply pieces of memory that are set aside to hold a single value. It's important to note that there's no such thing as a "value variable" and a "reference variable", because variables only hold values.
The distinction between "value" and "reference" comes with the type. A Value Type (VT) means that the entire piece of data is stored within the variable.
If I have an integer variable named abc that holds the value 100, then that means that I have a four-byte block of memory within my application that stores the literal value 100 inside it. This is because int is a value type, and thus all of the data is stored within the variable.
On the other hand, if I have a string variable named foo that holds the value "Adam", then there are two actual memory locations involved. The first is the piece of memory that stores the actual characters "Adam", as well as other information about my string (its length, etc.). A reference to this location is then stored within my variable. References are very similar to pointers in C/C++; while they are not the same, the analogy is sufficient for this explanation.
So, to sum it up, the value for a reference type is a reference to another location in memory, where the value for a value type is the data itself.
When you assign something to a variable, all you're changing is that variable's value. If I have this:
string str1 = "foo";
string str2 = str1;
Then I have two string variables that hold the same value (in this case, they each hold a reference to the same string, "foo".) If then do this:
str1 = "bar";
Then I have changed the value of str1 to a reference to the string "bar". This doesn't change str2 at all, since its value is still a reference to the string "foo".
System.String is not a value type. It exhibits some behaviors that are similar to value types, but the behavior you have come across is not one of them. Consider the following code.
class Foo
{
public string SomeProperty { get; private set; }
public Foo(string bar) { SomeProperty = bar }
}
Foo someOtherFoo = new Foo("B");
Foo foo = someOtherFoo;
someOtherFoo = new Foo("C");
If you checked the output of foo.SomeProperty, do you expect it to be the same as someOtherFoo.SomeProperty? If so, you have a flawed understanding of the language.
In your example, you have assigned a string a value. That's it. It has nothing to do with value types, reference types, classes or structs. It's simple assignment, and it's true whether you're talking about strings, longs, or Foos. Your variables temporarily contained the same value (a reference to the string "Ibraheem"), but then you reassigned one of them. Those variables were not inextricably linked for all time, they just held something temporarily in common.
It isn't a value type. When you use a string literal, its actually a reference stored when compiled. So when you assign a string, you are basically changing the pointer like in C++.
Strings behave the same as any other class. Consider:
class Test {
public int SomeValue { get; set; }
public Test(int someValue) { this.SomeValue = someValue; }
}
Test x = new Test(42);
Test y = x;
x = new Test(23);
Console.WriteLine(x.SomeValue + " " + y.SomeValue);
Output:
23 42
– exactly the same behaviour as in your string example.
What your example shows is the classic behavior of a reference type which string is.
string copy = me; means that copy reference will point to same memory location where me is pointing.
Later me can point to other memory location but it won't affect copy.
Your code would do the same if you used value types as well. Consider using integers:
int me = 1;
int copy = me;
me = 2;
Console.WriteLine(me);
Console.WriteLine(copy);
This will print out the following:
2
1
While the other answers said exactly what the solution to your answer was, to get a better fundamental understanding of why you will want to have a read up on heap and stack memory allocation and when data is removed from memory by the garbage collector.
Here is a good page that describes the stack and heap memory and the garbage collector. At the bottom of the article there are links to the other parts of the explanation:
http://www.c-sharpcorner.com/UploadFile/rmcochran/csharp_memory01122006130034PM/csharp_memory.aspx?ArticleID=9adb0e3c-b3f6-40b5-98b5-413b6d348b91
Hopefully this should give you a better understanding of why
Answering the original question:
Strings in C# are the reference type with value type semantics.
They are being stored on the heap because storing them on the stack might be unsafe due to the limited size of the stack.

Categories