I am new to C#, from a C++ background. In C++ you can do this:
class MyClass{
....
};
int main()
{
MyClass object; // this will create object in memory
MyClass* object = new MyClass(); // this does same thing
}
Whereas, in C#:
class Program
{
static void Main(string[] args)
{
Car x;
x.i = 2;
x.j = 3;
Console.WriteLine(x.i);
Console.ReadLine();
}
}
class Car
{
public int i;
public int j;
}
you can't do this. I wonder why Car x won't do its work.
There are a lot of misconceptions here, both in the question itself and in the several answers.
Let me begin by examining the premise of the question. The question is "why do we need the new keyword in C#?" The motivation for the question is this fragment of C++:
MyClass object; // this will create object in memory
MyClass* object = new MyClass(); // this does same thing
I criticize this question on two grounds.
First, these do not do the same thing in C++, so the question is based on a faulty understanding of the C++ language. It is very important to understand the difference between these two things in C++, so if you do not understand very clearly what the difference is, find a mentor who can teach you how to know what the difference is, and when to use each.
Second, the question presupposes -- incorrectly -- that those two syntaxes do the same thing in C++, and then, oddly, asks "why do we need new in C#?" Surely the right question to ask given this -- again, false -- presupposition is "why do we need new in C++?" If those two syntaxes do the same thing -- which they do not -- then why have two syntaxes in the first place?
So the question is both based on a false premise, and the question about C# does not actually follow from the -- misunderstood -- design of C++.
This is a mess. Let's throw out this question and ask some better questions. And let's ask the question about C# qua C#, and not in the context of the design decisions of C++.
What does the new X operator do in C#, where X is a class or struct type? (Let's ignore delegates and arrays for the purposes of this discussion.)
The new operator:
Causes a new instance of the given type to be allocated; new instances have all their fields initialized to default values.
Causes a constructor of the given type to be executed.
Produces a reference to the allocated object, if the object is a reference type, or the value itself if the object is a value type.
All right, I can already hear the objections from C# programmers out there, so let's dismiss them.
Objection: no new storage is allocated if the type is a value type, I hear you say. Well, the C# specification disagrees with you. When you say
S s = new S(123);
for some struct type S, the spec says that new temporary storage is allocated on the short-term pool, initialized to its default values, the constructor runs with this set to refer to the temp storage, and then the resulting object is copied to s. However, the compiler is permitted to use a copy-elision optimization provided that it can prove that it is impossible for the optimization to become observed in a safe program. (Exercise: work out under what circumstances a copy elision cannot be performed; give an example of a program that would have different behaviours if elision was or was not used.)
Objection: a valid instance of a value type can be produced using default(S); no constructor is called, I hear you say. That's correct. I didn't say that new is the only way to create an instance of a value type.
In fact, for a value type new S() and default(S) are the same thing.
Objection: Is a constructor really executed for situations like new S(), if not present in the source code in C# 6, I hear you say. This is an "if a tree falls in the forest and no one hears it, does it make a sound?" question. Is there a difference between a call to a constructor that does nothing, and no call at all? This is not an interesting question. The compiler is free to elide calls that it knows do nothing.
Suppose we have a variable of value type. Must we initialize the variable with an instance produced by new?
No. Variables which are automatically initialized, such as fields and array elements, will be initialized to the default value -- that is, the value of the struct where all the fields are themselves their default values.
Formal parameters will be initialized with the argument, obviously.
Local variables of value type are required to be definitely assigned with something before the fields are read, but it need not be a new expression.
So effectively, variables of value type are automatically initialized with the equivalent of default(S), unless they are locals?
Yes.
Why not do the same for locals?
Use of an uninitialized local is strongly associated with buggy code. The C# language disallows this because doing so finds bugs.
Suppose we have a variable of reference type. Must we initialize S with an instance produced by new?
No. Automatic-initialization variables will be initialized with null. Locals can be initialized with any reference, including null, and must be definitely assigned before being read.
So effectively, variables of reference type are automatically initialized with null, unless they are locals?
Yes.
Why not do the same for locals?
Same reason. A likely bug.
Why not automatically initialize variables of reference type by calling the default constructor automatically? That is, why not make R r; the same as R r = new R();?
Well, first of all, many types do not have a default constructor, or for that matter, any accessible constructor at all. Second, it seems weird to have one rule for an uninitialized local or field, another rule for a formal, and yet another rule for an array element. Third, the existing rule is very simple: a variable must be initialized to a value; that value can be anything you like; why is the assumption that a new instance is desired warranted? It would be bizarre if this
R r;
if (x) r = M(); else r = N();
caused a constructor to run to initialize r.
Leaving aside the semantics of the new operator, why is it necessary syntactically to have such an operator?
It's not. There are any number of alternative syntaxes that could be grammatical. The most obvious would be to simply eliminate the new entirely. If we have a class C with a constructor C(int) then we could simply say C(123) instead of new C(123). Or we could use a syntax like C.construct(123) or some such thing. There are any number of ways to do this without the new operator.
So why have it?
First, C# was designed to be immediately familiar to users of C++, Java, JavaScript, and other languages that use new to indicate new storage is being initialized for an object.
Second, the right level of syntactic redundancy is highly desirable. Object creation is special; we wish to call out when it happens with its own operator.
In C# you can do the similar thing:
// please notice "struct"
struct MyStruct {
....
}
MyStruct sample1; // this will create object on stack
MyStruct sample2 = new MyStruct(); // this does the same thing
Recall that primitives like int, double, and bool are also of type struct, so even though it's conventional to write
int i;
we may also write
int i = new int();
unlike C++, C# doesn't use pointers (in the safe mode) to instances,
however C# has class and struct declarations:
class: you have reference to instance,
memory is allocated on heap,
new is mandatory; similar to MyClass* in C++
struct: you have value,
memory is (usually) allocated on stack,
new is optional; similar to MyClass in C++
In your particular case you can just turn Car into struct
struct Car
{
public int i;
public int j;
}
and so the fragment
Car x; // since Car is struct, new is optional now
x.i = 2;
x.j = 3;
will be correct
In C#, class type objects are always allocated on the heap, i.e. variables of such types are always references ("pointers"). Just declaring a variable of such a type does not cause the allocation of an object. Allocating a class object on the stack like it's common to do in C++ isn't (in general) an option in C#.
Local variables of any type that have not been assigned to are considered uninitialized, and they cannot be read until they have been assigned to. This is a design choice (another way would have been to assign default(T) to every variable at declaration time) which seems like a good idea because it should protect you from some programming errors.
It's similar to how in C++ it wouldn't make sense to say SomeClass *object; and never assign anything to it.
Because in C# all class type variables are pointers, allocating an empty object when the variable is declared would lead to inefficient code when you actually only want to assign a value to the variable later, for instance in situations like this:
// Needs to be declared here to be available outside of `try`
Foo f;
try { f = GetFoo(); }
catch (SomeException) { return null; }
f.Bar();
Or
Foo f;
if (bar)
f = GetFoo();
else
f = GetDifferentFoo();
ignoring the stack vs heap side of things:
because C# made the bad decision to copy C++ when they should have just made the syntax
Car car = Car()
(or something similar). Having 'new' is superfluous.
When you use referenced types then in this statement
Car c = new Car();
there are created two entities: a reference named c to an object of type Car in the stack and the object of type Car itself in the heap.
If you will just write
Car c;
then you create an uninitialized reference (provided that c is a local variable) that points to nowhere.
In fact it is equivalent to C++ code where instead of references there are used pointers.
For example
Car *c = new Car();
or just
Car *c;
The difference between C++ and C# is that C++ can create instances of classes in the stack like
Car c;
In C# this means creating a reference of type Car that as I said points nowhere.
From the microsoft programming guide:
At run time, when you declare a variable of a reference type, the variable contains the value null until you explicitly create an instance of the object by using the new operator, or assign it an object that has been created elsewhere by using new
A class is a reference type. When an object of the class is created, the variable to which the object is assigned holds only a reference to that memory. When the object reference is assigned to a new variable, the new variable refers to the original object. Changes made through one variable are reflected in the other variable because they both refer to the same data.
A struct is a value type. When a struct is created, the variable to which the struct is assigned holds the struct's actual data. When the struct is assigned to a new variable, it is copied. The new variable and the original variable therefore contain two separate copies of the same data. Changes made to one copy do not affect the other copy.
I think in your C# example your effectively trying to assign values to a null pointer. In c++ translation this would look like:
Car* x = null;
x->i = 2;
x->j = 3;
This would obviously compile but crash.
Related
I found it difficult to come up with a descriptive enough title for this scenario so I'll let the code do most of the talking.
Consider covariance where you can substitute a derived type for a base class.
class Base
{
}
class Derived : Base
{
}
Passing in typeof(Base) to this method and setting that variable to the derived type is possible.
private void TryChangeType(Base instance)
{
var d = new Derived();
instance = d;
Console.WriteLine(instance.GetType().ToString());
}
However, when checking the type from the caller of the above function, the instance will still be of type Base
private void CallChangeType()
{
var b = new Base();
TryChangeType(b);
Console.WriteLine(b.GetType().ToString());
}
I would assume since objects are inherently reference by nature that the caller variable would now be of type Derived. The only way to get the caller to be type Derived is to pass a reference object by ref like so
private void CallChangeTypeByReference()
{
var b = new Base();
TryChangeTypeByReference(ref b);
Console.WriteLine(b.GetType().ToString());
}
private void TryChangeTypeByReference(ref Base instance)
{
var d = new Derived();
instance = d;
}
Further more, I feel like it's common knowledge that passing in an object to a method, editing props, and passing that object down the stack will keep the changes made down the stack. This makes sense as the object is a reference object.
What causes an object to permanently change type down the stack, only if it's passed in by reference?
You have a great many confused and false beliefs. Let's fix that.
Consider covariance where you can substitute a derived type for a base class.
That is not covariance. That is assignment compatibility. An Apple is assignment compatible with a variable of type Fruit because you can assign an Apple to such a variable. Again, that is not covariance. Covariance is the fact that a transformation on a type preserves the assignment compatibility relationship. A sequence of apples can be used somewhere that a sequence of fruit is needed because apples are a kind of fruit. That is covariance. The mapping "apple --> sequence of apples, fruit --> sequence of fruit" is a covariant mapping.
Moving on.
Passing in typeof(Base) to this method and setting that variable to the derived type is possible.
You are confusing types with instances. You do not pass typeof(Base) to this method; you pass a reference to Base to this instance. typeof(Base) is of type System.Type.
As you correctly note, formal parameters are variables. A formal parameter is a new variable, and it is initialized to the actual parameter aka argument.
However, when checking the type from the caller of the above function, the instance will still be of type Base
Correct. The argument is of type Base. You copy that to a variable, and then you reassign the variable. This is no different than saying:
Base x = new Base();
Base y = x;
y = new Derived();
And now x is still Base and y is Derived. You assigned the same variable twice; the second assignment wins. This is no different than if you said a = 1; b = a; b = 2; -- you would not expect a to be 2 afterwards just because you said b = a in the past.
I would assume since objects are inherently reference by nature that the caller variable would now be of type Derived.
That assumption is wrong. Again, you have made two assignments to the same variable, and you have two variables, one in the caller, and one in the callee. Variables contain values; references to objects are values.
The only way to get the caller to be type Derived is to pass a reference object by ref like so
Now we're getting to the crux of the problem.
The correct way to think about this is that ref makes an alias to a variable. A normal formal parameter is a new variable. A ref formal parameter makes the variable in the formal parameter an alias to the variable at the call site. So now you have one variable but it has two names, because the name of the formal parameter is an alias for the variable at the call. This is the same as:
Base x = new Base();
ref Base y = ref x; // x and y are now two names for the same variable
y = new Derived(); // this assigns both x and y because there is only one variable, with two names
Further more, I feel like it's common knowledge that passing in an object to a method, editing props, and passing that object down the stack will keep the changes made down the stack. This makes sense as the object is a reference object.
Correct.
The mistake you are making here is very common. It was a bad idea for the C# design team to name the variable aliasing feature "ref" because this causes confusion. A reference to a variable makes an alias; it gives another name to a variable. A reference to an object is a token that represents a specific object with a specific identity. When you mix the two it gets confusing.
The normal thing to do is to not pass variables by ref particularly if they contain references.
What causes an object to permanently change type down the stack, only if it's passed in by reference?
Now we have the most fundamental confusion. You have confused objects with variables. An object never changes its type, ever! An apple is an object, and an apple is now and forever an apple. An apple never becomes any other kind of fruit.
Stop thinking that variables are objects, right now. Your life will get so much better. Internalize these rules:
variables are storage locations that store values
references to objects are values
objects have a type that never changes
ref gives a new name to an existing variable
assigning to a variable changes its value
Now if we ask your question again using correct terminology, the confusion disappears immediately:
What causes the value of a variable to change its type down the stack, only if it's passed in by ref?
The answer is now very clear:
A variable passed by ref is an alias to another variable, so changing the value of the parameter is the same as changing the value of the variable at the call site
Assigning an object reference to a variable changes the value of that variable
An object has a particular type
If we don't pass by ref but instead pass normally:
A value passed normally is copied to a new variable, the formal parameter
We now have two variables with no connection; changing one of them does not change the other.
If that's still not clear, start drawing boxes, circles and arrows on a whiteboard, where objects are circles, variables are boxes, and object references are arrows from variables to objects. Making an alias via ref gives a new name to an existing circle; calling without ref makes a second circle and copies the arrow. It'll all make sense then.
This is not an issue with inheritance and polymorphism, what you're seeing is the difference between pass-by-value and pass-by-reference.
private void TryChangeType(Base instance)
The preceding method's instance parameter will be a copy of the caller's Base reference. You can change the object that is referenced and those changes will be visible to the caller because both the caller the callee both reference the same object. But, any changes to the reference itself (such as pointing it to a new object) will not affect the caller's reference. This is why it works as expected when you pass by reference.
When you call TryChangeType() you are passing a copy of the reference to "b" into "instance". Any changes to members of "instance" are made in the same memory space still referenced by "b" in your calling method. However, the command "instance = d" reassigns the value of the memory addressed by "instance". "b" and "instance no longer point to the same memory. When you return to CallChangeType, "b" still references the original space and hence Type.
TryChangeTypeByReference passes the a reference to where "b"'s pointer value is actually stored. Reassigning "instance" now changes the address that "b" is actually pointing to.
We know that class are reference types, so in general when we are passing a type, we are passing a reference but there's a difference between passing just b and ref b, which can be understood as:
In first case 1 it is passing reference by value, which means creating a separate pointer internally to the memory location, now when base class object is assigned to the derived class object, it starts pointing to another object in the memory and when that method returns, only the original pointer remains, which provides the same instance as Base class, when the new pointer created is off for garbage collection
However when object is passed as ref, this is passing reference to a reference in memory, which is like pointer to a pointer, like double pointer in C or C++, which when changes actually changes the original memory allocation and thus you see the difference
For first one to show the same result value has to be returned from the method and old object shall start pointing to the new derived object
Following is the modification to your program to get expected result in case 1:
private Base TryChangeType(Base instance)
{
var d = new Derived();
instance = d;
Console.WriteLine(instance.GetType().ToString());
return instance;
}
private void CallChangeType()
{
var b = new Base();
b = TryChangeType(b);
Console.WriteLine(b.GetType().ToString());
}
Following is the pictorial reference of both the cases:
When you do not pass by reference, a copy of the base class object is passed inside the function, and this copy is changed inside the TryChangeType function. When you print the type of the instance of the base class it is still the of the type "Base" because the copy of the instance was changed to "Derived" class.
When you pass by referece, the address of the instance i.e. the instace itself will be passed to the function. So any changes made to the instance inside the function is permanent.
What does happen behind the scenes when you make a struct without using the new keyword?
Let's say we have this struct:
struct Person
{
public int Age;
public string Name;
}
And In the Main() method I decide to make an instance of it without the new keyword like that:
Person p;
now if I try to access p.Age I will get a compile-time error saying "Use of possibly unassigned field 'Age'" however if I make an instance of the struct like that:
Person p = new Person();
and then I try to access p.Age I will get the value of 0. Now what exactly happens behind the scenes? Does the Runtime initialize these variables for me or the compiler places code that initializes them in the IL after compilation?
Edit:
Can anybody also explain this behavior:
Code:
struct Person
{
public string Name { get; set; }
}
If I make instance of struct like that:
Person p;
and I initialize the name manually
p.Name = "SomeRandomName"';
I won't be able to use it. The compiler gives an error "Use of an unassigned local variable p" but If I make instance of the struct with the default (parameterless) constructor there isn't such an error.
Members don't have the same rules as locals.
Locals must be explicitly initialised before use. Members are initialised by the runtime to their respective default values.
If you want some more relevant information:
In the internal implementation details (non-contractual!), up to the current MS .NET runtime for Windows objects are allocated in pre-zeroed memory on the heap (when they're on the heap at all, of course). All the default values are "physical" zeroes, so all you need is e.g. "200 consecutive bytes with value 0". In many cases, this is as simple as asking the OS for a pre-zeroed memory page. It's a performance compromise to keep memory safety - you can easily allocate an array of 2000 Person instances by just doing new Person[2000], which just requests 2000 * size of Person bytes with value zero; extremely cheap, while still keeping safe default values. No need to initialise 2000 Person instances, and 2000 int instances and 2000 string instances - they're all zero by default. At the same time, there's no chance you'd get a random value for the string reference that would point to some random place in memory (a very common error in unmanaged code).
The main reason for requiring explicit initialisation of locals is that it prevents stupid programming errors. You should never access an uninitialised value in the first place, and if you need a default value, you should be explicit about it - the default value then gets a meaning, and meanings should be explicit. You'll find that cases where you could use an uninitialised local meaningfully in the first place are pretty rare - you usually either declare the local right where it gets a value, or you need all possible branches to update a pre-declared local anyway. Both make it easier to understand code and avoid silly mistakes.
If you go through the small struct documentation, you can quote:
A struct type is a value type that is typically used to encapsulate small groups of related variables, such as the coordinates of a rectangle or the characteristics of an item in an inventory.
Normally, when you declare in your code these value type like:
int i; // By default it's equal to 0
bool b; // by default it's equal to false.
Or a reference type as:
string s; //By default it's null
The struct you have created is a value type, which by default isn't initialized and you can't access its properties. Therefore, you can't declare it as:
Person p;
Then use it directly.
Hence the error you got:
"Use of possibly unassigned field 'Age'"
Because p is still not initialized.
This also explains your second part of the question:
I won't be able to use it. The compiler gives an error "Use of an unassigned local variable p" but If I make instance of the struct with the default (parameterless) constructor there isn't such an error.
The same reason you couldn't directly assign p.Name = "something" is because p is still not initialized.
You must create a new instance of the struct as
Person p = New Person(); //or Person p = default(Person);
Now, what happens when you create a new instance of your struct without giving values to the struct properties? Each one of them will hold it's default value. Such as the Age = 0 because it's an int type.
Every datatype in .NET has a default value. For all reference types it's null. For the special type string it is also null. For all value types it is something akin to zero. For bool it's false because this is the equivalent to zero.
You can observe the same behaviour when you write a class with member fields. After construction all these fields will have a default value even when none was assigned during construction.
The same is also true when you use a struct as a member. Since a struct cannot be null, it will also be initialized and all its members (again) use their default values.
The difference in compiler output is that the compiler cannot determine if you have initialized the member field through any means. But it can determine if you have set the method variable value before reading it. Technically this wouldn't be necessary but since it reduces programming errors (why would you read a variable you have not written?), the compiler error appears.
From my experience of C++, I know that in C, objects declared as ClassName ObjectName; are stored on the stack, and objects declared as ClassName ObjectName = new ClassName; are stored on the heap.
In C#, I seem to be being told from everywhere that the new keyword must be used, i.e. you cannot initialize an object like ClassName ObjectName; i.e.
Product P;
P.someMethod();
Why is this?
In C# class Objects and any values in the objects will always be stored on the heap. The new key word allocates memory on the heap for the object and any values it has, and returns the reference to its location. Until this is done you should not be able to work with the object functions.
so in example:
Product P = new Product();
p is actually a reference to the allocated object. An object can have multiple references to the same object.
Product C = P;
In the case C does not copy P, but it copies the reference to the object.
Structs work differently than objects since they are allocated on the stack. This means the same operation as above will actually copy the struct and allocate new memory for it on the stack.
I'll answer my own question for the sake of pulling the info together for clarity.
A combination of mohits00691 and Jon Skeet's answers clears this up. Even though P is declared as a type of Product, it has no default value and is not instantiated until it is set with "= new Product".
This differs from C++, where Product P would instantiate an object of class Product.
as far i know, code like :
Product p;
p.someFunction();
will throw an error while compiling only : "Unassigned Local variable". So you need to give value to every variable, be it reference type or value type, before using it in C#.
Main question is what are the implications of allowing the this keyword to be modified in regards to usefulness and memory; and why is this allowed in the C# language specifications?
The other questions/subparts can be answered or not if choose to do so. I thought answers to them would help clarify the answer to the main question.
I ran across this as an answer to What's the strangest corner case you've seen in C# or .NET?
public struct Teaser
{
public void Foo()
{
this = new Teaser();
}
}
I've been trying to wrap my head around why the C# language specifications would even allow this. Sub-part 1. is there anything that would justify having this be modifiable? Is it every useful?
One of the comments to that answer was
From CLR via C#: The reason they made this is because you
can call the parameterless constructor of a struct in another
constructor. If you only want to initialize one value of a struct and
want the other values to be zero/null (default), you can write public
Foo(int bar){this = new Foo(); specialVar = bar;}. This is not
efficient and not really justified (specialVar is assigned twice), but
just FYI. (That's the reason given in the book, I don't know why we
shouldn't just do public Foo(int bar) : this())
Sub-part 2. I'm not sure I follow that reasoning. Can someone clarify what he meant? Maybe a concrete example of how it would be used?
EDIT (Disregard stack or heap main point is in regards to memory release or garbage collection. Instead of the int[] you could replace that with 262144 public int fields)
Also from my understanding structs are created on the stack as opposed to the heap if this struct were to have a 1 Mb byte array field initialized as so
public int[] Mb = new int[262144];
Sub-part 3. does this ever get removed from the stack when Foo is called? To me it seems since the struct never went out of scope it would not be removed from the stack. Don't have time tonight to create a test case but maybe I will for this one tomorrow.
In the below code
Teaser t1 = new Teaser();
Teaser tPlaceHolder = t1;
t1.Foo();
Sub-part 4. Are t1 and tPlaceHolder occupying the same or different address space?
Sorry to bring up a 3 year old post but this one has really got my head scratching.
FYI first question on stackoverflow so if I got something wrong with the question kindly post a comment and I will edit.
After 2 days I'll put a bounty of 50 on this question even if I have a winner chosen in my mind already as I think the answer will require a reasonable amount of work to explain the questions.
First of all, I think you should start by examining if you're even asking the right question. Perhaps we should be asking, "Why would C# not allow assignment to this in a struct?"
Assigning to the this keyword in a reference type is potentially dangerous: you are overwriting a reference to the object who's method you are running; you could even be doing so within the constructor that is initializing that reference. Its not clear what the behavior of that ought to be. To avoid having to figure that out, since it is not generally useful, it's not allowed by the spec (or compiler).
Assigning to the this keyword in a value type, however, is well defined. Assignment of value types is a copy operation. The value of each field is recursively copied over from right to left side of the assignment. This is a perfectly safe operation on a structure, even in a constructor, because the original copy of the structure is still present, you are just changing its data. It is exactly equivalent to manually setting each field in the struct. Why should the spec or compiler forbid an operation that is well-defined and safe?
This, by the way, answers one of your sub-questions. Value type assignment is a deep copy operation, not a reference copy. Given this code:
Teaser t1 = new Teaser();
Teaser tPlaceHolder = t1;
t1.Foo();
You have allocated two copies of your Teaser structure, and copied the values of the fields in the first into the fields in the second. This is the nature of value types: two types that have identical fields are identical, just like two int variables that both contain 10 are identical, regardless of where they are "in memory".
Also, this is important and worth repeating: careful making assumptions about what goes on "the stack" vs "the heap". Value types end up on the heap all the time, depending on the context in which they are used. Short-lived (locally scoped) structs that are not closed over or otherwise lifted out of their scope are quite likely to get allocated onto the stack. But that is an unimportant implementation detail that you should neither care about nor rely on. The key is that they are value types, and behave as such.
As far as how useful assignment to this really is: not very. Specific use cases have been mentioned already. You can use it to mostly-initialize a structure with default values but specify a small number. Since you are required to set all fields before your constructor returns, this can save a lot of redundant code:
public struct Foo
{
// Fields etc here.
public Foo(int a)
{
this = new Foo();
this.a = a;
}
}
It can also be used to perform a quick swap operation:
public void SwapValues(MyStruct other)
{
var temp = other;
other = this;
this = temp;
}
Beyond that, its just an interesting side-effect of the language and the way that structures and value types are implemented that you will most likely never need to know about.
Having this assignable allows for 'advanced' corner cases with structs. One example i found was a swap method:
struct Foo
{
void Swap(ref Foo other)
{
Foo temp = this;
this = other;
other = temp;
}
}
I would strongly argue against this use since it violates the default 'desired' nature of a struct which is immutability. The reason for having this option around is arguably unclear.
Now when it comes to structs themselfs. They differ from classes in a few ways:
They can live on the stack rather than the managed heap.
They can be marshaled back to unmanaged code.
They can not be assigned to a NULL value.
For a complete overview, see: http://www.jaggersoft.com/pubs/StructsVsClasses.htm
Relative to your question is whether your struct lives on the stack or the heap. This is determined by the allocation location of a struct. If the struct is a member of a class, it will be allocated on the heap. Else if a struct is allocated directly, it will be allocated on the heap (Actually this is only a part of the picture. This whole will get pretty complex once starting to talk about closures introduced in C# 2.0 but for now it's sufficient in order to answer your question).
An array in .NET is be default allocated on the heap (this behavior is not consistent when using unsafe code and the stackalloc keyword). Going back to the explanation above, that would indicate that the struct instances also gets allocated on the heap. In fact, an easy way of proving this is by allocating an array of 1 mb in size and observe how NO stackoverflow exception is thrown.
The lifetime for an instance on the stack is determined by it's scope. This is different from an instance on the manager heap which lifetime is determined by the garbage collector (and whether there are still references towards that instance). You can ensure that anything on the stack lives as long as it's within scope. Allocating an instance on the stack and calling a method wont deallocate that instance until that instance gets out of scope (by default when the method wherein that instance was declared ends).
A struct cant have managed references towards it (pointers are possible in unmanaged code). When working with structs on the stack in C#, you basically have a label towards an instance rather than a reference. Assigning one struct to another simply copies the underlying data. You can see references as structs. Naively put, a reference is nothing more than a struct containing a pointer to a certain part in memory. When assigning one reference to the other, the pointer data gets copied.
// declare 2 references to instances on the managed heap
var c1 = new MyClass();
var c2 = new MyClass();
// declare 2 labels to instances on the stack
var s1 = new MyStruct();
var s2 = new MyStruct();
c1 = c2; // copies the reference data which is the pointer internally, c1 and c2 both point to the same instance
s1 = s2; // copies the data which is the struct internally, c1 and c2 both point to their own instance with the same data
You can take advantage of this and mutate an immutable structure
public struct ImmutableData
{
private readonly int data;
private readonly string name;
public ImmutableData(int data, string name)
{
this.data = data;
this.name = name;
}
public int Data { get => data; }
public string Name { get => name; }
public void SetName(string newName)
{
// this wont work
// this.name = name;
// but this will
this = new ImmutableData(this.data, newName);
}
public override string ToString() => $"Data={data}, Name={name}";
}
class Program
{
static void Main(string[] args)
{
var X = new ImmutableData(100, "Jane");
X.SetName("Anne");
Debug.WriteLine(X);
// "Data=100, Name=Anne"
}
}
This is advantageous as you can implement IXmlSerializable and maintain the robustness of immutable structures, while allowing serialization (that happens one property at a time).
Just two methods in the above example to achieve this:
public void ReadXml(XmlReader reader)
{
var data = int.Parse(reader.GetAttribute("Data"));
var name = reader.GetAttribute("Name");
this = new ImmutableData(data, name);
}
public void WriteXml(XmlWriter writer)
{
writer.WriteAttributeString("Data", data.ToString());
writer.WriteAttributeString("Name", name);
}
which creates the followng xml file
<?xml version="1.0" encoding="utf-8"?>
<ImmutableData Data="100" Name="Anne" />
and can be read with
var xs = new XmlSerializer(typeof(ImmutableData));
var fs = File.OpenText("Store.xml");
var Y = (ImmutableData)xs.Deserialize(fs);
fs.Close();
I came across this when I was looking up how System.Guid was implemented, because I had a similar scenario.
Basically, it does this (simplified):
struct Guid
{
Guid(string value)
{
this = Parse(value);
}
}
Which I think is a pretty neat solution.
From Pro C#
Referring to "New-ing" Intrinsic data types...
All intrinsic data types support what is known as a default constructor. It allows you to create a variable using the new keyword.
[...] Object references (including strings) are set to null.
In C#, strings do not have a public default constructor. My guess is that due to string's immutability, they have a private default constructor. But, the context here is talking about Object references and string's as a whole while using new.
Because one cannot do
String myString = new String();
So,
String a;
referencing string doesn't result in a "default value". Instead, it's a compiler error to access a.
Though
public class StringContainer
{
public static string myString { get; set; }
}
Results in a legally accessible string (defaulted to null). This doesn't use new. It performs some kind of magical construction.
What is occurring in the StringContainer scenerio? Because there appears to be no new-able default constructor in string, is this an error in the C# book?
All intrinsic data types support what is known as a default constructor. It allows you to create a variable using the new keyword.
There are an impressive number of subtle errors in that statement.
First off, there is no such thing as an "intrinsic" data type; perhaps this term is defined somewhere else in the book?
Second, it would be more accurate to say that all struct types have a public parameterless constructor called the "default" constructor. Some class types also have a public parameterless ctor; if you do not provide any ctor then the C# compiler will automatically generate a public parameterless ctor for you. If you do provide a ctor then the C# compiler will not do this for you.
Third, constructors do not create variables. The author is conflating a bunch of related but different things: the "new" operator, the memory manager, the constructor and the variable, and the created object. Variables are storage locations and are managed by the CLR; they are not created by the "new" operator.
The correct statement is that the "new" operator on a struct causes a variable to be created by the CLR on the temporary storage pool; that variable is then initialized by the memory manager, and then passed to the constructor for more initialization. The value thus created is then copied somewhere else. The "new" operator on a class causes the CLR to create an object on the long-term storage pool, and then pass a reference to that object to the CLR. No "variable" need be involved.
Confusing variables with objects is a very common error; it is valuable to understand the difference.
In C#, strings do not have a public default constructor.
Correct.
My guess is that due to string's immutability, they have a private default constructor.
Good guess, but wrong. There is no private parameterless constructor on string.
[an auto-property or field of type string] results in a legally accessible string (defaulted to null). This doesn't use new. It performs some kind of magical construction.
It does no such thing. A null reference is not a constructed object at all. It's the absence of a constructed object!
You're basically saying that my empty garage contains a "magically constructed" non-existing car. That is an exceedingly weird way to look at an empty garage; an empty garage contains no car at all, not a magically constructed non-existing car.
What is occurring in the StringContainer scenerio?
The containing type contains a compiler-generated field -- a variable -- of type string. Let's suppose the containing type is a struct or class. When the storage for the struct or class is initialized by the memory manager, the memory manager writes a null reference into the storage location associated with the variable.
Finally: I suspect your confusion is because you've gotten the "default constructor" and the "default value of a type" confused. For a struct, they are the same thing:
int x = new int();
and
int x = default(int);
both make an int initialized to zero.
For a class, they do not do the same thing:
Fruit f = new Fruit();
makes a new fruit reference and assigns the reference to variable f, whereas:
Fruit f = default(Fruit);
is the same as
Fruit f = null;
No constructor is called.
All intrinsic data types support what is known as a default constructor. It allows you to create a variable using the new keyword.
I'm not sure what the author means by "intrinsic data types". My best guess is that he actually means "value types" (i.e. types declared with C#'s struct keyword), because value types always have a default constructor, and reference types may not.
So if you have a field whose type is a struct type (e.g. Int32, CancellationToken), then that field will be initialized as if the type's default constructor was called.
In actual implementation, there's probably not an actual call to the type's default constructor -- the memory is just initialized to all zeroes, which is the same thing that would happen if you did call the default constructor. (That's why you can't provide your own parameterless constructor for a value type -- the parameterless constructor always initializes the memory to all zeroes. This greatly simplifies things like new int[10000] -- the compiler doesn't actually have to call new Int32() 10,000 times; it just zeroes out the memory.)
Your question about a string field in a class isn't really related to the author's discussion of "intrinsic data types", because both string and your enclosing class are reference types, not value types. So your class won't have a parameterless-constructor-that-you-can't-override; it will just have normal constructors. But the zeroing-out behavior is still there: when you call a constructor, the new memory block is zeroed out before the constructor code starts to run. Your string field is a reference type, and a zero reference is null.
I imagine it is using default(string) which returns null because string is a reference type.
Also, remember that constructors can't return null.