What exactly is new [] doing in a class vs struct? - c#

MyClass[] CLASS = new MyClass[5];
int[] STRUCT = new int[5];
What exactly is new [] doing for a class vs a struct. Apparently the struct has some overloaded static index that causes it to run the default constructor of the struct. However the new [] for a class appears to do nothing but make space to initialize an instance of a class. How do I overload the static behavior of the class to also run the default constructor. I know how to use for loops and other methods of accomplishing this. My question is very specific to what is going on underneath the new []. I understand that a struct needs a default value. But doesn't a non-nullable class also need a default value which is why it gives an error when you try to use it? Or is this telling me that all classes are actually nullable?

No, the default constructor of struct isn't run. What happens instead is that in the struct case, the memory is initialized to zero. What this means depends on the data type. E.g., a reference field (like a class or a string) becomes null, numeric fields become 0, boolean fields become false, etc.
This differs very much from how classes work. This is why with the initialization of the class array, you see null values. Basically, it comes down to this. An "empty" class variable (or array in your case) becomes null. However, when you have an "empty" struct, you already have something valid. However, it is initialized as empty.
The easiest way to see this is when you e.g. have an int field in a class. This works basically the same way. When you add an int field to a class, you are not required to initialize it. When you don't initialize it, it gets the value 0 by default. struct's work the same way in this regard.
See http://msdn.microsoft.com/en-us/library/aa664471.aspx for a bit more information.
A few more things to note (answer to the question in the comments):
Classes are always nullable. This means that the only way to initialize the array is to create a loop and initialize a new instance for every item in the array;
The default constructor doesn't apply to structs, because they aren't allowed to have a default constructor. If you try to compile the following code snippet, you will get a compilation error telling you this:
struct MyStruct
{
public MyStruct()
{
}
}

Related

Why doesn't a struct in an array have to be initialized?

I researched this subject but I couldn't find any duplicate. I am wondering why you can use a struct in an array without creating an instance of it.
For example, I have a class and a struct:
public class ClassAPI
{
public Mesh mesh { get; set; }
}
public struct StructAPI
{
public Mesh mesh { get; set; }
}
When ClassAPI is used in an array, it has to be initialized with the new keyword before being able to use its properties and methods:
ClassAPI[] cAPI = new ClassAPI[1];
cAPI[0] = new ClassAPI(); //MUST DO THIS!
cAPI[0].mesh = new Mesh();
But this is not the case with StructAPI. It looks like StructAPI doesn't have to be initialized in the array:
StructAPI[] sAPI = new StructAPI[1];
sAPI[0].mesh = new Mesh();
If you try the same thing with ClassAPI, you would get a NullReferenceException.
Why is it different with structs when using them in an array?
I understand the difference between class and struct with struct being a value type but that still doesn't make sense. To me, without the array being involved in this, it would look like I am doing this:
StructAPI sp;
sp.mesh = new Mesh();
Notice that the sp variable is not initialized and it should result in a compile-time error that says:
Error CS0165 Use of unassigned local variable 'sp'
but that's a different story when the struct is put in an array.
Is the array initializing the struct in it? I would like to know what's going on.
As per the specification link provided by PetSerAl in the comments:
Array elements
The elements of an array come into existence when an array instance is created, and cease to exist when there are no references to that array instance.
The initial value of each of the elements of an array is the default value (Default values) of the type of the array elements.
For the purpose of definite assignment checking, an array element is considered initially assigned.
(emphasis mine).
This means that when you declare an array of T, each "cell" in the array is being initialized using default(T). For reference types default(T) returns null, but for value types default(T) returns the type default value – 0 for numbers, false for bool, and so on.
As per the Using Structs (C# Programming Guide) page:
If you instantiate a struct object using the default, parameterless constructor, all members are assigned according to their default values.
Since structs are value types, default(T) where T is a struct initializes the struct to its default value, meaning all its members will be initialized to their default values – null for reference types and whatever default value for value types.
So this line of code StructAPI[] sAPI = new StructAPI[1]; basically creates a new array of StructAPI containing a single StructAPI instance, where its mesh property is default(Mesh).
This is likely because the fact that classes, while they do have a default constructor, structs actually don't have one.
Why class arrays need to be intialized
Classes create the object and then return the reference. The actual variable is a reference to that value. Default values are always zeros, so the default value to a reference is null, because null is represented by an address of all zeros, meaning its not pointing to anything.
Because of this, the default value for the array of a class is all null references.
Why struct arrays don't need to be
Structs, on the other hand, are all by value. They also don't have a default parameterless constructor and C# doesn't let you create one (though the CLR does). Because of the fact that there isn't a constructor, the CLR is able to very efficiently create the struct by zeroing out all of the values, without having to call the constructor.
You can view more about why this is from this StackOverflow question.
When you initialize an array, default values are assigned to its elements:
null for reference types,
for value types, the default value varies: zero for types representing numbers, for struct it is little bit different, its default value is struct with all fields set to their default values. Again, for reference types it's null and for value types it depends (as mentioned above).
So, basically, when you initialize array you have your structs initialized (set to default value), that's why you can access their properties.

What does happen when you make a struct without using the new keyword?

What does happen behind the scenes when you make a struct without using the new keyword?
Let's say we have this struct:
struct Person
{
public int Age;
public string Name;
}
And In the Main() method I decide to make an instance of it without the new keyword like that:
Person p;
now if I try to access p.Age I will get a compile-time error saying "Use of possibly unassigned field 'Age'" however if I make an instance of the struct like that:
Person p = new Person();
and then I try to access p.Age I will get the value of 0. Now what exactly happens behind the scenes? Does the Runtime initialize these variables for me or the compiler places code that initializes them in the IL after compilation?
Edit:
Can anybody also explain this behavior:
Code:
struct Person
{
public string Name { get; set; }
}
If I make instance of struct like that:
Person p;
and I initialize the name manually
p.Name = "SomeRandomName"';
I won't be able to use it. The compiler gives an error "Use of an unassigned local variable p" but If I make instance of the struct with the default (parameterless) constructor there isn't such an error.
Members don't have the same rules as locals.
Locals must be explicitly initialised before use. Members are initialised by the runtime to their respective default values.
If you want some more relevant information:
In the internal implementation details (non-contractual!), up to the current MS .NET runtime for Windows objects are allocated in pre-zeroed memory on the heap (when they're on the heap at all, of course). All the default values are "physical" zeroes, so all you need is e.g. "200 consecutive bytes with value 0". In many cases, this is as simple as asking the OS for a pre-zeroed memory page. It's a performance compromise to keep memory safety - you can easily allocate an array of 2000 Person instances by just doing new Person[2000], which just requests 2000 * size of Person bytes with value zero; extremely cheap, while still keeping safe default values. No need to initialise 2000 Person instances, and 2000 int instances and 2000 string instances - they're all zero by default. At the same time, there's no chance you'd get a random value for the string reference that would point to some random place in memory (a very common error in unmanaged code).
The main reason for requiring explicit initialisation of locals is that it prevents stupid programming errors. You should never access an uninitialised value in the first place, and if you need a default value, you should be explicit about it - the default value then gets a meaning, and meanings should be explicit. You'll find that cases where you could use an uninitialised local meaningfully in the first place are pretty rare - you usually either declare the local right where it gets a value, or you need all possible branches to update a pre-declared local anyway. Both make it easier to understand code and avoid silly mistakes.
If you go through the small struct documentation, you can quote:
A struct type is a value type that is typically used to encapsulate small groups of related variables, such as the coordinates of a rectangle or the characteristics of an item in an inventory.
Normally, when you declare in your code these value type like:
int i; // By default it's equal to 0
bool b; // by default it's equal to false.
Or a reference type as:
string s; //By default it's null
The struct you have created is a value type, which by default isn't initialized and you can't access its properties. Therefore, you can't declare it as:
Person p;
Then use it directly.
Hence the error you got:
"Use of possibly unassigned field 'Age'"
Because p is still not initialized.
This also explains your second part of the question:
I won't be able to use it. The compiler gives an error "Use of an unassigned local variable p" but If I make instance of the struct with the default (parameterless) constructor there isn't such an error.
The same reason you couldn't directly assign p.Name = "something" is because p is still not initialized.
You must create a new instance of the struct as
Person p = New Person(); //or Person p = default(Person);
Now, what happens when you create a new instance of your struct without giving values to the struct properties? Each one of them will hold it's default value. Such as the Age = 0 because it's an int type.
Every datatype in .NET has a default value. For all reference types it's null. For the special type string it is also null. For all value types it is something akin to zero. For bool it's false because this is the equivalent to zero.
You can observe the same behaviour when you write a class with member fields. After construction all these fields will have a default value even when none was assigned during construction.
The same is also true when you use a struct as a member. Since a struct cannot be null, it will also be initialized and all its members (again) use their default values.
The difference in compiler output is that the compiler cannot determine if you have initialized the member field through any means. But it can determine if you have set the method variable value before reading it. Technically this wouldn't be necessary but since it reduces programming errors (why would you read a variable you have not written?), the compiler error appears.

Is there a way I can prevent struct from being insantiated or can I have a class that will be copied?

Ok this is more curiosity than practical requirement.
Let's say I have this class:
public sealed class Entity
{
int value;
Entity()
{
}
public static implicit operator Entity(int x)
{
return new Entity { value = x };
}
}
I don't want this class to be instantiated outside the class and this works. But here Entitys will be referenced and not copied like value types. This class being so small I want it to behave likes value types. Also if I compare two instances of Entitys like e1 == e2 it's going to give reference equality (ok I can overload == but that's more work). So I would make it a struct:
public struct Entity
{
int value;
public static implicit operator Entity(int x)
{
return new Entity { value = x };
}
}
But now someone can do new Entity() easily.
My question is there a way I can have a type (struct/class) that
1) will be copied every time the value is accessed, so that
Entity e1 = 1;
Entity e2 = e1;
ReferenceEquals(e1, e2); // prints false, just like any value type
2) has value semantics when comparing for equality
Entity e1 = 1;
Entity e2 = 1;
e1.Equals(e2); // prints true
// this is easy though by overriding default equality.
3) is not instantiable outside the scope of the type:
Entity e = new Entity(); // should not compile
Basically close to an enum's behaviour. I can certainly live without it, just learning.
Its not possible to do re-define or try to hide the default constructor for a struct in C# because the C# compiler routinely comples code based on the assumption that the parameterless constructor does nothing. See Why can't I define a default constructor for a struct in .NET?.
Your options are either
Use a class
Ensure that the default value for your struct has some meaning (e.g. in your case the int will be initialised to 0) - this is what enums do (recall that enums are initialised to 0).
My recommendation would be to use a class until you identify that there is a performance problem.
A type has no say over what can happen to storage locations of that type. Storage locations of any type can come into existence, or have their contents overwritten with the content of other locations of the same type, without the type itself having any say in the matter. A storage location of a reference type holds a class reference; when it comes into existence, it will be a null reference. The only way a storage location of a reference type will hold anything other than a null reference is if an instance of that type is created and--outside some scenarios involving Full Trust Code and reflection--a reference type will have full control over the circumstances where that can occur.
By contrast, a storage location of a value type is an instance of that type. Consequently, there is no way a structure can control how one comes into existence. What a structure can control--to a limited extent--is what values its fields can contain. In limited trust scenarios, a private struct field can only assume a non-default value if the code for the struct writes that value to some instance of the struct (if the value gets written to any instance of the struct, even limited-trust code with access to that instance may be able to use multi-threaded overlapped reads and writes to produce a struct with a combination of field values which the struct itself would never create itself).
I would suggest creating a struct with a single private field of a class type. You cannot prevent code from creating a default(Entity), but that doesn't mean you have to allow code to do anything useful with one. Its Equals method should work well enough to return true when comparing to another default instance, and false when compared with anything else, and GetHashCode should return the same value for all default instances, but any other methods could throw an exception.
Make all of the constructors (besides the static constructor) private for the class Entity. protected would work too, but since its sealed, that doesn't make a lot of sense.

New-ing string types

From Pro C#
Referring to "New-ing" Intrinsic data types...
All intrinsic data types support what is known as a default constructor. It allows you to create a variable using the new keyword.
[...] Object references (including strings) are set to null.
In C#, strings do not have a public default constructor. My guess is that due to string's immutability, they have a private default constructor. But, the context here is talking about Object references and string's as a whole while using new.
Because one cannot do
String myString = new String();
So,
String a;
referencing string doesn't result in a "default value". Instead, it's a compiler error to access a.
Though
public class StringContainer
{
public static string myString { get; set; }
}
Results in a legally accessible string (defaulted to null). This doesn't use new. It performs some kind of magical construction.
What is occurring in the StringContainer scenerio? Because there appears to be no new-able default constructor in string, is this an error in the C# book?
All intrinsic data types support what is known as a default constructor. It allows you to create a variable using the new keyword.
There are an impressive number of subtle errors in that statement.
First off, there is no such thing as an "intrinsic" data type; perhaps this term is defined somewhere else in the book?
Second, it would be more accurate to say that all struct types have a public parameterless constructor called the "default" constructor. Some class types also have a public parameterless ctor; if you do not provide any ctor then the C# compiler will automatically generate a public parameterless ctor for you. If you do provide a ctor then the C# compiler will not do this for you.
Third, constructors do not create variables. The author is conflating a bunch of related but different things: the "new" operator, the memory manager, the constructor and the variable, and the created object. Variables are storage locations and are managed by the CLR; they are not created by the "new" operator.
The correct statement is that the "new" operator on a struct causes a variable to be created by the CLR on the temporary storage pool; that variable is then initialized by the memory manager, and then passed to the constructor for more initialization. The value thus created is then copied somewhere else. The "new" operator on a class causes the CLR to create an object on the long-term storage pool, and then pass a reference to that object to the CLR. No "variable" need be involved.
Confusing variables with objects is a very common error; it is valuable to understand the difference.
In C#, strings do not have a public default constructor.
Correct.
My guess is that due to string's immutability, they have a private default constructor.
Good guess, but wrong. There is no private parameterless constructor on string.
[an auto-property or field of type string] results in a legally accessible string (defaulted to null). This doesn't use new. It performs some kind of magical construction.
It does no such thing. A null reference is not a constructed object at all. It's the absence of a constructed object!
You're basically saying that my empty garage contains a "magically constructed" non-existing car. That is an exceedingly weird way to look at an empty garage; an empty garage contains no car at all, not a magically constructed non-existing car.
What is occurring in the StringContainer scenerio?
The containing type contains a compiler-generated field -- a variable -- of type string. Let's suppose the containing type is a struct or class. When the storage for the struct or class is initialized by the memory manager, the memory manager writes a null reference into the storage location associated with the variable.
Finally: I suspect your confusion is because you've gotten the "default constructor" and the "default value of a type" confused. For a struct, they are the same thing:
int x = new int();
and
int x = default(int);
both make an int initialized to zero.
For a class, they do not do the same thing:
Fruit f = new Fruit();
makes a new fruit reference and assigns the reference to variable f, whereas:
Fruit f = default(Fruit);
is the same as
Fruit f = null;
No constructor is called.
All intrinsic data types support what is known as a default constructor. It allows you to create a variable using the new keyword.
I'm not sure what the author means by "intrinsic data types". My best guess is that he actually means "value types" (i.e. types declared with C#'s struct keyword), because value types always have a default constructor, and reference types may not.
So if you have a field whose type is a struct type (e.g. Int32, CancellationToken), then that field will be initialized as if the type's default constructor was called.
In actual implementation, there's probably not an actual call to the type's default constructor -- the memory is just initialized to all zeroes, which is the same thing that would happen if you did call the default constructor. (That's why you can't provide your own parameterless constructor for a value type -- the parameterless constructor always initializes the memory to all zeroes. This greatly simplifies things like new int[10000] -- the compiler doesn't actually have to call new Int32() 10,000 times; it just zeroes out the memory.)
Your question about a string field in a class isn't really related to the author's discussion of "intrinsic data types", because both string and your enclosing class are reference types, not value types. So your class won't have a parameterless-constructor-that-you-can't-override; it will just have normal constructors. But the zeroing-out behavior is still there: when you call a constructor, the new memory block is zeroed out before the constructor code starts to run. Your string field is a reference type, and a zero reference is null.
I imagine it is using default(string) which returns null because string is a reference type.
Also, remember that constructors can't return null.

Why can't non-static fields be initialized inside structs?

Consider this code block:
struct Animal
{
public string name = ""; // Error
public static int weight = 20; // OK
// initialize the non-static field here
public void FuncToInitializeName()
{
name = ""; // Now correct
}
}
Why can we initialize a static field inside a struct but not a non-static field?
Why do we have to initialize non-static in methods bodies?
Have a look at Why Can't Value Types have Default Constructors?
The CLI expects to be able to allocate and create new instances of any value type that would require 'n' bytes of memory, by simply allocating 'n' bytes and filling them with zero. There's no reason the CLI "couldn't" provide a means of specifying either that before any entity containing structs is made available to outside code, a constructor must be run on every struct therein, or that a whenever an instance of a particular n-byte struct is created, the compiler should copy a 'template instance'. As it is, however, the CLI doesn't allow such a thing. Consequently, there's no reason for a compiler to pretend it has a means of assuring that structs will be initialized to anything other than the memory-filled-with-zeroes default.
You cannot write a custom default constructor in a structure. The instance field initializers will eventually need to get moved to the constructor which you can't define.
Static field initializers are moved to a static constructor. You can write a custom static constructor in a struct.
You can do exactly what you're trying. All you're missing is a custom constructor that calls the default constructor:
struct Animal
{
public string name = "";
public static int weight = 20;
public Animal(bool someArg) : this() { }
}
The constructor has to take at least one parameter, and then it has to forward to this() to get the members initialised.
The reason this works is that the compiler now has a way to discover the times when the code should run to initialise the name field: whenever you write new Animal(someBool).
With any struct you can say new Animal(), but "blank" animals can be created implicitly in many circumstances in the workings of the CLR, and there isn't a way to ensure custom code gets run every time that happens.

Categories