In C#, we have this non-static method on the type string:
"abc".ToUpper()
but for char, we need to use a static method:
char.ToUpper('a')
When introducing c# to beginners, they always expect to be able to write the following:
'a'.ToUpper()
Does anyone have insights as why it was designed like this?
The only thing I can think of is performance but then I would have expected a static ToUpper() for the type string too.
The difference lies in the fact that string is a reference type, and char is a keyword that represents the .Net Framework's Char Structure. When you call Char.ToUpper('a') you are actually using the Char Structure in C#. Structures are Value Types. Value Types are immutable.
Since structs are immutable, methods that act upon the struct itself do not work as expected (see Why are Mutable Structs Evil). Thus static methods are needed. When calling Char.ToUpper(aChar) you are not actually changing aChar, instead, you are creating a new instance of a character that is the uppercase representation of the character you passed in as a parameter and returning it. The example below demonstrates this.
Char aChar = 'a';
Char.ToUpper(aChar);
//aChar still equals 'a'
Char bChar = 'b';
bChar = Char.ToUpper(bChar);
//bChar now equals 'B'
The reason char has other methods that allow you to do things like 'a'.Equals('a'); is because value types and reference types both inherit from Object, which defines those methods (technically, value types are of type System.ValueType, which inherits from System.Object). These methods do not enact any changes to the object itself.
Edit - Why this question is actually speculation
As I'm very curious to see if there's an actual answer to "why do chars not have a .ToUpper() method", I decided to check out the CSharp 5 Language Specification Document, I have found the following:
char is an Integral Type (pg 80), which is a subset of Simple Types. Simple Types themselves are just predefined Struct Types. Struct types are Value Types that "can declare constants, fields, methods, properties, indexers, operators, instance constructors, static constructors, and nested types" (pg 79).
string is a Class Type, which is a Reference Type (pg 85). Class Types define "a data structure that contains data members (constants and fields), function members (methods, properties, events, indexers, operators, instance constructors, destructors and static constructors), and nested types" (pg 84).
At this point, it is obvious that chars can support a .ToUpper() method (which is why the extension method works). However, as the question states, they do not support one. At this point I'm convinced any reasoning as to why this is true is pure speculation (unless you're on the C# team, of course).
Hans Passant mentioned that it is possible to achieve this syntax easily via extension methods. I'll provide the code here in case anyone is deeply attached to using that syntax.
public static class MyExtensionMethods
{
public static char ToUpper( this char c )
{
return char.ToUpper( c );
}
}
Then you can do:
'a'.ToUpper()
(sorry, not enough space in a comment -- I know this isn't a complete answer.)
This seems to be a pattern across all the primitive types; int, double, and bool, for example, also do not have methods (except ToString() variants). So it's not just char -- it's a property of all the primitive types that c# defines.
I would guess (and it is a guess) that any time you access data, you're either directly accessing bits of RAM -- the primitive values like int and char and byte -- or you're accessing a .NET construct like an object or struct. A char is always 2 bytes at a particular memory address. So the framework can treat it like a raw memory location.
If we try to treat the raw RAM as objects, you'll either have to 'box' everything to do any work, or it's just not possible. My guess is that you can't do some core feature like virtual method dispatch on primitives, and that the world of objects and the world of primitives has to be kept separate.
Anyway, hope that advances the conversation on some level...
Related
I know that structs in .NET do not support inheritance, but its not exactly clear why they are limited in this way.
What technical reason prevents structs from inheriting from other structs?
The reason value types can't support inheritance is because of arrays.
The problem is that, for performance and GC reasons, arrays of value types are stored "inline". For example, given new FooType[10] {...}, if FooType is a reference type, 11 objects will be created on the managed heap (one for the array, and 10 for each type instance). If FooType is instead a value type, only one instance will be created on the managed heap -- for the array itself (as each array value will be stored "inline" with the array).
Now, suppose we had inheritance with value types. When combined with the above "inline storage" behavior of arrays, Bad Things happen, as can be seen in C++.
Consider this pseudo-C# code:
struct Base
{
public int A;
}
struct Derived : Base
{
public int B;
}
void Square(Base[] values)
{
for (int i = 0; i < values.Length; ++i)
values [i].A *= 2;
}
Derived[] v = new Derived[2];
Square (v);
By normal conversion rules, a Derived[] is convertible to a Base[] (for better or worse), so if you s/struct/class/g for the above example, it'll compile and run as expected, with no problems. But if Base and Derived are value types, and arrays store values inline, then we have a problem.
We have a problem because Square() doesn't know anything about Derived, it'll use only pointer arithmetic to access each element of the array, incrementing by a constant amount (sizeof(A)). The assembly would be vaguely like:
for (int i = 0; i < values.Length; ++i)
{
A* value = (A*) (((char*) values) + i * sizeof(A));
value->A *= 2;
}
(Yes, that's abominable assembly, but the point is that we'll increment through the array at known compile-time constants, without any knowledge that a derived type is being used.)
So, if this actually happened, we'd have memory corruption issues. Specifically, within Square(), values[1].A*=2 would actually be modifying values[0].B!
Try to debug THAT!
Imagine structs supported inheritance. Then declaring:
BaseStruct a;
InheritedStruct b; //inherits from BaseStruct, added fields, etc.
a = b; //?? expand size during assignment?
would mean struct variables don't have fixed size, and that is why we have reference types.
Even better, consider this:
BaseStruct[] baseArray = new BaseStruct[1000];
baseArray[500] = new InheritedStruct(); //?? morph/resize the array?
Structs do not use references (unless they are boxed, but you should try to avoid that) thus polymorphism isn't meaningful since there is no indirection via a reference pointer. Objects normally live on the heap and are referenced via reference pointers, but structs are allocated on the stack (unless they are boxed) or are allocated "inside" the memory occupied by a reference type on the heap.
Class like inheritance is not possible, as a struct is laid directly on the stack. An inheriting struct would be bigger then it parent, but the JIT doesn't know so, and tries to put too much on too less space. Sounds a little unclear, let's write a example:
struct A {
int property;
} // sizeof A == sizeof int
struct B : A {
int childproperty;
} // sizeof B == sizeof int * 2
If this would be possible, it would crash on the following snippet:
void DoSomething(A arg){};
...
B b;
DoSomething(b);
Space is allocated for the sizeof A, not for the sizeof B.
Here's what the docs say:
Structs are particularly useful for small data structures that have value semantics. Complex numbers, points in a coordinate system, or key-value pairs in a dictionary are all good examples of structs. Key to these data structures is that they have few data members, that they do not require use of inheritance or referential identity, and that they can be conveniently implemented using value semantics where assignment copies the value instead of the reference.
Basically, they're supposed to hold simple data and therefore do not have "extra features" such as inheritance. It would probably be technically possible for them to support some limited kind of inheritance (not polymorphism, due to them being on the stack), but I believe it is also a design choice to not support inheritance (as many other things in the .NET languages are.)
On the other hand, I agree with the benefits of inheritance, and I think we all have hit the point where we want our struct to inherit from another, and realize that it's not possible. But at that point, the data structure is probably so advanced that it should be a class anyway.
Structs are allocated on the stack. This means the value semantics are pretty much free, and accessing struct members is very cheap. This doesn't prevent polymorphism.
You could have each struct start with a pointer to its virtual function table. This would be a performance issue (every struct would be at least the size of a pointer), but it's doable. This would allow virtual functions.
What about adding fields?
Well, when you allocate a struct on the stack, you allocate a certain amount of space. The required space is determined at compile time (whether ahead of time or when JITting). If you add fields and then assign to a base type:
struct A
{
public int Integer1;
}
struct B : A
{
public int Integer2;
}
A a = new B();
This will overwrite some unknown part of the stack.
The alternative is for the runtime to prevent this by only writing sizeof(A) bytes to any A variable.
What happens if B overrides a method in A and references its Integer2 field? Either the runtime throws a MemberAccessException, or the method instead accesses some random data on the stack. Neither of these is permissible.
It's perfectly safe to have struct inheritance, so long as you don't use structs polymorphically, or so long as you don't add fields when inheriting. But these aren't terribly useful.
There is a point I would like to correct. Even though the reason structs cannot be inherited is because they live on the stack is the right one, it is at the same a half correct explanation. Structs, like any other value type can live in the stack. Because it will depend on where the variable is declared they will either live in the stack or in the heap. This will be when they are local variables or instance fields respectively.
In saying that, Cecil Has a Name nailed it correctly.
I would like to emphasize this, value types can live on the stack. This doesn't mean they always do so. Local variables, including method parameters, will. All others will not. Nevertheless, it still remains the reason they can't be inherited. :-)
This seems like a very frequent question. I feel like adding that value types are stored "in place" where you declare the variable; apart from implementation details, this means that there is no object header that says something about the object, only the variable knows what kind of data resides there.
Structs do support interfaces, so you can do some polymorphic things that way.
IL is a stack-based language, so calling a method with an argument goes something like this:
Push the argument onto the stack
Call the method.
When the method runs, it pops some bytes off the stack to get its argument. It knows exactly how many bytes to pop off because the argument is either a reference type pointer (always 4 bytes on 32-bit) or it is a value type for which the size is always known exactly.
If it is a reference type pointer then the method looks up the object in the heap and gets its type handle, which points to a method table which handles that particular method for that exact type. If it is a value type, then no lookup to a method table is necessary because value types do not support inheritance, so there is only one possible method/type combination.
If value types supported inheritance then there would be extra overhead in that the particular type of the struct would have to placed on the stack as well as its value, which would mean some sort of method table lookup for the particular concrete instance of the type. This would eliminate the speed and efficiency advantages of value types.
I thought that I knew how to handle structures, since I have programmed in C for years. However, I have come across this struct definition in a C# program that I am attempting to understand. It is populated with booleans and each instance of the struct is going to be a cell in an array (not shown here). I expect that the override in line 3 is used to override a method "ToString()" in a base class.
public struct Cell
{
public bool occupied;
public Cell(bool occupied) { this.occupied = occupied; }
public override string ToString() { return occupied ? "x" : "."; }
}
I understand the first line above. I believe that I am confused about the use of methods in structures, as I am assuming that the second and third lines in the above struct definition are methods. The second line is very confusing to me.
Thank You
Tom
The second line in the struct is the constructor of the struct (so yeah, it's basically a method) which takes a boolean as a parameter and assigns the value passed in to the "occupied" field.
The third line is an override of the ToString method, which is inherited by everything because it's a built-in method of the Object class, which is a superclass of every other object that exists in C#. By default, it simply outputs the fully-qualified class/struct name.
The struct of C# has little to do with the struct from C. In .NET, all (for practical purposes) entities inherit from Object.
It does not matter if they are classes (reference types) or structs (value types); both can have methods, constructors, properties, attributes etc.. The only limitation is that you cannot extend a concrete value type (that is, you cannot inherit from a struct), since their memory footprint and type is predefined when unboxed. Therefore, you can think of all value types being "final".
Also, you can have constructors on structs (which is what you're seeing in the middle of your example code). Note, however, that a struct always also has an implicit "default constructor" with no arguments, which initializes the data to all binary 0.
What exactly is your confusion? You have two good guesses about lines #2 and #3, which can be easily verified with a simple test case.
Yes, the second line is a constructor, which receives a boolean and initializes a field's value.
The third line, as you guessed, is also a method which overrides the base ToString. In this case, since there's no explicit base class, the type extends the methods found in System.Object, known colloquially in C# as object. object's implementation would simply print out the type name ("MyNamespace.Cell"), but this implementation overrides it with the contents of the boolean field.
Structs and classes are very similar to use in C#. So your struct can have methods and a constructor. But there are a few differences. For example: A struct is called by value, a class by reference.
To choose the right one of these options look here:
https://msdn.microsoft.com/en-us/library/ms229017%28v=vs.110%29.aspx
Here the differences are explained:
Structs versus classes
Main difference between struct and class in c# is that class instances are reference types, while struct instances are value types (stored in the stack, rather than the heap).
Second line in your code is just a simple constructor. Structs may have constructor as long as they are not empty constructors. (See https://msdn.microsoft.com/en-us/library/aa288208(v=vs.71).aspx)
Third line is overriding the base Object class method ToString().
Structs may define methods, there's nothing wrong with it.
For additional information about structs, make sure to check out MSDN docs about structs
Why can you do things like
int i = 10;
i.ToString();
'c'.Equals('d');
1.ToString();
true.GetType();
in C#? Those things right there are either primitive, literal, unboxed, or any combination of those things; so why do they have methods? They are not objects and so should not have methods. Is this syntax sugar for something else? If so, what? I can understand having functions that do these things, for example:
string ToString(int number)
{
// Do mad code
return newString;
}
but in that case you would call it as a function, not a method:
string ranch = ToString(1);
What's going on here?
edit:
Just realised C# isn't a java clone anymore and the rules are totally different. oops :P
They act like that because the spec says so (and it's pretty nice) :
1.28 Value types
A value type is either a struct type or an enumeration type. C# provides a set of predefined struct types called the simple types.
The simple types are identified through reserved words.
...
1.28.4 Simple types
C# provides a set of predefined struct types called the simple types.
The simple types are identified through reserved words, but these
reserved words are simply aliases for predefined struct types in the
System namespace, as described in the table below.
...
Because a simple type aliases a struct type, every simple type has
members. For example, int has the members declared in System.Int32 and
the members inherited from System.Object, and the following statements
are permitted:
int i = int.MaxValue; // System.Int32.MaxValue constant
string s = i.ToString(); // System.Int32.ToString() instance method
string t = 123.ToString(); // System.Int32.ToString() instance method
The simple types differ from other struct types in that they permit
certain additional operations:
Most simple types permit values to be created by writing literals
(§1.16.4). For example, 123 is a literal of type int and 'a' is a
literal of type char. C# makes no provision for literals of struct
types in general, and nondefault values of other struct types are
ultimately always created through instance constructors of those
struct types.
As the spec explains simple types have some super powers like the ability to be const, a special literal syntax that could be used instead of new, and the capacity to be computed at compilation time (2+2 is actually written as 4 in the final MSIL stream)
But methods (as well as operators) aren't a special super powers and all structs could have them.
The specification (for C# 4.0, my copy paste is from an earlier version) could be downloaded from the microsoft website : C# Language Specification 4.0
Eric Lippert's recent article Inheritance and Representation explains.(Spoiler: You are confusing inheritance and representation.)
Not sure why you claim that the integer i, the character 'c' and the integer 1 are not objects. They are.
In C# all primitive types are actually structures.
So that you can use them!
It's convenient to be able to do so, so you can.
Now, in order to do so, primitives can be treated as structs. E.g. a 32-bit integer can be processed as a 32-bit integer, but it can also be processed as public struct Int32 : IComparable, IFormattable, IConvertible, IComparable<int>, IEquatable<int>. We mostly get the best of both worlds.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
When should I use a struct instead of a class?
By Venkat K in the C# .NET forum.
29-Apr-09 07:38 AM
The only difference between "class" and "struct" is that a struct defaults to having public members (both data members and function members) and a struct defaults to public inheritance, whereas a class defaults to private members and private inheritance. That is the only difference. This difference can be circumvented by explicitly specifying "public", private", or "protected" so a struct can be made to act like a class in ever way and vice versa.by convention, most programmer's use "struct" for data types that have no member functions and that do not use inheritance. They use "class" for data types with member functions and inheritance. However, this is not necessary or even universallly acepted.
*(Can this be true)
That statement is entirely false when it comes to C#. I believe it may well be true for C++, but in C# the difference is between a value type (structs) and a reference type (classes).
It's not the only difference. Structs are value types and classes are reference types. Look it up on Google to find out more about the differences, which are too many to list in the minute I have right now.
Unless, as #BrokenGlass points out it's C++.
The statement you quote would be correct for C++, but not for C#. Here's an excerpt of what the C# Language Specification has to say about structs:
Like classes, structs are data structures that can contain data members and function members, but unlike classes, structs are value types and do not require heap allocation. A variable of a struct type directly stores the data of the struct, whereas a variable of a class type stores a reference to a dynamically allocated object. Struct types do not support user-specified inheritance, and all struct types implicitly inherit from type object.
Structs are particularly useful for small data structures that have value semantics. Complex numbers, points in a coordinate system, or key-value pairs in a dictionary are all good examples of structs. The use of structs rather than classes for small data structures can make a large difference in the number of memory allocations an application performs.
Since a struct in C# consists of the bits of its members, you cannot have a value type T which includes any T fields:
// Struct member 'T.m_field' of type 'T' causes a cycle in the struct layout
struct T { T m_field; }
My understanding is that an instance of the above type could never be instantiated*—any attempt to do so would result in an infinite loop of instantiation/allocation (which I guess would cause a stack overflow?**)—or, alternately, another way of looking at it might be that the definition itself just doesn't make sense; perhaps it's a self-defeating entity, sort of like "This statement is false."
Curiously, though, if you run this code:
BindingFlags privateInstance = BindingFlags.NonPublic | BindingFlags.Instance;
// Give me all the private instance fields of the int type.
FieldInfo[] int32Fields = typeof(int).GetFields(privateInstance);
foreach (FieldInfo field in int32Fields)
{
Console.WriteLine("{0} ({1})", field.Name, field.FieldType);
}
...you will get the following output:
m_value (System.Int32)
It seems we are being "lied" to here***. Obviously I understand that the primitive types like int, double, etc. must be defined in some special way deep down in the bowels of C# (you cannot define every possible unit within a system in terms of that system... can you?—different topic, regardless!); I'm just interested to know what's going on here.
How does the System.Int32 type (for example) actually account for the storage of a 32-bit integer? More generally, how can a value type (as a definition of a kind of value) include a field whose type is itself? It just seems like turtles all the way down.
Black magic?
*On a separate note: is this the right word for a value type ("instantiated")? I feel like it carries "reference-like" connotations; but maybe that's just me. Also, I feel like I may have asked this question before—if so, I forget what people answered.
**Both Martin v. Löwis and Eric Lippert have pointed out that this is neither entirely accurate nor an appropriate perspective on the issue. See their answers for more info.
***OK, I realize nobody's actually lying. I didn't mean to imply that I thought this was false; my suspicion had been that it was somehow an oversimplification. After coming to understand (I think) thecoop's answer, it makes a lot more sense to me.
As far as I know, within a field signature that is stored in an assembly, there are certain hardcoded byte patterns representing the 'core' primitive types - the signed/unsigned integers, and floats (as well as strings, which are reference types and a special case). The CLR knows natively how to deal with those. Check out Partition II, section 23.2.12 of the CLR spec for the bit patterns of the signatures.
Within each primitive struct ([mscorlib]System.Int32, [mscorlib]System.Single etc) in the BCL is a single field of that native type, and because a struct is exactly the same size as its constituent fields, each primitive struct is the same bit pattern as its native type in memory, and so can be interpreted as either, by the CLR, C# compiler, or libraries using those types.
From C#, int, double etc are synonyms of the mscorlib structs, which each have their primitive field of a type that is natively recognised by the CLR.
(There's an extra complication here, in that the CLR spec specifies that any types that have a 'short form' (the native CLR types) always have to be encoded as that short form (int32), rather than valuetype [mscorlib]System.Int32. So the C# compiler knows about the primitive types as well, but I'm not sure of the exact semantics and special-casing that goes on in the C# compiler and CLR for, say, method calls on primitive structs)
So, due to Godel's Incompleteness Theorem, there has to be something 'outside' the system by which it can be defined. This is the Magic that lets the CLR interpret 4 bytes as a native int32 or an instance of [mscorlib]System.Int32, which is aliased from C#.
My understanding is that an instance of the above type could never be instantiated any attempt to do so would result in an infinite loop of instantiation/allocation (which I guess would cause a stack overflow?)—or, alternately, another way of looking at it might be that the definition itself just doesn't make sense;
That's not the best way of characterizing the situation. A better way to look at it is that the size of every struct must be well-defined. An attempt to determine the size of T goes into an infinite loop, and therefore the size of T is not well-defined. Therefore, it's not a legal struct because every struct must have a well-defined size.
It seems we are being lied to here
There's no lie. An int is a struct that contains a field of type int. An int is of known size; it is by definition four bytes. Therefore it is a legal struct, because the size of all its fields is known.
How does the System.Int32 type (for example) actually store a 32-bit integer value
The type doesn't do anything. The type is just an abstract concept. The thing that does the storage is the CLR, and it does so by allocating four bytes of space on the heap, on the stack, or in registers. How else do you suppose a four-byte integer would be stored, if not in four bytes of memory?
how does the System.Type object referenced with typeof(int) present itself as though this value is itself an everyday instance field typed as System.Int32?
That's just an object, written in code like any other object. There's nothing special about it. You call methods on it, it returns more objects, just like every other object in the world. Why do you think there's something special about it?
Three remarks, in addition to thecoop's answer:
Your assertion that recursive structs inherently couldn't work is not entirely correct. It's more like a statement "this statement is true": which is true if it is. It's plausible to have a type T whose only member is of type T: such an instance might consume 0 bytes, for example (since its only member consumes 0 bytes). Recursive value types only stop working if you have a second member (which is why they are disallowed).
Take a look at Mono's definition of Int32. As you can see: it actually is a type containing itself (since int is just an alias for Int32 in C#). There is certainly "black magic" involved (i.e. special-casing), as the comments explain: the runtime will lookup the field by name, and just expect that it's there - I also assume that the C# compiler will special-case the presence of int here.
In PE assemblies, type information is represented through "type signature blobs". These are sequences of type declarations, e.g. for method signatures, but also for fields. The list of available primitive types in such a signature is defined in section 22.1.15 of the CLR specification; a copy of the allowed values is in the CorElementType enumeration. Apparently, the reflection API maps these primitive types to their corresponding System.XYZ valuetypes.