Been browsing through .NET source code of .NET Framework Reference Source, just for fun of it. And found something I don't understand.
There is a Int32.cs file with C# code for Int32 type. And somehow that seems strange to me. How does the C# compiler compile code for Int32 type?
public struct Int32: IComparable, IFormattable, IConvertible {
internal int m_value;
// ...
}
But isn't this illegal in C#? If int is only an alias for Int32, it should fail to compile with Error CS0523:
Struct member 'struct2 field' of type 'struct1' causes a cycle in the struct layout.
Is there some magic in the compiler, or am I completely off track?
isn't this illegal in C#? If "int" is only alias for "Int32" it should fail to compile with error CS0523. Is there some magic in the compiler?
Yes; the error is deliberately suppressed in the compiler. The cycle checker is skipped entirely if the type in question is a built-in type.
Normally this sort of thing is illegal:
struct S { S s; int i; }
In that case the size of S is undefined because whatever the size of S is, it must be equal to itself plus the size of an int. There is no such size.
struct S { S s; }
In that case we have no information from which to deduce the size of S.
struct Int32 { Int32 i; }
But in this case the compiler knows ahead of time that System.Int32 is four bytes because it is a very special type.
Incidentally, the details of how the C# compiler (and, for that matter, the CLR) determines when a set of struct types is cyclic is extremely interesting. I'll try to write a blog article about that at some point.
int is an alias for Int32, but the Int32 struct you are looking at is simply metadata, it is not a real object. The int m_value declaration is possibly there only to give the struct the appropriate size, because it is never actually referenced anywhere else (which is why it is allowed to be there).
So, in other words, the compiler kind of saves this from being a problem. There is a discussion on the topic in the MSDN Forums.
From the discussion, here is a quote from the chosen answer that helps to try to determine how the declaration is possible:
while it is true that the type contains an integer m_value field - the
field is never referenced. In every supporting method (CompareTo,
ToString, etc), "this" is used instead. It is possible that the
m_value fields only exist to force the structures to have the
appropriate size.
I suspect that when the compiler sees "int", it translates it into "a
reference to System.Int32 in mscorlib.dll, to be resolved later", and
since it's building mscorlib.dll, it does end up with a cyclical
reference (but not one that can ever cause problems, because m_value
is never used). If this assumption is correct, then this trick would
only work for special compiler types.
Reading further, it can be determined that the struct is simply metadata, and not a real object, so it is not bound by the same recursive definiton restraints.
Related
I'm aware that Pinnable<T> is an internal class used by the methods in the new Unsafe class, and it's not meant to be used anywhere else other than in that class. This question is not about something practical, but it's just to understand why it's been designed like this and to learn a bit more about the language and its various "tricks" like this one.
As a recap, the Pinnable<T> class is defined here, and it looks like this:
[StructLayout(LayoutKind.Sequential)]
internal sealed class Pinnable<T>
{
public T Data;
}
And it's mainly used in the Span<T>.DangerousCreate method, here:
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)
{
Pinnable<T> pinnable = Unsafe.As<Pinnable<T>>(obj);
IntPtr byteOffset = Unsafe.ByteOffset<T>(ref pinnable.Data, ref objectData);
return new Span<T>(pinnable, byteOffset, length);
}
The reason for Pinnable<T> being that it's used to keep track of the original object, in case the Span<T> instance was created by one (instead of a native pointer).
Given that reference type doesn't matter when pinning a reference (fixing both a ref T and Unsafe.As<T, byte>(ref T) works the same), is there a specific reason why the Pinnable<T> class was made generic? The original design in DotNetCross here in fact had a Pinnable class with just a single byte field, and it worked just the same. Is there any reason why using a generic class in this case would be an advantage, other than avoiding to cast the reference time when writing/reading/returning it?
Is there any other way, other than this unsafe-cast done with Unsafe.As, to get a reference to an object (I mean a reference to the object contents, otherwise it'd be the same as any variable of a class type)? I mean, any way to get a reference (which should basically have the same address of the actual object variable in the first place, right?) to an object without having to pass through some custom defined secondary class.
First of all, the Struct in [StructLayout(LayoutKind.Sequential)] doesn't mean that it is only valid for structs, it means the layout of the actual structure of the fields in memory, be it in a class or in a value type. This controls the actual runtime layout of the data, not just how the type would marshal to unmanaged code. The Sequential is important because without it, the runtime is pretty much free to store the memory however it sees fit, which means that Data may have some padding before it.
From what I understand about the implementation, the reason for Pinnable is to allow creating an instance of Span to a memory that may be moved by the GC, without having to pin the object first. If you don't use actual pointers and just references, nothing at all will need to be pinned.
I have noticed that it was introduced in a commit with a description saying it made Span more "portable" (a bold word for something that does a lot of unsafe things). I can't think of any other reason than something related to alignment for why it is generic. I suppose representing a T in terms of an offset from another T is better than as an offset from a byte. It may happen that the type of the first field may play a role in its actual address, even if the type was marked with LayoutKind.Sequential.
A reference to an object is different from an interior reference to an object (a reference to its data). It is implementation defined, but in .NET Framework, an instance of any class (or a boxed value type) starts with a header consisting of a sync block (for lock) and a pointer to the method table, a.k.a. the type of the object. On 32-bit, the header is 8 bytes, but the actual pointer points to the pointer to the method table (for performance reasons, getting the type happens more often than locking an object).
One but not portable way of getting the pointer to the start of the data is therefore casting the object reference to a pointer and adding 4 bytes to it. There the first field should start.
Another way I can think of is utilising GCHandle.AddrOfPinnedObject. It is commonly used for accessing array or string data, but it works for other objects:
[StructLayout(LayoutKind.Sequential)]
class Obj
{
public int A;
}
var obj = new Obj();
var gc = GCHandle.Alloc(obj, GCHandleType.Pinned);
IntPtr interior = gc.AddrOfPinnedObject();
Marshal.WriteInt32(interior, 0, 16);
Console.WriteLine(obj.A);
I think this actually is quite portable, but still needs to pin the object (there is InternalAddrOfPinnedObject defined in GCHandle, but even if that doesn't check whether the handle is actually pinned, the returned value may not be valid if it was used on a non-pinned object).
Still, the technique Span uses seems like the most portable way of doing that, since a lot of the underlying work is done in pure CIL (like reference arithmetics).
When using the unsafe or fixed keyword in C#, you can define pointers to unmanaged types, like byte* int* etc. You can also define a pointer to any struct that only contains unmanaged types, for example:
namespace a
{
struct MyStruct
{
int value1;
int value2;
}
class b<T>
{
unsafe void SomeMethod()
{
MyStruct* ptr;
}
}
}
However, if the struct is defined within a generic class definition, I get error CS0208: Cannot take the address of, get the size of, or declare a pointer to a managed type. What is the a reason for this restriction?
UPDATE: This error only occurs if the containing class is a generic. I still see no reason for the error - the compiler can see that the struct will always contain unmanaged types, as it doesn't reference the generic type T.
namespace a
{
class b<T>
{
struct MyStruct
{
int value1;
int value2;
}
unsafe void SomeMethod()
{
MyStruct* ptr; // gives a compiler error
}
}
}
NOTE: It seems like this feature is being added to C# in an eventual version: see this issue on GitHub.
I've edited your code example so that it can actually reproduce the error.
The issue here is that, while the struct appears to be a legal unmanaged type, by nesting it in a generic type, it becomes a "constructed type", which is considered to be a managed type. This is because the full type of your struct actually includes the type parameter and generic types are always managed types. I.e. the type isn't just MyStruct, but rather a.b<T>.MyStruct where T is some type.
From the C# 5 language specification, "10.3.8.6 Nested types in generic classes":
Every type declaration contained within a generic class declaration is implicitly a generic type declaration.
"4.4 Constructed types" reads:
A type-name might identify a constructed type even though it doesn’t specify type parameters directly. This can occur where a type is nested within a generic class declaration, and the instance type of the containing declaration is implicitly used for name lookup…In unsafe code, a constructed type cannot be used as an unmanaged-type.
And from "18.2 Pointer types":
…the referent type of a pointer must be an unmanaged-type.
An unmanaged-type is any type that isn’t a reference-type or constructed type, and doesn’t contain reference-type or constructed type fields at any level of nesting.
In other words, the language specification makes it clear both that MyStruct is a "constructed type", and that you aren't allowed to have pointers to constructed types.
As for why the specification makes these restrictions, I'm not the language designer and so I can't provide a definitive answer on that. However, to me it seems safe to assume that the main issue here is that for a constructed type, it is theoretically possible for the type to not be verifiable at compile type as being safe for unsafe code.
In your example, the type parameter T is not used in MyStruct. But it could be, and that would be obviously bad in the unsafe pointer context.
I intuitively would guess that it's theoretically possible for the compiler to do additional analysis to verify MyStruct can be treated as a strictly unmanaged type, but a) I could easily be wrong about that (language designers and compiler writers know a lot more about what could go wrong in situations like this than I would), and b) even if it's theoretically possible, it would be an additional and significant complication in the language specification and the writing of any C# compiler.
That latter point is IMHO justification enough for the language designers to just rule it out. After all, many if not most types nested in a generic type would be using the generic type parameter anyway, so the usefulness of such additional analysis and leniency is probably limited.
Is int (aka Int32) an object , or a primitive in .NET (I'm not asking regarding int?)?
I hit F12 on the saved word int and got :
public struct Int32 : IComparable, IFormattable, IConvertible, IComparable<int>, IEquatable<int>
{ ... }
It doesn't inherit from Object , does it mean that int is a primitive ?
Everything in C# inherits from object, including int.
From msdn:
Int32 is an immutable value type that represents signed integers
and
Both reference and value types are derived from the ultimate base
class Object.
Int32 is a struct, which is like a type (compile time) and not an object (run time). So you can't say "Int32 is an object", but you could say "Int32 inherits from object".
A struct is a ValueType and a ValueType derives from object.
int and Int32 and synonyms where Int32 is better suited in operations where the reader cares about the length in bits (bit fiddling operations, overflow situations etc.)
Referring to this MSDN site there are 15 build in types, from which 2 are classes (object and string) and the rest are primitives:
bool - System.Boolean
byte - System.Byte
sbyte - System.SByte
char - System.Char
decimal - System.Decimal
double - System.Double
float - System.Single
int - System.Int32
uint - System.UInt32
long - System.Int64
ulong - System.UInt64
object - System.Object
short - System.Int16
ushort - System.UInt16
string - System.String
The primitive types are the one identified through keywords, so yes int is a primitive type.
The primitive types also allow you to use that as literals.
However, the underlying type that the keyword identifies is System.Int32 which is not a primitive types.
This is a value type, not a reference type (or object).
MSDN - "The primitive types are identified through keywords, which are aliases for predefined types in the System namespace. A primitive type is completely indistinguishable from the type it aliases: writing the reserved word int is exactly the same as writing System.Int32.
Because a primitive type aliases a regular type, every primitive type has members. For example, Integer has the members declared in System.Int32. Literals can be treated as instances of their corresponding types."
So basically int and Int32 are synonymous; You would be inclined to use int where you just need 'an integer',where when using Int32 the size is explicitly shown so future maintainers will know it's safe to enlarge an int if appropriate, but should take care changing Int32 variables in the same way. The resulting code using both will be identical, but the difference is only in the readability of the code or also if you want to call it code presentation.
You can read Applied .NET Framework Programming - the author Jeffrey Richter makes a good example of using the full type names. Here are the main things that I remembered:
Type names can vary between .NET languages. For example, in C#, long maps to System.Int64 while in C++ with managed extensions, long maps to Int32. Since languages can be mixed-and-matched while using .NET, you can be sure that using the explicit class name will always be clearer, no matter the reader's preferred language.
int in C# is an alias for Int32, and they behave exactly the same. One would usually use Int32, instead of int, for readability and explicitness. Now, unfortunately your question can’t really be answered with a simple yes or no answer. - int implicitly derives from the ValueType Type which itself derives from the Object Type. And so, in C#, all Types do in-fact derive from the Object Type.
But, since ValueType can not be explicitly derived from (because it is an Abstract Class), the compiler needs to , and does, inherently know that int implicitly inherits from ValueType. So int does derive from the Object Type, just not explicitly.
Though int is not a Reference Type, it can still be treated like one ( like a Type that derives explicitly from the Object Type ) through a process called boxing.
It is a bit confusing at first. Everywhere online, and in books, people say that with regards to C#: All Types derive from the Object Type, which is true, but in the case of certain ValueType Types they just don’t inherit explicitly.
Furthermore, ints, as well as other ValueTypes, are constructed using Structs - not Classes in the way Reference types are.
The delegates in C# offer similar functionality as function pointers in C. I heard someone saying "C# delegates are actually better than function pointers in C". How come? Please explain with an example.
"Better" is subjective -- but the main differences are:
Type safety. A delegate is not only guaranteed to refer to a valid method, it is guaranteed to refer to a method with the correct signature.
It's a bound method pointer -- that is, the delegate can point to a specific object on which to call the delegate. Thus, an Action<string> delegate could refer to alice.GetName or bob.GetName rather than just Person.GetName. This might be similar to C++ "pointer to member" -- I'm not sure.
In addition, the C# language supports closures through delegates to anonymous methods and lambda expressions -- i.e. capturing local variables of the declaring procedure, which delegate can reference when it later gets executed. This isn't strictly speaking a feature of delegates -- it's enabled by the C# compiler doing some magic on anonymous methods and lambda expressions -- but it's still worth mentioning because it enables a lot of the functional idioms in C#.
EDIT: As CWF notes in comments, another possible advantage of C# delegates is that the delegate type declarations are easier for many people to read. This may be a matter of familiarity and experience, of course.
Pointers can always point to the wrong place :) I.e it can point to a non-function or an arbitrary place in memory.
But in terms of functionality, function pointers can do anything that delegates can do.
One thing that a delegate provides that a C/C++ function pointer doesn't is type safety. That is, in C/C++, you can shove a function pointer into a function pointer variable declared with the wrong function signature (or even an int a double or worse with appropriate coaxing), and the compiler will be happy to produce code that calls the function completely incorrectly. In C#, the type signature of the function must match the type signature of the delegate and also the way the delegate is ultimately called.
Many people refer to C# delegates as more "type-safe" than C++ function pointers and I really find it misleading. In reality they are no more type-safe that C++'s function pointers are. An example C++ code (compiled by MSVS 2005 SP1):
typedef int (*pfn) (int);
int f (int) {
return 0;
}
double d (int) {
return 1;
}
int main()
{
pfn p=f; // OK
p=d; // error C2440: '=' : cannot convert from 'double (__cdecl *)(int)' to 'pfn'
p=(pfn)d;
}
So as is seen from the example above unless one uses "dirty hacks" to "shut up" the compiler the type mismatch is easily detected and the compiler's message is easy to understand. So that is type-safety as I understand it.
Regarding the "boundness" of the member function pointers. Indeed, in C++ pointer-to-member is not bound, the member function pointer has to be applied to a
type variable that matches the member pointer's signature. An example:
class A {
public:
int f (int) {
return 2;
}
};
typedef int (A::*pmfn) (int);
int main()
{
pmfn p=&(A::f);
// Now call it.
A *a=new A;
(a->*p)(0); // Calls A::f
}
Again, everything is perfectly type safe.
Is this function declaration in C#:
void foo(string mystring)
the same as this one in C:
void foo(char *)
i.e. In C#, does the called function receive a pointer behind the scenes?
In this specific instance, it is more like:
void foo(const char *);
.Net strings are immutable and passed by reference. However, in general C# receives a pointer or reference to an object behind the scenes.
There are pointers behind the scenes in C#, though they are more like C++'s smart pointers, so the raw pointers are encapsulated. A char* isn't really the same as System.String since a pointer to a char usually means the start of a character array, and a C# string is an object with a length field and a character array. The pointer points to the outer structure which points into something like a wchar_t array, so there's some indirection with a C# string and wider characters for Unicode support.
No. In C# (and all other .NET languages) the String is a first-class data type. It is not simply an array of characters. You can convert back and forth between them, but they do not behave the same. There are a number of string manipulation methods (like "Substring()" and "StartsWith") that are available to the String class, which don't apply to arrays in general, which an array of characters is simply an instance of.
Essentially, yes. In C#, string (actually System.String) is a reference type, so when foo() is called, it receives a pointer to the string in the heap.
For value types (int, double, etc.), the function receives a copy of the value. For other objects, it's a reference pointing to the original object.
Strings are special because they are immutable. Technically it means it will pass the reference, but in practice it will behave pretty much like a value type.
You can force value types to pass a reference by using the ref keyword:
public void Foo(ref int value) { value = 12 }
public void Bar()
{
int val = 3;
Foo(ref val);
// val == 12
}
no in c# string is unicode.
in c# it is not called a pointer, but a reference.
If you mean - will the method be allowed to access the contents of the character space, the answer is yes.
Yes, because a string is of dynamic size, so there must be heap memory behind the scenes
However they are NOT the same.
in c the pointer points to a string that may also be used elsewhere, so changing it will effect those other places.
Anything that is not a "value type", which essentially covers enums, booleans, and built-in numeric types, will be passed "by reference", which is arguably the same as the C/C++ mechanism of passing by reference or pointer. Syntactically and semantically it is essentially identical to C/C++ passing by reference.
Note, however, that in C# strings are immutable, so even though it is passed by reference you can't edit the string without creating a new one.
Also note that you can't pass an argument as "const" in C#, regardless whether it is a value type or a reference type.
While those are indeed equivalent in a semantic sense (i.e. the code is doing something with a string), C#, like Java, keeps pointers completely out of its everyday use, relegating them to areas such as transitions to native OS functions - even then, there are framework classes which wrap those up nicely, such as SafeFileHandle.
Long story short, don't go out of your way thinking of pointers in C#.
As far as I know, all classes in C# (not sure about the others) are reference types.