When using the unsafe or fixed keyword in C#, you can define pointers to unmanaged types, like byte* int* etc. You can also define a pointer to any struct that only contains unmanaged types, for example:
namespace a
{
struct MyStruct
{
int value1;
int value2;
}
class b<T>
{
unsafe void SomeMethod()
{
MyStruct* ptr;
}
}
}
However, if the struct is defined within a generic class definition, I get error CS0208: Cannot take the address of, get the size of, or declare a pointer to a managed type. What is the a reason for this restriction?
UPDATE: This error only occurs if the containing class is a generic. I still see no reason for the error - the compiler can see that the struct will always contain unmanaged types, as it doesn't reference the generic type T.
namespace a
{
class b<T>
{
struct MyStruct
{
int value1;
int value2;
}
unsafe void SomeMethod()
{
MyStruct* ptr; // gives a compiler error
}
}
}
NOTE: It seems like this feature is being added to C# in an eventual version: see this issue on GitHub.
I've edited your code example so that it can actually reproduce the error.
The issue here is that, while the struct appears to be a legal unmanaged type, by nesting it in a generic type, it becomes a "constructed type", which is considered to be a managed type. This is because the full type of your struct actually includes the type parameter and generic types are always managed types. I.e. the type isn't just MyStruct, but rather a.b<T>.MyStruct where T is some type.
From the C# 5 language specification, "10.3.8.6 Nested types in generic classes":
Every type declaration contained within a generic class declaration is implicitly a generic type declaration.
"4.4 Constructed types" reads:
A type-name might identify a constructed type even though it doesn’t specify type parameters directly. This can occur where a type is nested within a generic class declaration, and the instance type of the containing declaration is implicitly used for name lookup…In unsafe code, a constructed type cannot be used as an unmanaged-type.
And from "18.2 Pointer types":
…the referent type of a pointer must be an unmanaged-type.
An unmanaged-type is any type that isn’t a reference-type or constructed type, and doesn’t contain reference-type or constructed type fields at any level of nesting.
In other words, the language specification makes it clear both that MyStruct is a "constructed type", and that you aren't allowed to have pointers to constructed types.
As for why the specification makes these restrictions, I'm not the language designer and so I can't provide a definitive answer on that. However, to me it seems safe to assume that the main issue here is that for a constructed type, it is theoretically possible for the type to not be verifiable at compile type as being safe for unsafe code.
In your example, the type parameter T is not used in MyStruct. But it could be, and that would be obviously bad in the unsafe pointer context.
I intuitively would guess that it's theoretically possible for the compiler to do additional analysis to verify MyStruct can be treated as a strictly unmanaged type, but a) I could easily be wrong about that (language designers and compiler writers know a lot more about what could go wrong in situations like this than I would), and b) even if it's theoretically possible, it would be an additional and significant complication in the language specification and the writing of any C# compiler.
That latter point is IMHO justification enough for the language designers to just rule it out. After all, many if not most types nested in a generic type would be using the generic type parameter anyway, so the usefulness of such additional analysis and leniency is probably limited.
Related
Been browsing through .NET source code of .NET Framework Reference Source, just for fun of it. And found something I don't understand.
There is a Int32.cs file with C# code for Int32 type. And somehow that seems strange to me. How does the C# compiler compile code for Int32 type?
public struct Int32: IComparable, IFormattable, IConvertible {
internal int m_value;
// ...
}
But isn't this illegal in C#? If int is only an alias for Int32, it should fail to compile with Error CS0523:
Struct member 'struct2 field' of type 'struct1' causes a cycle in the struct layout.
Is there some magic in the compiler, or am I completely off track?
isn't this illegal in C#? If "int" is only alias for "Int32" it should fail to compile with error CS0523. Is there some magic in the compiler?
Yes; the error is deliberately suppressed in the compiler. The cycle checker is skipped entirely if the type in question is a built-in type.
Normally this sort of thing is illegal:
struct S { S s; int i; }
In that case the size of S is undefined because whatever the size of S is, it must be equal to itself plus the size of an int. There is no such size.
struct S { S s; }
In that case we have no information from which to deduce the size of S.
struct Int32 { Int32 i; }
But in this case the compiler knows ahead of time that System.Int32 is four bytes because it is a very special type.
Incidentally, the details of how the C# compiler (and, for that matter, the CLR) determines when a set of struct types is cyclic is extremely interesting. I'll try to write a blog article about that at some point.
int is an alias for Int32, but the Int32 struct you are looking at is simply metadata, it is not a real object. The int m_value declaration is possibly there only to give the struct the appropriate size, because it is never actually referenced anywhere else (which is why it is allowed to be there).
So, in other words, the compiler kind of saves this from being a problem. There is a discussion on the topic in the MSDN Forums.
From the discussion, here is a quote from the chosen answer that helps to try to determine how the declaration is possible:
while it is true that the type contains an integer m_value field - the
field is never referenced. In every supporting method (CompareTo,
ToString, etc), "this" is used instead. It is possible that the
m_value fields only exist to force the structures to have the
appropriate size.
I suspect that when the compiler sees "int", it translates it into "a
reference to System.Int32 in mscorlib.dll, to be resolved later", and
since it's building mscorlib.dll, it does end up with a cyclical
reference (but not one that can ever cause problems, because m_value
is never used). If this assumption is correct, then this trick would
only work for special compiler types.
Reading further, it can be determined that the struct is simply metadata, and not a real object, so it is not bound by the same recursive definiton restraints.
I am learning the basics of C++, coming from the .NET world (C#).
One topic i found interesting was the const keyword and its usage with pointers (const pointer/pointer to const).
I'd like to know if there's any C# language equivalent of the const pointer/pointer to const that C++ has?
(I know C# doesn't have pointers, i am considering references to be the pointer-like types in C#).
Also, out of interest, if there's no such equivalent, what were the decisions behind not including such a feature?
There is no direct equivalent to passing references as 'const' in C#, but there are alternative ways to accomplish its purpose. The most common way to do this is to make your reference class either completely immutable (once constructed, its state should never change) or pass it as an immutable public interface. The latter is the closest to the intention of the 'const' parameter contract (I'm giving you a reference to something so you can use it, but I'm asking you not to change it.) A poorly-behaved client could 'cast away' the public interface to a mutable form, of course, but it still makes the intention clear. You could 'cast away' const in C++, as well, thought this was rarely a good idea.
One other thing in C++ is that you would often prefer to pass as const when you knew that the lifetime of the reference you were passing was limited in scope. C++ often follows the pattern where objects are created and destroyed on the stack within method scope, so any references to those objects should not be persisted outside that scope (since using them after they fall out of scope could cause really nasty stack corruption crashes.) A const reference should not be mutated, so it's a strong hint that storing it somewhere to reference later would be a bad idea. A method with const parameters is promising that it's safe to pass these scoped references. Since C# never allows storing references to objects on the stack (outside of parameters), this is less of a concern.
The concept of constant objects (i.e. readonly) in C# (or Java for that matter) corresponds approximately to object *const in C++, i.e. a constant pointer to a non-constant object.
There are several reasons for it - for one specifying const correctly and making it useful in the language is quite hard. Taking c++ as an example, you have to define lots of methods twice with only small changes to the signature, there's const_cast, the fact that const is only applied shallow, etc.
So C# went for the easy solution to make the language simpler - D went the other way with transitive const correctness, etc. as I understand it (never written a single line in D, so take that with a grain of salt).
The usual solution in C#/Java is to have immutable classes, possibly using a builder pattern, or a simple wrapper that prohibits changes (e.g. all the unmodifiable collections that wrap another collection and throw exceptions for the mutating methods like add).
8,000,000 years later, but C# 7.2 uses the "in" keyword which is sort of like const. It tells the compiler to pass a struct or primitive variable by reference (like ref or out) but the method WILL NOT modify the variable.
public void DoSomething(in int variable){
//Whatever
}
is functionally equivalent to C++'s
void Foo::DoSomething(int& const variable){
//Whatever
}
or arguably even
void Foo::DoSomething(int const * const variable){
//Whatever
}
The main reason for doing this in C#, according to MSDN, is to tell the compiler that it can pass the variable by reference since it won't be modified. This allows for potentially better performance when passing large structs
Regarding constant pointers, see this answer: Difference between const. pointer and reference?
In other words, a reference in C++ is very similar to a const pointer for most applications. However, a reference in C# is closer to a pointer in C++ regarding how they can be used.
Reference objects passed as arguments to methods in C# can be reinstantiated (say, from Object Instance A to Object Instance B), however the B's lifetime is only within the method scope and is disposed of once returned to the caller (since the pointer itself is passed by value) and the caller's reference is always to A. In this sense, you can freely pass around references in C# and know that they cannot be made to point to different objects (unless ref/out keywords are used).
C# example -
class Foo
{
//Stateful field
public int x;
//Constructor
public Foo()
{
x = 6;
}
}
public class Program
{
public static void Main()
{
var foo = new Foo();
foo.x = 8;
VarTestField(foo);
Console.WriteLine(foo.x);
RefTestField(ref foo);
Console.WriteLine(foo.x);
}
//Object passed by reference, pointer passed by value
static void VarTestField(Foo whatever){
whatever = new Foo();
}
//Object passed by reference, pointer passed by reference
static void RefTestField(ref Foo whatever){
whatever = new Foo();
}
}
Output:
8
6
So no, you cannot declare a constant pointer in C#. Still, using proven design patterns, algorithms, and OOP fundamentals wisely, along with the built in syntax of the language, you can achieve the desired behavior.
In the C# language specifications it explicitly states:
Delegates are similar to the concept
of function pointers found in some
other languages, but unlike function
pointers, delegates are
object-oriented and type-safe.
I understand delegates need to be a little more flexible than pointers because .NET moves memory around. That's the only difference I'm aware of, but I am not sure how this would turn a delegate into in OO concept...?
What makes a function pointer not object oriented? Are pointers and function pointers equivalent?
Well, Wikipedia says that "object oriented" means using "features such as data abstraction, encapsulation, messaging, modularity, polymorphism, and inheritance." Lacking a better definition, let's go with that.
Function pointers don't contain data, they don't encapsulate implementation details, they neither send nor receive messages, they are not modular, they are not typically used in a polymorphic manner (though I suppose they could in theory be covariant and contravariant in their return and formal parameter types, as delegates now are in C# 4) and they do not participate in an inheritance hierarchy. They are not self-describing; you can't ask a function pointer for its type because it doesn't have one.
By contrast, delegates capture data -- they hold on to the receiver. They support messaging in the sense that you can "message" a delegate by calling its ToString or GetType or Invoke or BeginInvoke methods to tell it to do something, and it "messages" you back with the result. Delegate types can be restricted to certain accessibility domains if you choose to do so. They are self-describing objects that have metadata and at runtime know their own type. They can be combined with other delegates. They can be used polymorphically as System.MulticastDelegate or System.Delegate, the types from which they inherit. And they can be used polymorphically in the sense that in C# 4 delegate types may be covariant and contravariant in their return and parameter types.
I believe it is because, when you hold a delegate to a member method, the OO framework "knows" you are holding a reference to the holding object, whereas with function pointers, first of all function isn't necessarily a member method and second of all, if the function is a member methods, the OO framework doesn't know it has to prevent the owning object from being freed.
Function pointers are just memory addresses.
Delegates are objects that have methods and properties:
-BeginInvoke
-DynamicInvoke
-Invoke
-Method
-Target
etc.
I'll explain with C++ examples because it's a language where this problem is present (and solved another way).
A mere function pointer just holds the address of a function, nothing else.
Consider the function
void f(int x) { return; }
Now, a simple function pointer is declared and assigned like this:
void (*fptr)(int) = &f;
And you can use it simply:
foo(5); // calls f(5)
However, in an object oriented language we usually deal with member functions, not free functions. And this is where things get nasty. Consider the following class:
class C { void g(int x) { return; } };
Declaring a function pointer to C::g is done like this:
void (*C::gptr)(int) = &C::g;
The reason why we need a different syntax is that member functions have a hidden this parameter, thus their signature is different.
For the same reason, calling them is problematic. That this parameter needs a value, which means you need to have an instance. Calling a pointer to a member function is done like this:
C c;
(c.*gptr)(5); // calls c.g(5);
Aside from the weird syntax, the real problem with this is that you need to pass the object together with your function pointer when you really just want to pass around one thing.
The obvious idea is to encapsulate the two, and that's what a delegate is. This is why a delegate is considered more OOP. I have no idea why it is considered more type-safe (maybe because you can cast function pointers to void*).
BTW the C++ solution in C++0x is adopted from Boost. It is called std::function and std::bind and works like this:
std::function<void (C*, int)> d = std::bind(&c::g, &c);
d(5); // calls c.g(5);
A function pointer can have no knowledge of the instance it belongs to unless you pass it in explicitly - all function pointers are to static members. A delegate, on the other hand, can be a regular member of the class, and the correct instance of the object will be used when the delegate is invoked.
Suppose one wants to design a general purpose anyprintf method which can behave as either fprintf, sprintf, cprintf [console printf with color support]. One approach would be to have it accept a function that accepts a void* and a char along with a void* and a va_list; it should then for each character of output call the passed-in function, passing it the supplied pointer and the character to be output.
Given such a function, one could implement vsprintf and fprintf [ignoring their return values for simplicitly] via:
void fprint_function(void* data, char ch) { fputc( (FILE*)data, ch); }
void sprint_function(void* data, char ch) { char**p = (char**)data; *((*p)++) = ch; }
void fprint_function(void* data, char ch) { cputchar( ch); }
void vfprintf(FILE *f, va_list vp, const char *fmt, va_list vp)
{
vsanyprintf(fprint_function, (void*)f, st, vp);
}
void vsprintf(char *st, va_list vp, const char *fmt, va_list vp)
{
vsanyprintf(fprint_function, (void*)f, st, vp);
}
void vcprintf(va_list vp, const char *fmt, va_list vp)
{
vsanyprintf(cprint_function, (void*)0, st, vp);
}
Effectively, the combination of the function pointer and void* behave as a method. Unfortunately, there's no way for the compiler to ensure that the data which is passed in the void* will be of the form expected by the supplied function. C++ and other object-oriented language add in compile-time validation of such type consistency.
When I develop in COM, I always see (void**) type conversion as below.
QueryInterface(/* [in] */ REFIID riid,/* [out] */ void** ppInterface)
What's exact meaning of it?
IMHO, it tells the compiler not to enforce type validation, since the type which is pointed by the ppInterface is not known to the client code at compile time.
Thanks~~~
Update 1
I understand it this way:
void* p implies AnyType* p
void ** pp implies pointer to AnyType*
Update 2
If void**pp means "pointer to void*", then what checks does the compiler do when it sees it?
A void ** is a pointer to a void *. This can be used to pass the address of a void * variable that will be used as an output parameter - eg:
void alloc_two(int n, void **a, void **b)
{
*a = malloc(n * 100);
*b = malloc(n * 200);
}
/* ... */
void *x;
void *y;
alloc_two(10, &x, &y);
The reason why COM uses void** with QueryInterface are somewhat special. (See below.)
Generally, void** simply means a pointer to void*, and it can be used for out parameters, ie. parameters that indicate a place where a function can return a value to. Your comment /* [out] */ indicates that the location pointed to by ppvInterface will be written to.
"Why can parameters with a pointer type be used as out parameters?", you ask? Remember that you can change two things with a pointer variable:
You can change the pointer itself, such that it points to another object. (ptr = ...)
You can modify the pointed-to object. (*ptr = ...)
Pointers are passed to a function by value, ie. the function gets its own local copy of the original pointer that was passed to it. This means you can change the pointer parameter inside the function (1) without affecting the original pointer, since only the local copy is modified. However, you can change the pointed-to object (2) and this will be visible outside of the function, because the copy has the same value as the original pointer and thus references the same object.
Now, about COM specifically:
A pointer to an interface (specified by riid) will be returned in the variable referenced by ppvInterface. QueryInterface achieves this via mechanism (2) mentioned above.
With void**, one * is required to allow mechanism (2); the other * reflects the fact that QueryInterface does not return a newly created object (IUnknown), but an already existing one: In order to avoid duplication of that object, a pointer to that object (IUnknown*) is returned.
If you're asking why ppvInterface has type void** and not IUnknown**, which would seem more reasonable type-safety-wise (since all interfaces must derive from IUnknown), then read the following argument taken from the book Essential COM by Don Box, p. 60 (chapter Type Coercion and IUnknown):
One additional subtlety related to QueryInterface concerns its second parameter, which is of type void **. It is very ironic that QueryInterface, the underpinning of the COM type system, has a fairly type-unsafe prototype in C++ [...]
IPug *pPug = 0;
hr = punk->QueryInterface(IID_IPug, (void**)&pPug);
Unfortunately, the following looks equally correct to the C++ compiler:
IPug *pPug = 0;
hr = punk->QueryInterface(IID_ICat, (void**)&pPug);
This more subtle variation also compiles correctly:
IPug *pPug = 0;
hr = punk->QueryInterface(IID_ICat, (void**)pPug);
Given that the rules of inheritance do not apply to pointers, this alternative definition of QueryInterface does not alleviate the problem:
HRESULT QueryInterface(REFIID riid, IUnknown** ppv);
The same limitation applies to references as to pointers as well. The following alternative definition is arguably more convenient for clients to use:
HRESULT QueryInterface(const IID& riid, void* ppv);
[...] Unfortunately, this solution does not reduce the number of errors [...] and, by eliminating the need for a cast, removes a visual indicator that C++ type safety might be in jeopardy. Given the desired semantics of QueryInterface, the argument types Microsoft chose are reasonable, if not type safe or elegant. [...]
It is just a pointer to void*.
Eg:
Something* foo;
Bar((void**)&foo);
// now foo points to something meaningful
Edit: A possible implementation in C#.
struct Foo { }
static Foo foo = new Foo();
unsafe static void Main(string[] args)
{
Foo* foo;
Bar((void**)&foo);
}
static unsafe void Bar(void** v)
{
fixed (Foo* f = &foo)
{
*v = f;
}
}
Passing by void * also ensures that the pointed to object cannot be deleted or tampered (accidentally).
"This implies that an object cannot be deleted using a pointer of type void* because there are no objects of type void."
It's a pointer to the interface pointer you request using this call. Obviously you can request all sorts of interfaces, so it has to be a void pointer. If the interface doesn't exist, the pointer is set to NULL.
edit: Detailed information to be found here: http://msdn.microsoft.com/en-us/library/ms682521(VS.85).aspx
It allows the API to specify that a pointer may be used as an [in-out] parameter in future, but for now, the pointer is unused. (NULL is usually the required value.)
When returning one of many possible types, with no common supertype (such as with QueryInterface), returning a void* is really the only option, and as this needs to be passed as an [out] parameter a pointer to that type (void**) is needed.
not to enforce type validation
Indeed, void* or void** are there to allow the use of different types of pointers, that can be downcasted to void* to fit in the function parameters type.
Pointer to pointer of unknown interface that can be provided.
Instead of using pointers to pointers, try using a reference to a pointer. It's a bit more C++ than using **.
e.g.
void Initialise(MyType &*pType)
{
pType = new MyType();
}
The delegates in C# offer similar functionality as function pointers in C. I heard someone saying "C# delegates are actually better than function pointers in C". How come? Please explain with an example.
"Better" is subjective -- but the main differences are:
Type safety. A delegate is not only guaranteed to refer to a valid method, it is guaranteed to refer to a method with the correct signature.
It's a bound method pointer -- that is, the delegate can point to a specific object on which to call the delegate. Thus, an Action<string> delegate could refer to alice.GetName or bob.GetName rather than just Person.GetName. This might be similar to C++ "pointer to member" -- I'm not sure.
In addition, the C# language supports closures through delegates to anonymous methods and lambda expressions -- i.e. capturing local variables of the declaring procedure, which delegate can reference when it later gets executed. This isn't strictly speaking a feature of delegates -- it's enabled by the C# compiler doing some magic on anonymous methods and lambda expressions -- but it's still worth mentioning because it enables a lot of the functional idioms in C#.
EDIT: As CWF notes in comments, another possible advantage of C# delegates is that the delegate type declarations are easier for many people to read. This may be a matter of familiarity and experience, of course.
Pointers can always point to the wrong place :) I.e it can point to a non-function or an arbitrary place in memory.
But in terms of functionality, function pointers can do anything that delegates can do.
One thing that a delegate provides that a C/C++ function pointer doesn't is type safety. That is, in C/C++, you can shove a function pointer into a function pointer variable declared with the wrong function signature (or even an int a double or worse with appropriate coaxing), and the compiler will be happy to produce code that calls the function completely incorrectly. In C#, the type signature of the function must match the type signature of the delegate and also the way the delegate is ultimately called.
Many people refer to C# delegates as more "type-safe" than C++ function pointers and I really find it misleading. In reality they are no more type-safe that C++'s function pointers are. An example C++ code (compiled by MSVS 2005 SP1):
typedef int (*pfn) (int);
int f (int) {
return 0;
}
double d (int) {
return 1;
}
int main()
{
pfn p=f; // OK
p=d; // error C2440: '=' : cannot convert from 'double (__cdecl *)(int)' to 'pfn'
p=(pfn)d;
}
So as is seen from the example above unless one uses "dirty hacks" to "shut up" the compiler the type mismatch is easily detected and the compiler's message is easy to understand. So that is type-safety as I understand it.
Regarding the "boundness" of the member function pointers. Indeed, in C++ pointer-to-member is not bound, the member function pointer has to be applied to a
type variable that matches the member pointer's signature. An example:
class A {
public:
int f (int) {
return 2;
}
};
typedef int (A::*pmfn) (int);
int main()
{
pmfn p=&(A::f);
// Now call it.
A *a=new A;
(a->*p)(0); // Calls A::f
}
Again, everything is perfectly type safe.