Before I ask my question, please take a look at this example function:
DateTime.TryParse("01/01/2000", out oDate)
Why do I need to specify the out keyword? Shouldn't the compiler know this from the function's definition?
I'm asking this out of pure curiosity in the hope that I will learn something new about the compiler.
I should also clarify that I'm asking about the C# .NET 3.5 compiler in particular.
The out keyword could be implied by the compiler but my understanding is that the C# team decided to make the out keyword explicitly required by the caller of the function to increase visibility as to the nature of the parameter.
The compiler does know, you may not. It's a way of letting you know that the parameter you are passing can change in this function you are passing it to.
It's not about what the compiler knows, it's all about making sure the developer realizes this call can and will change the value of variable X.
A lot of this has it's roots in C++ where a reference value needs no call site monitor. It's impossible to look at a C++ call and know exactly what it will do. Parameters passed by reference and value in C++ have huge differences in semantics.
Yeah the compiler could figure it out, but this way you know that it is going to modify the variable you are passing in.
the C# language has a lot of what I would call safety nets that explicitely tell the programmer what is going on. A couple of examples are:
No fall through in switch statements.
You can't assign a value in an if statement: if(x = 5) throws an error instead of evaluating to true.
http://msdn.microsoft.com/en-us/library/t3c3bfhx(VS.80).aspx
"The out keyword causes arguments to be passed by reference. This is similar to the ref keyword, except that ref requires that the variable be initialized before being passed. To use an out parameter, both the method definition and the calling method must explicitly use the out keyword."
Since the DateTime.TryParse does not require oDate to be initialized, you must pass the out keyword.
OK, I'm not a C# expert, so if I mess up will somebody please correct me?
There's two ways to pass a parameter to a C# function: by value and by reference. The big difference here is whether modifying the parameter inside the function modifies the variable used to call it. This is not something I'd trust the compiler to decide for itself.
Since you want oDate to be a variable passed in from the caller, and changed, you want it passed by reference.
The other question is whether it should be initialized or not. C# likes to catch when variables are used while uninitialized, since that's almost always an error. In this case, you might well just declare what you're passing in, and use TryParse() to give it its first value. This is a perfectly legitimate technique, so the compiler should allow it. This is another thing I wouldn't trust the compiler to get right. (I assume the compiler also checks to make sure an out parameter is initialized before use in TryParse().)
So, "out" serves two purposes. It establishes that the parameter is passed in by reference, and that it is expected to be initialized inside the function. Neither of these can be determined by the compiler.
Related
How does C# Compiler and Virtual Machine (.NET Common-Language-Runtime) handle lambda functions in C#? Are they generally translated into normal functions but marked as "anonymous" in a sense in the compiler and then treated as normal functions or is there a different approach?
It probably is too broad. That said, some basic information can be easily provided:
First, "anonymous" simply means that there is no name by which you can refer to the method in your code. Any anonymous method, whether declared using the older delegate syntax or the newer lambda syntax, still winds up being compiled as a method, with an actual name, into some class (either your own or a new one the compiler creates for the purpose…see below). You simply don't have access to the name in your own code. Other than that, it is fundamentally just like any other method.
The main exception to the above is how it will deal with variable capturing. If the anonymous method captures a local variable, then a new class (also hidden from your code) is declared for the purpose and that variable winds up being stored in the class instead of as an actual local variable, and the anonymous method of course winds up being in that class too.
Finally note that the lambda syntax is use not just for anonymous methods, but also to declare instances of the Expression class. In that scenario, some of the handing is similar, but of course you don't get a type compiled into your code the way you would for an anonymous method.
As others have suggested, if you need more detail than that, then StackOverflow probably is not the best place for the answer (or, if it is the best place for the answer, surely it's already been asked and answered and you just need to search for the answer). You can use ildasm.exe, dotPeek, etc. to inspect the actual generated code, you can read the C# specification, or you can search the web for other details.
I find this question in Stack Overflow when googleing, but it has been deleted. So I list this question again.
As I can't find the LcidAttribute or RetvalAttribute in BCL, I guess C# hasn't provided the support for locale identifier parameter and return value parameter.
Is that it?
Thanks all.
They are associated with the ParameterAttributes enumeration. Which is used in metadata for the parameter of a method, only a compiler can emit the [modopt].
I do not know of a compiler that actually does this. I have a decent guess at the background though, these attributes are also used in IDL. Which is an interface description language that is used in COM and RPC. Having this option ensures that .NET metadata can also describe the kind of declarations that are written in IDL and can appear in type libraries.
The [lcid] attribute is described here. It doesn't actually describe usage and I've never used it myself. No real idea why you'd use it.
The [retval] attribute is described here. Very important in COM automation method declarations, it marks the parameter that returns the method value. And used by a tool like Tlbimp.exe, it rewrites the method to make that parameter the return value type.
I'm working with a GIS based math library that wraps lower C/C++ code in C#. Many of the parameters are pass by reference for the sake of receiving multiple outputs. If I only want some of the outputs, how can I ignore the other parameters? Is the best solution to create a dummy variable and pass it by reference and ignore its output?
Is the best solution to create a dummy variable and pass it by reference and ignore its output?
Yes, that's what I do.
I usually just create an object in my code like
object NotNeeded = null;
or something similar that says that its effectively an unnecessary parameter and then use that repeatedly. I'm not sure whether or not that'll work, though, because I'm not sure what the GIS library is doing on the other side. If it needs an actual non-null value for each one, that might be problemeatic.
You have a few choices:
dummies
wrapper methods
change the Interop Imports. Your ref parameters are most likely pointers in C++, and if they allow null then you could change the import to use pointers (IntPtr) and pass null / IntPtr.Zero.
But a few dummies is probably the best (easiest to read) option unless you have really many, many calls.
The "best" is the "only" compile-time method that I am aware of: foo(bar, ref dummy) -- but feel free to wrap away these dummy variables if it makes sense.
If there are instance methods, creating appropriate Extension Methods wrappers can help hide the "useless" dummy variables in a relatively seamless fashion.
Happy coding.
I might say I'm getting quite familiar with Code Contracts: I've read and understood most of the user manual and have been using them for quite a while now, but I still have questions. When I search SO for 'code contracts unproven' there are quite a few hits, all asking why their specific statement couldn't be statically proven. Although I could do the same and post my specific scenario (which is btw:
),
I'd much rather understand why any Code Contract condition can or can't be proven. Sometimes I'm impressed with what it can prove, and sometimes I'm... well... to say it politely: definately not impressed. If I want to understand this, I'd like to know the mechanisms the static checker uses. I'm sure I'll learn by experience, but I'm spraying Contract.Assume statements all over the place to make the warnings go away, and I feel like that's not what Code Contracts are meant for. Googling didn't help me, so I want to ask you guys for your experiences: what (unobvious) patterns have you seen? And what made you see the light?
The contract in your construction is not satisfied. Since you are referencing an object’s field (this.data), other threads may have access to the field and may change its value between the Assume and the first parameter resolution and the third parameter resolution. (e.i., they could be three completely different arrays.)
You should assign the array to a local variable, then use that variable throughout the method. Then the analyzer will know that the constraints are being satisfied, because no other threads will have the ability to change the reference.
var localData = this.data;
if (localData == null) return;
byte[] newData = new byte[localData.Length]; // Or whatever the datatype is.
Array.Copy(localData, newData, localData.Length); // Now, this cannot fail.
This has the added benifit of not only satisfying the constraint, but, in reality, making the code more robust in many cases.
I hope this leads you to the answer to your question. I could not actually answer your question directly, because I do not have access to a version of Visual Studio that includes the static checker. (I'm on VS2008 Pro.) My answer is based on what my own visual inspection of the code would conclude, and it appears that the static contract checker uses similar techniques. I am intreagued! I need to get me one of them. :-D
UPDATE: (Lots of speculation to follow)
Upon reflection, I think I can do a pretty good guess of what can or can't be proven (even without access to the static checker). As stated in the other answer, the static checker does not do interprocedural analysis. Therefore, with the looming possibility of multi-threaded variable accesses (as in the OP), the static checker can only deal effectively with local variables (as defined below).
By "local variables" I mean a variable that cannot be accessed by any other thread. This would include any variables declared in the method or passed as a parameter, unless the parameter is decorated with ref or out or the variable is captured in an anonymous method.
If a local variable is a value-type, then its fields are also local variables (and so on recursively).
If a local variable is a reference-type, then only the reference itself—not its fields—can be considered a local variable. This is true even of an object constructed within the method, since a constructor itself may leak a reference to the constructed object (say to a static collection for caching, for example).
So long as the static checker does not do any interprocedural analysis, any assumptions made about variables that are not local as defined above can be invalidated at any time, and, therefore, are ignored in the static analysis.
Exception 1: since strings and arrays are known by the runtime to be immutable, their properties (such as Length) are subject to analysis, so long as the string or array variable itself is local. This does not include the contents of an array which are mutable by other threads.
Exception 2: The array constructor may be known by the runtime not to leak any references to the constructed array. Therefore, an array that is constructed within the method body and not leaked outside of the method (passed as a parameter to another method, assigned to a non-local variable, etc.) has elements that may also be considered local variables.
These restrictions seem rather onerous, and I can imagine several ways this could be improved, but I don't know what has been done. Here are some other things that could, in theory, be done with the static checker. Someone who has it handy should check to see what has been done and what hasn't:
It could determine if a constructor does not leak any references to the object or its fields and consider the fields of any object so constructed to be local variables.
A no-leaks analysis could be done on other methods to determine whether a reference type passed to a method can still be considered local after that method invocation.
Variables decorated with ThreadStatic or ThreadLocal may be considered local variables.
Options could be given to ignore the possibility of using reflection to modify values. This would allow private readonly fields on reference types or static private readonly fields to be considered immutable. Also, when this option is enabled, a private or internal variable X that is only ever accessed inside a lock(X){ /**/ } construction and which is not leaked could be considered a local variable. However, these things would, in effect, reduce the reliability of the static checker, so that's kinda iffy.
Another possibility that could open up a lot of new analysis would be declaratively assigning variables and the methods that use them (and so on recursively) to a particular unique thread. This would be a major addition to the language, but it might be worth it.
The short answer is that the static code analyzer appears to be very limited. For instance, it does not detect
readonly string name = "I'm never null";
as being an invariant. From what I can gather on MSDN forums, it analyzes every method by itself (for performance reasons, not that one should think it could get much slower), which limits its knowledge when verifying the code.
To strike a balance between the academically lofty goal of proving correctness and being able to get work done, I've resorted to decorating individual methods (or even classes, as needed) with
[ContractVerification(false)]
rather than sprinkle the logic with lots of Assumes. This may not be best practice for using CC, but it does provide a way to get rid of warnings without unchecking any of the static checker options. In order not to lose pre/post-condition checks for such methods I generally add a stub with the desired conditions and then invoke the excluded method to perform the actual work.
My own assessment of Code Contracts is that it's great if you're only using the official framework libraries and do not have a lot of legacy code (e.g. when starting a new project). Anything else and it's a mixed bag of pleasure and pain.
I've spent my professional life as a C# developer. As a student I occasionally used C but did not deeply study it's compilation model. Recently I jumped on the bandwagon and have begun studying Objective-C. My first steps have only made me aware of holes in my pre-existing knowledge.
From my research, C/C++/ObjC compilation requires all encountered symbols to be pre-declared. I also understand that building is a two-step process. First you compile each individual source file into individual object files. These object files might have undefined "symbols" (which generally correspond to the identifiers declared in the header files). Second you link the object files together to form your final output. This is a pretty high-level explanation but it satisfies my curiosity enough. But I'd also like to have a similar high-level understanding of the C# build process.
Q: How does the C# build process get around the need for header files? I'd imagine perhaps the compilation step does two-passes?
(Edit: Follow up question here How do C/C++/Objective-C compare with C# when it comes to using libraries?)
UPDATE: This question was the subject of my blog for February 4th 2010. Thanks for the great question!
Let me lay it out for you. In the most basic sense the compiler is a "two pass compiler" because the phases that the compiler goes through are:
Generation of metadata.
Generation of IL.
Metadata is all the "top level" stuff that describes the structure of the code. Namespaces, classes, structs, enums, interfaces, delegates, methods, type parameters, formal parameters, constructors, events, attributes, and so on. Basically, everything except method bodies.
IL is all the stuff that goes in a method body -- the actual imperative code, rather than metadata about how the code is structured.
The first phase is actually implemented via a great many passes over the sources. It's way more than two.
The first thing we do is take the text of the sources and break it up into a stream of tokens. That is, we do lexical analysis to determine that
class c : b { }
is class, identifier, colon, identifier, left curly, right curly.
We then do a "top level parse" where we verify that the token streams define a grammaticaly-correct C# program. However, we skip parsing method bodies. When we hit a method body, we just blaze through the tokens until we get to the matching close curly. We'll come back to it later; we only care about getting enough information to generate metadata at this point.
We then do a "declaration" pass where we make notes about the location of every namespace and type declaration in the program.
We then do a pass where we verify that all the types declared have no cycles in their base types. We need to do this first because in every subsequent pass we need to be able to walk up type hierarchies without having to deal with cycles.
We then do a pass where we verify that all generic parameter constraints on generic types are also acyclic.
We then do a pass where we check whether every member of every type -- methods of classes, fields of structs, enum values, and so on -- is consistent. No cycles in enums, every overriding method overrides something that is actually virtual, and so on. At this point we can compute the "vtable" layouts of all interfaces, classes with virtual methods, and so on.
We then do a pass where we work out the values of all "const" fields.
At this point we have enough information to emit almost all the metadata for this assembly. We still do not have information about the metadata for iterator/anonymous function closures or anonymous types; we do those late.
We can now start generating IL. For each method body (and properties, indexers, constructors, and so on), we rewind the lexer to the point where the method body began and parse the method body.
Once the method body is parsed, we do an initial "binding" pass, where we attempt to determine the types of every expression in every statement. We then do a whole pile of passes over each method body.
We first run a pass to transform loops into gotos and labels.
(The next few passes look for bad stuff.)
Then we run a pass to look for use of deprecated types, for warnings.
Then we run a pass that searches for uses of anonymous types that we haven't emitted metadata for yet, and emit those.
Then we run a pass that searches for bad uses of expression trees. For example, using a ++ operator in an expression tree.
Then we run a pass that looks for all local variables in the body that are defined, but not used, to report warnings.
Then we run a pass that looks for illegal patterns inside iterator blocks.
Then we run the reachability checker, to give warnings about unreachable code, and tell you when you've done something like forgotten the return at the end of a non-void method.
Then we run a pass that verifies that every goto targets a sensible label, and that every label is targetted by a reachable goto.
Then we run a pass that checks that all locals are definitely assigned before use, notes which local variables are closed-over outer variables of an anonymous function or iterator, and which anonymous functions are in reachable code. (This pass does too much. I have been meaning to refactor it for some time now.)
At this point we're done looking for bad stuff, but we still have way more passes to go before we sleep.
Next we run a pass that detects missing ref arguments to calls on COM objects and fixes them. (This is a new feature in C# 4.)
Then we run a pass that looks for stuff of the form "new MyDelegate(Foo)" and rewrites it into a call to CreateDelegate.
Then we run a pass that transforms expression trees into the sequence of factory method calls necessary to create the expression trees at runtime.
Then we run a pass that rewrites all nullable arithmetic into code that tests for HasValue, and so on.
Then we run a pass that finds all references of the form base.Blah() and rewrites them into code which does the non-virtual call to the base class method.
Then we run a pass which looks for object and collection initializers and turns them into the appropriate property sets, and so on.
Then we run a pass which looks for dynamic calls (in C# 4) and rewrites them into dynamic call sites that use the DLR.
Then we run a pass that looks for calls to removed methods. (That is, partial methods with no actual implementation, or conditional methods that don't have their conditional compilation symbol defined.) Those are turned into no-ops.
Then we look for unreachable code and remove it from the tree. No point in codegenning IL for it.
Then we run an optimization pass that rewrites trivial "is" and "as" operators.
Then we run an optimization pass that looks for switch(constant) and rewrites it as a branch directly to the correct case.
Then we run a pass which turns string concatenations into calls to the correct overload of String.Concat.
(Ah, memories. These last two passes were the first things I worked on when I joined the compiler team.)
Then we run a pass which rewrites uses of named and optional parameters into calls where the side effects all happen in the correct order.
Then we run a pass which optimizes arithmetic; for example, if we know that M() returns an int, and we have 1 * M(), then we just turn it into M().
Then we do generation of the code for anonymous types first used by this method.
Then we transform anonymous functions in this body into methods of closure classes.
Finally, we transform iterator blocks into switch-based state machines.
Then we emit the IL for the transformed tree that we've just computed.
Easy as pie!
I see that there are multiple interpretations of the question. I answered the intra-solution interpretation, but let me fill it out with all the information I know.
The "header file metadata" is present in the compiled assemblies, so any assembly you add a reference to will allow the compiler to pull in the metadata from those.
As for things not yet compiled, part of the current solution, it will do a two-pass compilation, first reading namespaces, type names, member names, ie. everything but the code. Then when this checks out, it will read the code and compile that.
This allows the compiler to know what exists and what doesn't exist (in its universe).
To see the two-pass compiler in effect, test the following code that has 3 problems, two declaration-related problems, and one code problem:
using System;
namespace ConsoleApplication11
{
class Program
{
public static Stringg ReturnsTheWrongType()
{
return null;
}
static void Main(string[] args)
{
CallSomeMethodThatDoesntExist();
}
public static Stringg AlsoReturnsTheWrongType()
{
return null;
}
}
}
Note that the compiler will only complain about the two Stringg types that it cannot find. If you fix those, then it complains about the method-name called in the Main method, that it cannot find.
It uses the metadata from the reference assemblies. That contains a full type declaration, same thing as you'd find in a header file.
It being a two-pass compiler accomplishes something else: you can use a type in one source file before it is declared in another source code file.
It's a 2-pass compiler. http://en.wikipedia.org/wiki/Multi-pass_compiler
All the necessary information can be obtained from the referenced assemblies.
So there are no header files but the compiler does need access to the DLL's being used.
And yes, it is a 2-pass compiler but that doesn't explain how it gets information about library types.