Global vs. Member functions - c#

I've been playing around with c++ lately and wondered why there were so many global functions. Then I started thinking about programming in c# and how member functions are stored, so I guess my question is if I have a:
public class Foo {
public void Bar() { ... }
}
and then I do something silly like adding 1,000,000 Foo's to a list; does this mean I have 1,000,000 Foo objects sitting in memory each with there own Bar() function? Or does something much more clever happen?
Thanks.

Nope, there is only one instance. All instances of a class point to an object that contains all the instance methods that take an implicit first parameter affectionately called this. When you invoke an instance method on an instance the this pointer for that instance is passed as the first parameter to that method. That is how the method knows all the instance fields and properties for that instance.
For details see CLR via C#.
This is, of course, complicated by virtual methods. CLR via C# will spell out the distinction for you and is highly recommended if you are interested in this subject. Either way, there is still only one instance of each instance method. The issue is just how these methods are resolved.

An instance method is simply a static method with a hidden this parameter.
(virtual methods are a little more complicated)

In C++ a member function doesn't normally require any per-object storage (an exception - virtual functions - is discussed in the next paragraph). Normally, at each point where the function is used the compiler generates CPU-specific machine code to directly call that function, and for inline functions the call may be avoided and the function's affect may be optimally integrated into the caller's code (which can be ~10x faster for small functions such as "getters and setters" that simply read or write one member variable).
For those classes that have one or more virtual functions, each object will have one extra pointer to a per-class table of function pointers and other information. Thus, each object grows by the size of a pointer - typically 4 or 8 bytes.
Addressing your original observation: C++ has more non-member functions (usually in the std namespace), but a namespace serves this purpose better than a class anyway. Indeed, namespaces are effectively logical interfaces for "static" functions and data that can span many "physical" header files. Why should the logical API of a program be compromised by considerations related to physical files and their implications to build times, file-modification-timestamp triggered make tools etc? In trivial cases where the namespace is in one header, C++ could use a class or struct to scope the same declarations, but that's less convenient as it prevents the use of namespace aliases, using namespaces, and Koenig lookup for implicitly searching namespaces matching a function arguments' namespaces - forcing very explicit prefixing at every point of use. It also gives the false impression that the user is intended to instantiate an object from the content.

Related

Why it is possible to assign partial methods to delegates in spite of other constraints?

I know the title might not be totally clear, but I didn't want to make it too long.
One thing boggles me when thinking about restrictions placed on partial methods. It seems to me that the rules are inconsistent. As you probably know:
Partial methods must always have a return type of void, and they
cannot have any parameters marked with the out modifier. These
restrictions are in place because at run time, the method may not
exist and so you can’t initialize a variable to what the method might
return because the method might not exist. Similarly, you can’t have
an out parameter because the method would have to initialize it and
the method might not exist. [1]
It sounds sensible to me. But at the same time:
If there is no implementing partial method declaration, then you
cannot have any code that attempts to create a delegate that refers
to the partial method. Again, the reason is that the method doesn’t
exist at run time. [1]
At first, all these rules seem to follow the same compiler logic. There is a difference, though. As stated in the second quotation, compiler issues an error only when there is no method implementation of the partial method. Why can't it also check for the implementation at compile-time in other scenarios? This would allow much more flexibility when using partial methods and the logic behind all rules would be identical.
I am afraid that the only answer I can get is "Because that's how it was implemented", but maybe there is something more to it?
[1] CLR via C#, Fourth Edition
The whole point of partial methods is to facilitate extensibility in generated code scenarios without runtime performance impact. The code generator emits a partial method signature, and then emits code that calls this partial method.
Now, at compile time, if the method isn't implemented, these call sites get entirely removed, and the remaining code has to be valid. The quotes are confusing in this regard, because they're talking about "runtime existence" of the method. That's nonsense: everything is resolved at compile time.
This is the reason for the difference: the rules you quoted first impose restrictions on the method signature, whereas the delegate rule imposes a restriction on method usage.
The rules about the signature make sure you can call the method in a manner the code will remain valid if the method call is removed. And the intended use case for partial methods is for them to be absent 99% of the time. If you require them to be implemented then that's not the feature you should be using in the first place. Use an abstract method or something alike.
Building a delegate to a method is like taking the pointer to the method, and you can't do that if the method doesn't exist (well, I suppose you could argue the compiler could just replace the delegate with null at this point, but you can't write, say, new Action(null)), yet there is no reason to disallow it if the method does exist, for the implementer's convenience. But the code generator shouldn't emit code that creates delegates to the method.
Keep in mind that the reason for partial methods is so they can be used in designer-generated code, i.e. the designer can generate calls to the methods and leave their (optional) implementation to a human developer (who will later compile the code). If, as you suggest, the calls gave compile-time errors when the method is not implemented, then there are cases where the generated code would not compile until the developer either changed the generated code (usually a bad idea) or until he implemented all the methods.
C# already has interfaces and abstract methods to force method implementation, partial methods try to do something else.

Does c# has any concept of copy control similar to c++

In C++, we have Copy Constructor, Destructors, overloaded = which are together called copy control.
My questions are:
Is copy constructor used in C# for initializing objects when passed to a function as argument or when initializing (not assigning) or when returning an object from a function as in C++?
Does an implicitly overloaded = operator function gets called when we assign (not initialize) any object to another object of the same type?
No: reference objects are passed by reference, while value objects are copied byte-for-byte. You can make your own "copy constructors", but you would be responsible for calling them explicitly.
You cannot overload the assignment operator
In fairness to C#, none of the intricacies of the C++ copy control are necessary because of the underlying garbage-collected memory model. For example, you do not need to control the ownership of dynamically allocated objects when copying objects, because you are free to have as many references to dynamic objects as you wish.
In C# you cannot overload the assignment operator.
Destructors don't really make sense in a managed memory world (most of the time). There is actually an equivalent (finalizers) but you should very rarely need to utilize them.
Copy constructors are made for certain classes, but it's not done nearly as often as in C++. It will never be called implicitly by the language, it will only be used when you manually call it.
No. If you want to provide copying mechanism, you do this by implementing the ICloneable interface: http://msdn.microsoft.com/en-us/library/system.icloneable.aspx
C# has no concept of a copy constructor, nor of overloading operator=. In C# objects just exist and you use handles to them in your code. Handles can be copied "by value" to be used throughout your code, but that's a so-called "shallow" copy, the object itself is still the same, they all point to the same memory (indirectly).
Copy constructors are akin to deep copying in the managed world, and you can achieve that through ICloneable, but it's completely manual and up to the object's implementation to write it. You can also achieve it through serialization (the boost way) or through any manner of indirect ways.
As a final point, since object lifetimes are non-deterministic (they die when the GC arbitrarily decides they should die), there's no such thing as destructors either. The closest thing are finalizers (called non-deterministically when your object gets collected) and the IDisposable pattern (along with using) which give you a semblance of control over finalization. Needless to say they are rarely used.
Edit: I should point out that, while copy constructors have no equivalent, you do have "type casting" constructors through implicit, the precise name escapes me at the moment.

Do extension methods benefit in any practical way from being a part of a static class vs. [theoretically] a part of a namespace?

When it comes to extension methods class names seem to do nothing, but provide a grouping which is what name-spaces do. As soon as I include the namespace I get all the extension methods in the namespace. So my question comes down to this: Is there some value I can get from the extension methods being in the static class?
I realize it is a compiler requirement for them to be put into a static class, but it seems like from an organizational perspective it would be reasonable for it to be legal to allow extension methods to be defined in name-spaces without classes surrounding them. Rephrasing the above question another way: Is there any practical benefit or help in some scenario I get as a developer from having extension methods attached to the class vs. attached to the namespace?
I'm basically just looking to gain some intuition, confirmation, or insight - I suspect it's may be that it was easiest to implement extension methods that way and wasn't worth the time to allow extension methods to exist on their own in name-spaces.
Perhaps you will find a satisfactory answer in Eric Lippert's blog post Why Doesn't C# Implement "Top Level" Methods? (in turn prompted by SO question Why C# is not allowing non-member functions like C++), whence (my emphasis):
I am asked "why doesn't C# implement feature X?" all the time. The
answer is always the same: because no one ever designed, specified,
implemented, tested, documented and shipped that feature. All six of
those things are necessary to make a feature happen. All of them cost
huge amounts of time, effort and money. Features are not cheap, and we
try very hard to make sure that we are only shipping those features
which give the best possible benefits to our users given our
constrained time, effort and money budgets.
I understand that such a general answer probably does not address the
specific question.
In this particular case, the clear user benefit was in the past not
large enough to justify the complications to the language which would
ensue. By restricting how different language entities nest inside each
other we (1) restrict legal programs to be in a common, easily
understood style, and (2) make it possible to define "identifier
lookup" rules which are comprehensible, specifiable, implementable,
testable and documentable.
By restricting method bodies to always be inside a struct or class, we make it easier to reason about the meaning of an unqualified
identifier used in an invocation context; such a thing is always an
invocable member of the current type (or a base type).
To me putting them in the class is all about grouping related functions inside a class. You may have a number of extension methods in the same namespace. If I wanted to write some extension methods for the DirectoryInfo and FileInfo classes I would create two classes in an IO namespace called DirectoryInfoExtensions and FileInfoExtensions.
You can still call the extension methods like you would any other static method. I dont know how the compiler works but perhaps the output assembly if compiled for .net 2 can still be used by legacy .net frameworks. It also means the existing reflection library can work and be used to run extension methods without any changes. Again I am no compiler expert but I think the "this" keyword in the context of an extension method is to allow for syntactical sugar that allows us to use the methods as though they belong to the object.
The .NET Framework requires that every method exist in a class which is within an assembly. A language could allow methods or fields to be declared without an explicitly-specified enclosing class, place all such methods in assembly Fnord into a class called Fnord_TopLevelDefault, and then search the Fnord_TopLevelDefault class of all assemblies when performing method lookup; the CLS specification would have to be extended for this feature to work smoothly for mixed-language projects, however. As with extension methods, such behavior could be CLS compliant if the CLS didn't acknowledge it, since code in a language which didn't use such a feature could use a "free-floating" method Foo in assembly Fnord by spelling it Fnord_TopLevelDefault.Foo, but that would be a bit ugly.
A more interesting question is the extent to which allowing an extension method Foo to be invoked from an arbitrary class without requiring a clearly visible reference to that class is less evil than would be allowing a non-extension static methods to be likewise invoked. I don't think Math.Sqrt(x) is really more readable than Sqrt; even if one didn't want to import Math everywhere, being able to do so at least locally could in some cases improve code legibility considerably.
They can reference other static class members internally.
You should not only consider the consumer side aspect, but also the code maintenance aspect.
Even though intellisense doesn't distinguish with respect to the owner class, the information is still there through tool tips and whatever productivity tools you have added to your IDE. This can easily be used to provide some context for the method in what otherwise would be a flat (and sometimes very long) list.
Consumer wise, bottom line, I do not think it matters much.

How does C# compilation get around needing header files?

I've spent my professional life as a C# developer. As a student I occasionally used C but did not deeply study it's compilation model. Recently I jumped on the bandwagon and have begun studying Objective-C. My first steps have only made me aware of holes in my pre-existing knowledge.
From my research, C/C++/ObjC compilation requires all encountered symbols to be pre-declared. I also understand that building is a two-step process. First you compile each individual source file into individual object files. These object files might have undefined "symbols" (which generally correspond to the identifiers declared in the header files). Second you link the object files together to form your final output. This is a pretty high-level explanation but it satisfies my curiosity enough. But I'd also like to have a similar high-level understanding of the C# build process.
Q: How does the C# build process get around the need for header files? I'd imagine perhaps the compilation step does two-passes?
(Edit: Follow up question here How do C/C++/Objective-C compare with C# when it comes to using libraries?)
UPDATE: This question was the subject of my blog for February 4th 2010. Thanks for the great question!
Let me lay it out for you. In the most basic sense the compiler is a "two pass compiler" because the phases that the compiler goes through are:
Generation of metadata.
Generation of IL.
Metadata is all the "top level" stuff that describes the structure of the code. Namespaces, classes, structs, enums, interfaces, delegates, methods, type parameters, formal parameters, constructors, events, attributes, and so on. Basically, everything except method bodies.
IL is all the stuff that goes in a method body -- the actual imperative code, rather than metadata about how the code is structured.
The first phase is actually implemented via a great many passes over the sources. It's way more than two.
The first thing we do is take the text of the sources and break it up into a stream of tokens. That is, we do lexical analysis to determine that
class c : b { }
is class, identifier, colon, identifier, left curly, right curly.
We then do a "top level parse" where we verify that the token streams define a grammaticaly-correct C# program. However, we skip parsing method bodies. When we hit a method body, we just blaze through the tokens until we get to the matching close curly. We'll come back to it later; we only care about getting enough information to generate metadata at this point.
We then do a "declaration" pass where we make notes about the location of every namespace and type declaration in the program.
We then do a pass where we verify that all the types declared have no cycles in their base types. We need to do this first because in every subsequent pass we need to be able to walk up type hierarchies without having to deal with cycles.
We then do a pass where we verify that all generic parameter constraints on generic types are also acyclic.
We then do a pass where we check whether every member of every type -- methods of classes, fields of structs, enum values, and so on -- is consistent. No cycles in enums, every overriding method overrides something that is actually virtual, and so on. At this point we can compute the "vtable" layouts of all interfaces, classes with virtual methods, and so on.
We then do a pass where we work out the values of all "const" fields.
At this point we have enough information to emit almost all the metadata for this assembly. We still do not have information about the metadata for iterator/anonymous function closures or anonymous types; we do those late.
We can now start generating IL. For each method body (and properties, indexers, constructors, and so on), we rewind the lexer to the point where the method body began and parse the method body.
Once the method body is parsed, we do an initial "binding" pass, where we attempt to determine the types of every expression in every statement. We then do a whole pile of passes over each method body.
We first run a pass to transform loops into gotos and labels.
(The next few passes look for bad stuff.)
Then we run a pass to look for use of deprecated types, for warnings.
Then we run a pass that searches for uses of anonymous types that we haven't emitted metadata for yet, and emit those.
Then we run a pass that searches for bad uses of expression trees. For example, using a ++ operator in an expression tree.
Then we run a pass that looks for all local variables in the body that are defined, but not used, to report warnings.
Then we run a pass that looks for illegal patterns inside iterator blocks.
Then we run the reachability checker, to give warnings about unreachable code, and tell you when you've done something like forgotten the return at the end of a non-void method.
Then we run a pass that verifies that every goto targets a sensible label, and that every label is targetted by a reachable goto.
Then we run a pass that checks that all locals are definitely assigned before use, notes which local variables are closed-over outer variables of an anonymous function or iterator, and which anonymous functions are in reachable code. (This pass does too much. I have been meaning to refactor it for some time now.)
At this point we're done looking for bad stuff, but we still have way more passes to go before we sleep.
Next we run a pass that detects missing ref arguments to calls on COM objects and fixes them. (This is a new feature in C# 4.)
Then we run a pass that looks for stuff of the form "new MyDelegate(Foo)" and rewrites it into a call to CreateDelegate.
Then we run a pass that transforms expression trees into the sequence of factory method calls necessary to create the expression trees at runtime.
Then we run a pass that rewrites all nullable arithmetic into code that tests for HasValue, and so on.
Then we run a pass that finds all references of the form base.Blah() and rewrites them into code which does the non-virtual call to the base class method.
Then we run a pass which looks for object and collection initializers and turns them into the appropriate property sets, and so on.
Then we run a pass which looks for dynamic calls (in C# 4) and rewrites them into dynamic call sites that use the DLR.
Then we run a pass that looks for calls to removed methods. (That is, partial methods with no actual implementation, or conditional methods that don't have their conditional compilation symbol defined.) Those are turned into no-ops.
Then we look for unreachable code and remove it from the tree. No point in codegenning IL for it.
Then we run an optimization pass that rewrites trivial "is" and "as" operators.
Then we run an optimization pass that looks for switch(constant) and rewrites it as a branch directly to the correct case.
Then we run a pass which turns string concatenations into calls to the correct overload of String.Concat.
(Ah, memories. These last two passes were the first things I worked on when I joined the compiler team.)
Then we run a pass which rewrites uses of named and optional parameters into calls where the side effects all happen in the correct order.
Then we run a pass which optimizes arithmetic; for example, if we know that M() returns an int, and we have 1 * M(), then we just turn it into M().
Then we do generation of the code for anonymous types first used by this method.
Then we transform anonymous functions in this body into methods of closure classes.
Finally, we transform iterator blocks into switch-based state machines.
Then we emit the IL for the transformed tree that we've just computed.
Easy as pie!
I see that there are multiple interpretations of the question. I answered the intra-solution interpretation, but let me fill it out with all the information I know.
The "header file metadata" is present in the compiled assemblies, so any assembly you add a reference to will allow the compiler to pull in the metadata from those.
As for things not yet compiled, part of the current solution, it will do a two-pass compilation, first reading namespaces, type names, member names, ie. everything but the code. Then when this checks out, it will read the code and compile that.
This allows the compiler to know what exists and what doesn't exist (in its universe).
To see the two-pass compiler in effect, test the following code that has 3 problems, two declaration-related problems, and one code problem:
using System;
namespace ConsoleApplication11
{
class Program
{
public static Stringg ReturnsTheWrongType()
{
return null;
}
static void Main(string[] args)
{
CallSomeMethodThatDoesntExist();
}
public static Stringg AlsoReturnsTheWrongType()
{
return null;
}
}
}
Note that the compiler will only complain about the two Stringg types that it cannot find. If you fix those, then it complains about the method-name called in the Main method, that it cannot find.
It uses the metadata from the reference assemblies. That contains a full type declaration, same thing as you'd find in a header file.
It being a two-pass compiler accomplishes something else: you can use a type in one source file before it is declared in another source code file.
It's a 2-pass compiler. http://en.wikipedia.org/wiki/Multi-pass_compiler
All the necessary information can be obtained from the referenced assemblies.
So there are no header files but the compiler does need access to the DLL's being used.
And yes, it is a 2-pass compiler but that doesn't explain how it gets information about library types.

Method can be made static, but should it?

ReSharper likes to point out multiple functions per ASP.NET page that could be made static. Does it help me if I do make them static? Should I make them static and move them to a utility class?
Performance, namespace pollution etc are all secondary in my view. Ask yourself what is logical. Is the method logically operating on an instance of the type, or is it related to the type itself? If it's the latter, make it a static method. Only move it into a utility class if it's related to a type which isn't under your control.
Sometimes there are methods which logically act on an instance but don't happen to use any of the instance's state yet. For instance, if you were building a file system and you'd got the concept of a directory, but you hadn't implemented it yet, you could write a property returning the kind of the file system object, and it would always be just "file" - but it's logically related to the instance, and so should be an instance method. This is also important if you want to make the method virtual - your particular implementation may need no state, but derived classes might. (For instance, asking a collection whether or not it's read-only - you may not have implemented a read-only form of that collection yet, but it's clearly a property of the collection itself, not the type.)
Static methods versus Instance methods
Static and instance members of the C# Language Specification explains the difference. Generally, static methods can provide a very small performance enhancement over instance methods, but only in somewhat extreme situations (see this answer for some more details on that).
Rule CA1822 in FxCop or Code Analysis states:
"After [marking members as static], the compiler will emit non-virtual call sites to these members which will prevent a check at
runtime for each call that ensures the current object pointer is
non-null. This can result in a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue."
Utility Class
You shouldn't move them to a utility class unless it makes sense in your design. If the static method relates to a particular type, like a ToRadians(double degrees) method relates to a class representing angles, it makes sense for that method to exist as a static member of that type (note, this is a convoluted example for the purposes of demonstration).
Marking a method as static within a class makes it obvious that it doesn't use any instance members, which can be helpful to know when skimming through the code.
You don't necessarily have to move it to another class unless it's meant to be shared by another class that's just as closely associated, concept-wise.
I'm sure this isn't happening in your case, but one "bad smell" I've seen in some code I've had to suffer through maintaining used a heck of a lot of static methods.
Unfortunately, they were static methods that assumed a particular application state. (why sure, we'll only have one user per application! Why not have the User class keep track of that in static variables?) They were glorified ways of accessing global variables. They also had static constructors (!), which are almost always a bad idea. (I know there are a couple of reasonable exceptions).
However, static methods are quite useful when they factor out domain-logic that doesn't actually depend on the state of an instance of the object. They can make your code a lot more readable.
Just be sure you're putting them in the right place. Are the static methods intrusively manipulating the internal state of other objects? Can a good case be made that their behavior belongs to one of those classes instead? If you're not separating concerns properly, you may be in for headaches later.
This is interesting read:
http://thecuttingledge.com/?p=57
ReSharper isn’t actually suggesting you make your method static.
You should ask yourself why that method is in that class as opposed to, say, one of the classes that shows up in its signature...
but here is what ReSharper documentaion says:
http://confluence.jetbrains.net/display/ReSharper/Member+can+be+made+static
Just to add to #Jason True's answer, it is important to realise that just putting 'static' on a method doesn't guarantee that the method will be 'pure'. It will be stateless with regard to the class in which it is declared, but it may well access other 'static' objects which have state (application configuration etc.), this may not always be a bad thing, but one of the reasons that I personally tend to prefer static methods when I can is that if they are pure, you can test and reason about them in isolation, without having to worry about the surrounding state.
For complex logic within a class, I have found private static methods useful in creating isolated logic, in which the instance inputs are clearly defined in the method signature and no instance side-effects can occur. All outputs must be via return value or out/ref parameters. Breaking down complex logic into side-effect-free code blocks can improve the code's readability and the development team's confidence in it.
On the other hand it can lead to a class polluted by a proliferation of utility methods. As usual, logical naming, documentation, and consistent application of team coding conventions can alleviate this.
You should do what is most readable and intuitive in a given scenario.
The performance argument is not a good one except in the most extreme situations as the only thing that is actually happening is that one extra parameter (this) is getting pushed onto the stack for instance methods.
ReSharper does not check the logic. It only checks whether the method uses instance members.
If the method is private and only called by (maybe just one) instance methods this is a sign to let it an instance method.
I hope you have already understood the difference between static and instance methods. Also, there can be a long answer and a short one. Long answers are already provided by others.
My short answer: Yes, you can convert them to static methods as ReSharper suggests. There is no harm in doing so. Rather, by making the method static, you are actually guarding the method so that you do not unnecessarily slip any instance members into that method. In that way, you can achieve an OOP principle "Minimize the accessibility of classes and members".
When ReSharper is suggesting that an instance method can be converted to a static one, it is actually telling you, "Why the .. this method is sitting in this class but it is not actually using any of its states?" So, it gives you food for thought. Then, it is you who can realize the need for moving that method to a static utility class or not. According to the SOLID principles, a class should have only one core responsibility. So, you can do a better cleanup of your classes in that way. Sometimes, you do need some helper methods even in your instance class. If that is the case, you may keep them within a #region helper.
If the functions are shared across many pages, you could also put them in a base page class, and then have all asp.net pages using that functionality inherit from it (and the functions could still be static as well).
Making a method static means you can call the method from outside the class without first creating an instance of that class. This is helpful when working with third-party vendor objects or add-ons. Imagine if you had to first create a Console object "con" before calling con.Writeline();
It helps to control namespace pollution.
Just my tuppence: Adding all of the shared static methods to a utility class allows you to add
using static className;
to your using statements, which makes the code faster to type and easier to read. For example, I have a large number of what would be called "global variables" in some code I inherited. Rather than make global variables in a class that was an instance class, I set them all as static properties of a global class. It does the job, if messily, and I can just reference the properties by name because I have the static namespace already referenced.
I have no idea if this is good practice or not. I have so much to learn about C# 4/5 and so much legacy code to refactor that I am just trying to let the Roselyn tips guide me.
Joey

Categories