C# Code Contracts: What can be statically proven and what can't? - c#

I might say I'm getting quite familiar with Code Contracts: I've read and understood most of the user manual and have been using them for quite a while now, but I still have questions. When I search SO for 'code contracts unproven' there are quite a few hits, all asking why their specific statement couldn't be statically proven. Although I could do the same and post my specific scenario (which is btw:
),
I'd much rather understand why any Code Contract condition can or can't be proven. Sometimes I'm impressed with what it can prove, and sometimes I'm... well... to say it politely: definately not impressed. If I want to understand this, I'd like to know the mechanisms the static checker uses. I'm sure I'll learn by experience, but I'm spraying Contract.Assume statements all over the place to make the warnings go away, and I feel like that's not what Code Contracts are meant for. Googling didn't help me, so I want to ask you guys for your experiences: what (unobvious) patterns have you seen? And what made you see the light?

The contract in your construction is not satisfied. Since you are referencing an object’s field (this.data), other threads may have access to the field and may change its value between the Assume and the first parameter resolution and the third parameter resolution. (e.i., they could be three completely different arrays.)
You should assign the array to a local variable, then use that variable throughout the method. Then the analyzer will know that the constraints are being satisfied, because no other threads will have the ability to change the reference.
var localData = this.data;
if (localData == null) return;
byte[] newData = new byte[localData.Length]; // Or whatever the datatype is.
Array.Copy(localData, newData, localData.Length); // Now, this cannot fail.
This has the added benifit of not only satisfying the constraint, but, in reality, making the code more robust in many cases.
I hope this leads you to the answer to your question. I could not actually answer your question directly, because I do not have access to a version of Visual Studio that includes the static checker. (I'm on VS2008 Pro.) My answer is based on what my own visual inspection of the code would conclude, and it appears that the static contract checker uses similar techniques. I am intreagued! I need to get me one of them. :-D
UPDATE: (Lots of speculation to follow)
Upon reflection, I think I can do a pretty good guess of what can or can't be proven (even without access to the static checker). As stated in the other answer, the static checker does not do interprocedural analysis. Therefore, with the looming possibility of multi-threaded variable accesses (as in the OP), the static checker can only deal effectively with local variables (as defined below).
By "local variables" I mean a variable that cannot be accessed by any other thread. This would include any variables declared in the method or passed as a parameter, unless the parameter is decorated with ref or out or the variable is captured in an anonymous method.
If a local variable is a value-type, then its fields are also local variables (and so on recursively).
If a local variable is a reference-type, then only the reference itself—not its fields—can be considered a local variable. This is true even of an object constructed within the method, since a constructor itself may leak a reference to the constructed object (say to a static collection for caching, for example).
So long as the static checker does not do any interprocedural analysis, any assumptions made about variables that are not local as defined above can be invalidated at any time, and, therefore, are ignored in the static analysis.
Exception 1: since strings and arrays are known by the runtime to be immutable, their properties (such as Length) are subject to analysis, so long as the string or array variable itself is local. This does not include the contents of an array which are mutable by other threads.
Exception 2: The array constructor may be known by the runtime not to leak any references to the constructed array. Therefore, an array that is constructed within the method body and not leaked outside of the method (passed as a parameter to another method, assigned to a non-local variable, etc.) has elements that may also be considered local variables.
These restrictions seem rather onerous, and I can imagine several ways this could be improved, but I don't know what has been done. Here are some other things that could, in theory, be done with the static checker. Someone who has it handy should check to see what has been done and what hasn't:
It could determine if a constructor does not leak any references to the object or its fields and consider the fields of any object so constructed to be local variables.
A no-leaks analysis could be done on other methods to determine whether a reference type passed to a method can still be considered local after that method invocation.
Variables decorated with ThreadStatic or ThreadLocal may be considered local variables.
Options could be given to ignore the possibility of using reflection to modify values. This would allow private readonly fields on reference types or static private readonly fields to be considered immutable. Also, when this option is enabled, a private or internal variable X that is only ever accessed inside a lock(X){ /**/ } construction and which is not leaked could be considered a local variable. However, these things would, in effect, reduce the reliability of the static checker, so that's kinda iffy.
Another possibility that could open up a lot of new analysis would be declaratively assigning variables and the methods that use them (and so on recursively) to a particular unique thread. This would be a major addition to the language, but it might be worth it.

The short answer is that the static code analyzer appears to be very limited. For instance, it does not detect
readonly string name = "I'm never null";
as being an invariant. From what I can gather on MSDN forums, it analyzes every method by itself (for performance reasons, not that one should think it could get much slower), which limits its knowledge when verifying the code.
To strike a balance between the academically lofty goal of proving correctness and being able to get work done, I've resorted to decorating individual methods (or even classes, as needed) with
[ContractVerification(false)]
rather than sprinkle the logic with lots of Assumes. This may not be best practice for using CC, but it does provide a way to get rid of warnings without unchecking any of the static checker options. In order not to lose pre/post-condition checks for such methods I generally add a stub with the desired conditions and then invoke the excluded method to perform the actual work.
My own assessment of Code Contracts is that it's great if you're only using the official framework libraries and do not have a lot of legacy code (e.g. when starting a new project). Anything else and it's a mixed bag of pleasure and pain.

Related

Why can I call this static method using the class name but not using a class instance?

class Program
{
static void Main(string[] args)
{
var p = new Program();
p.Main(args);//instance reference error,use type name instead
var p = new Program();
Program.Main(args);//error disappears
}
}
I think I understand that statics are not associated with object instances, but what I'm having trouble with is aren't classes synonymous with objects? Or aren't classes used in creating objects? So why does the error disappear when I use the class name if classes are essentially objects?
I get that I haven't yet created an instance of Main and won't be. Is that the only thing that makes the difference? Maybe it's just not being explained properly in this class I'm taking.
Your confusion is a very natural one, and it is exacerbated by the design of C# in this respect. I'll try to explain as we go, and I'll reformulate your questions to be easier to answer:
Is class synonymous with object?
No. Let's be very, very clear on this point. "Object" has a specific meaning in C#. An object is always an instance of a type. There are two broad kinds of object in C#: reference types which are copied by reference, like string and value types which are copied by value, like int.
Later you will learn about boxing, which is the mechanism by which an instance of value type may be used in a context that expects a reference, but don't worry about that for now.
In C# a class defines a reference type. Instances of that class are objects. The class itself is not an object.
The justification for this comes from the real world. The class "all objects which are newspapers" is not itself a newspaper. The class "all people who speak French" is not itself a French speaker. It is a category error to confuse a description of a set of things with a specific example of the thing!
(You may wish to examine closely the design of prototype inheritance languages such as JavaScript. In JS we make a specific object that is the prototypical example of a kind of thing, and we make a constructor object that represents the factory for new examples of that kind of thing; the prototype and the constructor work together to make new instances, and both are genuinely objects. But again, your question is about C#, so let's stick to that for now.)
are classes used in creating objects?
Yes. We instantiate a class with new; since all classes are reference types, new produces a reference to a novel object.
So why does the error disappear when i use the class name, if classes are essentially objects?
Classes are not objects, but I understand your confusion. It certainly looks like the class name is being used in a context where you would expect an object. (You might be interested to examine closely the design of languages like Python where classes are objects, but your question is about C# so let's stick to that.)
To resolve this confusion you need to understand that the member access operator, also called the "dot operator", in C# is one of the most flexible and sophisticated operators. This makes it easy to use but hard to understand!
The key thing to understand is that the member access operator always has this form:
On the left of the dot is an expression that evaluates to a thing that has members
On the right of the dot is a simple name.
Though it is possible, and common, for thing to be an object and thing.name to be an object, it is also possible for either or both to not be an object.
When you say p.Main the compiler says "I know that p is an instance of Program, and I know that Main is a name. Does that make sense?"
The first thing the compiler does is verifies that Program -- p's type -- has an accessible member Main, which it does. At this point overload resolution takes over, and we discover that the only possible meaning of Main is a static method. This is likely a mistake because p is an instance, and we're attempting to invoke a static through the instance. C#'s designers could have allowed this -- it is allowed in other languages. But since this is a likely mistake, they disallowed it.
When you type Program.Main, Program is not an object. The compiler verifies that Program refers to a type and types have members. Once again, overload resolution takes over and it determines that the only possible meaning is that Main is being invoked. Since Main is static and the receiver -- the thing on the left of the dot -- refers to a type, this is allowed.
Maybe it's just not being explained properly in this class I'm taking.
I edit technical books and other course materials and a great many of them explain these concepts very poorly. Also a great many instructors have vague and confused notions about the relationships between classes, objects, variables, and so on. I encourage you to question your instructor closely on these matters until you are satisfied with their explanations.
That said, once you have a solid grasp on these matters then you can start to take shortcuts. As expert C# programmers we say "p is an object which..." because we all know that we mean "p is a variable, the value of which is a reference to an object which..."
I think it is helpful for the beginner to spell it out, but you will very quickly become more relaxed about it.
One other thing that you did not ask but is important:
What about reflection?
.NET has a reflection system which allows you to take things that are not objects, like classes, structs, interfaces, methods, properties, events, and so on, and obtain an object which describes them. (The analogy being that a mirror image is not reality but it sure looks like it enough to understand reality.)
It is important to remember that the reflection object is not the class. It is an object which describes the class. If you use reflection in your program like this:
Type t = typeof(Program);
then the value of t is a reference to the Type object that describes the characteristics of class Program. You could inspect that object and determine that there was a MethodInfo for method Main, and so on. But the object is not the class. You cannot say
t.Main();
for example. There are ways to invoke methods via reflection, but it is a mistake to think of the Type object as being the class. It reflects the class.
Another question you did not ask but is germane to your education:
What you're saying here is that values are instances of objects, but certain programming language constructs such as classes are not objects that can be manipulated like values. Why is it that some programming language constructs in C# are "first class" -- they can be treated as data manipulated by the program -- and some are "second class", and cannot be so manipulated?
That question gets to the crux of language design itself. All language design is a process of examining past languages, observing their strengths and weaknesses, coming up with principles that attempt to build on strengths while mitigating weaknesses, and then resolving the countless contradictions entailed when principles come into conflict with each other.
We all want a camera that is lightweight, inexpensive, and takes great pictures, but as the saying goes, you can only have two. The designers of C# were in a similar position:
We want languages to have a small number of concepts that must be understood by the novice. Moreover, we achieve this by unifying disparate concepts into hierarchies; structs, classes, interfaces, delegates and enums are all types. if, while, foreach are all statements. And so on.
We want to be able to build programs that manipulate values that are important to the developer in powerful ways. Making functions "first class" for example opens up powerful new ways to program.
We want programs to have little enough redundancy that developers do not feel like they are forced into unnecessary ceremony, but sufficient redundancy that humans can understand the program as written.
And now the big ones:
We want the language to be general enough to allow line-of-business developers to write programs that represent any concept in their business domain
We want the language to be understandable by machines to the extent that the machines can find likely problems before the program even runs. That is, the language must be statically analyzable.
But like the camera, you can't have all of these at once. C# is a general-purpose programming language designed for programming in the large with a strong static type checker for finding real-world mistakes before they make it into production. It is precisely because we want to find the sort of error you're encountering that we do not allow types to be treated as first-class values. If you treat types as first class then huge amounts of static analysis capability goes out the window!
Other language designers made completely different choices; the consequences of Python saying "classes are a kind of function, and functions are a kind of object" dramatically moves the language towards our desired goal of hierarchical simplicity and first-class treatment of language concepts, and dramatically away from ability to statically determine correctness. That's the right choice for Python, but not for C#.
Static methods in classes are meant to be connected to the class. Other methods are meant to be connected to objects. This way, you can access static methods without having to create an object. In c++, this the boilerplate code you would use:
className::staticMethod();

How do I implement caching in an immutable way?

I've read and heard a lot of good things about immutability, so I decided to try it out in one of my hobby projects. I declared all of my fields as readonly, and made all methods that would usually mutate an object to return a new, modified version.
It worked great until I ran into a situation where a method should, by external protocol, return a certain information about an object without modifying it, but at the same time could be optimized by modifying the internal structure. In particular, this happens with tree path compression in a union find algorithm.
When the user calls int find(int n), object appears unmodified to the outsider. It represents the same entity conceptually, but it's internal fields are mutated to optimize the running time.
How can I implement this in an immutable way?
Short answer: you have to ensure the thread-safety by yourself.
The readonly keyword on a field gives you the insurance that the field cannot be modified after the object containing this field has been constructed.
So the only write you can have for this field is contained in the constructor (or in the field initialization), and a read through a method call cannot occur before the object is constructed, hence the thread-safety of readonly.
If you want to implement caching, you break the assumption that only one write occurs (since "caching writes" can and will occur during you reads), and thus there can be threading problems in bad cases (think you're reading lines from a file, two threads can call the find method with the same parameter but read two different lines and therefore get different results).
What you want to implement is observational immutability. This related question about memoization may help you with an elegant answer.

How does C# compilation get around needing header files?

I've spent my professional life as a C# developer. As a student I occasionally used C but did not deeply study it's compilation model. Recently I jumped on the bandwagon and have begun studying Objective-C. My first steps have only made me aware of holes in my pre-existing knowledge.
From my research, C/C++/ObjC compilation requires all encountered symbols to be pre-declared. I also understand that building is a two-step process. First you compile each individual source file into individual object files. These object files might have undefined "symbols" (which generally correspond to the identifiers declared in the header files). Second you link the object files together to form your final output. This is a pretty high-level explanation but it satisfies my curiosity enough. But I'd also like to have a similar high-level understanding of the C# build process.
Q: How does the C# build process get around the need for header files? I'd imagine perhaps the compilation step does two-passes?
(Edit: Follow up question here How do C/C++/Objective-C compare with C# when it comes to using libraries?)
UPDATE: This question was the subject of my blog for February 4th 2010. Thanks for the great question!
Let me lay it out for you. In the most basic sense the compiler is a "two pass compiler" because the phases that the compiler goes through are:
Generation of metadata.
Generation of IL.
Metadata is all the "top level" stuff that describes the structure of the code. Namespaces, classes, structs, enums, interfaces, delegates, methods, type parameters, formal parameters, constructors, events, attributes, and so on. Basically, everything except method bodies.
IL is all the stuff that goes in a method body -- the actual imperative code, rather than metadata about how the code is structured.
The first phase is actually implemented via a great many passes over the sources. It's way more than two.
The first thing we do is take the text of the sources and break it up into a stream of tokens. That is, we do lexical analysis to determine that
class c : b { }
is class, identifier, colon, identifier, left curly, right curly.
We then do a "top level parse" where we verify that the token streams define a grammaticaly-correct C# program. However, we skip parsing method bodies. When we hit a method body, we just blaze through the tokens until we get to the matching close curly. We'll come back to it later; we only care about getting enough information to generate metadata at this point.
We then do a "declaration" pass where we make notes about the location of every namespace and type declaration in the program.
We then do a pass where we verify that all the types declared have no cycles in their base types. We need to do this first because in every subsequent pass we need to be able to walk up type hierarchies without having to deal with cycles.
We then do a pass where we verify that all generic parameter constraints on generic types are also acyclic.
We then do a pass where we check whether every member of every type -- methods of classes, fields of structs, enum values, and so on -- is consistent. No cycles in enums, every overriding method overrides something that is actually virtual, and so on. At this point we can compute the "vtable" layouts of all interfaces, classes with virtual methods, and so on.
We then do a pass where we work out the values of all "const" fields.
At this point we have enough information to emit almost all the metadata for this assembly. We still do not have information about the metadata for iterator/anonymous function closures or anonymous types; we do those late.
We can now start generating IL. For each method body (and properties, indexers, constructors, and so on), we rewind the lexer to the point where the method body began and parse the method body.
Once the method body is parsed, we do an initial "binding" pass, where we attempt to determine the types of every expression in every statement. We then do a whole pile of passes over each method body.
We first run a pass to transform loops into gotos and labels.
(The next few passes look for bad stuff.)
Then we run a pass to look for use of deprecated types, for warnings.
Then we run a pass that searches for uses of anonymous types that we haven't emitted metadata for yet, and emit those.
Then we run a pass that searches for bad uses of expression trees. For example, using a ++ operator in an expression tree.
Then we run a pass that looks for all local variables in the body that are defined, but not used, to report warnings.
Then we run a pass that looks for illegal patterns inside iterator blocks.
Then we run the reachability checker, to give warnings about unreachable code, and tell you when you've done something like forgotten the return at the end of a non-void method.
Then we run a pass that verifies that every goto targets a sensible label, and that every label is targetted by a reachable goto.
Then we run a pass that checks that all locals are definitely assigned before use, notes which local variables are closed-over outer variables of an anonymous function or iterator, and which anonymous functions are in reachable code. (This pass does too much. I have been meaning to refactor it for some time now.)
At this point we're done looking for bad stuff, but we still have way more passes to go before we sleep.
Next we run a pass that detects missing ref arguments to calls on COM objects and fixes them. (This is a new feature in C# 4.)
Then we run a pass that looks for stuff of the form "new MyDelegate(Foo)" and rewrites it into a call to CreateDelegate.
Then we run a pass that transforms expression trees into the sequence of factory method calls necessary to create the expression trees at runtime.
Then we run a pass that rewrites all nullable arithmetic into code that tests for HasValue, and so on.
Then we run a pass that finds all references of the form base.Blah() and rewrites them into code which does the non-virtual call to the base class method.
Then we run a pass which looks for object and collection initializers and turns them into the appropriate property sets, and so on.
Then we run a pass which looks for dynamic calls (in C# 4) and rewrites them into dynamic call sites that use the DLR.
Then we run a pass that looks for calls to removed methods. (That is, partial methods with no actual implementation, or conditional methods that don't have their conditional compilation symbol defined.) Those are turned into no-ops.
Then we look for unreachable code and remove it from the tree. No point in codegenning IL for it.
Then we run an optimization pass that rewrites trivial "is" and "as" operators.
Then we run an optimization pass that looks for switch(constant) and rewrites it as a branch directly to the correct case.
Then we run a pass which turns string concatenations into calls to the correct overload of String.Concat.
(Ah, memories. These last two passes were the first things I worked on when I joined the compiler team.)
Then we run a pass which rewrites uses of named and optional parameters into calls where the side effects all happen in the correct order.
Then we run a pass which optimizes arithmetic; for example, if we know that M() returns an int, and we have 1 * M(), then we just turn it into M().
Then we do generation of the code for anonymous types first used by this method.
Then we transform anonymous functions in this body into methods of closure classes.
Finally, we transform iterator blocks into switch-based state machines.
Then we emit the IL for the transformed tree that we've just computed.
Easy as pie!
I see that there are multiple interpretations of the question. I answered the intra-solution interpretation, but let me fill it out with all the information I know.
The "header file metadata" is present in the compiled assemblies, so any assembly you add a reference to will allow the compiler to pull in the metadata from those.
As for things not yet compiled, part of the current solution, it will do a two-pass compilation, first reading namespaces, type names, member names, ie. everything but the code. Then when this checks out, it will read the code and compile that.
This allows the compiler to know what exists and what doesn't exist (in its universe).
To see the two-pass compiler in effect, test the following code that has 3 problems, two declaration-related problems, and one code problem:
using System;
namespace ConsoleApplication11
{
class Program
{
public static Stringg ReturnsTheWrongType()
{
return null;
}
static void Main(string[] args)
{
CallSomeMethodThatDoesntExist();
}
public static Stringg AlsoReturnsTheWrongType()
{
return null;
}
}
}
Note that the compiler will only complain about the two Stringg types that it cannot find. If you fix those, then it complains about the method-name called in the Main method, that it cannot find.
It uses the metadata from the reference assemblies. That contains a full type declaration, same thing as you'd find in a header file.
It being a two-pass compiler accomplishes something else: you can use a type in one source file before it is declared in another source code file.
It's a 2-pass compiler. http://en.wikipedia.org/wiki/Multi-pass_compiler
All the necessary information can be obtained from the referenced assemblies.
So there are no header files but the compiler does need access to the DLL's being used.
And yes, it is a 2-pass compiler but that doesn't explain how it gets information about library types.

Is there any run-time overhead to readonly?

For some reason, I've always assumed that readonly fields have overhead associated with them, which I thought of as the CLR keeping track of whether or not a readonly field has been initialized or not. The overhead here would be some extra memory usage to keep track of the state and a check when assigning a value.
Perhaps I assumed this because I didn't know a readonly field could only be initialized inside a constructor or within the field declaration itself and without a run-time check, you wouldn't be able to guarantee it's not being assigned to multiple times in various methods. But now I know this, it could easily be statically checked by the C# compiler, right? So is that the case?
Another reason is that I've read that the usage of readonly has a 'slight' performance impact, but they never went into this claim and I can't find information on this subject, hence my question. I don't know what other performance impact there might be aside from run-time checks.
A third reason is that I saw that readonly is preserved in the compiled IL as initonly, so what is the reason for this information to be in the IL if readonly is nothing more than a guarantee by the C# compiler that the field is never assigned to outside of a constructor or declaration?
On the other hand, I've found out you can set the value of a readonly int through reflection without the CLR throwing an exception, which shouldn't be possible if readonly was a run-time check.
So my guess is: the 'readonlyness' is only a compile time feature, can anyone confirm/deny this? And if it is, what is the reason for this information to be included in the IL?
You have to look at it from the same point of view as the access modifiers. The access modifiers exist in IL, but are they really a run-time check? (1) I can't directly assign private fields at compile-time, (2) I can assign them using reflection. So far it seems no run-time check, like readonly.
But let's examine access modifiers. Do the following:
Create Assembly A.dll with public class C
Create an Assembly B.exe that references A.dll. B.exe uses class C.
Build the two assemblies. Running B.exe works just fine.
Rebuild A.dll but set class C to internal. Replace A.dll in B.exe's directory.
Now, running B.exe throws a runtime exception.
Access modifiers exist in IL as well, right? So what's their purpose? The purpose is that other assemblies that reference a .Net assembly need to know what they are allowed to access and what they are not allowed to access, both compile-time AND run-time.
Readonly seems to have a similar purpose in IL. It tells other assemblies whether they can write to a field on a particular type. However, readonly does not seem to have that same run-time check that access modifiers exhibit in my sample above. It seems that readonly is a compile-time check and does not occur in run-time. Take a look at a sample of performance here: Read-only performance vs const.
Again, this doesn't mean the IL is useless. The IL makes sure that a compile-time error occurs in the first place. Remember, when you build you don't build against code, but assemblies.
If you're using a standard, instance variable, readonly will perform nearly identically to a normal variable. The IL added becomes a compile time check, but is pretty much ignored at runtime.
If you're using a static readonly member, things are a bit different...
Since the static readonly member is set during the static constructor, the JIT "knows" that a value exists. There is no extra memory - readonly just prevents other methods from setting this, but that's a compile time check.
SInce the JIT knows this member can never change, it gets "hard-coded" at runtime, so the final effect is just like having a const value. The difference is that it will take longer during the JIT time itself, since the JIT compiler needs to do extra work to hard-wire the readonly's value into place. (This is going to be very fast, though.)
Expert C++/CLI by Marcus Hegee has a reasonably good explanation of this.
One important point not yet mentioned by any other answers is that when a readonly field is accessed, or when any property is accessed, the request is satisfied using a copy of the data. If the data in question is a value type with more than 4-8 bytes of data, the cost of this extra copying can sometimes be significant. Note that while there is a big jump in cost when structs grow from 16 bytes to 17, structs can be quite a bit larger and still be faster than classes in many applications, if they're not copied too often. For example, if one is supposed to have a type which represents the vertices of a triangle in three-dimensional space. A straightforward implementation would be a struct containing a struct with three float for each point; probably 36 bytes total. If the points, and the coordinates within each point, are mutable public fields, one can access someTriangle.P1.X quickly and easily, without having to copy any data except the Y coordinate of vertex 1. On the other hand, if P1 was a property or a readonly field, the compiler would have to copy P1 to a temporary structure, and then read X from that.
Even if readonly only had effect at compile time, it would still be necessary to store the data in the assembly (i.e. the IL). The CLR is a Common Language Runtime - classes written in one language can both be used and extended by other languages.
Since every compiler for the CLR isn't going to know how to read and compile every other language, in order to preserve the semantics of readonly fields, that data needs to be stored in the assembly so that compilers for other languages will respect it.
Of course, the fact that the field is marked readonly means the JIT can do other things, like optimization (e.g. inline uses of the value), etc. Irrespective of the fact that you used reflection to change the field's value, creating IL which modifies an initonly field outside of the corresponding constructor (instance or static, depending on field type), will result in an unverifiable assembly.

Method can be made static, but should it?

ReSharper likes to point out multiple functions per ASP.NET page that could be made static. Does it help me if I do make them static? Should I make them static and move them to a utility class?
Performance, namespace pollution etc are all secondary in my view. Ask yourself what is logical. Is the method logically operating on an instance of the type, or is it related to the type itself? If it's the latter, make it a static method. Only move it into a utility class if it's related to a type which isn't under your control.
Sometimes there are methods which logically act on an instance but don't happen to use any of the instance's state yet. For instance, if you were building a file system and you'd got the concept of a directory, but you hadn't implemented it yet, you could write a property returning the kind of the file system object, and it would always be just "file" - but it's logically related to the instance, and so should be an instance method. This is also important if you want to make the method virtual - your particular implementation may need no state, but derived classes might. (For instance, asking a collection whether or not it's read-only - you may not have implemented a read-only form of that collection yet, but it's clearly a property of the collection itself, not the type.)
Static methods versus Instance methods
Static and instance members of the C# Language Specification explains the difference. Generally, static methods can provide a very small performance enhancement over instance methods, but only in somewhat extreme situations (see this answer for some more details on that).
Rule CA1822 in FxCop or Code Analysis states:
"After [marking members as static], the compiler will emit non-virtual call sites to these members which will prevent a check at
runtime for each call that ensures the current object pointer is
non-null. This can result in a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue."
Utility Class
You shouldn't move them to a utility class unless it makes sense in your design. If the static method relates to a particular type, like a ToRadians(double degrees) method relates to a class representing angles, it makes sense for that method to exist as a static member of that type (note, this is a convoluted example for the purposes of demonstration).
Marking a method as static within a class makes it obvious that it doesn't use any instance members, which can be helpful to know when skimming through the code.
You don't necessarily have to move it to another class unless it's meant to be shared by another class that's just as closely associated, concept-wise.
I'm sure this isn't happening in your case, but one "bad smell" I've seen in some code I've had to suffer through maintaining used a heck of a lot of static methods.
Unfortunately, they were static methods that assumed a particular application state. (why sure, we'll only have one user per application! Why not have the User class keep track of that in static variables?) They were glorified ways of accessing global variables. They also had static constructors (!), which are almost always a bad idea. (I know there are a couple of reasonable exceptions).
However, static methods are quite useful when they factor out domain-logic that doesn't actually depend on the state of an instance of the object. They can make your code a lot more readable.
Just be sure you're putting them in the right place. Are the static methods intrusively manipulating the internal state of other objects? Can a good case be made that their behavior belongs to one of those classes instead? If you're not separating concerns properly, you may be in for headaches later.
This is interesting read:
http://thecuttingledge.com/?p=57
ReSharper isn’t actually suggesting you make your method static.
You should ask yourself why that method is in that class as opposed to, say, one of the classes that shows up in its signature...
but here is what ReSharper documentaion says:
http://confluence.jetbrains.net/display/ReSharper/Member+can+be+made+static
Just to add to #Jason True's answer, it is important to realise that just putting 'static' on a method doesn't guarantee that the method will be 'pure'. It will be stateless with regard to the class in which it is declared, but it may well access other 'static' objects which have state (application configuration etc.), this may not always be a bad thing, but one of the reasons that I personally tend to prefer static methods when I can is that if they are pure, you can test and reason about them in isolation, without having to worry about the surrounding state.
For complex logic within a class, I have found private static methods useful in creating isolated logic, in which the instance inputs are clearly defined in the method signature and no instance side-effects can occur. All outputs must be via return value or out/ref parameters. Breaking down complex logic into side-effect-free code blocks can improve the code's readability and the development team's confidence in it.
On the other hand it can lead to a class polluted by a proliferation of utility methods. As usual, logical naming, documentation, and consistent application of team coding conventions can alleviate this.
You should do what is most readable and intuitive in a given scenario.
The performance argument is not a good one except in the most extreme situations as the only thing that is actually happening is that one extra parameter (this) is getting pushed onto the stack for instance methods.
ReSharper does not check the logic. It only checks whether the method uses instance members.
If the method is private and only called by (maybe just one) instance methods this is a sign to let it an instance method.
I hope you have already understood the difference between static and instance methods. Also, there can be a long answer and a short one. Long answers are already provided by others.
My short answer: Yes, you can convert them to static methods as ReSharper suggests. There is no harm in doing so. Rather, by making the method static, you are actually guarding the method so that you do not unnecessarily slip any instance members into that method. In that way, you can achieve an OOP principle "Minimize the accessibility of classes and members".
When ReSharper is suggesting that an instance method can be converted to a static one, it is actually telling you, "Why the .. this method is sitting in this class but it is not actually using any of its states?" So, it gives you food for thought. Then, it is you who can realize the need for moving that method to a static utility class or not. According to the SOLID principles, a class should have only one core responsibility. So, you can do a better cleanup of your classes in that way. Sometimes, you do need some helper methods even in your instance class. If that is the case, you may keep them within a #region helper.
If the functions are shared across many pages, you could also put them in a base page class, and then have all asp.net pages using that functionality inherit from it (and the functions could still be static as well).
Making a method static means you can call the method from outside the class without first creating an instance of that class. This is helpful when working with third-party vendor objects or add-ons. Imagine if you had to first create a Console object "con" before calling con.Writeline();
It helps to control namespace pollution.
Just my tuppence: Adding all of the shared static methods to a utility class allows you to add
using static className;
to your using statements, which makes the code faster to type and easier to read. For example, I have a large number of what would be called "global variables" in some code I inherited. Rather than make global variables in a class that was an instance class, I set them all as static properties of a global class. It does the job, if messily, and I can just reference the properties by name because I have the static namespace already referenced.
I have no idea if this is good practice or not. I have so much to learn about C# 4/5 and so much legacy code to refactor that I am just trying to let the Roselyn tips guide me.
Joey

Categories