When calling a delegate you always have to check if it is not null. This is an often cause for errors.
Since delegates are more or less just a list of functions, I would assume, that this could have been easily checked by the delegate itself.
Does anybody know, why it has been implemented as it is?
This may be stating the obvious, but you can declare your event to point to a no-op handler and then you don't need to check for null when invoking.
public event EventHandler MyEvent = delegate { };
Then you can call the handlers pointed to by MyEvent without checking for null.
The delegate itself can't check it. Since it is null you can't call methods on it.
The C# compiler on the other hand could automatically insert the null-check. But I don't know what behavior to use if the delegate is a function with a return-value.
One might argue that the C# compiler should insert that check for void-functions to avoid boilerplate code in that common case. But that decision is now in the past, you can only fix it without breaking existing programs if you get a time-machine.
You can use extension methods(since they can be called on a null object) to automate the null-check for events:
http://blogs.microsoft.co.il/blogs/shayf/archive/2009/05/25/a-handy-extension-method-raise-events-safely.aspx
And since a parameter is a temp variable you don't need to manually assign the event to a temp variable for thread-safety.
Internally, the compiler will generate 2 methods add_MyEvent and remove_MyEvent for each event declared in a class. When you write, MyEvent += ..., the compiler will actually generate a call to add_MyEvent which in turn will call System.Delegate.Combine.
Combine takes 2 delegates in parameter and creates a new (multicast) delegate from the 2 original delegates, handling the case when one of them is null (which is the case the first time you call +=).
I guess the compiler could have been a bit smarter and also handled the event call so that when you call
MyEvent(), it would actually generate a null check and actually invoke the delegate only if not null. (It would be nice to have Eric Lippert's view on that one).
The reason is performance (actually, that's my best guess). Events and delegates are made with a lot of compiler magic, stuff that cannot be replicated by mere C# code. But underneath it goes something like this:
A delegate is a compiler-generated class that inherits from MulticastDelegate class, which itself derives from the Delegate class. Note these two classes are magic and you can't inherit from them yourself (well, maybe you can, but you won't be able to use them very well).
An event however is implemented something like this:
private MyEventDelegateClass __cgbe_MyEvent; // Compiler generated backing field
public event MyEventDelegateClass MyEvent
{
add
{
this.__cgbe_MyEvent = (MyEventDelegateClass)Delegate.Combine(this.__cgbe_MyEvent, value);
}
remove
{
this.__cgbe_MyEvent = (MyEventDelegateClass)Delegate.Remove(this.__cgbe_MyEvent, value);
}
get
{
return this.__cgbe_MyEvent;
}
}
OK, so this isn't real code (nor is it exactly the way it is in real life), but it should give you an idea of what's going on.
The point is that the backing field initially really is null. That way there is no overhead for creating an instance of MyEventDelegateClass when creating your object. And that's why you have to check for null before invoking. Later on, when handlers get added/removed, an instance of MyEventDelegateClass is created and assigned to the field. And when the last handler is removed, this instance is also lost and the backing field is reset to null again.
This is the principle of "you don't pay for what you don't use". As long as you don't use the event, there will be no overhead for it. No extra memory, no extra CPU cycles.
I'd guess the reasons are history and consistency.
It is consistent with the way other types are handled; e.g. there's no language or platform assistence in dealing with nulls - it's the programmers responsibility to ensure the null placeholder is never actually used - there's no means of only optionally calling a method if the object reference you wish to call it on is non-null either, after all.
It's history, since null references happen to be included by default in most types, even though this isn't necessary. That is, rather than treating reference types like value types and require an additional "nullability" annotation to permit nullability, reference types are always nullable. I'm sure there were reasons for this back in the day when Java and .NET were designed, but it introduces many unnecessary bugs and complexities which are easy to avoid in a strongly typed language (e.g., .NET value types). But given the historical inclusion of null as a type-system wide "invalid" value, so to speak, it's only natural to do so for delegates as well.
If a delegate instance is null, how can it "check itself"? It's null... that means it does not exist, basically. There is nothing there to check itself in the first place. That's why your code which attempts to call the delegate must do that checking, first.
"Since delegates are more or less just
a list of functions, I would assume,
that this could have been easily
checked by the delegate itself".
Your assumption is wrong. Plain and simple.
Related
In the talk "Lowering in C#: What's really going on in your code? - David Wengier" https://youtu.be/gc1AxbNybvw?t=1606 at the 26:39 mark, the presenter says that if the following code:
was lowered by the compiler to a method:
this could be a potential problem for memory usage, I quote:
Because the compiler doesn't know what the body of the lambda does,
this could cause the whole class C to have to live on in memory
forever. If the class would be Windows.Form, it could have a lot of
resources
And then he says, that for this reason, the compiler actually generates a class.
I fail to understand the reasoning how could the memory leak appear.
For example, the C# 7 local functions actually turn into methods, so why wouldn't it work for non-capturing lambdas too?
If the compiler generates a non-static method and binds it to an Action<string>, the this pointer has to be bound along with it whether it is used in the lambda or not. You cannot call a non-static method without a this pointer; this is a VM-level restriction that the compiler cannot circumvent. So the this pointer is captured and thus keeps the class instance alive as long as the Action is alive, which, if it is bound to a long-lived event handler, for example, could be a long time.
But IMO the whole argument starts from the wrong place. He says, "You kind think it conceptually gets lowered to [a method in the same class]." And then he goes on to explain why that's a bad idea. But I think this is a strawman argument. Why would the compiler generate a non-static method on the same class in the first place? There is no reason to do that. It is certainly not the line of thinking the C# designers used when coming up with a lowering for lambdas.
The argument should instead go like this: a lambda can capture things it uses (it's a closure). In order to capture them, a place to store them is needed. So the compiler generates a class, which has fields for the captured things.
If the class happens to capture nothing, there are no fields, but there's still a class. Now, the compiler could, if the lambda captures nothing, generate a static method in the containing class, or a non-static method if the lambda only catches this. But there are two problems with this:
Those methods could be discovered by reflection.
It increases the complexity of the compiler (completely separate ways of lowering the lambda for different capture patterns) for no gain (this other way of lowering has no real advantage).
So the compiler simply doesn't do that. The only thing it does, as an optimization, is reuse the same instance of the lambda class every time if the lambda is stateless (i.e. captures nothing). This is just a tiny branch in the existing lowering code and thus has far less complexity.
If I create a bool within my class, just something like bool check, it defaults to false.
When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?
Yuval and David's answers are basically correct; summing up:
Use of an unassigned local variable is a likely bug, and this can be detected by the compiler at low cost.
Use of an unassigned field or array element is less likely a bug, and it is harder to detect the condition in the compiler. Therefore the compiler makes no attempt to detect the use of an uninitialized variable for fields, and instead relies upon the initialization to the default value in order to make the program behavior deterministic.
A commenter to David's answer asks why it is impossible to detect the use of an unassigned field via static analysis; this is the point I want to expand upon in this answer.
First off, for any variable, local or otherwise, it is in practice impossible to determine exactly whether a variable is assigned or unassigned. Consider:
bool x;
if (M()) x = true;
Console.WriteLine(x);
The question "is x assigned?" is equivalent to "does M() return true?" Now, suppose M() returns true if Fermat's Last Theorem is true for all integers less than eleventy gajillion, and false otherwise. In order to determine whether x is definitely assigned, the compiler must essentially produce a proof of Fermat's Last Theorem. The compiler is not that smart.
So what the compiler does instead for locals is implements an algorithm which is fast, and overestimates when a local is not definitely assigned. That is, it has some false positives, where it says "I can't prove that this local is assigned" even though you and I know it is. For example:
bool x;
if (N() * 0 == 0) x = true;
Console.WriteLine(x);
Suppose N() returns an integer. You and I know that N() * 0 will be 0, but the compiler does not know that. (Note: the C# 2.0 compiler did know that, but I removed that optimization, as the specification does not say that the compiler knows that.)
All right, so what do we know so far? It is impractical for locals to get an exact answer, but we can overestimate not-assigned-ness cheaply and get a pretty good result that errs on the side of "make you fix your unclear program". That's good. Why not do the same thing for fields? That is, make a definite assignment checker that overestimates cheaply?
Well, how many ways are there for a local to be initialized? It can be assigned within the text of the method. It can be assigned within a lambda in the text of the method; that lambda might never be invoked, so those assignments are not relevant. Or it can be passed as "out" to anothe method, at which point we can assume it is assigned when the method returns normally. Those are very clear points at which the local is assigned, and they are right there in the same method that the local is declared. Determining definite assignment for locals requires only local analysis. Methods tend to be short -- far less than a million lines of code in a method -- and so analyzing the entire method is quite quick.
Now what about fields? Fields can be initialized in a constructor of course. Or a field initializer. Or the constructor can call an instance method that initializes the fields. Or the constructor can call a virtual method that initailizes the fields. Or the constructor can call a method in another class, which might be in a library, that initializes the fields. Static fields can be initialized in static constructors. Static fields can be initialized by other static constructors.
Essentially the initializer for a field could be anywhere in the entire program, including inside virtual methods that will be declared in libraries that haven't been written yet:
// Library written by BarCorp
public abstract class Bar
{
// Derived class is responsible for initializing x.
protected int x;
protected abstract void InitializeX();
public void M()
{
InitializeX();
Console.WriteLine(x);
}
}
Is it an error to compile this library? If yes, how is BarCorp supposed to fix the bug? By assigning a default value to x? But that's what the compiler does already.
Suppose this library is legal. If FooCorp writes
public class Foo : Bar
{
protected override void InitializeX() { }
}
is that an error? How is the compiler supposed to figure that out? The only way is to do a whole program analysis that tracks the initialization static of every field on every possible path through the program, including paths that involve choice of virtual methods at runtime. This problem can be arbitrarily hard; it can involve simulated execution of millions of control paths. Analyzing local control flows takes microseconds and depends on the size of the method. Analyzing global control flows can take hours because it depends on the complexity of every method in the program and all the libraries.
So why not do a cheaper analysis that doesn't have to analyze the whole program, and just overestimates even more severely? Well, propose an algorithm that works that doesn't make it too hard to write a correct program that actually compiles, and the design team can consider it. I don't know of any such algorithm.
Now, the commenter suggests "require that a constructor initialize all fields". That's not a bad idea. In fact, it is such a not-bad idea that C# already has that feature for structs. A struct constructor is required to definitely-assign all fields by the time the ctor returns normally; the default constructor initializes all the fields to their default values.
What about classes? Well, how do you know that a constructor has initialized a field? The ctor could call a virtual method to initialize the fields, and now we are back in the same position we were in before. Structs don't have derived classes; classes might. Is a library containing an abstract class required to contain a constructor that initializes all its fields? How does the abstract class know what values the fields should be initialized to?
John suggests simply prohibiting calling methods in a ctor before the fields are initialized. So, summing up, our options are:
Make common, safe, frequently used programming idioms illegal.
Do an expensive whole-program analysis that makes the compilation take hours in order to look for bugs that probably aren't there.
Rely upon automatic initialization to default values.
The design team chose the third option.
When I create the same bool within my method, bool check(instead of
within the class), i get an error "use of unassigned local variable
check". Why?
Because the compiler is trying to prevent you from making a mistake.
Does initializing your variable to false change anything in this particular path of execution? Probably not, considering default(bool) is false anyway, but it is forcing you to be aware that this is happening. The .NET environment prevents you from accessing "garbage memory", since it will initialize any value to their default. But still, imagine this was a reference type, and you'd pass an uninitialized (null) value to a method expecting a non-null, and get a NRE at runtime. The compiler is simply trying to prevent that, accepting the fact that this may sometimes result in bool b = false statements.
Eric Lippert talks about this in a blog post:
The reason why we want to make this illegal is not, as many people
believe, because the local variable is going to be initialized to
garbage and we want to protect you from garbage. We do in fact
automatically initialize locals to their default values. (Though the C
and C++ programming languages do not, and will cheerfully allow you to
read garbage from an uninitialized local.) Rather, it is because the
existence of such a code path is probably a bug, and we want to throw
you in the pit of quality; you should have to work hard to write that
bug.
Why doesn't this apply to a class field? Well, I assume the line had to be drawn somewhere, and local variables initialization are a lot easier to diagnose and get right, as opposed to class fields. The compiler could do this, but think of all the possible checks it would need to be making (where some of them are independent of the class code itself) in order to evaluate if each field in a class is initialized. I am no compiler designer, but I am sure it would be definitely harder as there are plenty of cases that are taken into account, and has to be done in a timely fashion as well. For every feature you have to design, write, test and deploy and the value of implementing this as opposed to the effort put in would be non-worthy and complicated.
Why do local variables require initialization, but fields do not?
The short answer is that code accessing uninitialised local variables can be detected by the compiler in a reliable way, using static analysis. Whereas this isn't the case of fields. So the compiler enforces the first case, but not the second.
Why do local variables require initialization?
This is no more than a design decision of the C# language, as explained by Eric Lippert. The CLR and the .NET environment do not require it. VB.NET, for example, will compile just fine with uninitialised local variables, and in reality the CLR initialises all uninitialised variables to default values.
The same could occur with C#, but the language designers chose not to. The reason is that initialised variables are a huge source of bugs and so, by mandating initialisation, the compiler helps to cut down on accidental mistakes.
Why don't fields require initialization?
So why doesn't this compulsory explicit initialisation happen with fields within a class? Simply because that explicit initialisation could occur during construction, through a property being called by an object initializer, or even by a method being called long after the event. The compiler cannot use static analysis to determine if every possible path through the code leads to the variable being explicitly initialised before us. Getting it wrong would be annoying, as the developer could be left with valid code that won't compile. So C# doesn't enforce it at all and the CLR is left to automatically initialise fields to a default value if not explicitly set.
What about collection types?
C#'s enforcement of local variable initialisation is limited, which often catches developers out. Consider the following four lines of code:
string str;
var len1 = str.Length;
var array = new string[10];
var len2 = array[0].Length;
The second line of code won't compile, as it's trying to read an uninitialised string variable. The fourth line of code compiles just fine though, as array has been initialised, but only with default values. Since the default value of a string is null, we get an exception at run-time. Anyone who's spent time here on Stack Overflow will know that this explicit/implicit initialisation inconsistency leads to a great many "Why am I getting a “Object reference not set to an instance of an object” error?" questions.
Good answers above, but I thought I'd post a much simpler/shorter answer for people to lazy to read a long one (like me).
Class
class Foo {
private string Boo;
public Foo() { /** bla bla bla **/ }
public string DoSomething() { return Boo; }
}
Property Boo may or may not have been initialized in the constructor. So when it finds return Boo; it doesn't assume that it's been initialized. It simply suppresses the error.
Function
public string Foo() {
string Boo;
return Boo; // triggers error
}
The { } characters define the scope of a block of code. The compiler walks the branches of these { } blocks keeping track of stuff. It can easily tell that Boo was not initialized. The error is then triggered.
Why does the error exist?
The error was introduced to reduce the number of lines of code required to make source code safe. Without the error the above would look like this.
public string Foo() {
string Boo;
/* bla bla bla */
if(Boo == null) {
return "";
}
return Boo;
}
From the manual:
The C# compiler does not allow the use of uninitialized variables. If the compiler detects the use of a variable that might not have been initialized, it generates compiler error CS0165. For more information, see Fields (C# Programming Guide). Note that this error is generated when the compiler encounters a construct that might result in the use of an unassigned variable, even if your particular code does not. This avoids the necessity of overly-complex rules for definite assignment.
Reference: https://msdn.microsoft.com/en-us/library/4y7h161d.aspx
Quick note on the accepted answer: I disagree with a small part of Jeffrey's answer, namely the point that since Delegate had to be a reference type, it follows that all delegates are reference types. (It simply isn't true that a multi-level inheritance chain rules out value types; all enum types, for example, inherit from System.Enum, which in turn inherits from System.ValueType, which inherits from System.Object, all reference types.) However I think the fact that, fundamentally, all delegates in fact inherit not just from Delegate but from MulticastDelegate is the critical realization here. As Raymond points out in a comment to his answer, once you've committed to supporting multiple subscribers, there's really no point in not using a reference type for the delegate itself, given the need for an array somewhere.
See update at bottom.
It has always seemed strange to me that if I do this:
Action foo = obj.Foo;
I am creating a new Action object, every time. I'm sure the cost is minimal, but it involves allocation of memory to later be garbage collected.
Given that delegates are inherently themselves immutable, I wonder why they couldn't be value types? Then a line of code like the one above would incur nothing more than a simple assignment to a memory address on the stack*.
Even considering anonymous functions, it seems (to me) this would work. Consider the following simple example.
Action foo = () => { obj.Foo(); };
In this case foo does constitute a closure, yes. And in many cases, I imagine this does require an actual reference type (such as when local variables are closed over and are modified within the closure). But in some cases, it shouldn't. For instance in the above case, it seems that a type to support the closure could look like this: I take back my original point about this. The below really does need to be a reference type (or: it doesn't need to be, but if it's a struct it's just going to get boxed anyway). So, disregard the below code example. I leave it only to provide context for answers the specfically mention it.
struct CompilerGenerated
{
Obj obj;
public CompilerGenerated(Obj obj)
{
this.obj = obj;
}
public void CallFoo()
{
obj.Foo();
}
}
// ...elsewhere...
// This would not require any long-term memory allocation
// if Action were a value type, since CompilerGenerated
// is also a value type.
Action foo = new CompilerGenerated(obj).CallFoo;
Does this question make sense? As I see it, there are two possible explanations:
Implementing delegates properly as value types would have required additional work/complexity, since support for things like closures that do modify values of local variables would have required compiler-generated reference types anyway.
There are some other reasons why, under the hood, delegates simply can't be implemented as value types.
In the end, I'm not losing any sleep over this; it's just something I've been curious about for a little while.
Update: In response to Ani's comment, I see why the CompilerGenerated type in my above example might as well be a reference type, since if a delegate is going to comprise a function pointer and an object pointer it'll need a reference type anyway (at least for anonymous functions using closures, since even if you introduced an additional generic type parameter—e.g., Action<TCaller>—this wouldn't cover types that can't be named!). However, all this does is kind of make me regret bringing the question of compiler-generated types for closures into the discussion at all! My main question is about delegates, i.e., the thing with the function pointer and the object pointer. It still seems to me that could be a value type.
In other words, even if this...
Action foo = () => { obj.Foo(); };
...requires the creation of one reference type object (to support the closure, and give the delegate something to reference), why does it require the creation of two (the closure-supporting object plus the Action delegate)?
*Yes, yes, implementation detail, I know! All I really mean is short-term memory storage.
The question boils down to this: the CLI (Common Language Infrastructure) specification says that delegates are reference types. Why is this so?
One reason is clearly visible in the .NET Framework today. In the original design, there were two kinds of delegates: normal delegates and "multicast" delegates, which could have more than one target in their invocation list. The MulticastDelegate class inherits from Delegate. Since you can't inherit from a value type, Delegate had to be a reference type.
In the end, all actual delegates ended up being multicast delegates, but at that stage in the process, it was too late to merge the two classes. See this blog post about this exact topic:
We abandoned the distinction between Delegate and MulticastDelegate
towards the end of V1. At that time, it would have been a massive
change to merge the two classes so we didn’t do so. You should
pretend that they are merged and that only MulticastDelegate exists.
In addition, delegates currently have 4-6 fields, all pointers. 16 bytes is usually considered the upper bound where saving memory still wins out over extra copying. A 64-bit MulticastDelegate takes up 48 bytes. Given this, and the fact that they were using inheritance suggests that a class was the natural choice.
There is only one reason that Delegate needs to be a class, but it's a big one: while a delegate could be small enough to allow efficient storage as a value type (8 bytes on 32-bit systems, or 16 bytes on 64-bit systems), there's no way it could be small enough to efficiently guarantee if one thread attempts to write a delegate while another thread attempts to execute it, the latter thread wouldn't end up either invoking the old method on the new target, or the new method on the old target. Allowing such a thing to occur would be a major security hole. Having delegates be reference types avoids this risk.
Actually, even better than having delegates be structure types would be having them be interfaces. Creating a closure requires creating two heap objects: a compiler-generated object to hold any closed-over variables, and a delegate to invoke the proper method on that object. If delegates were interfaces, the object which held the closed-over variables could itself be used as the delegate, with no other object required.
Imagine if delegates were value types.
public delegate void Notify();
void SignalTwice(Notify notify) { notify(); notify(); }
int counter = 0;
Notify handler = () => { counter++; }
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
Per your proposal, this would internally be converted to
struct CompilerGenerated
{
int counter = 0;
public Execute() { ++counter; }
};
Notify handler = new CompilerGenerated();
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
If delegate were a value type, then SignalEvent would get a copy of handler, which means that a brand new CompilerGenerated would be created (a copy of handler) and passed to SignalEvent. SignalTwice would execute the delegate twice, which increments the counter twice in the copy. And then SignalTwice returns, and the function prints 0, because the original was not modified.
Here's an uninformed guess:
If delegates were implemented as value-types, instances would be very expensive to copy around since a delegate-instance is relatively heavy. Perhaps MS felt it would be safer to design them as immutable reference types - copying machine-word sized references to instances are relatively cheap.
A delegate instance needs, at the very least:
An object reference (the "this" reference for the wrapped method if it is an instance method).
A pointer to the wrapped function.
A reference to the object containing the multicast invocation list. Note that a delegate-type should support, by design, multicast using the same delegate type.
Let's assume that value-type delegates were implemented in a similar manner to the current reference-type implementation (this is perhaps somewhat unreasonable; a different design may well have been chosen to keep the size down) to illustrate. Using Reflector, here are the fields required in a delegate instance:
System.Delegate: _methodBase, _methodPtr, _methodPtrAux, _target
System.MulticastDelegate: _invocationCount, _invocationList
If implemented as a struct (no object header), these would add up to 24 bytes on x86 and 48 bytes on x64, which is massive for a struct.
On another note, I want to ask how, in your proposed design, making the CompilerGenerated closure-type a struct helps in any way. Where would the created delegate's object pointer point to? Leaving the closure type instance on the stack without proper escape analysis would be extremely risky business.
I can tell that making delegates as reference types is definitely a bad design choice. They could be value types and still support multi-cast delegates.
Imagine that Delegate is a struct composed of, let's say:
object target;
pointer to the method
It can be a struct, right?
The boxing will only occur if the target is a struct (but the delegate itself will not be boxed).
You may think it will not support MultiCastDelegate, but then we can:
Create a new object that will hold the array of normal delegates.
Return a Delegate (as struct) to that new object, which will implement Invoke iterating over all its values and calling Invoke on them.
So, for normal delegates, that are never going to call two or more handlers, it could work as a struct.
Unfortunately, that is not going to change in .Net.
As a side note, variance does not requires the Delegate to be reference types. The parameters of the delegate should be reference types. After all, if you pass a string were an object is required (for input, not ref or out), then no cast is needed, as string is already an object.
I saw this interesting conversation on the Internet:
Immutable doesn't mean it has to be a value type. And something that
is a value type is not required to be immutable. The two often go
hand-in-hand, but they are not actually the same thing, and there are
in fact counter-examples of each in the .NET Framework (the String
class, for example).
And the answer:
The difference being that while immutable reference types are
reasonably common and perfectly reasonable, making value types mutable
is almost always a bad idea, and can result in some very confusing
behaviour!
Taken from here
So, in my opinion the decision was made by language usability aspects, and not by compiler technological difficulties. I love nullable delegates.
I guess one reason is support for multi cast delegates Multi cast delegates are more complex than simply a few fields indicating target and method.
Another thing that's only possible in this form is delegate variance. This kind of variance requires a reference conversion between the two types.
Interestingly F# defines it's own function pointer type that's similar to delegates, but more lightweight. But I'm not sure if it's a value or reference type.
In his excellent book, CLR Via C#, Jeffrey Richter said that he doesn't like properties, and recommends not to use them. He gave some reason, but I don't really understand. Can anyone explain to me why I should or should not use properties?
In C# 3.0, with automatic properties, does this change?
As a reference, I added Jeffrey Richter's opinions:
• A property may be read-only or write-only; field access is always readable and writable.
If you define a property, it is best to offer both get and set accessor methods.
• A property method may throw an exception; field access never throws an exception.
• A property cannot be passed as an out or ref parameter to a method; a field can. For
example, the following code will not compile:
using System;
public sealed class SomeType
{
private static String Name
{
get { return null; }
set {}
}
static void MethodWithOutParam(out String n) { n = null; }
public static void Main()
{
// For the line of code below, the C# compiler emits the following:
// error CS0206: A property or indexer may not
// be passed as an out or ref parameter
MethodWithOutParam(out Name);
}
}
• A property method can take a long time to execute; field access always completes immediately.
A common reason to use properties is to perform thread synchronization, which
can stop the thread forever, and therefore, a property should not be used if thread synchronization
is required. In that situation, a method is preferred. Also, if your class can be
accessed remotely (for example, your class is derived from System.MashalByRefObject),
calling the property method will be very slow, and therefore, a method is preferred to a
property. In my opinion, classes derived from MarshalByRefObject should never use
properties.
• If called multiple times in a row, a property method may return a different value each
time; a field returns the same value each time. The System.DateTime class has a readonly
Now property that returns the current date and time. Each time you query this
property, it will return a different value. This is a mistake, and Microsoft wishes that they
could fix the class by making Now a method instead of a property.
• A property method may cause observable side effects; field access never does. In other
words, a user of a type should be able to set various properties defined by a type in any
order he or she chooses without noticing any different behavior in the type.
• A property method may require additional memory or return a reference to something
that is not actually part of the object's state, so modifying the returned object has no
effect on the original object; querying a field always returns a reference to an object that
is guaranteed to be part of the original object's state. Working with a property that
returns a copy can be very confusing to developers, and this characteristic is frequently
not documented.
Jeff's reason for disliking properties is because they look like fields - so developers who don't understand the difference will treat them as if they're fields, assuming that they'll be cheap to execute etc.
Personally I disagree with him on this particular point - I find properties make the client code much simpler to read than the equivalent method calls. I agree that developers need to know that properties are basically methods in disguise - but I think that educating developers about that is better than making code harder to read using methods. (In particular, having seen Java code with several getters and setters being called in the same statement, I know that the equivalent C# code would be a lot simpler to read. The Law of Demeter is all very well in theory, but sometimes foo.Name.Length really is the right thing to use...)
(And no, automatically implemented properties don't really change any of this.)
This is slightly like the arguments against using extension methods - I can understand the reasoning, but the practical benefit (when used sparingly) outweighs the downside in my view.
Well, lets take his arguments one by one:
A property may be read-only or
write-only; field access is always
readable and writable.
This is a win for properties, since you have more fine-grained control of access.
A property method may throw an
exception; field access never throws
an exception.
While this is mostly true, you can very well call a method on a not initialized object field, and have an exception thrown.
• A property cannot be passed as an
out or ref parameter to a method; a
field can.
Fair.
• A property method can take a long
time to execute; field access always
completes immediately.
It can also take very little time.
• If called multiple times in a row, a
property method may return a different
value each time; a field returns the
same value each time.
Not true. How do you know the field's value has not changed (possibly by another thread)?
The System.DateTime class has a
readonly Now property that returns the
current date and time. Each time you
query this property, it will return a
different value. This is a mistake,
and Microsoft wishes that they could
fix the class by making Now a method
instead of a property.
If it is a mistake it's a minor one.
• A property method may cause
observable side effects; field access
never does. In other words, a user of
a type should be able to set various
properties defined by a type in any
order he or she chooses without
noticing any different behavior in the
type.
Fair.
• A property method may require
additional memory or return a
reference to something that is not
actually part of the object's state,
so modifying the returned object has
no effect on the original object;
querying a field always returns a
reference to an object that is
guaranteed to be part of the original
object's state. Working with a
property that returns a copy can be
very confusing to developers, and this
characteristic is frequently not
documented.
Most of the protestations could be said for Java's getters and setters too --and we had them for quite a while without such problems in practice.
I think most of the problems could be solved by better syntax highlighting (i.e differentiating properties from fields) so the programmer knows what to expect.
I haven't read the book, and you haven't quoted the part of it you don't understand, so I'll have to guess.
Some people dislike properties because they can make your code do surprising things.
If I type Foo.Bar, people reading it will normally expect that this is simply accessing a member field of the Foo class. It's a cheap, almost free, operation, and it's deterministic. I can call it over and over, and get the same result every time.
Instead, with properties, it might actually be a function call. It might be an infinite loop. It might open a database connection. It might return different values every time I access it.
It is a similar argument to why Linus hates C++. Your code can act surprising to the reader. He hates operator overloading: a + b doesn't necessarily mean simple addition. It may mean some hugely complicated operation, just like C# properties. It may have side effects. It may do anything.
Honestly, I think this is a weak argument. Both languages are full of things like this. (Should we avoid operator overloading in C# as well? After all, the same argument can be used there)
Properties allow abstraction. We can pretend that something is a regular field, and use it as if it was one, and not have to worry about what goes on behind the scenes.
That's usually considered a good thing, but it obviously relies on the programmer writing meaningful abstractions. Your properties should behave like fields. They shouldn't have side effects, they shouldn't perform expensive or unsafe operations. We want to be able to think of them as fields.
However, I have another reason to find them less than perfect. They can not be passed by reference to other functions.
Fields can be passed as ref, allowing a called function to access it directly. Functions can be passed as delegates, allowing a called function to access it directly.
Properties... can't.
That sucks.
But that doesn't mean properties are evil or shouldn't be used. For many purposes, they're great.
Back in 2009, this advice merely seemed like bellyaching of the Who Moved My Cheese variety. Today, it's almost laughably obsolete.
One very important point that many answers seem to tiptoe around but don't quite address head on is that these purported "dangers" of properties are an intentional part of the framework design!
Yes, properties can:
Specify different access modifiers for the getter and setter. This is an advantage over fields. A common pattern is to have a public getter and a protected or internal setter, a very useful inheritance technique which isn't achievable by fields alone.
Throw an exception. To date, this remains one of the most effective methods of validation, especially when working with UI frameworks that involve data-binding concepts. It's much more difficult to ensure that an object remains in a valid state when working with fields.
Take a long time to execute. The valid comparison here is with methods, which take equally long - not fields. No basis is given for the statement "a method is preferred" other than one author's personal preference.
Return different values from its getter on subsequent executions. This almost seems like a joke in such close proximity to the point extolling the virtues of ref/out parameters with fields, whose value of a field after a ref/out call is pretty much guaranteed to be different from its previous value, and unpredictably so.
If we're talking about the specific (and practically academic) case of single-threaded access with no afferent couplings, it's fairly well understood that it's just bad property design to have visible-state-changing side-effects, and maybe my memory is fading, but I just can't seem to recall any examples of folks using DateTime.Now and expecting the same value to come out every time. At least not any instances where they wouldn't have screwed it up just as badly with a hypothetical DateTime.Now().
Cause observable side effects - which is of course precisely the reason properties were invented as a language feature in the first place. Microsoft's own Property Design guidelines indicate that setter order shouldn't matter, as to do otherwise would imply temporal coupling. Certainly, you can't achieve temporal coupling with fields alone, but that's only because you can't cause any meaningful behaviour at all to happen with fields alone, until some method is executed.
Property accessors can actually help prevent certain types of temporal coupling by forcing the object into a valid state before any action is taken - for example, if a class has a StartDate and an EndDate, then setting the EndDate before the StartDate could force the StartDate back as well. This is true even in multi-threaded or asynchronous environments, including the obvious example of an event-driven user interface.
Other things that properties can do which fields can't include:
Lazy loading, one of the most effective ways of preventing initialization-order errors.
Change Notifications, which are pretty much the entire basis for the MVVM architecture.
Inheritance, for example defining an abstract Type or Name so derived classes can provide interesting but nevertheless constant metadata about themselves.
Interception, thanks to the above.
Indexers, which everyone who has ever had to work with COM interop and the inevitable spew of Item(i) calls will recognize as a wonderful thing.
Work with PropertyDescriptor which is essential for creating designers and for XAML frameworks in general.
Richter is clearly a prolific author and knows a lot about the CLR and C#, but I have to say, it seems like when he originally wrote this advice (I'm not sure if it's in his more recent revisions - I sincerely hope not) that he just didn't want to let go of old habits and was having trouble accepting the conventions of C# (vs. C++, for example).
What I mean by this is, his "properties considered harmful" argument essentially boils down to a single statement: Properties look like fields, but they might not act like fields. And the problem with the statement is, it isn't true, or at best it's highly misleading. Properties don't look like fields - at least, they aren't supposed to look like fields.
There are two very strong coding conventions in C# with similar conventions shared by other CLR languages, and FXCop will scream at you if you don't follow them:
Fields should always be private, never public.
Fields should be declared in camelCase. Properties are PascalCase.
Thus, there is no ambiguity over whether Foo.Bar = 42 is a property accessor or a field accessor. It's a property accessor and should be treated like any other method - it might be slow, it might throw an exception, etc. That's the nature of Abstraction - it's entirely up to the discretion of the declaring class how to react. Class designers should apply the principle of least surprise but callers should not assume anything about a property except that it does what it says on the tin. That's on purpose.
The alternative to properties is getter/setter methods everywhere. That's the Java approach, and it's been controversial since the beginning. It's fine if that's your bag, but it's just not how we roll in the .NET camp. We try, at least within the confines of a statically-typed system, to avoid what Fowler calls Syntactic Noise. We don't want extra parentheses, extra get/set warts, or extra method signatures - not if we can avoid them without any loss of clarity.
Say whatever you like, but foo.Bar.Baz = quux.Answers[42] is always going to be a lot easier to read than foo.getBar().setBaz(quux.getAnswers().getItem(42)). And when you're reading thousands of lines of this a day, it makes a difference.
(And if your natural response to the above paragraph is to say, "sure it's hard to read, but it would be easier if you split it up in multiple lines", then I'm sorry to say that you have completely missed the point.)
I don't see any reasons why you shouldn't use Properties in general.
Automatic properties in C# 3+ only simplify syntax a bit (a la syntatic sugar).
It's just one person's opinion. I've read quite a few c# books and I've yet to see anyone else saying "don't use properties".
I personally think properties are one of the best things about c#. They allow you to expose state via whatever mechanism you like. You can lazily instantiate the first time something is used and you can do validation on setting a value etc. When using and writing them, I just think of properties as setters and getters which a much nicer syntax.
As for the caveats with properties, there are a couple. One is probably a misuse of properties, the other can be subtle.
Firstly, properties are types of methods. It can be surprising if you place complicated logic in a property because most users of a class will expect the property to be fairly lightweight.
E.g.
public class WorkerUsingMethod
{
// Explicitly obvious that calculation is being done here
public int CalculateResult()
{
return ExpensiveLongRunningCalculation();
}
}
public class WorkerUsingProperty
{
// Not at all obvious. Looks like it may just be returning a cached result.
public int Result
{
get { return ExpensiveLongRunningCalculation(); }
}
}
I find that using methods for these cases helps to make a distinction.
Secondly, and more importantly, properties can have side-effects if you evaluate them while debugging.
Say you have some property like this:
public int Result
{
get
{
m_numberQueries++;
return m_result;
}
}
Now suppose you have an exception that occurs when too many queries are made. Guess what happens when you start debugging and rollover the property in the debugger? Bad things. Avoid doing this! Looking at the property changes the state of the program.
These are the only caveats I have. I think the benefits of properties far outweigh the problems.
That reason must have been given within a very specific context. It's usually the other way round - it is recomended to use properties as they give you a level of abstraction enabling you to change behaviour of a class without affecting its clients...
I can't help picking on the details of Jeffrey Richter's opinions:
A property may be read-only or write-only; field access is always readable and writable.
Wrong: Fields can marked read-only so only the object's constructor can write to them.
A property method may throw an exception; field access never throws an exception.
Wrong: The implementation of a class can change the access modifier of a field from public to private. Attempts to read private fields at runtime will always result in an exception.
I don't agree with Jeffrey Richter, but I can guess why he doesn't like properties (I haven't read his book).
Even though, properties are just like methods (implementation-wise), as a user of a class, I expect that its properties behave "more or less" like a public field, e.g:
there's no time-consuming operation going on inside the property getter/setter
the property getter has no side effects (calling it multiple times, does not change the result)
Unfortunately, I have seen properties which did not behave that way. But the problem are not the properties themselves, but the people who implemented them. So it just requires some education.
The argument assumes that properties are bad because they look like fields, but can do surprising actions. This assumption is invalidated by .NET programmers' expectactions:
Properties don't look like fields. Fields look like properties.
• A property method may throw an exception; field access never throws an exception.
So, a field is like a property that is guaranteed to never throw an exception.
• A property cannot be passed as an out or ref parameter to a method; a field can.
So, a field is like a property, but it has additional capabilities: passing to a ref/out accepting methods.
• A property method can take a long time to execute; field access always completes immediately. [...]
So, a field is like a fast property.
• If called multiple times in a row, a property method may return a different value each time; a field returns the same value each time. The System.DateTime class has a readonly Now property that returns the current date and time.
So, a field is like a property that is guaranteed to return the same value unless the field was set to a different value.
• A property method may cause observable side effects; field access never does.
Again, a field is a property that is guaranteed to not do that.
• A property method may require additional memory or return a reference to something that is not actually part of the object's state, so modifying the returned object has no effect on the original object; querying a field always returns a reference to an object that is guaranteed to be part of the original object's state. Working with a property that returns a copy can be very confusing to developers, and this characteristic is frequently not documented.
This one may be surprising, but not because it's a property that does this. Rather, it's that barely anyone returns mutable copies in properties, so that 0.1% case is surprising.
There is a time when I consider not using properties, and that is in writing .Net Compact Framework code. The CF JIT compiler does not perform the same optimisation as the desktop JIT compiler and does not optimise simple property accessors, so in this case adding a simple property causes a small amount of code bloat over using a public field. Usually this wouldn't be an issue, but almost always in the Compact Framework world you are up against tight memory constraints, so even tiny savings like this do count.
You shouldn't avoid using them but you should use them with qualification and care, for the reasons given by other contributors.
I once saw a property called something like Customers that internally opened an out-of-process call to a database and read the customer list. The client code had a 'for (int i to Customers.Count)' which was causing a separate call to the database on each iteration and for the access of the selected Customer. That's an egregious example that demonstrates the principle of keeping the property very light - rarely more than a internal field access.
One argument FOR using properties is that they allow you to validate the value being set. Another is that the value of of the property may be a derived value, not a single field, like TotalValue = amount * quantity.
Personally I only use properties when creating simple get / set methods. I stray away from it when coming to complicated data structures.
I haven't read his book but I agree with him.
It's quite a contradiction on the usual paradigm.
They look like a field but had many sideeffects. Exceptions, performance, sideeffects, read/write access difference, etc.
If you care about readability and reliability i think you'd care about this as well.
In my experience i never gained anything from properties and only felt like using them when i was lazy.
But many times i encountered people using properties as if they were fields inadvertently creating all sort of issues.
Ive worked professionally on many languages for quite a time and never needed them, not even once.
Of course other people might find them useful and that's fine too.
Depends on your user case and what you value.
I personally believe they are just 'features ' that you can choose to use or not.
Invoking methods instead of properties greatly reduces the readability of the invoking code. In J#, for example, using ADO.NET was a nightmare because Java doesn't support properties and indexers (which are essentially properties with arguments). The resulting code was extremely ugly, with empty parentheses method calls all over the place.
The support for properties and indexers is one of the basic advantages of C# over Java.
One thing I am concerned with is that I discovered two ways of registering delegates to events.
OnStuff += this.Handle;
OnStuff += new StuffEventHandler(this.Handle);
The first one is clean, and it makes sense doing "OnStuff -= this.Handle;" to unregister from the event... But with the latter case, should I do "OnStuff -= new StuffEventHandler(this.Handle);"? It feels like I am not removing anything at all, since I'm throwing in another StuffEventHandler reference. Does the event compare the delegate by reference? I am concerned I could start a nasty memory pool here. Get me? I don't have the reference to the "new StuffEventHandler" I previously registered.
What is the downside of doing #1?
What is benefit of doing #2?
Number one is just shorthand that will generate the same MSIL as the 2nd one, at compile type it'll look at this.Handle and infer the delegate to instantiate. But you should never unsubscribe by using new.
So there is no difference between the 2, just some syntactic sugar to make our code cleaner.
You don't need to worry about keeping a reference to the originally registered delegate, and you will not start a "nasty memory pool".
When you call "OnStuff -= new StuffEventHandler(this.Handle);" the removal code does not compare the delegate you are removing by reference: it checks for equality by comparing references to the target method(s) that the delegate will call, and removes the matching delegates from "OnStuff".
By the way, "OnStuff" itself is a delegate object (the event keyword that I assume you have in your declaration simply restricts the accessibility of the delegate).
I was under the impression that 2 is just syntax sugar. They should be exactly the same thing.
If I remember correctly, the first alternative is merely syntactic sugar for the second.