Is there a way to constrain anonymous function parameter's scope? - c#

I have a method that takes an anonymous function parameter. This function's parameter is provided by a local variable.
public void DoSomething<T>(Action<T> method) where T : new()
{
T instance = new T();
method.Invoke(instance);
}
I want to prevent creating a closure. Local variable should go out of scope when DoSomething<T> is finished. Is there a way to constrain it at compile time?
Here's the situation I want to avoid:
Foo capturedInstance = null;
DoSomething<Foo>(item => capturedInstance = item);
capturedInstance.Call();

Unfortunately*, that's not possible. You have little to no control over what a method does with its arguments. It would be possible to work around if you weren't using generic types, but you are, so just don't worry about that kind of situation. (I hope you don't have to.)
* Actually, I consider it "fortunately". This isn't C++ we're talking about here.

If T were a struct, it would be possible for code which held a field of type T or an array of type T[] to pass the field or an array element as a ref parameter to an outside method; that method would be able to operate directly and efficiently on that field or array slot without having to make temporary copies of the struct, but once the method returned the type holding the field or array could be confident that outside code could no longer access that slot. Outside could have made a copy of the field or element's contents, of course, but it won't be able to change the original unless or until the type holding the struct again exposes it to outside code.
Unfortunately, if the T is a mutable class, exposing a reference will allow outside code to promiscuously copy and pass around that reference forevermore. For that reason, mutable classes are much worse data holders than mutable structs. If one wants to allow outside code to make use of a class without exposing a direct reference, it's necessary to create a wrapper class and interface. Unfortunately, there's no way of doing this without either using a truly horrible amount of duplicated code, or else using Reflection to generate wrappers at run-time.

I think your code already does what you want.
The delegate representing the lambda item => capturedInstance = item is passed only to DoSomething<Foo>, and that reference is not handed out to anybody else. So, the delegate would go out of scope when the containing method finishes. And so will the local variable that is captured by the lambda expression.
Only if you are passing around a reference to the delegate, you will not get the behavior you want.

Related

Force function input parameters to be immutable?

I've just spent the best part of 2 days trying to track down a bug, it turns out I was accidentally mutating the values that were provided as input to a function.
IEnumerable<DataLog>
FilterIIR(
IEnumerable<DataLog> buffer
) {
double notFilter = 1.0 - FilterStrength;
var filteredVal = buffer.FirstOrDefault()?.oilTemp ?? 0.0;
foreach (var item in buffer)
{
filteredVal = (item.oilTemp * notFilter) + (filteredVal * FilterStrength);
/* Mistake here!
item.oilTemp = filteredValue;
yield return item;
*/
// Correct version!
yield return new DataLog()
{
oilTemp = (float)filteredVal,
ambTemp = item.ambTemp,
oilCond = item.oilCond,
logTime = item.logTime
};
}
}
My programming language of preference is usually C# or C++ depending on what I think suits the requirements better (this is part of a larger program that suits C# better)...
Now in C++ I would have been able to guard against such a mistake by accepting constant iterators which prevent you from being able to modify the values as you retrieve them (though I might need to build a new container for the return value). I've done a little searching and can't find any simple way to do this in C#, does anyone know different?
I was thinking I could make an IReadOnlyEnumerable<T> class which takes an IEnumerable as a constructor, but then I realized that unless it makes a copy of the values as you retrieve them it won't actually have any effect, because the underlying value can still be modified.
Is there any way I might be able to protect against such errors in future? Some wrapper class, or even if it's a small code snippet at the top of each function I want to protect, anything would be fine really.
The only sort of reasonable approach I can think of at the moment that'll work is to define a ReadOnly version of every class I need, then have a non-readonly version that inherits and overloads the properties and adds functions to provide a mutable version of the same class.
The problem is here isn't really about the IEnumerable. IEnumerables are actually immutable. You can't add or remove things from them. What's mutable is your DataLog class.
Because DataLog is a reference type, item holds a reference to the original object, instead of a copy of the object. This, plus the fact that DataLog is mutable, allows you to mutate the parameters passed in.
So on a high level, you can either:
make a copy of DataLog, or;
make DataLog immutable
or both...
What you are doing now is "making a copy of DataLog". Another way of doing this is changing DataLog from a class to a struct. This way, you'll always create a copy of it when passing it to methods (unless you mark the parameter with ref). So be careful when using this method because it might silently break existing methods that assume a pass-by-reference semantic.
You can also make DataLog immutable. This means removing all the setters. Optionally, you can add methods named WithXXX that returns a copy of the object with only one property different. If you chose to do this, your FilterIIR would look like:
yield return item.WithOilTemp(filteredVal);
The only sort of reasonable approach I can think of at the moment that'll work is to define a ReadOnly version of every class I need, then have a non-readonly version that inherits and overloads the properties and adds functions to provide a mutable version of the same class.
You don't actually need to do this. Notice how List<T> implements IReadOnlyList<T>, even though List<T> is clearly mutable. You could write an interface called IReadOnlyDataLog. This interface would only have the getters of DataLog. Then, have FilterIIR accept a IEnumerable<IReadOnlyDataLog> and DataLog implement IReadOnlyDataLog. This way, you will not accidentally mutate the DataLog objects in FilterIIR.

Why does a field in a struct lose its value when a field in an identical class does not?

I have a Struct with a field in it that loses its value. I can declare the field static and that solves the problem. I can also just change struct to class (changing nothing else) and that also solves the problem. I was just wondering why this is?
Structs are passed by value. In other words, when you pass a struct, you're passing a copy of its value. So if you take a copy of the value and change it, then the original will appear unchanged. You changed the copy, not the original.
Without seeing your code I cannot be sure, but I figure this is what's happening.
This doesn't happen for classes as they're passed by reference.
It's worth mentioning that this is why structs should be immutable -- that is, that once they're created, they do not change their value. Operations that provide modified versions return new structs.
EDIT: In the comments below, #supercat suggests that mutable properties can be more convenient. However property setters on structs can cause weird failures too. Here's an example that can catch you by surprise unless you deeply understand how structs work. For me, it's reason enough to avoid mutable structs altogether.
Consider the following types:
struct Rectangle {
public double Left { get; set; }
}
class Shape {
public Rectangle Bounds { get; private set; }
}
Ok, now imagine this code:
myShape.Bounds.Left = 100;
Perhaps surprisingly, This has no effect at all! Why? Let's re-write the code in longer yet equivalent form:
var bounds = myShape.Bounds;
bounds.Left = 100;
It's easier to see here how the value of Bounds is copied to a local variable, and then its value is changed. However at no point is the original value in Shape updated.
This is pretty compelling evidence to make all public structs immutable. If you know what you're doing, mutable structs can be handy, but personally I only really use them in that form as private nested classes.
As #supercat points out, the alternative is a little unsightly:
myShape.Bounds = new Rectangle(100, myShape.Bounds.Top,
myShape.Bounds.Width, myShape.Bounds.Height);
Sometimes it's more convenient to add helper methods:
myShape.Bounds = myShape.Bounds.WithLeft(100);
When a struct is passed by value, the system will make a copy of the struct for the callee, so it can see its contents, and perhaps modify its own copy, but but cannot affect the fields in the caller's copy. It's also possible to pass structs by ref, in which case the callee will be able to work with the caller's copy of the struct, modifying it if desired, and even pass it by ref to other functions which could do likewise. Note that the only way the called function can make the caller's copy of the struct available to other functions, though, is to pass it by ref, and the called function can't return until all functions to which it has passed the struct by ref have also returned. Thus, the caller can be assured that any changes which might occur to the structure as a consequence of the function call will have occurred by the time it returns.
This behavior is different from class objects; if a function passes a mutable class object to another function, it has no way of knowing if or when that other function will cause that object to be mutated immediately or at any future time, even after the function has finished running. The only way one can ever be sure that any mutable object won't be mutated by outside code is to be the sole holder of that object from the moment of its creation until its abandonment.
While one who is not used to value semantics may initially be "surprised" at the fact passing a struct by value simply gives the called function a copy of it, and assigning one struct storage location to another simply copies the contents of the struct, the guarantees that value types offer can be very useful. Since Point is a structure, one can know that a statement like MyPoints[5].X += 1; (assuming MyPoints is an array) will affect MyPoints[5].X but will not affect any other Point. One can further be assured that the only way MyPoints[5].X will change is if either MyPoints gets replaced with another array, or something writes to MyPoints[5]. By contrast, Point were a class and MyPoint[5] had ever been exposed to the outside world, the only way of knowing whether the aforementioned statement would affect field/property X of any other storage locations of type Point would be to examine every single storage location of type Point or Object that existed anywhere within the code to see if it pointed to the same instance as MyPoints[5]. Since there's no way for code to examine all of the storage locations of a particular type, such assurance would be impossible if Point[5] had ever been exposed to the outside world.
There is one annoying wrinkle with structs, though: generally, the system will only allow structures to be passed by ref if the called code is allowed to write to the structure in question. Struct method calls and property getters, however, receive this as a ref parameter but do not have the above restriction. Instead, when invoking a struct method or property getter on a read-only structure, the system will make a copy of the structure, pass that copy by ref to the method or property getter, and then discard it. Since the system has no way of knowing whether a method or property getter will try to mutate this, it won't complain in such cases--it will just generate silly code. If one avoids mutating this in anything other than property setters (the system won't allow the use of property setters on read-only structures), however, one can avoid problems.

c# IEnumerable property. Does adding an item to the collection call the mutator or just the accessor?

Suppose I have
class A
{
public List<B> LiProperty
{
get;
set { //will I get called when someone calls A::LiProperty.Add()? }
}
}
Then
A a = new A();
a.LiProperty.Add(new B());
Will the mutator ever be called?
My instincts say that get is returning a pointer to the list so the add method is being called directly on the object, but then again C# does some funky stuff sometimes with immutable types. Anybody know the answer for certain?
Your instinct is correct. In your code snippet above, the get accessor is called, and the Add method is called on the List returned.
Changing the property is to change it to point to another different List<T>. Adding an object to the list is not changing which List<T> the property points to. The code will not call the setter unless you wrote a.LiProperty = new List<T>();
No.
Like any other property, the setter will only run if you write LiProperty = something.
That will replace replace the list with a new list instance (unless you do something funky in your setter)
In general, collection properties should be readonly.
In .net, a property of type T is nothing more than a link to a pair of methods, one of which has the signature: T get_method( [optional arguments, in case of indexed property] ); and one of which has the signature: void set_method(T value, [optional arguments, in case of indexed property] );. In C#, the set_method is only called when the property is on the left half of an assignment operator; otherwise, the get_method is called(*). Note that in .net, a property generally has no way of knowing what is done with the result it returns, nor any way of getting notified when the caller is done with the returned object.
(*) In vb.net, it is permissible in some cases to write code which would appear to pass a property by reference. The actual compiler behavior in such cases is to 'get' the property to a temp, pass that temp by reference, call the function, and then 'set' the property to the value the function left in temp.
A more useful pattern from a code perspective would be to have a method which calls a specified delegate with the field backing the property as a 'ref' parameter. This would make it possible to do something like:
MyShape.ActOnBounds((ref Rectangle bounds) => {bounds.x -= bounds.width/2; bounds.width *= 2;});
and have MyShape act upon the changed value of "bounds" without having to copy or create any superfluous instances of Rectangle. Unfortunately, while such a transformation could be reasonably performant (especially if ActOnBounds could be a static method accepting one or more a generic type parameters, and call the supplied routine with ref parameters of those types), the client-side code is a bit ugly.
Incidentally, even if .net didn't have struct types, the above pattern could still be useful (better than get-manipulate-set) with primitive types or reference properties where the reference itself may have to be changed, since it allows the use of things like Interlocked.Increment and Interlocked.CompareExchange.
If you want to expose a list in such a way that you can find out when it is updated, one way to do that is to define an immutable struct type which holds a reference to your object and implements IList by redirecting all IList methods to private methods of your object (which will in turn perform then on the list and act upon them as appropriate). Note that despite being an immutable struct, the new type would act with reference semantics (meaning that making a copy of the struct and calling the "Add" method on it would cause the new item to be visible in both the original and copied structs).
Doing things in this way would avoid boxing in the scenario where methods are performed directly on the returned list property (e.g. myThing.LiProperty.Add("George"); and in the scenario where the returned type is used to create a variable (e.g. var myList = myThing.LiProperty;. Boxing would occur if and when the returned object was assigned to a variable, or passed to a parameter, of an interface type (e.g. IList).

What's the method representation in memory?

While thinking a little bit about programming in Java/C# I wondered about how methods which belong to objects are represented in memory and how this fact does concern multi threading.
Is a method instantiated for each object in memory seperately or do
all objects of the same type share one instance of the method?
If the latter, how does the executing thread know which object's
attributes to use?
Is it possible to modify the code of a method in
C# with reflection for one, and only one object of many objects of
the same type?
Is a static method which does not use class attributes always thread safe?
I tried to make up my mind about these questions, but I'm very unsure about their answers.
Each method in your source code (in Java, C#, C++, Pascal, I think every OO and procedural language...) has only one copy in binaries and in memory.
Multiple instances of one object have separate fields but all share the same method code. Technically there is a procedure that takes a hidden this parameter to provide an illusion of executing a method on an object. In reality you are calling a procedure and passing structure (a bag of fields) to it along with other parameters. Here is a simple Java object and more-or-less equivalent pseudo-C code:
class Foo {
private int x;
int mulBy(int y) {
return x * y
}
}
Foo foo = new Foo()
foo.mulBy(3)
is translated to this pseude-C code (the encapsulation is forced by the compiler and runtime/VM):
struct Foo {
int x = 0;
}
int Foo_mulBy(Foo *this, int y) {
return this->x * y;
}
Foo* foo = new Foo();
Foo_mulBy(foo, 3)
You have to draw a difference between code and local variables and parameters it operates on (the data). Data is stored on call stack, local to each thread. Code can be executed by multiple threads, each thread has its own copy of instruction pointer (place in the method it currently executes). Also because this is a parameter, it is thread-local, so each thread can operate on a different object concurrently, even though it runs the same code.
That being said you cannot modify a method of only one instance because the method code is shared among all instances.
The Java specifications don't dictate how to do memory layout, and different implementations can do whatever they like, providing it meets the spec where it matters.
Having said that, the mainstream Oracle JVM (HotSpot) works off of things called oops - Ordinary Object Pointers. These consist of two words of header followed by the data which comprises the instance member fields (stored inline for primitive types, and as pointers for reference member fields).
One of the two header words - the class word - is a pointer to a klassOop. This is a special type of oop which holds pointers to the instance methods of the class (basically, the Java equivalent of a C++ vtable). The klassOop is kind-of a VM-level representation of the Class object corresponding to the Java type.
If you're curious about the low-level detail, you can find out a lot more by looking in the OpenJDK source for the definition of some of the oop types (klassOop is a good place to start).
tl;dr Java holds one blob of code for each method of each type. The blobs of code are shared among each instance of the type, and hidden this pointers are used to know which instance's members to use.
I am going to try to answer this in the context of C#.There are basically 3 different types of Methods
virtual
non-virtual
static
When your code is executed, you basically have two kinds of objects that are formed on the heap.
The object corresponding to the type of the object. This is called Type Object. This holds the type object pointer, the sync block index, the static fields and the method table.
The object corresponding to the object itself, which contains all the non static fields.
In response to your questions,
Is a method instantiated for each object in memory seperately or do all objects of the same type share one instance of the method?
This is a wrong way of understanding objects. All methods are per type only. Look at it this way. A method is just a set of instructions. The first time you call a particular method, the IL code is JITed into native instructions and saved in memory. The next time this is called, the address is picked up from the method table and the same instructions are executed again.
2.If the latter, how does the executing thread know which object's attributes to use?
Each static method call on a Type results in looking up the method table from the corresponding Type Object and finding the address of the JITed instruction. In case of methods that are not static, the the relevant object on which the method is called is maintained on the thread's local stack. Basically, you get the nearest object on the stack. That is always the object on which we want the method to be called.
3.Is it possible to modify the code of a method in C# with reflection for one, and only one object of many objects of the same type?
No, It is not possible now. (And I am thankful for that). The reason is that reflection only allows code inspection. If you figure out what some method actually means, there is no way you are going to be able to change the code in the same assembly.

Immutable objects that reference each other?

Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.
public class A
{
public string Name { get; private set; }
public B B { get; private set; }
public A()
{
B = new B(this);
Name = "test";
}
}
public class B
{
public A A { get; private set; }
public B(A a)
{
//a.Name is null
A = a;
}
}
What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?
Why is this even valid?
Why do you expect it to be invalid?
Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.
Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.
Are there any other ways to observe the state of an object that is not fully constructed?
Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:
Put "this" in a static field and reference it from another thread
make a method call or constructor call and pass "this" as an argument
make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.
I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!
Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.
Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.
If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.
Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.
In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.
For more information on how value type constructors work, see:
Debunking another myth about value types
And for more information on how C# language semantics try to save you from yourself, see:
Why Do Initializers Run In The Opposite Order As Constructors? Part One
Why Do Initializers Run In The Opposite Order As Constructors? Part Two
I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.
Now let's consider the root cause of your question: how do you make immutable objects that reference each other?
Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:
An immutable list of immutable nodes, each of which contains a value.
An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.
Now the way you make nodes A and B "reference" each other is:
A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);
And you're done, you've got a graph where A and B "reference" each other.
The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.
Yes, this is the only way for two immutable objects to refer to each other - at least one of them must see the other in a not-fully-constructed way.
It's generally a bad idea to let this escape from your constructor but in cases where you're confident of what both constructors do, and it's the only alternative to mutability, I don't think it's too bad.
"Fully constructed" is defined by your code, not by the language.
This is a variation on calling a virtual method from the constructor,
the general guideline is: don't do that.
To correctly implement the notion of "fully constructed", don't pass this out of your constructor.
Indeed, leaking the this reference out during the constructor will allow you to do this; it may cause problems if methods get invoked on the incomplete object, obviously. As for "other ways to observe the state of an object that is not fully constructed":
invoke a virtual method in a constructor; the subclass constructor will not have been called yet, so an override may try to access incomplete state (fields declared or initialized in the subclass, etc)
reflection, perhaps using FormatterServices.GetUninitializedObject (which creates an object without calling the constructor at all)
If you consider the initialization order
Derived static fields
Derived static constructor
Derived instance fields
Base static fields
Base static constructor
Base instance fields
Base instance constructor
Derived instance constructor
clearly through up-casting you can access the class BEFORE the derived instance constructor is called (this is the reason you shouldn't use virtual methods from constructors. They could easily access derived fields not initialized by the constructor/the constructor in the derived class could not have brought the derived class in a "consistent" state)
You can avoid the problem by instancing B last in your constuctor:
public A()
{
Name = "test";
B = new B(this);
}
If what you suggest was not possible, then A would not be immutable.
Edit: fixed, thanks to leppie.
The principle is that don't let your this object escape from the constructor body.
Another way to observe such problem is by calling virtual methods inside the constructor.
As noted, the compiler has no means of knowing at what point an object has been constructed well enough to be useful; it therefore assumes that a programmer who passes this from a constructor will know whether an object has been constructed well enough to satisfy his needs.
I would add, however, that for objects which are intended to be truly immutable, one must avoid passing this to any code which will examine the state of a field before it has been assigned its final value. This implies that this not be passed to arbitrary outside code, but does not imply that there is anything wrong with having an object under construction pass itself to another object for the purpose of storing a back-reference which will not actually be used until after the first constructor has completed.
If one were designing a language to facilitate the construction and use of immutable objects, it may be helpful for it to declare methods as being usable only during construction, only after construction, or either; fields could be declared as being non-dereferenceable during construction and read-only afterward; parameters could likewise be tagged to indicate that should be non-dereferenceable. Under such a system, it would be possible for a compiler to allow the construction of data structures which referred to each other, but where no property could ever change after it was observed. As to whether the benefits of such static checking would outweigh the cost, I'm not sure, but it might be interesting.
Incidentally, a related feature which would be helpful would be the ability to declare parameters and function returns as ephemeral, returnable, or (the default) persistable. If a parameter or function return were declared ephemeral, it could not be copied to any field nor passed as a persistable parameter to any method. Additionally, passing an ephemeral or returnable value as a returnable parameter to a method would cause the return value of the function to inherit the restrictions of that value (if a function has two returnable parameters, its return value would inherit the more restrictive constraint from its parameters). A major weakness with Java and .net is that all object references are promiscuous; once outside code gets its hands on one, there's no telling who may end up with it. If parameters could be declared ephemeral, it would more often be possible for code which held the only reference to something to know it held the only reference, and thus avoid needless defensive copy operations. Additionally, things like closures could be recycled if the compiler could know that no references to them existed after they returned.

Categories