Reflection.Emit.ILGenerator Exception Handling "Leave" instruction - c#

First, some background info:
I am making a compiler for a school project. It is already working, and I'm expending a lot of effort to bug fix and/or optimize it. I've recently run into a problem with is that I discovered that the ILGenerator object generates an extra leave instruction when you call any of the following member methods:
BeginCatchBlock()
BeginExceptFilterBlock()
BeginFaultBlock()
BeginFinallyBlock()
EndExceptionBlock()
So, you start a try statement with a call to BeginExceptionBlock(), add a couple of catch clauses with BeginCatchBlock(), possibly add a finally clause with BeginFinallyBlock(), and then end the protected code region with EndExceptionBlock().
The methods I listed automatically generate a leave instruction branching to the first instruction after the try statement. I don't want these, for two reasons. One, because it always generates an unoptimized leave instruction, rather than a leave.s instruction, even when it's branching just two bytes away. And two, because you can't control where the leave instruction goes.
So, if you wanted to branch to some other location in your code, you have to add a compiler-generated local variable, set it depending on where you want to go inside of the try statement, let EndExceptionBlock() auto-generate the leave instruction, and then generate a switch statement below the try block. OR, you could just emit a leave or leave.s instruction yourself, before calling one of the previous methods, resulting in an ugly and unreachable extra 5 bytes, like so:
L_00ca: leave.s L_00e5
L_00cc: leave L_00d1
Both of these options are unacceptable to me. Is there any way to either prevent the automatic generation of leave instructions, or else any other way to specify protected regions rather than using these methods (which are extremely annoying and practically undocumented)?
EDIT
Note: the C# compiler itself does this, so it's not as if there is a good reason to force it on us. For example, if you have .NET 4.5 beta, disassemble the following code and check their implementation: (exception block added internally)
public static async Task<bool> TestAsync(int ms)
{
var local = ms / 1000;
Console.WriteLine("In async call, before await " + local.ToString() + "-second delay.");
await System.Threading.Tasks.Task.Delay(ms);
Console.WriteLine("In async call, after await " + local.ToString() + "-second delay.");
Console.WriteLine();
Console.WriteLine("Press any key to continue.");
Console.ReadKey(false);
return true;
}

As far as I can tell, you cannot do this in .NET 4.0. The only way to create a method body without using ILGenerator is by using MethodBuilder.CreateMethodBody, but that does not allow you to set exception handling info. And ILGenerator forces the leave instruction you're asking about.
However, if .NET 4.5 is an option for you (it seems to be), take a look at MethodBuilder.SetMethodBody. This allows you to create the IL yourself, but still pass through exception handling information. You can wrap this in a custom ILGenerator-like class of your own, with Emit methods taking an OpCode argument, and reading OpCode.Size and OpCode.Value to get the corresponding bytes.
And of course there's always Mono.Cecil, but that probably requires more extensive changes to code you've already written.
Edit: you appear to have already figured this out yourself, but you left this question open. You can post answers to your own questions and accept them, if you've figured it out on your own. This would have let me know I shouldn't have wasted time searching, and which would have let other people with the same question know what to do.

Related

Object instance valid only for the current method

Is it possible to create an object that can register whether the current thread leaves the method where it was created, or to check whether this has happened when a method on the instance gets called?
ScopeValid obj;
void Method1()
{
obj = new ScopeValid();
obj.Something();
}
void Method2()
{
Method1();
obj.Something(); //Exception
}
Can this technique be accomplished? I would like to develop a mechanism similar to TypedReference and ArgIterator, which can't "escape" the current method. These types are handled specially by the compiler, so I can't mimic this behavior exactly, but I hope it is possible to create at least a similar rule with the same results - disallow accessing the object if it has escaped the method where it was created.
Note that I can't use StackFrame and compare methods, because the object might escape and return to the same method.
Changing method behavior based upon the source of the call is a bad design choice.
Some example problems to consider with such a method include:
Testability - how would you test such a method?
Refactoring the calling code - What if the user of your code just does an end run around your error message that says you can't do that in a different method than it was created? "Okay, fine! I'll just do my bad thing in the same method, says the programmer."
If the user of your code breaks it, and it's their fault, let it break. Better to just document your code with something like:
IInvalidatable - Types which implement this member should be invalidated with Invalidate() when you are done working with this.
Ignoring the obvious point that this almost seems like is re-inventing IDisposible and using { } blocks (which have language support), if the user of your code doesn't use it right, it's not really your concern.
This is likely technically possible with AOP (I'm thinking PostSharp here), but it still depends on the user using your code correctly - they would have to have it in the build process, and failing to function if they aren't using a tool just because you're trying to make it easy on them is evil.
Another point - If you are just attempting to create an object which cannot be used outside of one method, and any attempted operation outside of the method would fail, just declare it a local inside the method.
Related: How to find out which assembly handled the request
Years laters, it seems this feature was finally added to C# 7.2: ref struct.
Another related language feature is the ability to declare a value type that must be stack allocated. In other words, these types can never be created on the heap as a member of another class. The primary motivation for this feature was Span and related structures. Span may contain a managed pointer as one of its members, the other being the length of the span. It's actually implemented a bit differently because C# doesn't support pointers to managed memory outside of an unsafe context. Any write that changes the pointer and the length is not atomic. That means a Span would be subject to out of range errors or other type safety violations were it not constrained to a single stack frame. In addition, putting a managed pointer on the GC heap typically crashes at JIT time.
This prevents the code from moving the value to the heap, which partly solves my original problem. I am not sure how returning a ref struct is constrained, though.

Boolean vs memory

We had a discussion at work about code design and one of the issues was when handling responses from a call to a boolean method like this:
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
}
One of my colleagues insists that we skip the extra variable ok and write
if(IsEverythingOK())
{
//do somehthing
}
Since he says that using the "bool ok"-statement is memorywise bad.
Which one is the one we should use?
Paraphrasing your question:
Is there a cost to using a local variable?
Because C# and .NET is well-engineered my expectation is that using a local variable as you describe has no or a negligible cost but let my try to back this expectation by some facts.
The following C# code
if (IsEverythingOk()) {
...
}
will compile into this (simplified) IL (with optimizations turned on)
call IsEverythingOk
brfalse.s AfterIfBody
... if body
Using a local variable
var ok = IsEverythingOk();
if (ok) {
...
}
you get this optimized (and simplified) IL:
call IsEverythingOk
stloc.0
ldloc.0
brfalse.s AfterIfBody
... if body
On the surface this seems slightly less efficient because the return value is stored on the stack and then retrived but the JIT compiler will also perform some optimizations.
You can see the actual machine code generated by debugging your application with native code debugging enabled. You have to do this using the release build and you also have to turn off the debugger option that supresses JIT optimizations on module load. Now you can put breakpoints in the code you want to inspect and then view the disassembly. Note, that the JIT is like a black box and the behavior I see on my computer may differ from what other people see on their computers. With that disclaimer in mind the assembly code I get for both versions of the code is (with a slight difference in how the call is performed):
call IsEverythingOk
test eax,eax
je AfterIfBody
So the JIT will optimize the extra unnecessary IL away. Actually, in my initial experiments the method IsEverythingOk returned true and the JIT was able to completely optimize the branch away. When I then switched to returning a field in the method the JIT would inline the call and access the field directly.
The bottom line: You should expect the JIT to optimize at least simple things like transient local variables even if the code generates some extra IL that seems unnecessary.
It all depends on whether you are doing anything with ok inside the loop.
i.e.
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
ok = IsEverythingOK();
}
Assuming you don't do anything with ok in the loop however, you will probably find that the JIT compiler will turn:
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
}
...into what is essentially:
if(IsEverythingOK())
{
//do somehthing
}
...anyway.
The compiler of course generates some additional IL code steps if you use the first solution as it at least needs an additional stloc and a ldloc command but if it's just for performance reasons, forget these microsecondes (or nanoseconds).
If there is no other reason for the ok variable I would prefer the second solution nevertheless as it is easier readable.
I'd say it's preference, unless you'd have a unified coding standard. One of the other provides a benefit.
This is great if you're expecting or assuming modification other than the if clause. Though it does create a stack entry upon creation of the variable, it'll prolly get disposed after the method scope.
bool ok = IsEverythingOK();
if(ok)
{
//do somehthing
}
This is great if you only want to use it as a validation one. Though it's only good if your method name is short. But lets say you access a class before you use it like _myLongNameInstance.IsEverythingOK() this reduces readability and I'll go with the first one, but with other conditions, i'd choose the direct if.
if(IsEverythingOK())
{
//do somehthing
}

How to enforce the use of a method's return value in C#?

I have a piece of software written with fluent syntax. The method chain has a definitive "ending", before which nothing useful is actually done in the code (think NBuilder, or Linq-to-SQL's query generation not actually hitting the database until we iterate over our objects with, say, ToList()).
The problem I am having is there is confusion among other developers about proper usage of the code. They are neglecting to call the "ending" method (thus never actually "doing anything")!
I am interested in enforcing the usage of the return value of some of my methods so that we can never "end the chain" without calling that "Finalize()" or "Save()" method that actually does the work.
Consider the following code:
//The "factory" class the user will be dealing with
public class FluentClass
{
//The entry point for this software
public IntermediateClass<T> Init<T>()
{
return new IntermediateClass<T>();
}
}
//The class that actually does the work
public class IntermediateClass<T>
{
private List<T> _values;
//The user cannot call this constructor
internal IntermediateClass<T>()
{
_values = new List<T>();
}
//Once generated, they can call "setup" methods such as this
public IntermediateClass<T> With(T value)
{
var instance = new IntermediateClass<T>() { _values = _values };
instance._values.Add(value);
return instance;
}
//Picture "lazy loading" - you have to call this method to
//actually do anything worthwhile
public void Save()
{
var itemCount = _values.Count();
. . . //save to database, write a log, do some real work
}
}
As you can see, proper usage of this code would be something like:
new FluentClass().Init<int>().With(-1).With(300).With(42).Save();
The problem is that people are using it this way (thinking it achieves the same as the above):
new FluentClass().Init<int>().With(-1).With(300).With(42);
So pervasive is this problem that, with entirely good intentions, another developer once actually changed the name of the "Init" method to indicate that THAT method was doing the "real work" of the software.
Logic errors like these are very difficult to spot, and, of course, it compiles, because it is perfectly acceptable to call a method with a return value and just "pretend" it returns void. Visual Studio doesn't care if you do this; your software will still compile and run (although in some cases I believe it throws a warning). This is a great feature to have, of course. Imagine a simple "InsertToDatabase" method that returns the ID of the new row as an integer - it is easy to see that there are some cases where we need that ID, and some cases where we could do without it.
In the case of this piece of software, there is definitively never any reason to eschew that "Save" function at the end of the method chain. It is a very specialized utility, and the only gain comes from the final step.
I want somebody's software to fail at the compiler level if they call "With()" and not "Save()".
It seems like an impossible task by traditional means - but that's why I come to you guys. Is there an Attribute I can use to prevent a method from being "cast to void" or some such?
Note: The alternate way of achieving this goal that has already been suggested to me is writing a suite of unit tests to enforce this rule, and using something like http://www.testdriven.net to bind them to the compiler. This is an acceptable solution, but I am hoping for something more elegant.
I don't know of a way to enforce this at a compiler level. It's often requested for objects which implement IDisposable as well, but isn't really enforceable.
One potential option which can help, however, is to set up your class, in DEBUG only, to have a finalizer that logs/throws/etc. if Save() was never called. This can help you discover these runtime problems while debugging instead of relying on searching the code, etc.
However, make sure that, in release mode, this is not used, as it will incur a performance overhead since the addition of an unnecessary finalizer is very bad on GC performance.
You could require specific methods to use a callback like so:
new FluentClass().Init<int>(x =>
{
x.Save(y =>
{
y.With(-1),
y.With(300)
});
});
The with method returns some specific object, and the only way to get that object is by calling x.Save(), which itself has a callback that lets you set up your indeterminate number of with statements. So the init takes something like this:
public T Init<T>(Func<MyInitInputType, MySaveResultType> initSetup)
I can think of three a few solutions, not ideal.
AIUI what you want is a function which is called when the temporary variable goes out of scope (as in, when it becomes available for garbage collection, but will probably not be garbage collected for some time yet). (See: The difference between a destructor and a finalizer?) This hypothetical function would say "if you've constructed a query in this object but not called save, produce an error". C++/CLI calls this RAII, and in C++/CLI there is a concept of a "destructor" when the object isn't used any more, and a "finaliser" which is called when it's finally garbage collected. Very confusingly, C# has only a so-called destructor, but this is only called by the garbage collector (it would be valid for the framework to call it earlier, as if it were partially cleaning the object immediately, but AFAIK it doesn't do anything like that). So what you would like is a C++/CLI destructor. Unfortunately, AIUI this maps onto the concept of IDisposable, which exposes a dispose() method which can be called when a C++/CLI destructor would be called, or when the C# destructor is called -- but AIUI you still have to call "dispose" manually, which defeats the point?
Refactor the interface slightly to convey the concept more accurately. Call the init function something like "prepareQuery" or "AAA" or "initRememberToCallSaveOrThisWontDoAnything". (The last is an exaggeration, but it might be necessary to make the point).
This is more of a social problem than a technical problem. The interface should make it easy to do the right thing, but programmers do have to know how to use code! Get all the programmers together. Explain simply once-and-for-all this simple fact. If necessary have them all sign a piece of paper saying they understand, and if they wilfully continue to write code which doesn't do anythign they're worse than useless to the company and will be fired.
Fiddle with the way the operators are chained, eg. have each of the intermediateClass functions assemble an aggregate intermediateclass object containing all of the parameters (you mostly do it this was already (?)) but require an init-like function of the original class to take that as an argument, rather than have them chained after it, and then you can have save and the other functions return two different class types (with essentially the same contents), and have init only accept a class of the correct type.
The fact that it's still a problem suggests that either your coworkers need a helpful reminder, or they're rather sub-par, or the interface wasn't very clear (perhaps its perfectly good, but the author didn't realise it wouldn't be clear if you only used it in passing rather than getting to know it), or you yourself have misunderstood the situation. A technical solution would be good, but you should probably think about why the problem occurred and how to communicate more clearly, probably asking someone senior's input.
After great deliberation and trial and error, it turns out that throwing an exception from the Finalize() method was not going to work for me. Apparently, you simply can't do that; the exception gets eaten up, because garbage collection operates non-deterministically. I was unable to get the software to call Dispose() automatically from the destructor either. Jack V.'s comment explains this well; here was the link he posted, for redundancy/emphasis:
The difference between a destructor and a finalizer?
Changing the syntax to use a callback was a clever way to make the behavior foolproof, but the agreed-upon syntax was fixed, and I had to work with it. Our company is all about fluent method chains. I was also a fan of the "out parameter" solution to be honest, but again, the bottom line is the method signatures simply could not change.
Helpful information about my particular problem includes the fact that my software is only ever to be run as part of a suite of unit tests - so efficiency is not a problem.
What I ended up doing was use Mono.Cecil to Reflect upon the Calling Assembly (the code calling into my software). Note that System.Reflection was insufficient for my purposes, because it cannot pinpoint method references, but I still needed(?) to use it to get the "calling assembly" itself (Mono.Cecil remains underdocumented, so it's possible I just need to get more familiar with it in order to do away with System.Reflection altogether; that remains to be seen....)
I placed the Mono.Cecil code in the Init() method, and the structure now looks something like:
public IntermediateClass<T> Init<T>()
{
ValidateUsage(Assembly.GetCallingAssembly());
return new IntermediateClass<T>();
}
void ValidateUsage(Assembly assembly)
{
// 1) Use Mono.Cecil to inspect the codebase inside the assembly
var assemblyLocation = assembly.CodeBase.Replace("file:///", "");
var monoCecilAssembly = AssemblyFactory.GetAssembly(assemblyLocation);
// 2) Retrieve the list of Instructions in the calling method
var methods = monoCecilAssembly.Modules...Types...Methods...Instructions
// (It's a little more complicated than that...
// if anybody would like more specific information on how I got this,
// let me know... I just didn't want to clutter up this post)
// 3) Those instructions refer to OpCodes and Operands....
// Defining "invalid method" as a method that calls "Init" but not "Save"
var methodCallingInit = method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(INITMETHODSIGNATURE);
var methodNotCallingSave = !method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(SAVEMETHODSIGNATURE);
var methodInvalid = methodCallingInit && methodNotCallingSave;
// Note: this is partially pseudocode;
// It doesn't 100% faithfully represent either Mono.Cecil's syntax or my own
// There are actually a lot of annoying casts involved, omitted for sanity
// 4) Obviously, if the method is invalid, throw
if (methodInvalid)
{
throw new Exception(String.Format("Bad developer! BAD! {0}", method.Name));
}
}
Trust me, the actual code is even uglier looking than my pseudocode.... :-)
But Mono.Cecil just might be my new favorite toy.
I now have a method that refuses to be run its main body unless the calling code "promises" to also call a second method afterwards. It's like a strange kind of code contract. I'm actually thinking about making this generic and reusable. Would any of you have a use for such a thing? Say, if it were an attribute?
What if you made it so Init and With don't return objects of type FluentClass? Have them return, e.g., UninitializedFluentClass which wraps a FluentClass object. Then calling .Save(0 on the UnitializedFluentClass object calls it on the wrapped FluentClass object and returns it. If they don't call Save they don't get a FluentClass object.
In Debug mode beside implementing IDisposable you can setup a timer that will throw a exception after 1 second if the resultmethod has not been called.
Use an out parameter! All the outs must be used.
Edit: I am not sure of it will help, tho...
It would break the fluent syntax.

When does a param that is passed by reference get updated?

Suppose I have a method like this:
public void MyCoolMethod(ref bool scannerEnabled)
{
try
{
CallDangerousMethod();
}
catch (FormatException exp)
{
try
{
//Disable scanner before validation.
scannerEnabled = false;
if (exp.Message == "FormatException")
{
MessageBox.Show(exp.Message);
}
}
finally
{
//Enable scanner after validation.
scannerEnabled = true;
}
}
And it is used like this:
MyCoolMethod(ref MyScannerEnabledVar);
The scanner can fire at any time on a separate thread. The idea is to not let it if we are handling an exception.
The question I have is, does the call to MyCoolMethod update MyScannerEnabledVar when scannerEnabled is set or does it update it when the method exits?
Note: I did not write this code, I am just trying to refactor it safely.
You can think of a ref as making an alias to a variable. It's not that the variable you pass is "passed by reference", it's that the parameter and the argument are the same variable, just with two different names. So updating one immediately updates the other, because there aren't actually two things here in the first place.
As SLaks notes, there are situations in VB that use copy-in-copy-out semantics. There are also, if I recall correctly, rare and obscure situations in which expression trees may be compiled into code that does copy-in-copy-out, but I do not recall the details.
If this code is intended to update the variable for reading on another thread, the fact that the variable is "immediately" updated is misleading. Remember, on multiple threads, reads and writes can be observed to move forwards and backwards in time with respect to each other if the reads and writes are not volatile. If the intention is to use the variable as a cross-thread communications mechanism them use an object actually designed for that purpose which is safe for that purpose. Use some sort of wait handle or mutex or whatever.
It gets updated live, as it is assigned inside the method.
When you pass a parameter by reference, the runtime passes (an equivalent to) a pointer to the field or variable that you referenced. When the method assigns to the parameter, it assigns directly to whatever the reference is pointing to.
Note, by the way, that this is not always true in VB.
Yes, it will be set when the variable is set within the method. Perhaps it would be best to return true or false whether the scanner is enabled rather than pass it in as a ref arg
The situation calls for more than a simple refactor. The code you posted will be subject to race conditions. The easy solution is to lock the unsafe method, thereby forcing threads to hop in line. The way it is, there's bound to be some bug(s) in the application due to this code, but its impossible to say what exactly they are without knowing a lot more about your requirements and implementation. I recommend you proceed with caution, a mutex/lock is an easy fix, but may have a great impact on performance. If this is a concern for you, then you all should review a better thread safe solution.

How do I write (test) code that will not be optimized by the compiler/JIT?

I don't really know much about the internals of compiler and JIT optimizations, but I usually try to use "common sense" to guess what could be optimized and what couldn't. So there I was writing a simple unit test method today:
#Test // [Test] in C#
public void testDefaultConstructor() {
new MyObject();
}
This method is actually all I need. It checks that the default constructor exists and runs without exceptions.
But then I started to think about the effect of compiler/JIT optimizations. Could the compiler/JIT optimize this method by eliminating the new MyObject(); statement completely? Of course, it would need to determine that the call graph does not have side effects to other objects, which is the typical case for a normal constructor that simply initializes the internal state of the object.
I presume that only the JIT would be allowed to perform such an optimization. This probably means that it's not something I should worry about, because the test method is being performed only once. Are my assumptions correct?
Nevertheless, I'm trying to think about the general subject. When I thought about how to prevent this method from being optimized, I thought I may assertTrue(new MyObject().toString() != null), but this is very dependent on the actual implementation of the toString() method, and even then, the JIT can determine that toString() method always returns a non-null string (e.g. if actually Object.toString() is being called), and thus optimize the whole branch. So this way wouldn't work.
I know that in C# I can use [MethodImpl(MethodImplOptions.NoOptimization)], but this is not what I'm actually looking for. I'm hoping to find a (language-independent) way of making sure that some specific part(s) of my code will actually run as I expect, without the JIT interfering in this process.
Additionally, are there any typical optimization cases I should be aware of when creating my unit tests?
Thanks a lot!
Don't worry about it. It's not allowed to ever optimize anything that can make a difference to your system (except for speed). If you new an object, code gets called, memory gets allocated, it HAS to work.
If you had it protected by an if(false), where false is a final, it could be optimized out of the system completely, then it could detect that the method doesn't do anything and optimize IT out (in theory).
Edit: by the way, it can also be smart enough to determine that this method:
newIfTrue(boolean b) {
if(b)
new ThisClass();
}
will always do nothing if b is false, and eventually figure out that at one point in your code B is always false and compile this routine out of that code completely.
This is where the JIT can do stuff that's virtually impossible in any non-managed language.
I think if you are worried about it getting optimized away, you may be doing a bit of testing overkill.
In a static language, I tend to think of the compiler as a test. If it passes compilation, that means that certain things are there (like methods). If you don't have another test that exercises your default constructor (which will prove it wont throw exceptions), you may want to think about why you are writing that default constructor in the first place (YAGNI and all that).
I know there are people that don't agree with me, but I feel like this sort of thing is just something that will bloat out your number of tests for no useful reason, even looking at it through TDD goggles.
Think about it this way:
Lets assume that compiler can determine that the call graph doesn't have any side effects(I don't think it is possible, I vaguely remember something about P=NP from my CS courses). It will optimize any method that doesn't have side effects. Since most tests don't have and shouldn't have any side effects then compiler can optimize them all away.
The JIT is only allowed to perform operations that do not affect the guaranteed semantics of the language. Theoretically, it could remove the allocation and call to the MyObject constructor if it can guarantee that the call has no side effects and can never throw an exception (not counting OutOfMemoryError).
In other words, if the JIT optimizes the call out of your test, then your test would have passed anyway.
PS: Note that this applies because you are doing functionality testing as opposed to performance testing. In performance testing, it's important to make sure the JIT does not optimize away the operation you are measuring, else your results become useless.
It seems that in C# I could do this:
[Test]
public void testDefaultConstructor() {
GC.KeepAlive(new MyObject());
}
AFAIU, the GC.KeepAlive method will not be inlined by the JIT, so the code will be guaranteed to work as expected. However, I don't know a similar construct in Java.
Every I/O is a side effect, so you can just put
Object obj = new MyObject();
System.out.println(obj.toString());
and you're fine.
Why should it matter? If the compiler/JIT can statically determine no asserts are going to be hit (which could cause side effects), then you're fine.

Categories