C# Instrumentation

C# Instrumentation - c#

What would be the easiest way to do instrumentation of C# code? By instrumentation I mean inserting my own pieces of the code to gather some dynamic information during the execution.
For example (star represents some unimportant piece of code):
for (int i=0; i<s.Length-2; ++i) {
if (*)
s = s.Substring(1, s.Length-2);
}
I would like to catch 0 being assigned to i, i incremented and assignment and call to a Substring. By catching a method call, I mean that I have information what method it is and values of the arguments or similar.
I tried to do it with the Roslyn by wrapping method calls with my own wrappers which could a) intercept values and store them (for example), b) call actual method and c) return this result. Only problem is that this approach is really error-prone and difficult (because there are many different cases to cover).
I wonder if there is already some library for this purpose or someone knows easier way of doing it. Thank you!

You could use tools like :
- Cecil
- ReFrameworker
- MBEL
- RAIL

Related

Is it possible to always eliminate goto's?

While making everthing with goto's is easy (as evidenced by f.ex. IL), I was wondering if it is also possible to eliminate all goto statements with higher level expressions and statements - say - using everything that's supported in Java.
Or if you like: what I'm looking for is 'rewrite rules' that will always work, regardless of the way the goto is created.
It's mostly intended as a theoretical question, purely as interest; not as a good/bad practices.
The obvious solution that I've thought about is to use something like this:
while (true)
{
switch (state) {
case [label]: // here's where all your goto's will be
state = [label];
continue;
default:
// here's the rest of the program.
}
}
While this will probably work and does fits my 'formal' question, I don't like my solution a single bit. For one, it's dead ugly and for two, it basically wraps the goto into a switch that does exactly the same as a goto would do.
So, is there a better solution?
Update 1
Since a lot of people seem to think the question is 'too broad', I'm going to elaborate a bit more... the reason I mentioned Java is because Java doesn't have a 'goto' statement. As one of my hobby projects, I was attempting to transform C# code into Java, which is proving to be quite challenging (partly because of this limitation in Java).
That got me thinking. If you have f.ex. the implementation of the 'remove' method in Open addressing (see: http://en.wikipedia.org/wiki/Open_addressing - note 1), it's quite convenient to have the 'goto' in the exceptional case, although in this particular case you could rewrite it by introducing a 'state' variable. Note that this is just one example, I've implemented code generators for continuations, which produce tons and tons of goto's when you're attempting to decompile them.
I'm also not sure if rewriting in this matter will always eliminate the 'goto' statement and if it is allowed in every case. While I'm not looking for a formal 'proof', some evidence that elimination is possible in this matter would be great.
So about the 'broadness', I challenge all the people that think there are 'too many answers' or 'many ways to rewrite a goto' to provide an algorithm or an approach to rewriting the general case please, since the only answer I found so far is the one I've posted.

This 1994 paper: Taming Control Flow: A Structured Approach to Eliminating Goto
Statements proposes an algorithm to eradicate all goto statements in a C program. The method is applicable to any program written in C# or any language that uses common constructs like if/switch/loop/break/continue (AFAIK, but I don't see why it wouldn't).
It begins with the two simplest transformations:
Case 1
Stuff1();
if (cond) goto Label;
Stuff2();
Label:
Stuff3();
becomes:
Stuff1();
if (!cond)
{
Stuff2();
}
Stuff3();
Case 2
Stuff1();
Label:
Stuff2();
Stuff3();
if (cond) goto Label;
becomes:
Stuff1();
do
{
Stuff2();
Stuff3();
} while (cond);
and builds on that to examine each complex case and apply iterative transformations that lead to those trivial cases. It then rounds off with the ultimate gotos/labels eradication algorithm.
This is a very interesting read.
UPDATE: Some other interesting papers on the subject (not easy to get your hands on, so I copy direct links here for reference):
A Formal Basis for Removing Goto Statements
A Goto-Elimination Method And Its Implementation For The McCat C Compiler

A situation where a goto can be avoided, but I think it is better to use it:
When I need to exit a inner loop and the loop:
for(int i = 0; i < list.Count; i++)
{
// some code that initializes inner
for(int j = 0; j < inner.Count; j++)
{
// Some code.
if (condition) goto Finished;
}
}
Finished:
// Some more code.
To avoid the goto you should do something like this:
for(int i = 0; i < list.Count; i++)
{
// some code that initializes inner
bool conditon = false;
for(int j = 0; j < inner.Count; j++)
{
// Some code that might change condition
if (condition) break;
}
if (condition) break;
}
// Some more code.
I think it looks much nicer with the goto statement.
The second situation is okay if the inner loop was in a different method.
void OuterLoop(list)
{
for(int i = 0; i < list.Count; i++)
{
// some code that initializes inner
if (InnerLoop(inner)) break;
}
}
bool InnerLoop(inner)
{
for(int j = 0; j < inner.Count; j++)
{
// Some code that might change condition
if (condition) return true;
}
return false;
}

I have some practical experience of attempting to take an unstructured program (in COBOL, no less) and render it as structured by removing every instance of GOTO. The original programmer was a reformed Assembly programmer, and though he may have known about the PERFORM statement, he didn't use it. It was GOTO GOTO GOTO. And it was serious spaghetti code -- several hundred lines worth of spaghetti code. I spent a couple of weeks worth of spare time trying to rewrite this monstrous construct, and eventually I had to give up. It was a huge steaming pile of insanity. It worked, though! Its job was to parse user instructions sent in to the mainframe in a textual format, and it did it well.
So, NO, it is not always to possible to completely eliminate GOTO -- if you are using manual methods. This is an edge case, however -- existing code that was written by a man with an apparently twisted programming mind. In modern times, there are tools available which can solve formerly intractable structural problems.
Since that day I have coded in Modula-2, C, Revelation Basic, three flavors of VB, and C# and have never found a situation that required or even suggested GOTO as a solution. For the original BASIC, however, GOTO was unavoidable.

Since you said it's a theoretical question, here is the theoretical answer.
Java is Turing complete so of course, yes. You can express any C# program in Java. You can also express it in Minecraft Redstone or Minesweeper. Compared to these alternatives expressing it in Java should be easy.
An obviously more practical answer though, namely an intelligible algorithm to do the transformation, has been given by Patrice.

I don't like my solution a single bit. For one, it's dead ugly and for two, it basically wraps the goto into a switch that does exactly the same as a goto would do.
That's what you're always going to get when looking for a general pattern which can replace any use of goto, something which does exactly the same as a goto would do. What else could it be? Different uses of goto should be replaced with best matching language construct. That's why constructs as switch statements and for loops exist in the first place, to make it easier and less error prone to create such a program flow. The compiler will still generate goto's (or jumps), but it will do so consistently where we will screw up. On top of that we don't have to read what the compiler generates, but we get to read (and write) something that's easier to understand.
You'll find that most compiler constructs exist to generalize a specific use of goto, those constructs where create based on the common patterns of goto usage which existed before it. The paper Patrice Gahide mentions sort of shows that process in reverse. If you are finding goto's in your code you are either looking at one of those patterns, in which case you should replace it with the matching language construct. Or you are looking at essentially unstructured code in which case you should actually structure the code (or leave it alone). Changing it to something unstructured but without goto can only make matters worse. (The same generalization process is still going on by the way, consider how foreach is being added to compilers to generalize a very common usage of for...)

How to enforce the use of a method's return value in C#?

I have a piece of software written with fluent syntax. The method chain has a definitive "ending", before which nothing useful is actually done in the code (think NBuilder, or Linq-to-SQL's query generation not actually hitting the database until we iterate over our objects with, say, ToList()).
The problem I am having is there is confusion among other developers about proper usage of the code. They are neglecting to call the "ending" method (thus never actually "doing anything")!
I am interested in enforcing the usage of the return value of some of my methods so that we can never "end the chain" without calling that "Finalize()" or "Save()" method that actually does the work.
Consider the following code:
//The "factory" class the user will be dealing with
public class FluentClass
{
//The entry point for this software
public IntermediateClass<T> Init<T>()
{
return new IntermediateClass<T>();
}
}
//The class that actually does the work
public class IntermediateClass<T>
{
private List<T> _values;
//The user cannot call this constructor
internal IntermediateClass<T>()
{
_values = new List<T>();
}
//Once generated, they can call "setup" methods such as this
public IntermediateClass<T> With(T value)
{
var instance = new IntermediateClass<T>() { _values = _values };
instance._values.Add(value);
return instance;
}
//Picture "lazy loading" - you have to call this method to
//actually do anything worthwhile
public void Save()
{
var itemCount = _values.Count();
. . . //save to database, write a log, do some real work
}
}
As you can see, proper usage of this code would be something like:
new FluentClass().Init<int>().With(-1).With(300).With(42).Save();
The problem is that people are using it this way (thinking it achieves the same as the above):
new FluentClass().Init<int>().With(-1).With(300).With(42);
So pervasive is this problem that, with entirely good intentions, another developer once actually changed the name of the "Init" method to indicate that THAT method was doing the "real work" of the software.
Logic errors like these are very difficult to spot, and, of course, it compiles, because it is perfectly acceptable to call a method with a return value and just "pretend" it returns void. Visual Studio doesn't care if you do this; your software will still compile and run (although in some cases I believe it throws a warning). This is a great feature to have, of course. Imagine a simple "InsertToDatabase" method that returns the ID of the new row as an integer - it is easy to see that there are some cases where we need that ID, and some cases where we could do without it.
In the case of this piece of software, there is definitively never any reason to eschew that "Save" function at the end of the method chain. It is a very specialized utility, and the only gain comes from the final step.
I want somebody's software to fail at the compiler level if they call "With()" and not "Save()".
It seems like an impossible task by traditional means - but that's why I come to you guys. Is there an Attribute I can use to prevent a method from being "cast to void" or some such?
Note: The alternate way of achieving this goal that has already been suggested to me is writing a suite of unit tests to enforce this rule, and using something like http://www.testdriven.net to bind them to the compiler. This is an acceptable solution, but I am hoping for something more elegant.

I don't know of a way to enforce this at a compiler level. It's often requested for objects which implement IDisposable as well, but isn't really enforceable.
One potential option which can help, however, is to set up your class, in DEBUG only, to have a finalizer that logs/throws/etc. if Save() was never called. This can help you discover these runtime problems while debugging instead of relying on searching the code, etc.
However, make sure that, in release mode, this is not used, as it will incur a performance overhead since the addition of an unnecessary finalizer is very bad on GC performance.

You could require specific methods to use a callback like so:
new FluentClass().Init<int>(x =>
{
x.Save(y =>
{
y.With(-1),
y.With(300)
});
});
The with method returns some specific object, and the only way to get that object is by calling x.Save(), which itself has a callback that lets you set up your indeterminate number of with statements. So the init takes something like this:
public T Init<T>(Func<MyInitInputType, MySaveResultType> initSetup)

I can think of three a few solutions, not ideal.
AIUI what you want is a function which is called when the temporary variable goes out of scope (as in, when it becomes available for garbage collection, but will probably not be garbage collected for some time yet). (See: The difference between a destructor and a finalizer?) This hypothetical function would say "if you've constructed a query in this object but not called save, produce an error". C++/CLI calls this RAII, and in C++/CLI there is a concept of a "destructor" when the object isn't used any more, and a "finaliser" which is called when it's finally garbage collected. Very confusingly, C# has only a so-called destructor, but this is only called by the garbage collector (it would be valid for the framework to call it earlier, as if it were partially cleaning the object immediately, but AFAIK it doesn't do anything like that). So what you would like is a C++/CLI destructor. Unfortunately, AIUI this maps onto the concept of IDisposable, which exposes a dispose() method which can be called when a C++/CLI destructor would be called, or when the C# destructor is called -- but AIUI you still have to call "dispose" manually, which defeats the point?
Refactor the interface slightly to convey the concept more accurately. Call the init function something like "prepareQuery" or "AAA" or "initRememberToCallSaveOrThisWontDoAnything". (The last is an exaggeration, but it might be necessary to make the point).
This is more of a social problem than a technical problem. The interface should make it easy to do the right thing, but programmers do have to know how to use code! Get all the programmers together. Explain simply once-and-for-all this simple fact. If necessary have them all sign a piece of paper saying they understand, and if they wilfully continue to write code which doesn't do anythign they're worse than useless to the company and will be fired.
Fiddle with the way the operators are chained, eg. have each of the intermediateClass functions assemble an aggregate intermediateclass object containing all of the parameters (you mostly do it this was already (?)) but require an init-like function of the original class to take that as an argument, rather than have them chained after it, and then you can have save and the other functions return two different class types (with essentially the same contents), and have init only accept a class of the correct type.
The fact that it's still a problem suggests that either your coworkers need a helpful reminder, or they're rather sub-par, or the interface wasn't very clear (perhaps its perfectly good, but the author didn't realise it wouldn't be clear if you only used it in passing rather than getting to know it), or you yourself have misunderstood the situation. A technical solution would be good, but you should probably think about why the problem occurred and how to communicate more clearly, probably asking someone senior's input.

After great deliberation and trial and error, it turns out that throwing an exception from the Finalize() method was not going to work for me. Apparently, you simply can't do that; the exception gets eaten up, because garbage collection operates non-deterministically. I was unable to get the software to call Dispose() automatically from the destructor either. Jack V.'s comment explains this well; here was the link he posted, for redundancy/emphasis:
The difference between a destructor and a finalizer?
Changing the syntax to use a callback was a clever way to make the behavior foolproof, but the agreed-upon syntax was fixed, and I had to work with it. Our company is all about fluent method chains. I was also a fan of the "out parameter" solution to be honest, but again, the bottom line is the method signatures simply could not change.
Helpful information about my particular problem includes the fact that my software is only ever to be run as part of a suite of unit tests - so efficiency is not a problem.
What I ended up doing was use Mono.Cecil to Reflect upon the Calling Assembly (the code calling into my software). Note that System.Reflection was insufficient for my purposes, because it cannot pinpoint method references, but I still needed(?) to use it to get the "calling assembly" itself (Mono.Cecil remains underdocumented, so it's possible I just need to get more familiar with it in order to do away with System.Reflection altogether; that remains to be seen....)
I placed the Mono.Cecil code in the Init() method, and the structure now looks something like:
public IntermediateClass<T> Init<T>()
{
ValidateUsage(Assembly.GetCallingAssembly());
return new IntermediateClass<T>();
}
void ValidateUsage(Assembly assembly)
{
// 1) Use Mono.Cecil to inspect the codebase inside the assembly
var assemblyLocation = assembly.CodeBase.Replace("file:///", "");
var monoCecilAssembly = AssemblyFactory.GetAssembly(assemblyLocation);
// 2) Retrieve the list of Instructions in the calling method
var methods = monoCecilAssembly.Modules...Types...Methods...Instructions
// (It's a little more complicated than that...
// if anybody would like more specific information on how I got this,
// let me know... I just didn't want to clutter up this post)
// 3) Those instructions refer to OpCodes and Operands....
// Defining "invalid method" as a method that calls "Init" but not "Save"
var methodCallingInit = method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(INITMETHODSIGNATURE);
var methodNotCallingSave = !method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(SAVEMETHODSIGNATURE);
var methodInvalid = methodCallingInit && methodNotCallingSave;
// Note: this is partially pseudocode;
// It doesn't 100% faithfully represent either Mono.Cecil's syntax or my own
// There are actually a lot of annoying casts involved, omitted for sanity
// 4) Obviously, if the method is invalid, throw
if (methodInvalid)
{
throw new Exception(String.Format("Bad developer! BAD! {0}", method.Name));
}
}
Trust me, the actual code is even uglier looking than my pseudocode.... :-)
But Mono.Cecil just might be my new favorite toy.
I now have a method that refuses to be run its main body unless the calling code "promises" to also call a second method afterwards. It's like a strange kind of code contract. I'm actually thinking about making this generic and reusable. Would any of you have a use for such a thing? Say, if it were an attribute?

What if you made it so Init and With don't return objects of type FluentClass? Have them return, e.g., UninitializedFluentClass which wraps a FluentClass object. Then calling .Save(0 on the UnitializedFluentClass object calls it on the wrapped FluentClass object and returns it. If they don't call Save they don't get a FluentClass object.

In Debug mode beside implementing IDisposable you can setup a timer that will throw a exception after 1 second if the resultmethod has not been called.

Use an out parameter! All the outs must be used.
Edit: I am not sure of it will help, tho...
It would break the fluent syntax.

The best way to conditionally execute a function?

Yes, I know the wording is hard to understand, but this is something that bugs me a lot. On a recent project, I have a function that recurses, and there are a number of conditions that would cause it to stop recursing (at the moment three). Which of the situations would be optional? (I.E. best performance or easiest maintenance).
1) Conditional return:
void myRecursingFunction (int i, int j){
if (conditionThatWouldStopRecursing) return;
if (anotherConditionThatWouldStopRecursing) return;
if (thirdConditionThatWouldStopRecursing) return;
doSomeCodeHere();
myRecursingFunction(i + 1, j);
myRecursingFunction(i, j + 1);
}
2) Wrap the whole thing in an if statement
void myRecursingFunction (int i, int j){
if (
!conditionThatWouldStopRecursing &&
!anotherConditionThatWouldStopRecursing &&
!thirdConditionThatWouldStopRecursing
){
doSomeCodeHere();
myRecursingFunction(i + 1, j);
myRecursingFunction(i, j + 1);
}
}
3) You're doing it wrong noob, no sane algorithm would ever use recursion.

Both of those approaches should result in the same IL code behind the scenes since they are equivalent boolean expressions. Note that each termination condition will be evaluated in the order you write it (since the compiler can't tell which is most likely), so you will want to put the most common termination condition first.
Even though structured programming dictates the second approach is better, personally I prefer to code return conditions as a separate block at the top of the recursive method. I find that easier to read and follow (though I am not a fan of returns in random areas of the method body).

I would opt for the first solution, since this makes it perfectly clear what the conditions are to stop the recursion.
It is more readable and more maintainable imho.

If it's something that needs to be fast, I'd recommend hitting the most common case as quickly as possible (i.e. put the base case at the end because you'll only hit it once). Also think about putting a base case-1 before the recursing clause (i.e. perform the test before you call the function again rather than check it on entry to the subsequent call) if that will make a difference.
And, with all things, don't optimise unless it's a problem. I'd go for clarity first.

I like the variant 1 better...
It is much easier to read then variant 2. Here you have to understand 3 negations and chain them all together with and. I know its not "hard", but it takes much longer then glancing on variant 1

My vote is for option #1 as well. It looks clearer to me.

C#: Why can't we have inner methods / local functions?

Very often it happens that I have private methods which become very big and contain repeating tasks but these tasks are so specific that it doesn't make sense to make them available to any other code part.
So it would be really great to be able to create 'inner methods' in this case.
Is there any technical (or even philosophical?) limitation that prevents C# from giving us this? Or did I miss something?
Update from 2016: This is coming and it's called a 'local function'. See marked answer.

Well, we can have "anonymous methods" defined inside a function (I don't suggest using them to organize a large method):
void test() {
Action t = () => Console.WriteLine("hello world"); // C# 3.0+
// Action t = delegate { Console.WriteLine("hello world"); }; // C# 2.0+
t();
}

If something is long and complicated than usually its good practise to refactor it to a separate class (either normal or static - depending on context) - there you can have private methods which will be specific for this functionality only.

I know a lot of people dont like regions but this is a case where they could prove useful by grouping your specific methods into a region.

Could you give a more concrete example? After reading your post I have the following impression, which is of course only a guess, due to limited informations:
Private methods are not available outside your class, so they are hidden from any other code anyway.
If you want to hide private methods from other code in the same class, your class might be to big and might violate the single responsibility rule.
Have a look at anonymous delegates an lambda expressions. It's not exactly what you asked for, but they might solve most of your problems.
Achim

If your method becomes too big, consider putting it in a separate class, or to create private helper methods. Generally I create a new method whenever I would normally have written a comment.

The better solution is to refactor this method to separate class. Create instance of this class as private field in your initial class. Make the big method public and refactor big method into several private methods, so it will be much clear what it does.

Seems like we're going to get exactly what I wanted with Local Functions in C# 7 / Visual Studio 15:
https://github.com/dotnet/roslyn/issues/2930
private int SomeMethodExposedToObjectMembers(int input)
{
int InnerMethod(bool b)
{
// TODO: Change return based on parameter b
return 0;
}
var calculation = 0;
// TODO: Some calculations based on input, store result in calculation
if (calculation > 0) return InnerMethod(true);
return InnerMethod(false);
}
Too bad I had to wait more than 7 years for this :-)
See also other answers for earlier versions of C#.

Capturing method state using Reflection

Is there a way to use .NET reflection to capture the values of all parameters/local variables?

You could get at this information using the CLR debugging API though it won't be a simple couple of lines to extract it.

Reflection is not used to capture information from the stack. It reads the Assembly.
You might want to take a look at StackTrace
http://msdn.microsoft.com/en-us/library/system.diagnostics.stacktrace.aspx
Good article here:
http://www.codeproject.com/KB/trace/customtracelistener.aspx

Reflection will tell you the type of parameters that a method has but it won't help discover their values during any particular invocation. Reflection doesn't tell you anything about local variables at all.
You need the sort of APIs that the debugger uses to access this sort of info.

I dont think this is possible, you can get the method and its parameters by looking at the StackTrace.
System.Diagnostics.StackTrace sTrace = new System.Diagnostics.StackTrace(true);
for (Int32 frameCount = 0; frameCount < sTrace.FrameCount; frameCount++){
System.Diagnostics.StackFrame sFrame = sTrace.GetFrame(frameCount);
System.Reflection.MethodBase thisMethod = sFrame.GetMethod();
if (thisMethod == currentMethod){
if (frameCount + 1 <= sTrace.FrameCount){
System.Diagnostics.StackFrame prevFrame = sTrace.GetFrame(frameCount + 1);
System.Reflection.MethodBase prevMethod = prevFrame.GetMethod();
}
}
}

I don't know how it's possible using reflection, but look at using weaving. SpringFramework.Net allows you to define pointcuts that can intercept method calls. Others probably do it as well.
Here's a link to the "BeforeAdvice" interceptor
http://www.springframework.net/docs/1.2.0-M1/reference/html/aop.html#d0e8139

The folks at secondlife suspend scripts and move them between servers. That implies that they have to capture the state of a running script, including the values of variables on the call stack.
Their scripting language runs on mono, an open source implementation of the .NET runtime. I doubt that their solution applies to the regular .NET runtime, but the video of the presentation on how they did it (skip to second half) might still be interesting.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.