Are there any ways to find all code that invokes IO operations like File.WriteAllText, Request.Files["filename"].SaveAs("out"), etc?
For now I can just grep for all possible common ways to read/write files with something like that for example:
grep 'SaveAs' -I -r . -l | grep "\.cs"
This is not satisfactory because I can't think of all possible ways files can be read and written. Maybe it could be done via reflection somehow or through analysis of system calls in compiled binaries? Any ideas?
EDIT:
If method A calls method B and method B calls method C, and method C does a file operation it would be good to have that code identified as well. However, just to simplify the problem finding direct calls to IO would be sufficient.
If you're planning to develop this tool, you can start here.
The article describes ways to detect assembly and method dependencies, so you could find all methods that calls IO primitives, such as FileStream.Write.
So on a whim I decided to try this using Mono.Cecil.
It's pretty simple to recursively get a list of all the methods that a method calls:
static IEnumerable<MethodDefinition> GetMethodsCalledInMethod(MethodDefinition md)
{
if (md.Body == null)
{
return Enumerable.Empty<MethodDefinition>();
}
return md.Body.Instructions
.Select(i => i.Operand)
.OfType<MethodReference>()
.Select(mr => mr.Resolve())
.Where(mr => mr != null);
}
You can get all the methods in an assembly you want to inspect:
var ad = AssemblyDefinition.ReadAssembly(typeof(ATypeInTheAssembly).Assembly.Location);
var allMethodsInAssembly = ad.Modules
.SelectMany(m => m.Types)
.SelectMany(t => t.Methods));
You can then recurse through this tree of method calls until you find a method call which looks like an IO call.
Func<MethodDefinition, bool> isFileOperation = md =>
md.DeclaringType.Name == "FileStream";
Is this sufficient? I don't know. File.WriteAllText uses a FileStream. The SaveAs example you gave does too. But does every file access go through a FileStream? I can't say.
This approach also has issues:
You need to look out for recursive calls, because otherwise your recursion will turn into an infinite loop and you'll blow your stack.
This is sloooow. When I analysed a simple console application that called File.WriteAllLines I got a result immediately, but when I tried to analyse the analyser itself it got lost in the tree.
If you run into an interface - or even a virtual method - you can't know for sure what the implementation is going to be, so you can't know whether it'll perform an IO operation or not!
It depends on how much do you want to do in depth analysis.If you're really serious about catching all 'System.IO' method calls I would suggest using NRefactory. It's a front-end C# parser which can parse C# code and generated syntax tree and code resolver.
There's good tutorial about using it in code project that can help you to get started. Also in code project tutorial sample there's auxiliary classes that let you to load the whole solution, and provide you with code resolver.
If you need more sample you can find some in my blog.
PS : I'll try to extend this answer with code sample in a few hours.
Problem Description:
As a single developer I've stumbled across this several times:
Sometimes on a project instead of choosing a cleaner approach it makes sense for efficiency reasons to just add some quick and dirty test-code within production code.
While it may sound like a dirty solution, please keep in mind, that I'm only talking about test-code that should be thrown away anyway, when I just want to quickly check something.
To be 100% sure to not forget to take this code out again I'd like to guard it by a compile time timebomb:
What I mean is some piece of code, preprocessor code or anything basically that allows for compilation for a certain timespan, like half an hour for instance and then automatically results in a compiler error after the time (half an hour) is over. A compiler error would be great because it could directly mark the place of the test-code. (I could have it as the first line(s) within such a throwaway region for instance)
So anything like "throw an error if system time is bigger than this specific DateTime" would be great.
I looked into the preprocessor (not too familiar with it) but the directives #if and #error didn't seem like an option since #if expects a symbol instead of an expression.
Question:
Is such a timebomb possible? Any idea on how to do that?
Or any idea on how to get the efficiency of quick and dirty test-code and be absolutely sure that taking it out again can't be forgotten?
(A run time error would be easy but if a compile time error isn't possible I would need something of a similar quality to it.)
I personally think, that timebombing is the wrong approach. Use build targets to distinguish different purpose of code usage.
// Constructor call will only be allowed for target DEBUG
public class Protect : IDisposable
{
#if DEBUG
[Obsolete("error", false)]
#else
[Obsolete("error", true)]
#endif
public Protect()
{
}
public void Dispose()
{
}
}
Usage
using (new Protect())
{
// do some testcode
// will only compile in DEBUG mode
}
One option is to generate file that has "current time" variable at build time and than simply add check that code should stop working after particular time.
Possible approach to generate file at build time - how to put computed value in RESX at build time C#
I know there are a quite a few static analysis tools for C# or .Net around. See this question for a good list of available tools. I have used some of those in the past and they have a good way of detecting problems.
I am currently looking for a way to automatically enforce some locking rules we have in our teams. For example I would like to enforce the following rules:
"Every public method that uses member foo must acquire a lock on bar"
Or
"Every call to foobar event must be outside lock to bar"
Writing custom FxCop rules, if feasible, seems rather complex. Is there any simpler way of doing it?
Multithreading is hard. Using locks is not the only way to make operations thread-safe. A developer may use non-blocking synchronization with a loop and Interlocked.CompareExchange, or some other mechanism instead. A rule can not determine if something is thread-safe.
If the purpose of rules is to ensure high quality code, I think the best way to go about this is to create a thread-safe version of your class which is simple to consume. Put checks in place that the more-complex synchronization code is only modified under code review by developers that understand multithreading.
With NDepend you could write a code rule over a LINQ query (CQLinq) that could look like:
warnif count > 0 from m in Methods where
m.IsUsing ("YourNamespace.YourClass.foo") && (
! m.IsUsing ("YourNamespace.YourClass.bar") ||
! m.IsUsing ("System.Threading.Monitor.Enter(Object)".AllowNoMatch()) ||
! m.IsUsing ("System.Threading.Monitor.Exit(Object)".AllowNoMatch()) )
select new { m, m.NbLinesOfCode }
Basically it will matches methods that uses the field foo, without using the field bar, or without calling Monitor Enter or Exit. This is not exactly what you are asking for, since you want lock explicitely on bar, but this is simple and quite close.
Notes that you can also write...
m.AssignField("YourNamespace.YourClass.foo")
... to restrict a specific write/assign field usage on foo.
One of possible solutions could be implementation of Code Contracts. You define rules, run them at compile time (so can be also integrated in your CI environment if any) and get results.
For en example of using CodeContracts like a tool for code static analys see :
Static Code Analysis and Code Contracts
I have a piece of software written with fluent syntax. The method chain has a definitive "ending", before which nothing useful is actually done in the code (think NBuilder, or Linq-to-SQL's query generation not actually hitting the database until we iterate over our objects with, say, ToList()).
The problem I am having is there is confusion among other developers about proper usage of the code. They are neglecting to call the "ending" method (thus never actually "doing anything")!
I am interested in enforcing the usage of the return value of some of my methods so that we can never "end the chain" without calling that "Finalize()" or "Save()" method that actually does the work.
Consider the following code:
//The "factory" class the user will be dealing with
public class FluentClass
{
//The entry point for this software
public IntermediateClass<T> Init<T>()
{
return new IntermediateClass<T>();
}
}
//The class that actually does the work
public class IntermediateClass<T>
{
private List<T> _values;
//The user cannot call this constructor
internal IntermediateClass<T>()
{
_values = new List<T>();
}
//Once generated, they can call "setup" methods such as this
public IntermediateClass<T> With(T value)
{
var instance = new IntermediateClass<T>() { _values = _values };
instance._values.Add(value);
return instance;
}
//Picture "lazy loading" - you have to call this method to
//actually do anything worthwhile
public void Save()
{
var itemCount = _values.Count();
. . . //save to database, write a log, do some real work
}
}
As you can see, proper usage of this code would be something like:
new FluentClass().Init<int>().With(-1).With(300).With(42).Save();
The problem is that people are using it this way (thinking it achieves the same as the above):
new FluentClass().Init<int>().With(-1).With(300).With(42);
So pervasive is this problem that, with entirely good intentions, another developer once actually changed the name of the "Init" method to indicate that THAT method was doing the "real work" of the software.
Logic errors like these are very difficult to spot, and, of course, it compiles, because it is perfectly acceptable to call a method with a return value and just "pretend" it returns void. Visual Studio doesn't care if you do this; your software will still compile and run (although in some cases I believe it throws a warning). This is a great feature to have, of course. Imagine a simple "InsertToDatabase" method that returns the ID of the new row as an integer - it is easy to see that there are some cases where we need that ID, and some cases where we could do without it.
In the case of this piece of software, there is definitively never any reason to eschew that "Save" function at the end of the method chain. It is a very specialized utility, and the only gain comes from the final step.
I want somebody's software to fail at the compiler level if they call "With()" and not "Save()".
It seems like an impossible task by traditional means - but that's why I come to you guys. Is there an Attribute I can use to prevent a method from being "cast to void" or some such?
Note: The alternate way of achieving this goal that has already been suggested to me is writing a suite of unit tests to enforce this rule, and using something like http://www.testdriven.net to bind them to the compiler. This is an acceptable solution, but I am hoping for something more elegant.
I don't know of a way to enforce this at a compiler level. It's often requested for objects which implement IDisposable as well, but isn't really enforceable.
One potential option which can help, however, is to set up your class, in DEBUG only, to have a finalizer that logs/throws/etc. if Save() was never called. This can help you discover these runtime problems while debugging instead of relying on searching the code, etc.
However, make sure that, in release mode, this is not used, as it will incur a performance overhead since the addition of an unnecessary finalizer is very bad on GC performance.
You could require specific methods to use a callback like so:
new FluentClass().Init<int>(x =>
{
x.Save(y =>
{
y.With(-1),
y.With(300)
});
});
The with method returns some specific object, and the only way to get that object is by calling x.Save(), which itself has a callback that lets you set up your indeterminate number of with statements. So the init takes something like this:
public T Init<T>(Func<MyInitInputType, MySaveResultType> initSetup)
I can think of three a few solutions, not ideal.
AIUI what you want is a function which is called when the temporary variable goes out of scope (as in, when it becomes available for garbage collection, but will probably not be garbage collected for some time yet). (See: The difference between a destructor and a finalizer?) This hypothetical function would say "if you've constructed a query in this object but not called save, produce an error". C++/CLI calls this RAII, and in C++/CLI there is a concept of a "destructor" when the object isn't used any more, and a "finaliser" which is called when it's finally garbage collected. Very confusingly, C# has only a so-called destructor, but this is only called by the garbage collector (it would be valid for the framework to call it earlier, as if it were partially cleaning the object immediately, but AFAIK it doesn't do anything like that). So what you would like is a C++/CLI destructor. Unfortunately, AIUI this maps onto the concept of IDisposable, which exposes a dispose() method which can be called when a C++/CLI destructor would be called, or when the C# destructor is called -- but AIUI you still have to call "dispose" manually, which defeats the point?
Refactor the interface slightly to convey the concept more accurately. Call the init function something like "prepareQuery" or "AAA" or "initRememberToCallSaveOrThisWontDoAnything". (The last is an exaggeration, but it might be necessary to make the point).
This is more of a social problem than a technical problem. The interface should make it easy to do the right thing, but programmers do have to know how to use code! Get all the programmers together. Explain simply once-and-for-all this simple fact. If necessary have them all sign a piece of paper saying they understand, and if they wilfully continue to write code which doesn't do anythign they're worse than useless to the company and will be fired.
Fiddle with the way the operators are chained, eg. have each of the intermediateClass functions assemble an aggregate intermediateclass object containing all of the parameters (you mostly do it this was already (?)) but require an init-like function of the original class to take that as an argument, rather than have them chained after it, and then you can have save and the other functions return two different class types (with essentially the same contents), and have init only accept a class of the correct type.
The fact that it's still a problem suggests that either your coworkers need a helpful reminder, or they're rather sub-par, or the interface wasn't very clear (perhaps its perfectly good, but the author didn't realise it wouldn't be clear if you only used it in passing rather than getting to know it), or you yourself have misunderstood the situation. A technical solution would be good, but you should probably think about why the problem occurred and how to communicate more clearly, probably asking someone senior's input.
After great deliberation and trial and error, it turns out that throwing an exception from the Finalize() method was not going to work for me. Apparently, you simply can't do that; the exception gets eaten up, because garbage collection operates non-deterministically. I was unable to get the software to call Dispose() automatically from the destructor either. Jack V.'s comment explains this well; here was the link he posted, for redundancy/emphasis:
The difference between a destructor and a finalizer?
Changing the syntax to use a callback was a clever way to make the behavior foolproof, but the agreed-upon syntax was fixed, and I had to work with it. Our company is all about fluent method chains. I was also a fan of the "out parameter" solution to be honest, but again, the bottom line is the method signatures simply could not change.
Helpful information about my particular problem includes the fact that my software is only ever to be run as part of a suite of unit tests - so efficiency is not a problem.
What I ended up doing was use Mono.Cecil to Reflect upon the Calling Assembly (the code calling into my software). Note that System.Reflection was insufficient for my purposes, because it cannot pinpoint method references, but I still needed(?) to use it to get the "calling assembly" itself (Mono.Cecil remains underdocumented, so it's possible I just need to get more familiar with it in order to do away with System.Reflection altogether; that remains to be seen....)
I placed the Mono.Cecil code in the Init() method, and the structure now looks something like:
public IntermediateClass<T> Init<T>()
{
ValidateUsage(Assembly.GetCallingAssembly());
return new IntermediateClass<T>();
}
void ValidateUsage(Assembly assembly)
{
// 1) Use Mono.Cecil to inspect the codebase inside the assembly
var assemblyLocation = assembly.CodeBase.Replace("file:///", "");
var monoCecilAssembly = AssemblyFactory.GetAssembly(assemblyLocation);
// 2) Retrieve the list of Instructions in the calling method
var methods = monoCecilAssembly.Modules...Types...Methods...Instructions
// (It's a little more complicated than that...
// if anybody would like more specific information on how I got this,
// let me know... I just didn't want to clutter up this post)
// 3) Those instructions refer to OpCodes and Operands....
// Defining "invalid method" as a method that calls "Init" but not "Save"
var methodCallingInit = method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(INITMETHODSIGNATURE);
var methodNotCallingSave = !method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(SAVEMETHODSIGNATURE);
var methodInvalid = methodCallingInit && methodNotCallingSave;
// Note: this is partially pseudocode;
// It doesn't 100% faithfully represent either Mono.Cecil's syntax or my own
// There are actually a lot of annoying casts involved, omitted for sanity
// 4) Obviously, if the method is invalid, throw
if (methodInvalid)
{
throw new Exception(String.Format("Bad developer! BAD! {0}", method.Name));
}
}
Trust me, the actual code is even uglier looking than my pseudocode.... :-)
But Mono.Cecil just might be my new favorite toy.
I now have a method that refuses to be run its main body unless the calling code "promises" to also call a second method afterwards. It's like a strange kind of code contract. I'm actually thinking about making this generic and reusable. Would any of you have a use for such a thing? Say, if it were an attribute?
What if you made it so Init and With don't return objects of type FluentClass? Have them return, e.g., UninitializedFluentClass which wraps a FluentClass object. Then calling .Save(0 on the UnitializedFluentClass object calls it on the wrapped FluentClass object and returns it. If they don't call Save they don't get a FluentClass object.
In Debug mode beside implementing IDisposable you can setup a timer that will throw a exception after 1 second if the resultmethod has not been called.
Use an out parameter! All the outs must be used.
Edit: I am not sure of it will help, tho...
It would break the fluent syntax.
Is there a way to use .NET reflection to capture the values of all parameters/local variables?
You could get at this information using the CLR debugging API though it won't be a simple couple of lines to extract it.
Reflection is not used to capture information from the stack. It reads the Assembly.
You might want to take a look at StackTrace
http://msdn.microsoft.com/en-us/library/system.diagnostics.stacktrace.aspx
Good article here:
http://www.codeproject.com/KB/trace/customtracelistener.aspx
Reflection will tell you the type of parameters that a method has but it won't help discover their values during any particular invocation. Reflection doesn't tell you anything about local variables at all.
You need the sort of APIs that the debugger uses to access this sort of info.
I dont think this is possible, you can get the method and its parameters by looking at the StackTrace.
System.Diagnostics.StackTrace sTrace = new System.Diagnostics.StackTrace(true);
for (Int32 frameCount = 0; frameCount < sTrace.FrameCount; frameCount++){
System.Diagnostics.StackFrame sFrame = sTrace.GetFrame(frameCount);
System.Reflection.MethodBase thisMethod = sFrame.GetMethod();
if (thisMethod == currentMethod){
if (frameCount + 1 <= sTrace.FrameCount){
System.Diagnostics.StackFrame prevFrame = sTrace.GetFrame(frameCount + 1);
System.Reflection.MethodBase prevMethod = prevFrame.GetMethod();
}
}
}
I don't know how it's possible using reflection, but look at using weaving. SpringFramework.Net allows you to define pointcuts that can intercept method calls. Others probably do it as well.
Here's a link to the "BeforeAdvice" interceptor
http://www.springframework.net/docs/1.2.0-M1/reference/html/aop.html#d0e8139
The folks at secondlife suspend scripts and move them between servers. That implies that they have to capture the state of a running script, including the values of variables on the call stack.
Their scripting language runs on mono, an open source implementation of the .NET runtime. I doubt that their solution applies to the regular .NET runtime, but the video of the presentation on how they did it (skip to second half) might still be interesting.