Find all File IO operations in C# code (maybe through static analysis)

Find all File IO operations in C# code (maybe through static analysis) - c#

Are there any ways to find all code that invokes IO operations like File.WriteAllText, Request.Files["filename"].SaveAs("out"), etc?
For now I can just grep for all possible common ways to read/write files with something like that for example:
grep 'SaveAs' -I -r . -l | grep "\.cs"
This is not satisfactory because I can't think of all possible ways files can be read and written. Maybe it could be done via reflection somehow or through analysis of system calls in compiled binaries? Any ideas?
EDIT:
If method A calls method B and method B calls method C, and method C does a file operation it would be good to have that code identified as well. However, just to simplify the problem finding direct calls to IO would be sufficient.

If you're planning to develop this tool, you can start here.
The article describes ways to detect assembly and method dependencies, so you could find all methods that calls IO primitives, such as FileStream.Write.

So on a whim I decided to try this using Mono.Cecil.
It's pretty simple to recursively get a list of all the methods that a method calls:
static IEnumerable<MethodDefinition> GetMethodsCalledInMethod(MethodDefinition md)
{
if (md.Body == null)
{
return Enumerable.Empty<MethodDefinition>();
}
return md.Body.Instructions
.Select(i => i.Operand)
.OfType<MethodReference>()
.Select(mr => mr.Resolve())
.Where(mr => mr != null);
}
You can get all the methods in an assembly you want to inspect:
var ad = AssemblyDefinition.ReadAssembly(typeof(ATypeInTheAssembly).Assembly.Location);
var allMethodsInAssembly = ad.Modules
.SelectMany(m => m.Types)
.SelectMany(t => t.Methods));
You can then recurse through this tree of method calls until you find a method call which looks like an IO call.
Func<MethodDefinition, bool> isFileOperation = md =>
md.DeclaringType.Name == "FileStream";
Is this sufficient? I don't know. File.WriteAllText uses a FileStream. The SaveAs example you gave does too. But does every file access go through a FileStream? I can't say.
This approach also has issues:
You need to look out for recursive calls, because otherwise your recursion will turn into an infinite loop and you'll blow your stack.
This is sloooow. When I analysed a simple console application that called File.WriteAllLines I got a result immediately, but when I tried to analyse the analyser itself it got lost in the tree.
If you run into an interface - or even a virtual method - you can't know for sure what the implementation is going to be, so you can't know whether it'll perform an IO operation or not!

It depends on how much do you want to do in depth analysis.If you're really serious about catching all 'System.IO' method calls I would suggest using NRefactory. It's a front-end C# parser which can parse C# code and generated syntax tree and code resolver.
There's good tutorial about using it in code project that can help you to get started. Also in code project tutorial sample there's auxiliary classes that let you to load the whole solution, and provide you with code resolver.
If you need more sample you can find some in my blog.
PS : I'll try to extend this answer with code sample in a few hours.

Related

Iterating through diff changes in LibGit2Sharp

What could be the best (as in performant, simple) way to iterate over TreeChanges in LibGit2Sharp?
If I access the .Patch property, I retrieve the full text of the changes. This is not quite enough for me... ideally I would like to be able to iterate over the diff lines, and per each line retrieve the status of the line (modified, added, deleted) and build my own output out of it.
Update:
Let's say I want to build my own diff output. What I'd like to do is to iterate over the changed lines, and during iteration I would check for the type of change (added, removed), and construct my output.
For example:
var diff = "";
foreach (LineChange line in changes) // Bogus class "LineChange"
{
if (line.Type == LineChange.TYPE_ADDED)
diff += "+";
else
diff += "-";
diff += line.Content;
diff += "\n";
}
The above is just a simple example what kind of flexibility I'm looking for. To be able to go through the changes, and run some logic along with it depending on the line change types. The Patch property is already "built", one way would be to parse it, but it seems silly that the library first builds the output, and then I parse it... I'd rather use the building ingredients directly.
I need this kind of functionality so that I can display a visual diff of changes which involves far more code and logic than the simple example I gave above.

As far as I can see, this information is not exposed by libgit2sharp, but it's provided by libgit2 in the case of blob diffs (but not for tree diffs). The relevant code is in ContentChanges.cs, specifically in the constructor and in the LineCallback() method (the code for tree diffs is in TreeChanges.cs).
Because of this, I think you have two options:
Invoke the method git_diff_blobs(), that's used internally by ContentChanges, yourself, either using reflection (it's an internal method in NativeMethods), or by copying the PInvoke signature to your project. You will most likely also need Utf8Marshaler.
Modify the code of ContentChanges, so that it fits your needs. If you do this, it might make sense to create a pull request for that change, so that others could use it too.

#svick is right. It's not exposed.
It might be useful to open an issue/feature request to further discuss this topic. Indeed, exposing a full blown line based diffgram might not fit the current "grain" of the library. However, provided you can come up with a scenario/use case that would benefit most of the users, some research may be invested in order to widen the API.
Beside this option, there might be other solutions: post-process the current produced patch against the previous version of the file
See this SO question for potential leads
Neil Fraser's "Diff Strategies" paper is also a great source of strategies and potential caveats regarding what a diff tool might aim at
DiffPlex, as a working visualization tool, might be inspirational as well
With some more work, one might even achieve something similar to the following kind of visualization (from Perforce 4 viewer)
(source: macworld.com)
Note: In order to ease this, it might be useful to expose in C# the libgit2 diffing options.

Static analysis tool to check locking before access to variable

I know there are a quite a few static analysis tools for C# or .Net around. See this question for a good list of available tools. I have used some of those in the past and they have a good way of detecting problems.
I am currently looking for a way to automatically enforce some locking rules we have in our teams. For example I would like to enforce the following rules:
"Every public method that uses member foo must acquire a lock on bar"
Or
"Every call to foobar event must be outside lock to bar"
Writing custom FxCop rules, if feasible, seems rather complex. Is there any simpler way of doing it?

Multithreading is hard. Using locks is not the only way to make operations thread-safe. A developer may use non-blocking synchronization with a loop and Interlocked.CompareExchange, or some other mechanism instead. A rule can not determine if something is thread-safe.
If the purpose of rules is to ensure high quality code, I think the best way to go about this is to create a thread-safe version of your class which is simple to consume. Put checks in place that the more-complex synchronization code is only modified under code review by developers that understand multithreading.

With NDepend you could write a code rule over a LINQ query (CQLinq) that could look like:
warnif count > 0 from m in Methods where
m.IsUsing ("YourNamespace.YourClass.foo") && (
! m.IsUsing ("YourNamespace.YourClass.bar") ||
! m.IsUsing ("System.Threading.Monitor.Enter(Object)".AllowNoMatch()) ||
! m.IsUsing ("System.Threading.Monitor.Exit(Object)".AllowNoMatch()) )
select new { m, m.NbLinesOfCode }
Basically it will matches methods that uses the field foo, without using the field bar, or without calling Monitor Enter or Exit. This is not exactly what you are asking for, since you want lock explicitely on bar, but this is simple and quite close.
Notes that you can also write...
m.AssignField("YourNamespace.YourClass.foo")
... to restrict a specific write/assign field usage on foo.

One of possible solutions could be implementation of Code Contracts. You define rules, run them at compile time (so can be also integrated in your CI environment if any) and get results.
For en example of using CodeContracts like a tool for code static analys see :
Static Code Analysis and Code Contracts

How to enforce the use of a method's return value in C#?

I have a piece of software written with fluent syntax. The method chain has a definitive "ending", before which nothing useful is actually done in the code (think NBuilder, or Linq-to-SQL's query generation not actually hitting the database until we iterate over our objects with, say, ToList()).
The problem I am having is there is confusion among other developers about proper usage of the code. They are neglecting to call the "ending" method (thus never actually "doing anything")!
I am interested in enforcing the usage of the return value of some of my methods so that we can never "end the chain" without calling that "Finalize()" or "Save()" method that actually does the work.
Consider the following code:
//The "factory" class the user will be dealing with
public class FluentClass
{
//The entry point for this software
public IntermediateClass<T> Init<T>()
{
return new IntermediateClass<T>();
}
}
//The class that actually does the work
public class IntermediateClass<T>
{
private List<T> _values;
//The user cannot call this constructor
internal IntermediateClass<T>()
{
_values = new List<T>();
}
//Once generated, they can call "setup" methods such as this
public IntermediateClass<T> With(T value)
{
var instance = new IntermediateClass<T>() { _values = _values };
instance._values.Add(value);
return instance;
}
//Picture "lazy loading" - you have to call this method to
//actually do anything worthwhile
public void Save()
{
var itemCount = _values.Count();
. . . //save to database, write a log, do some real work
}
}
As you can see, proper usage of this code would be something like:
new FluentClass().Init<int>().With(-1).With(300).With(42).Save();
The problem is that people are using it this way (thinking it achieves the same as the above):
new FluentClass().Init<int>().With(-1).With(300).With(42);
So pervasive is this problem that, with entirely good intentions, another developer once actually changed the name of the "Init" method to indicate that THAT method was doing the "real work" of the software.
Logic errors like these are very difficult to spot, and, of course, it compiles, because it is perfectly acceptable to call a method with a return value and just "pretend" it returns void. Visual Studio doesn't care if you do this; your software will still compile and run (although in some cases I believe it throws a warning). This is a great feature to have, of course. Imagine a simple "InsertToDatabase" method that returns the ID of the new row as an integer - it is easy to see that there are some cases where we need that ID, and some cases where we could do without it.
In the case of this piece of software, there is definitively never any reason to eschew that "Save" function at the end of the method chain. It is a very specialized utility, and the only gain comes from the final step.
I want somebody's software to fail at the compiler level if they call "With()" and not "Save()".
It seems like an impossible task by traditional means - but that's why I come to you guys. Is there an Attribute I can use to prevent a method from being "cast to void" or some such?
Note: The alternate way of achieving this goal that has already been suggested to me is writing a suite of unit tests to enforce this rule, and using something like http://www.testdriven.net to bind them to the compiler. This is an acceptable solution, but I am hoping for something more elegant.

I don't know of a way to enforce this at a compiler level. It's often requested for objects which implement IDisposable as well, but isn't really enforceable.
One potential option which can help, however, is to set up your class, in DEBUG only, to have a finalizer that logs/throws/etc. if Save() was never called. This can help you discover these runtime problems while debugging instead of relying on searching the code, etc.
However, make sure that, in release mode, this is not used, as it will incur a performance overhead since the addition of an unnecessary finalizer is very bad on GC performance.

You could require specific methods to use a callback like so:
new FluentClass().Init<int>(x =>
{
x.Save(y =>
{
y.With(-1),
y.With(300)
});
});
The with method returns some specific object, and the only way to get that object is by calling x.Save(), which itself has a callback that lets you set up your indeterminate number of with statements. So the init takes something like this:
public T Init<T>(Func<MyInitInputType, MySaveResultType> initSetup)

I can think of three a few solutions, not ideal.
AIUI what you want is a function which is called when the temporary variable goes out of scope (as in, when it becomes available for garbage collection, but will probably not be garbage collected for some time yet). (See: The difference between a destructor and a finalizer?) This hypothetical function would say "if you've constructed a query in this object but not called save, produce an error". C++/CLI calls this RAII, and in C++/CLI there is a concept of a "destructor" when the object isn't used any more, and a "finaliser" which is called when it's finally garbage collected. Very confusingly, C# has only a so-called destructor, but this is only called by the garbage collector (it would be valid for the framework to call it earlier, as if it were partially cleaning the object immediately, but AFAIK it doesn't do anything like that). So what you would like is a C++/CLI destructor. Unfortunately, AIUI this maps onto the concept of IDisposable, which exposes a dispose() method which can be called when a C++/CLI destructor would be called, or when the C# destructor is called -- but AIUI you still have to call "dispose" manually, which defeats the point?
Refactor the interface slightly to convey the concept more accurately. Call the init function something like "prepareQuery" or "AAA" or "initRememberToCallSaveOrThisWontDoAnything". (The last is an exaggeration, but it might be necessary to make the point).
This is more of a social problem than a technical problem. The interface should make it easy to do the right thing, but programmers do have to know how to use code! Get all the programmers together. Explain simply once-and-for-all this simple fact. If necessary have them all sign a piece of paper saying they understand, and if they wilfully continue to write code which doesn't do anythign they're worse than useless to the company and will be fired.
Fiddle with the way the operators are chained, eg. have each of the intermediateClass functions assemble an aggregate intermediateclass object containing all of the parameters (you mostly do it this was already (?)) but require an init-like function of the original class to take that as an argument, rather than have them chained after it, and then you can have save and the other functions return two different class types (with essentially the same contents), and have init only accept a class of the correct type.
The fact that it's still a problem suggests that either your coworkers need a helpful reminder, or they're rather sub-par, or the interface wasn't very clear (perhaps its perfectly good, but the author didn't realise it wouldn't be clear if you only used it in passing rather than getting to know it), or you yourself have misunderstood the situation. A technical solution would be good, but you should probably think about why the problem occurred and how to communicate more clearly, probably asking someone senior's input.

After great deliberation and trial and error, it turns out that throwing an exception from the Finalize() method was not going to work for me. Apparently, you simply can't do that; the exception gets eaten up, because garbage collection operates non-deterministically. I was unable to get the software to call Dispose() automatically from the destructor either. Jack V.'s comment explains this well; here was the link he posted, for redundancy/emphasis:
The difference between a destructor and a finalizer?
Changing the syntax to use a callback was a clever way to make the behavior foolproof, but the agreed-upon syntax was fixed, and I had to work with it. Our company is all about fluent method chains. I was also a fan of the "out parameter" solution to be honest, but again, the bottom line is the method signatures simply could not change.
Helpful information about my particular problem includes the fact that my software is only ever to be run as part of a suite of unit tests - so efficiency is not a problem.
What I ended up doing was use Mono.Cecil to Reflect upon the Calling Assembly (the code calling into my software). Note that System.Reflection was insufficient for my purposes, because it cannot pinpoint method references, but I still needed(?) to use it to get the "calling assembly" itself (Mono.Cecil remains underdocumented, so it's possible I just need to get more familiar with it in order to do away with System.Reflection altogether; that remains to be seen....)
I placed the Mono.Cecil code in the Init() method, and the structure now looks something like:
public IntermediateClass<T> Init<T>()
{
ValidateUsage(Assembly.GetCallingAssembly());
return new IntermediateClass<T>();
}
void ValidateUsage(Assembly assembly)
{
// 1) Use Mono.Cecil to inspect the codebase inside the assembly
var assemblyLocation = assembly.CodeBase.Replace("file:///", "");
var monoCecilAssembly = AssemblyFactory.GetAssembly(assemblyLocation);
// 2) Retrieve the list of Instructions in the calling method
var methods = monoCecilAssembly.Modules...Types...Methods...Instructions
// (It's a little more complicated than that...
// if anybody would like more specific information on how I got this,
// let me know... I just didn't want to clutter up this post)
// 3) Those instructions refer to OpCodes and Operands....
// Defining "invalid method" as a method that calls "Init" but not "Save"
var methodCallingInit = method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(INITMETHODSIGNATURE);
var methodNotCallingSave = !method.Body.Instructions.Any
(instruction => instruction.OpCode.Name.Equals("callvirt")
&& instruction.Operand is IMethodReference
&& instruction.Operand.ToString.Equals(SAVEMETHODSIGNATURE);
var methodInvalid = methodCallingInit && methodNotCallingSave;
// Note: this is partially pseudocode;
// It doesn't 100% faithfully represent either Mono.Cecil's syntax or my own
// There are actually a lot of annoying casts involved, omitted for sanity
// 4) Obviously, if the method is invalid, throw
if (methodInvalid)
{
throw new Exception(String.Format("Bad developer! BAD! {0}", method.Name));
}
}
Trust me, the actual code is even uglier looking than my pseudocode.... :-)
But Mono.Cecil just might be my new favorite toy.
I now have a method that refuses to be run its main body unless the calling code "promises" to also call a second method afterwards. It's like a strange kind of code contract. I'm actually thinking about making this generic and reusable. Would any of you have a use for such a thing? Say, if it were an attribute?

What if you made it so Init and With don't return objects of type FluentClass? Have them return, e.g., UninitializedFluentClass which wraps a FluentClass object. Then calling .Save(0 on the UnitializedFluentClass object calls it on the wrapped FluentClass object and returns it. If they don't call Save they don't get a FluentClass object.

In Debug mode beside implementing IDisposable you can setup a timer that will throw a exception after 1 second if the resultmethod has not been called.

Use an out parameter! All the outs must be used.
Edit: I am not sure of it will help, tho...
It would break the fluent syntax.

what does object.start() mean?

Sorry i am new to C#. I have a program, where there is a class CatchFS. The main function in the class , has the code
CatchFS fs = new CatchFS(args);
fs.Start();
Can someone tell me what it means. I hv heard of thread.start() but object.start() is new to me . Am i even thinking right ?
Thanks a lot, Yes it is derived from a class called FileSysetm.cs. The start does this : public void Start ()
{
Console.WriteLine("start");
Create ();
if (MultiThreaded) {
mfh_fuse_loop_mt (fusep);
}
else {
mfh_fuse_loop (fusep);
}
}
Now im trying to do a fusemount. The program starts and it hangs. there is some call that was not returned and i couldnt figure out which one. I tried using debug option of monodevelop, but no use, it runs only in my main function and I get thread started and thats it !!
I think the file FileSystem.cs is from library Mono.fuse.dll. Thanks for all your time. I hv been looking at this question for 2 whole days, and I dont seem to figureout much as to why the code wont proceed.Im expecting my azure cloud storage to be mounted in this fusemount point. My aim is after running this code I should be able to do an ls on the mountpoint to get list of contents of the cloud storage. I am also suspecting the mountpoint. Thanks a lot for providing me all your inputs.

There is no object.Start method. Start must be a method of the CatchFS class or some base class from which CatchFS derives.
If possible, consult the documentation for the library CatchFS comes from. That should hopefully explain what CatchFS.Start does.
If the documentation is sparse or nonexistent but you do have the source code, you can also simply take a look at the CatchFS.Start method yourself and try to figure out what its intended behavior is.
If there's no documentation and you have no source code, you're dealing with a black box. If you can contact the developer who wrote CatchFS, ask him/her what Start does.
One final option would be to download .NET Reflector and use that to disassemble the compiled assembly from which CatchFS is loaded. Treat this as a last resort, as code revealed by Reflector is typically less readable than the original source.

Start is a method on the CatchFS class (or one of its parent classes) - you'll have to read the documentation or source for that class to find out what it actually means.

According to the MSDN Docs for Object, there is no Start method. This must either be a method of CatchFS or one of it's base classes.

"Magic" constants in C# like PHP has?

I am building a logging control for a C# project and would like to be able to call it with the name of the current source code File, Line, Class, Function, etc. PHP uses "magic constants" that have all of this info: http://php.net/manual/en/language.constants.predefined.php but I don't see anything like that in the C# compiler language.
Am I looking for something that doesn't exist?

Using the StackTrace/StackFrame classes, you can have your control find out where it's been called from, rather than passing it that information:
private static StringBuilder ListStack(out string sType)
{
StringBuilder sb = new StringBuilder();
sType = "";
StackTrace st = new StackTrace(true);
foreach (StackFrame f in st.GetFrames())
{
MethodBase m = f.GetMethod();
if (f.GetFileName() != null)
{
sb.AppendLine(string.Format("{0}:{1} {2}.{3}",
f.GetFileName(), f.GetFileLineNumber(),
m.DeclaringType.FullName, m.Name));
if (!string.IsNullOrEmpty(m.DeclaringType.Name))
sType = m.DeclaringType.Name;
}
}
return sb;
}
(I used this code to get the call stack of the currently executed method, so it does more than you asked for)

The StackTrace/StackFrame classes will give you quite a bit of this, though they can be quite expensive to construct.

You can ask the system for a stack trace, and you can use reflection. Details are coming.
__LINE__
__FILE__
__DIR__
__FUNCTION__ (does not really exist in C#)
__CLASS__
__METHOD__
__NAMESPACE__
This is a start:
http://www.csharp-examples.net/reflection-callstack/
http://www.csharp-examples.net/reflection-calling-method-name/
Assembly.GetExecutingAssembly().FullName
System.Reflection.MethodBase.GetCurrentMethod().Name
You will get better information in Debug (non-optimized) build. PhP might always have access to all that stuff, but it ain't the fastest gun on this planet. Play with it and let me know what is missing.

There are methods to get this type of data. It depends on what data you want.
__CLASS__ : If you want the current classname you'll need to use reflection.
__LINE__ : I'm not sure what "The current line number of the file" means, I'll take a guess and say it's how many lines in the file. That can be done by opening the file and doing a line count. This can be done via the File class, the FileInfo class may also work.
__DIR__ :Getting the directory of the file is done by using the DirectoryInfo class.
__FUNCTION__ and __METHOD__: Function name (method name), this can be retrieved via reflection.
__NAMESPACE__ :Namespace an be retrieved via reflection

Using Type, the best you can really do is get information about the current class. There is no means to get the file (though you should generally stick to one class per file), nor line number, nor function using Type.
Getting a type is simple, for example, this.getType(), or typeof(MyClass).
You can get the more specific details by generating a StackTrace object and retrieving a StackFrame from it, but doing so repeatedly is a bad idea.
I think a more important question is perhaps: why do you need them? For trace debugging, your output is supposedly temporary, so whether it reflects an accurate line number or not shouldn't matter (in fact, I rarely ever include a line number in trace debugging). Visual Studio is also very useful as a true step debugger. What do you really need File, Class, Function, and Line Number for?
Edit: For error checking, use exceptions like they're meant to be used: for exceptional (wrong) cases. The exception will generate a stack trace pointing you right at the problem.

Many of the previous responders have provided excellent information; however, I just wanted to point out that accessing the StackFrame is exorbitantly expensive and probably shouldn't be done except for special cases. Those cases being an extremely chatty verbose mode for debugging corner cases or error logging and for an error you probably already have an Exception instance which provides the StackTrace. Your best performance will be as Bring S suggested by using Type. Also as another design consideration logging to the console can slow your application down by several orders of magnitude depending on the volume of data to display. So if there is a console sink having the writer operating on a worker thread helps tremendously.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find all File IO operations in C# code (maybe through static analysis) - c#

If you're planning to develop this tool, you can start here. The article describes ways to detect assembly and method dependencies, so you could find all methods that calls IO primitives, such as FileStream.Write.

Related

Iterating through diff changes in LibGit2Sharp

Static analysis tool to check locking before access to variable

How to enforce the use of a method's return value in C#?

what does object.start() mean?

"Magic" constants in C# like PHP has?

Categories

Resources