Finding property usage counts with reflection - c#

I have a generated file with around 10,000 public static properties and methods. I suspect that a nontrivial number of them are entirely unused, but there are around 50 assemblies and millions of lines of code to check to be sure.
What I would like to do is run some kind of utility that can look into all of the compiled assemblies we have and tell me exactly which members of this class are being called, and give me counts for each one.
Is such a thing possible with reflection, or do I need to revert to actual code analysis tools? Are there any libraries that can analyze assemblies to find their dependencies within another assembly?

The ReSharper "Find Usages Advanced" feature has an option to find references in Libraries as well as in the current Solution. I haven't used this particular feature, so I'm not sure how well it works (the Find Usages within a solution works quite nicely), but you can get a trial version of ReSharper and try it. My guess is that you'll need to run it from a Solution that has a Project with references to the various Libraries you're interested in.

I don't think this can be done with "regular" reflection, since usages cannot be detected by looking only at the structure of the classes. I guess you'll need to disassemble the IL and analyze it, looking for call, calli, and callvirt instructions (property lookups are also method calls). You can get the IL for a method with typeof(SomeType).GetMethod("Method").GetMethodBody().GetILAsByteArray(), but it might be hard to analyze it when it's in the form of a byte array. You might want to look into Cecil, which might help you analyze the bytecode.
Of course, there might well exist tools for this already.

Related

How determine that assembly was compiled from f# project?

What is the difference between c# and f# assemblies? Some flag maybe? I want to determine it using reflection API only
There's no single value to check that would tell you what you need, but there's a good amount of circumstancial evidence that you could look at - IlSpy is your friend if you want to explore it.
I would suggest you check for presence of these two indicators, either of them being present would mean you're likely looking at an F# assembly unless someone is really dedicated to mess things up for you.
FSharpInterfaceDataVersionAttribute on the assembly. This was my initial suggestion, however there are compiler flags that, when set, would prevent this attribute from being emitted: --standalone and --nointerfacedata. I find it highly doubtful either of them would be commonly used in the field, but the fact remains there are openly available ways of opting out from the attribute being emitted right now.
asm.GetCustomAttribute(typeof(FSharpInterfaceDataVersionAttribute))
Presence of StartupCode types. They're an artifact of how F# compiler compiles certain constructs, and it seems they're present even empty, so they should be highly reliable.
asm.GetTypes().Where(fun x -> x.FullName.StartsWith("<StartupCode$"))
In particular looking for a reference to FSharp.Core is not a great idea, as it would be commonly referenced from C# projects as well if you're working with mixed solutions (and there's nothing stopping anyone from just getting it off nuget).

Partially reference a DLL

I have a library DLL full with sort algorithmn, parsers, validators, converters etc. The DLL is about 40 Mb (that is not much I know but still). Now I would like to reference just the parsers of that DLL. The point is to get out those parsers without shipping 40 Mb to the customer.
Is there a way everytime I make a release build to just take those up-to-date parsers from my library, store them into some kind of .partialDll file and deliver only them to the customer? The result would be me keeping all my helper classes in one big library which keeps growing and the customers get just what they ordered..
I guess I would need to deal with alot of reflection to achieve something like this, right? Any ideas?
Let me start with a quote from MSDN:
"Assemblies are the building blocks of .NET Framework applications; they form the fundamental unit of deployment […]."
Note that the quote is about assemblies, not about DLLs. There's a difference!
Although most .NET assemblies consist of exactly one DLL file, that is not a strict requirement: An assembly can in fact consist of more than one file; such a "multi-file assembly" can, for instance, consist of several DLLs, which in turn are called "netmodules". (A netmodule might have a .netmodule file extension by convention, but it's really a DLL containing .NET metadata and bytecode.) Each multi-file assembly has exactly one "main" module which carries the metadata that references all the other assembly files and so ties them together into a logical whole.
While an assembly has to be deployed in full (as per the above quote), the .NET runtime can load only those netmodules that are actually required for JIT code compilation and execution.
So you can split up an assembly into several parts, and have the runtime load only what is actually needed; but you cannot do the same to a netmodule / DLL file. A DLL file can only be deployed and loaded in its entirety.
Note also that Visual Studio's support for netmodules is non-existent for all practical purposes, so most people don't use them, which is why you see so few multi-file assemblies in the real world.
The bottom line is this: In practice, if you or your clients are interested in only a part of an assembly ("DLL"), then it's usually easier to split a large assembly (that is, one large Visual Studio project) into several inter-dependent assemblies (several smaller Visual Studio projects).
In general, no, there is no way to achieve that. Once you pack "everything" into a module and compile it, you can't split that module later into smaller ones. (well, ok, you can analyze the bytecode and rewrite the assembly, see the end of this post).
For me, your nullhypothesis seems wrong. You don't need to work with "one huge library that keeps all your helper classes", and really, you dont want, or you will not want to either. If you don't feel like that, I assure you that in time, years maybe, you will hate such one-to-have-it-all approach.
This is exactly what you want to escape from and this is why .Net and many other languages/environments support concept of "libraries" or "modules" and allow you to use multiple of them, and that's why most of the projects you see everywhere aren't created as "one huge EXE". It's much easier to reuse, analyze and even hunt bugs when you have it in smaller chunks.
--
However, if you'd insist, there are ways (ugly) to achive something-like you think. I assume that the "huge DLL" is in C# and is controlled by you.
First, somewhat naiive but working way, is to use "file links". In VisualStudio you can have a project that contains tons of files and producess a BigDLL "all.dll", and just by its side you can create another project that will not contain any files at all, but that will contain links to the first projects' files. Use typical "Add a file.." option to a project and note that near the final "Add" button there's a down arrow that expands to "Add as link..".
This will cause the file to stay in HugeProject, but the SmallProject will see the file too and when SmallProject is compiled, it will pull the code from that file too.
Note that this way you will actually build two separate modules assemblies: big one and small one, and your final product will need to reference the small one.
This way is naiive and ugly, it is just as if you manually copied/splitted the huge project into smaller ones, but with the tiny advantage is that you don't need to copy the code files around.
--
intermission for side-thoughts:
you can use #if to conditionally turn off some currently-unused code, however setting the flags that drive those IFs will be cumbersome
you can edit .csproj files and use MSBuild conditional clauses to automatically exclude unused code files from your HugeProject during final builds, however setting the flags that drive those IFs will be cumbersome too
--
The second way is to keep everything in the HugeProject, and to have your application(s) reference it directly, and then after building and testing everything, just before packing that and sending to customer - use some kind of trimming utility that will check what parts of code are referenced and that will remove all dead code from the assemblies. I can't give you any name for such utility, but many obfuscators come with such feature.
They will run through your compiled code, cross-reference everything, change/remove/trash class/method/propertynames and also they may as a bonus remove unused bits. Then, they'll write mangled assemblies back to disk ensuring that they reference each other and not the original ones from before mangling.
example: See a question related to that
example: See an example of such utility also consider ILMerge for better results.
Cons - utility may leave some trash it couldn't decide whether it is used or not, finding/testing/buying it may take some time and resources, you can have some signing problems since the stripped-assembly will be a brand new assembly, etc. Also, such utilities have problems if you invoke some code only by reflection and it may require you to provide some extra hints or to make sure the code "seems to be used" (example: a whole namespace of "plugins" that implement "IPlugin" and then your app searched that NS for Types and uses Activator.CreateInstance to instantiate them; no hard-linked usages, trimmer may decide to remove all plugins as "unused"; you'll need to configure trimmer carefully or be suprised).
Probably a few other ways could be found too, but seriously, in most of the times, you don't want to waste your time on that, especially manually. So just tidy up your code and split it into small libs, or start looking for automatic obfuscator&trimmer.

Does an ILSplit.exe exist, equivalent to ILMerge.exe, or how could this be made?

Does a utility for splitting a single .NET assembly into a subset of the full assembly exist? I.e. the "functional inverse" of ILMerge.exe?
This tool, of course, would be difficult to produce if it had to track dependencies etc. between classes, functions and such.
However, what I am looking is for a case where I have a very big (hundreds of MB) mixed mode assembly with mostly static classes and static methods, basically just a function library. Although, with some DLLMain initialization and similar.
What I would like is to be able to specify a list of static methods upon specific static classes that I want to keep in the subset assembly. Technically, this should be possible as an assembly is just binary information with a standardized format.
So does this exist or how could this be made? Or why would this be impractical?
No, very high odds that this tool doesn't exist. Albeit that the absence of the tool can never be disproved positively.
These IL rewriting tricks don't work on mixed-mode assemblies anyway, ILMerge doesn't support them either. Such assemblies don't just contain IL, they also have machine code and a relocation table. There is no simple way to pick machine code apart, mostly because it isn't just pure code but also contains data. Like jump tables for a switch-statement. Also the reason that programmers that write code in a native language don't bother with obfuscators. Decompiling machine code is a major time sink and always imperfect.
So, for one, it is likely that your assembly is large because it has a lot of native code. You will need to tackle this at the project level and split up this mongo project into smaller ones, distributing the source code between them. That's work. And not exactly always easy, linker errors are a common scourge when you do this. And you're likely to have to change code declarations so they can be exported. Only one way to do this, start at the beginning and split off a sub-section first so now you have a not-so-big assembly and a small one. Rinse and repeat. Do beware the cost, many assemblies make the cold start of a program slower.

What is the best method to find a class/property in c# through reflection after obfuscation has been done?

Here's an example of the code which will be used for the reflection:
var i = typeof(Program).Assembly.CreateInstance("test.Program");
After the software is obfuscated, the code will obviously stop working.
I'm trying to find a way around it by searching for properties of a class, which do not change after obfuscation has been done. I've tried that with type.GUID, but when I run the debug version, I get one GUID, and in the release after the obfuscation is completed, the guid is changed.
I'm using Eazfuscator.NET for obfuscation.
I would like to avoid using attributes to mark class/method if possible.
Any ideas on what would work?
I'm sure there are ways to iterate over all types and find the one you're looking for, but the things that come to mind would all produce the least maintainable code ever.
Some obfuscators (we use DeepSea, I don't know Eazfuscator) allow preventing obfuscation of specific classes, allowing reflection on those. In DeepSea's case, this is indicated by attributes but those won't/shouldn't (I never checked :o) make it to the final assembly.
If you regard reflection as "an outside process looking at your assembly" and obfuscating "preventing outside processes from looking at your assembly" you're really stopping yourself from doing what you want to do.
don't want the obfuscator to defeat the attackers. Just make the job of understanding the code more difficult. And I want this as a part of advanced piracy protection
After obfuscation; zip, encrypt and do whatever you want with your assembly. Then create another wrapper project and add your assembly as a resource into that project. Attach to AppDomain.CurrentDomain.AssemblyResolve event (in your new project) and whenever an unresolved assembly event occurs, read your resource(decrypt,unzip etc.) and return the actual assembly.
You may also try to obfuscate your final wrapper application.
How secure? At least, you can make life more harder for attackers.
I don't have exact answer, but ILSpy's source might help you.

How do you programmatically identify the number of references to a method with C#

I've recently inherited C# console application that is in need of some pruning and clean up. Long story short, the app consists of a single class containing over 110,000 lines of code. Yup, over 110,000 lines in a single class. And, of course, the app is core to our business, running 'round the clock updating data used on a dynamic website. Although I'm told my predecessor was "a really good programmer", it obvious he was not at all into OOP (or version control).
Anyway... while familiarizing myself with the code I've found plenty of methods that are declared, but never referenced. It looks as if copy/paste was used to version the code, for example say I have a method called getSomethingImportant(), chances are there is another method called getSomethingImortant_July2007() (the pattern is functionName_[datestamp] in most cases). It looks like when the programmer was asked to make a change to getSomethingImportant() he would copy/paste then rename to getSomethingImortant_Date, make changes to getSomethingImortant_Date, then change any method calls in the code to the new method name, leaving the old method in the code but never referenced.
I'd like to write a simple console app that crawls through the one huge class and returns a list of all methods with the number of times each method was referenced. By my estimates there are well over 1000 methods, so doing this by hand would take a while.
Are there classes within the .NET framework that I can use to examine this code? Or any other usefull tools that may help identify methods that are declared but never referenced?
(Side question: Has anyone else ever seen a C# app like this, one reeeealy big class? It's more or less one huge procedural process, I know this is the first I've seen, at least of this size.)
You could try to use NDepend if you just need to extract some stats about your class. Note that this tool relies on Mono.Cecil internally to inspect assemblies.
To complete the Romain Verdier answer, lets dig a bit into what NDepend can bring to you here. (Disclaimer: I am a developer of the NDepend team)
NDepend lets query your .NET code with some LINQ queries. Knowing which methods call and is called by which others, is as simple as writing the following LINQ query:
from m in Application.Methods
select new { m, m.MethodsCalled, m.MethodsCallingMe }
The result of this query is presented in a way that makes easy to browse callers and callees (and its 100% integrated into Visual Studio).
There are many other NDepend capabilities that can help you. For example you can right click a method in Visual Studio > NDepend > Select methods... > that are using me (directly or indirectly) ...
The following code query is generated...
from m in Methods
let depth0 = m.DepthOfIsUsing("NUnit.Framework.Constraints.ConstraintExpression.Property(String)")
where depth0 >= 0 orderby depth0
select new { m, depth0 }
... which matches direct and indirect callers, with the depth of calls (1 means direct caller, 2 means caller of direct callers and so on).
And then by clicking the button Export to Graph, you get a call graph of your pivot method (of course it could be the other way around, i.e method called directly or indirectly by a particular pivot method).
Download the free trial of Resharper. Use the Resharper->Search->Find Usages in File (Ctrl-Shift-F7) to have all usages highlighted. Also, a count will appear in the status bar. If you want to search across multiple files, you can do that too using Ctrl-Alt-F7.
If you don't like that, do text search for the function name in Visual Studio (Ctrl-Shift-F), this should tell you how many occurrences were found in the solution, and where they are.
I don't think you want to write this yourself - just buy NDepend and use its Code Query Language
There is no easy tool to do that in .NET framework itself. However I don't think you really need a list of unused methods at once. As I see it, you'll just go through the code and for each method you'll check if it's unused and then delete it if so. I'd use Visual Studio "Find References" command to do that. Alternatively you can use Resharper with its "Analize" window. Or you can just use Visual Studio code analysis tool to find all unused private methods.
FXCop has a rule that will identify unused private methods. So you could mark all the methods private and have it generate a list.
FXCop also has a language if you wanted to get fancier
http://www.binarycoder.net/fxcop/
If you don't want to shell out for NDepend, since it sounds like there is just a single class in a single assembly - comment out the methods and compile. If it compiles, delete them - you aren't going to have any inheritance issues, virtual methods or anything like that. I know it sounds primitive, but sometimes refactoring is just grunt work like this. This is kind of assuming you have unit tests you run after each build until you've got the code cleaned up (Red/Green/Refactor).
The Analyzer window in Reflector can show you where a method is called (Used By).
Sounds like it would take a very long time to get the information that way though.
You might look at the API that Reflector provides for writing add-ins and see if you can get the grunt work of the analysis that way. I would expect that the source code for the code metrics add-in could tell you a bit about how to get information about methods from the reflector API.
Edit: Also the code model viewer add-in for Reflector could help too. It's a good way to explore the Reflector API.
I don't know of anything that's built to handle this specific case, but you could use Mono.Cecil. Reflect the assemblies and then count references in the IL. Shouldn't be too tough.
Try having the compiler emit assembler files, as in x86 instructions, not .NET assemblies.
Why? Because it's much easier to parse assembler code than it is C# code or .NET assemblies.
For instance, a function/method declaration looks something like this:
.string "w+"
.text
.type create_secure_tmpfile, #function
create_secure_tmpfile:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $-1, -8(%ebp)
subl $4, %esp
and function/method references will look something like this:
subl $12, %esp
pushl 24(%ebp)
call create_secure_tmpfile
addl $16, %esp
movl 20(%ebp), %edx
movl %eax, (%edx)
When you see "create_secure_tmpfile:" you know you have a function/method declaration, and when you see "call create_secure_tmpfile" you know you have a function/method reference. This may be good enough for your purposes, but if not it's just a few more steps before you can generate a very cute call-tree for your entire application.

Categories