How determine that assembly was compiled from f# project? - c#

What is the difference between c# and f# assemblies? Some flag maybe? I want to determine it using reflection API only

There's no single value to check that would tell you what you need, but there's a good amount of circumstancial evidence that you could look at - IlSpy is your friend if you want to explore it.
I would suggest you check for presence of these two indicators, either of them being present would mean you're likely looking at an F# assembly unless someone is really dedicated to mess things up for you.
FSharpInterfaceDataVersionAttribute on the assembly. This was my initial suggestion, however there are compiler flags that, when set, would prevent this attribute from being emitted: --standalone and --nointerfacedata. I find it highly doubtful either of them would be commonly used in the field, but the fact remains there are openly available ways of opting out from the attribute being emitted right now.
asm.GetCustomAttribute(typeof(FSharpInterfaceDataVersionAttribute))
Presence of StartupCode types. They're an artifact of how F# compiler compiles certain constructs, and it seems they're present even empty, so they should be highly reliable.
asm.GetTypes().Where(fun x -> x.FullName.StartsWith("<StartupCode$"))
In particular looking for a reference to FSharp.Core is not a great idea, as it would be commonly referenced from C# projects as well if you're working with mixed solutions (and there's nothing stopping anyone from just getting it off nuget).

Related

Is there a way to find what Types are referenced by a c# assembly?

The Assembly class has a GetReferencedAssemblies method that returns the
referenced assemblies. Is there a way to find what Types are referenced?
The CLR wont be able to tell you at runtime. You would have to do some serious static analysis of the source files - similar to the static analysis done by resharper or visual studio.
Static analysis is fairly major undertaking. You basically need a c# parser, a symbol table and plenty of time to work through all the cases that come up in abstract syntax trees.
Why can't the CLR tell you at run time? It is just in time compiled, this means that CLR bytcode is converted into machine code just before execution. Reflection only tells you stuff known statically at runtime about your types, and the CLR would only know if a type is referenced when the code is run. The CLR only knows when a type is loaded at execution time - at the point of just in time compilation.
Use System.Reflection.Assembly.GetTypes().
Types are not referenced separately from assemblies. If an assembly references another assembly, it automatically references (at least in the technical context) all the types within that assembly, as well. In order to get all the types defined (not referenced) in an assembly, you can use the Assembly.GetTypes method.
It may be possible, but sounds like a rather arduous task, to scan an assembly for which actual types it references (i.e. which types it actually invokes or otherwise mentions). This will probably involve working with IL. Something like this is best to be avoided.
Edit: Actually, when I think about it, this is not possible at all. Whatsoever. On a quite basic level. The thing is, types can be instantiated and referenced willy-nilly. It's not even uncommon for this to happen. Not to mention late binding. All this means trying to analyze an assembly for all the types it references is something like predicting the future.
Edit 2: Comments
While the question, as stated, isn't possible due to all sorts of dynamic references, it is possible greatly shrink all sorts of binary files using difference encoding. This basically allows you to get a file containing the differences between two binary files, which in the case of executables/libraries, tends to be vastly smaller than either of the actual files. Here are some applications that perform this operation. Note that bsdiff doesn't run on Windows, but there is a link to a port there, and you can find many more ports (including to .NET) with the aid of Google.
XDelta
bsdiff
If you'd look, you'll find many more such applications. One of the best parts is, they are totally self-contained and involve very little work on your part.

Is there any C# decompiler that can show the coding almost identically to how it was written?

I've been using reflector to decompile a couple simple c# apps but I notice that though code is being decompiled, I still can't see things as they were written on VS. I think this is the way it is as the compiler replaces human instructions by machine code. However I thought I would give it a try and ask it on here. Maybe there is a decompiler that can decompile and show the coding almost identically to the original code.
That is impossible, since there are lots of ways to get the same IL from different code. For example, there is no way to know if an extension method was called fluent-style vs explicit on the declaring type. There is no way to know if LINQ vs regular code was used. All manner of implicit operations may or may not be there. Removed code may or may not have been there. Many primitives (including enums) up-to-and-including 4 bytes are indistinguishable once they are IL.
If you want the actual code, legally obtain the original code.
Existing .Net decompilers generally decompile to the best of their ability.
You appear to be asking for variable names and line formatting, which for obvious reasons are not compiled to IL.
There are several. I currently use JustDecompile found here http://www.telerik.com/products/decompiler.aspx?utm_source=twitter&utm_medium=sm&utm_campaign=ad
[Edit]
An alternative is .NET Reflector found here: http://www.reflector.net/
I believe there is a free version of it, but didn't take time to look.
Basically, no. There are often many ways to arrive at the same IL code, and there's no way at all for a decompiler to know which was used.
No, nor should there ever be. Things like comments and unreachable code would just add bloat with absolutely zero benefit. The very best you can ever do is approximate the compiled code.

What is the best method to find a class/property in c# through reflection after obfuscation has been done?

Here's an example of the code which will be used for the reflection:
var i = typeof(Program).Assembly.CreateInstance("test.Program");
After the software is obfuscated, the code will obviously stop working.
I'm trying to find a way around it by searching for properties of a class, which do not change after obfuscation has been done. I've tried that with type.GUID, but when I run the debug version, I get one GUID, and in the release after the obfuscation is completed, the guid is changed.
I'm using Eazfuscator.NET for obfuscation.
I would like to avoid using attributes to mark class/method if possible.
Any ideas on what would work?
I'm sure there are ways to iterate over all types and find the one you're looking for, but the things that come to mind would all produce the least maintainable code ever.
Some obfuscators (we use DeepSea, I don't know Eazfuscator) allow preventing obfuscation of specific classes, allowing reflection on those. In DeepSea's case, this is indicated by attributes but those won't/shouldn't (I never checked :o) make it to the final assembly.
If you regard reflection as "an outside process looking at your assembly" and obfuscating "preventing outside processes from looking at your assembly" you're really stopping yourself from doing what you want to do.
don't want the obfuscator to defeat the attackers. Just make the job of understanding the code more difficult. And I want this as a part of advanced piracy protection
After obfuscation; zip, encrypt and do whatever you want with your assembly. Then create another wrapper project and add your assembly as a resource into that project. Attach to AppDomain.CurrentDomain.AssemblyResolve event (in your new project) and whenever an unresolved assembly event occurs, read your resource(decrypt,unzip etc.) and return the actual assembly.
You may also try to obfuscate your final wrapper application.
How secure? At least, you can make life more harder for attackers.
I don't have exact answer, but ILSpy's source might help you.

Finding property usage counts with reflection

I have a generated file with around 10,000 public static properties and methods. I suspect that a nontrivial number of them are entirely unused, but there are around 50 assemblies and millions of lines of code to check to be sure.
What I would like to do is run some kind of utility that can look into all of the compiled assemblies we have and tell me exactly which members of this class are being called, and give me counts for each one.
Is such a thing possible with reflection, or do I need to revert to actual code analysis tools? Are there any libraries that can analyze assemblies to find their dependencies within another assembly?
The ReSharper "Find Usages Advanced" feature has an option to find references in Libraries as well as in the current Solution. I haven't used this particular feature, so I'm not sure how well it works (the Find Usages within a solution works quite nicely), but you can get a trial version of ReSharper and try it. My guess is that you'll need to run it from a Solution that has a Project with references to the various Libraries you're interested in.
I don't think this can be done with "regular" reflection, since usages cannot be detected by looking only at the structure of the classes. I guess you'll need to disassemble the IL and analyze it, looking for call, calli, and callvirt instructions (property lookups are also method calls). You can get the IL for a method with typeof(SomeType).GetMethod("Method").GetMethodBody().GetILAsByteArray(), but it might be hard to analyze it when it's in the form of a byte array. You might want to look into Cecil, which might help you analyze the bytecode.
Of course, there might well exist tools for this already.

Do method names get compiled into the EXE?

Do class, method and variable names get included in the MSIL after compiling a Windows App project into an EXE?
For obfuscation - less names, harder to reverse engineer.
And for performance - shorter names, faster access.
e.g. So if methods ARE called via name:
Keep names short, better performance for named-lookup.
Keep names cryptic, harder to decompile.
Yes, they're in the IL - fire up Reflector and you'll see them. If they didn't end up in the IL, you couldn't build against them as libraries. (And yes, you can reference .exe files as if they were class libraries.)
However, this is all resolved once in JIT.
Keep names readable so that you'll be able to maintain the code in the future. The performance issue is unlikely to make any measurable difference, and if you want to obfuscate your code, don't do it at the source code level (where you're the one to read the code) - do it with a purpose-built obfuscator.
EDIT: As for what's included - why not just launch Reflector or ildasm and find out? From memory, you lose local variable names (which are in the pdb file if you build it) but that's about it. Private method names and private variable names are still there.
Yes, they do. I do not think that there will be notable performance gain by using shorter names. There is no way that gain overcomes the loss of readability.
Local variables are not included in MSIL. Fields, methods, classes etc are.
Variables are index based.
Member names do get included in the IL whether they are private or public. In fact all of your code gets included too, and if you'd use Reflector, you can practically read all the source code of the application. What's left is debugging the app, and I think there might be tools for that.
You must ABSOLUTELY (and I can't emphasize it more) obfuscate your code if you're making packaged applications that have a number of clients and competition. Luckily there are a number of obfuscators available.
This is a major gripe that I have with .Net. Since MS is doing so much hard work on this, why not develop (or acquire) a professional obfuscator and make that a part of VS. Dotfuscator just doesn't cut it, not the version they've for community.
Keep names short, better
performance for named-lookup.
How could this make any difference? I'm not sure how identifiers are looked up by the VM, but I'm pretty sure it's not doing a straight string comparison lookup. This would be the worst possible way to do it.
Keep names cryptic, harder to decompile.
To be honest, I don't think code obfuscation helps that much. Most competent developers out there have already developed a "sixth sense" to figure out things quickly even if identifiers like method names are totally unhelpful since very often the source code they need to maintain or improve already has these problems (I am talking about method names like "DoAllStuff()").
Anyway, security through obscurity is usually a bad idea.
If you are concerned about obfuscation check out .NET Reactor. I tested 8 different obfuscators and Reactor was not only the cheapest commercial one, it was the second best of the bunch (the best was the most expensive one, Dotfuscator Gold).
[EDIT]
Actually now that I think of it, if all you care about is obfuscating method names then the one that comes with VS.NET, Dotfuscator Community Edition, should work fine.
I think they're added, but the length of the name isn't going to affect anything, because of the way the function names are looked up. As for obfuscation, I think there are tools (Dotfuscator or something like that) that basically do exactly what you're saying.

Categories