Viewing MSIL as expression tree - c#

I'm currently building a compiler for my language into MSIL, and use Reflector to inspect the IL.
Is there a way to visualise the IL as an Expression Tree that could be used to generate the IL instead?

You could use FxCop for this, with a custom rule that writes to a text file or something.
Note: FxCop works on compiled managed code (DLLS/exes), not sure about starting from IL. I suggested this answer as you say you're using Reflector to get IL, implying you're starting from compiled managed code.

Related

IL code vs IL assembly: is there a difference?

If I run a .NET compiler it produces a file containing intermediate language code (IL) and put it into, an .exe file (for instance).
After if I use a tool like ildasm it shows me the IL code again.
However if I write directly into a file IL code then I can use ilasm to produce an .exe file.
What does it contain? IL code again? Is IL code different to IL assembly code?
Is there a difference between IL code and IL assembly?
Yes, there is a big difference between them, since :
(IL) which is also known as Microsoft Intermediate Language or Common Intermediate Language can be considered very similar to the Byte Code generated by the Java Language, and is what I think you are referring as IL Code in your question .
(ILAsm) has the instruction set same as that the native assembly language has. You can write code for ILAsm in any text editor like notepad and then can use the command line compiler (ILAsm.exe) provided by the .NET framework to compile that.
I think that IL Assembly can be considered a fully fledged .NET language(maybe an intermediate language), so when you compile ILAsm with ILAsm.exe you are producing IL in pretty much the same way(with less steps) that your C# compiler does with C# Code ...
As someone stated in the comment IL Assembly is basically a human readable version of the .NET Byte Code.
A .NET assembly does not contain MSIL, it contains metadata and bytes that represent IL opcodes. Pure binary data, not text. Any .NET decompiler, like ildasm.exe, knows how to convert the bytes back to text. It is pretty straight-forward.
The C# compiler directly generates the binary data, there is no intermediate text format. When you write your own IL code with a text editor then you need ilasm.exe to convert it to binary. It is pretty straight-forward.
The most difficult job of generating the binary data is the metadata btw. It is excessively micro-optimized to make it as small as possible, its structure is quite convoluted. No compiler generates the bytes directly, they'll use a pre-built component to get that job done. Notable is that Roslyn had to rewrite this from scratch, big job.

Decompile C# vs C++

I have read many posts about decompiling (though no experience) but did not understand why all of them generally mentioned that it is easier to decompile C# than C++ executable. Could anyone explain the difference?
C# compiles into CIL, not directly into a native code like a C++ compiler would normally do.
It produces a .NET assembly, which contains much more meta data than a C++ executable does (via the embedded manifest) - this is metadata about the types contained in the assembly, what it references and more, making it much easier to decompile than a "normal" executable.
As noted in the comments, CIL in and of itself is a higher level language than assembly and is an object oriented language, making it easier to understand and decompile correctly.
It's simply.
The C#-code has necessary information for restore source code, but C/C++ hasn't it.

Are we looking at the original code in a Disassembler or reverse engineered CIL?

Take ILSPy. When I view my assembly am I looking at my original C#? Or, is this code reconstructed from CIL using some type of reverse engineering process?
My understanding is that release assemblies do not include any original code, just CIL. So, does it make a difference if I build my assembly in release mode?
Neither release nor debug assemblies contain original source code.
ILSpy & friends analyze the compiled CIL to extract a reasonable C# equivalent.
Release vs Debug still makes a huge difference. The compiler does optimize. See Scott Hanselman's post about Release vs Debug.
In terms of what ILSpy does, yes it displays CIL and then reverse engineers it to a reasonable C#/VB representation. I'll admit ILSpy does a very good job with it! I've reversed others' assemblies with it and can make perfect sense of their code. The only time I've had it break down was with WPF and GUI stuff, but I'm sure there are ways to work that as well.
In terms of preventing reversal of your assembly and protecting your intellectual property, use Dotfuscator or other obfuscation tool.
You are seeing code reconstructed from the IL. This reconstruction process can be performed on any .NET assembly regardless of whether it is built in debug or release mode.
You can't prevent your source code from being reconstructed from your assembly in this way, but if you want to make the code less useful/understandable you can use various .NET obfuscation tools.
Actually, you could use ILMerge or .net FuZe to wrap your exe and dlls into an exe or dll container, making it more difficult to disassemble.

Does compiling to native code in .Net remove the MSIL completely?

I'm wondering if, in the context of disassembling .Net code (Redgate .Net reflector, etc), is it more secure to compile your code to native, using Ngen? That is, does that mean someone would now need IDA and ASM skills to disassemble (and make sense) of your code vs the relatively trivial de-compiling of MSIL?
Yes, I'm aware that MS provides a obfuscater for exactly this purpose, but I'm curious if compiling to native is a better solution, with some tradeoffs(no JIT).
Thanks.
ngen doesn't remove the MSIL (or rather, the native binary produced by ngen is unusable without also having the MSIL file). MSIL is still used by the verifier to determine whether to load assemblies in partial-trust scenarios, and for reflection.
There's a lot of good information here.

.Net Assemblies Decompiler that Convert Assemblies into C# Source Code

I am well aware that one can use reflector to browse the content inside an assembly, and one can use FileDisassembler to convert the content into the c# source code with cs projects. But the source code outputted by FileDisassembler may not be able to compile if it has interface with property.
Is the other similar applications that do what FileDisassembler does?
I would not trust Reflector's decompiler.
Many times I have seen it just ignore instruction it did not understand, or just optimized certain sequences away, and changing the meaning the process.
The only trusty way is to use IL.
Regarding more tools, look at the CCI. IIRC, they had a C# source emitter at some stage, but it was removed for some reason.
dotPeek from jetBrains is a good decompiler for c#. http://confluence.jetbrains.net/display/NETPEEK/dotPeek+Early+Access+Program

Categories