I have read many posts about decompiling (though no experience) but did not understand why all of them generally mentioned that it is easier to decompile C# than C++ executable. Could anyone explain the difference?
C# compiles into CIL, not directly into a native code like a C++ compiler would normally do.
It produces a .NET assembly, which contains much more meta data than a C++ executable does (via the embedded manifest) - this is metadata about the types contained in the assembly, what it references and more, making it much easier to decompile than a "normal" executable.
As noted in the comments, CIL in and of itself is a higher level language than assembly and is an object oriented language, making it easier to understand and decompile correctly.
It's simply.
The C#-code has necessary information for restore source code, but C/C++ hasn't it.
Related
I would first like to say my goal is to convert MSIL into native X86 code. I am fine with my assembly's still needing the .net framework installed. NGEN is not what I want as you still need the original assembly's.
I came across ilasm, and what I am wondering is this what I want, will this make pure assembly code?
I have looked at other projects like mono (which does not support some of the key features my app uses) and .net linkers but they simple just make a single EXE with the .net framework which is not what I am looking for.
So far any research has come up with...you can't do it. I am really no sure as to why as the JIT does it when it loads the MSIL assembly. I have my own reasons for wanting this, so I guess my question(s) come down to this.
Is the link I posted helpful in anyway?
Is there anything out there that can turn MSIL into x86 assembly?
There are various third-party code-protection packages available that hide the IL by encrypting it and packing it with a special bootloader that only unpacks it during runtime. This might be an option if you're concerned about disassembly of your code, though most of these third-party packages are also already cracked (somewhat unavoidable, unfortunately.) Simple obfuscation may ultimately be just as effective, assuming this is your underlying goal.
One the major challenges associated with 'pre-jitting' the IL is that you end up including fixed address references in the native code. These in turn will need to be 're-based' when the native code is loaded for execution under the CLR. This means you need more than just the logic that gets compiled; you also need all of the reference context information necessary to rebase the fixed references when the code is loaded. It's a lot more than just caching code.
As with most things, the first question should be why instead of how. I assume you have a specific goal in mind, if you want to generate native code yourself (also, why x86? Why not x64 too?). This is the job of the JIT compiler - to compile an optimized instruction set on a particular platform only when needed, and execute it later.
The best source I can recommend to try and understand how the CLR works and how JIT works is taking a look at SSCLI - an implementation of the CLR based on the ECMA-335 spec.
Have you considered not using C#? Given that the output of the C# compiler is MSIL, it would make sense to develop on a different platform if that is not what you want.
Alternatively it sounds like NGEN does the operation you are wanting, it just doesn't handle putting the entire thing into an executable. You could analyze the resultant NGEN image to determine what needs to be done to accomplish that (note that NGENed images are PE files per the documentation)
Here is a link on NGEN that contains information on where the images are stored: C:\windows\assembly\NativeImages_CLR_Bit for instance C:\windows\assembly\NativeImages_v2.0.50727_86. Note that .NET 3.0 and 3.5 are both part of 2.0.
Take ILSPy. When I view my assembly am I looking at my original C#? Or, is this code reconstructed from CIL using some type of reverse engineering process?
My understanding is that release assemblies do not include any original code, just CIL. So, does it make a difference if I build my assembly in release mode?
Neither release nor debug assemblies contain original source code.
ILSpy & friends analyze the compiled CIL to extract a reasonable C# equivalent.
Release vs Debug still makes a huge difference. The compiler does optimize. See Scott Hanselman's post about Release vs Debug.
In terms of what ILSpy does, yes it displays CIL and then reverse engineers it to a reasonable C#/VB representation. I'll admit ILSpy does a very good job with it! I've reversed others' assemblies with it and can make perfect sense of their code. The only time I've had it break down was with WPF and GUI stuff, but I'm sure there are ways to work that as well.
In terms of preventing reversal of your assembly and protecting your intellectual property, use Dotfuscator or other obfuscation tool.
You are seeing code reconstructed from the IL. This reconstruction process can be performed on any .NET assembly regardless of whether it is built in debug or release mode.
You can't prevent your source code from being reconstructed from your assembly in this way, but if you want to make the code less useful/understandable you can use various .NET obfuscation tools.
Actually, you could use ILMerge or .net FuZe to wrap your exe and dlls into an exe or dll container, making it more difficult to disassemble.
I just looked at the source of Mono for the first time and I thought I would find a bunch of C or C++ code, instead I found 26,192 .cs files and 7 .cpp files.
I am not totally shocked but it made me think of a quesiton I've always had in the back of my mind:
How does a project end up being written in "itself" like this?
Was an older version of mono more c/c++? Or was there initial effort to create some kind of machine coded compiler...
What's the "trick" here?
Mono's compiler is written in C#. You may want to read about compiler bootstrapping.
You should be looking for .c files, instead of .cpp files: the mono runtime is written in C, not C++.
I think it is also important to remember that mono is both a virtual machine runtime (the JIT compiler, garbage collector, etc.) as well as a collection of class libraries that run on this framework (the System.Linq namespace, the XML parsers, etc.).
The majority of the .cs files you see are part of the class libraries. These are basically C# code that run like your own C# code (with some exceptions, but basically it doesn't make sense for everyone to reinvent and re-distribute the wheel over and over, so these are the C# "base" class libraries). This is why you can download complex mono programs as such small file sizes if mono is already installed on the machine.
For mono, the JIT, runtime and garbage collector are largely written in C/C++ as you would expect. If you ever get a low level error, you will often see GNU debug tool dumps as you would in C, just with lots more useful information. The Mono framework is very good at taking any C# code and converting it to CIL code that can run anywhere, and they use whatever toolset is best suited to ensure the code does run anywhere (which in this case meant a C compiler runtime on linux).
I'm wondering if, in the context of disassembling .Net code (Redgate .Net reflector, etc), is it more secure to compile your code to native, using Ngen? That is, does that mean someone would now need IDA and ASM skills to disassemble (and make sense) of your code vs the relatively trivial de-compiling of MSIL?
Yes, I'm aware that MS provides a obfuscater for exactly this purpose, but I'm curious if compiling to native is a better solution, with some tradeoffs(no JIT).
Thanks.
ngen doesn't remove the MSIL (or rather, the native binary produced by ngen is unusable without also having the MSIL file). MSIL is still used by the verifier to determine whether to load assemblies in partial-trust scenarios, and for reflection.
There's a lot of good information here.
whats the relation(if any) of MASM assembly language and ILASM. Is there a one to one conversion? Im trying to incorporate Quantum GIS into a program Im kinda writing as I go along! I have GIS on my computer, I have RedGate Reflector and it nor the Object Browser of Visual Studio 2008 couldnt open one(of several which I dont have a strong clue to how they behave) of the .dlls in Quantum. I used the MASM assembly editor and "opened" the same dll and it spewed something I didnt expect to necessarily understand in the first place. How can I/can I make a conversion of that same "code" to something I can interact with in ILASM and Im assuming consequently in Csharp? Thanks a ton for reading and all the responses to earlier questions...please bear in mind Im relatively new to programming in Csharp, and even fresher to MASM and ILASM.
MASM deals with the x86 instructions and is platform/processor dependent, while ILASM reffers to the .Net CIL (common intermediary language) instructions which are platform/processor independent. Switching from something specific to something more general is hard to achieve, that's why, AFAIK, there is no converter from MASM to ILASM (inverse, there is!)
IL is a platform independent layer of abstraction over native code. Code written on the .NET platform in C#, VB.NET, or other .NET language all compile down to an assembly .EXE/.DLL containing IL. Typically, the first time the IL code is executed the .NET runtime will run it through NGen, which compiles it once again down to native code and stores the output in a temporary location where it is actually executed. This allows .NET platform code to be deployed to any platform supporting that .NET framework, regardless of the processor or architecture of the system.
As you've seen, Reflector is great for viewing the code in an assembly because IL can easily be previewed in C# or VB.NET form. This is because IL is generally a little higher level instructions and also contain a lot of metadata that native code wouldn't normally have, such as class, method, and variable names.
It's also possible to compile a .NET project directly to native code by setting the Visual Studio project platform or by calling Ngen.exe directly on the assembly. Once done, it's really difficult to make sense of the native code.
Ther is no relationship between MASM assembly language and ILASM. I don't see you have any way to convert native code to IL code. IL can be understood by CLR only while the MASM assembly language is about native machine code. CLR turns the IL into native code in runtime