I just looked at the source of Mono for the first time and I thought I would find a bunch of C or C++ code, instead I found 26,192 .cs files and 7 .cpp files.
I am not totally shocked but it made me think of a quesiton I've always had in the back of my mind:
How does a project end up being written in "itself" like this?
Was an older version of mono more c/c++? Or was there initial effort to create some kind of machine coded compiler...
What's the "trick" here?
Mono's compiler is written in C#. You may want to read about compiler bootstrapping.
You should be looking for .c files, instead of .cpp files: the mono runtime is written in C, not C++.
I think it is also important to remember that mono is both a virtual machine runtime (the JIT compiler, garbage collector, etc.) as well as a collection of class libraries that run on this framework (the System.Linq namespace, the XML parsers, etc.).
The majority of the .cs files you see are part of the class libraries. These are basically C# code that run like your own C# code (with some exceptions, but basically it doesn't make sense for everyone to reinvent and re-distribute the wheel over and over, so these are the C# "base" class libraries). This is why you can download complex mono programs as such small file sizes if mono is already installed on the machine.
For mono, the JIT, runtime and garbage collector are largely written in C/C++ as you would expect. If you ever get a low level error, you will often see GNU debug tool dumps as you would in C, just with lots more useful information. The Mono framework is very good at taking any C# code and converting it to CIL code that can run anywhere, and they use whatever toolset is best suited to ensure the code does run anywhere (which in this case meant a C compiler runtime on linux).
Related
I have read many posts about decompiling (though no experience) but did not understand why all of them generally mentioned that it is easier to decompile C# than C++ executable. Could anyone explain the difference?
C# compiles into CIL, not directly into a native code like a C++ compiler would normally do.
It produces a .NET assembly, which contains much more meta data than a C++ executable does (via the embedded manifest) - this is metadata about the types contained in the assembly, what it references and more, making it much easier to decompile than a "normal" executable.
As noted in the comments, CIL in and of itself is a higher level language than assembly and is an object oriented language, making it easier to understand and decompile correctly.
It's simply.
The C#-code has necessary information for restore source code, but C/C++ hasn't it.
I'm in the process of wrapping a pure unmanaged VC++ 9 project in C++/CLI in order to use it plainly from a .NET app. I know how to write the wrappers, and that unmanaged code can be executed from .NET, but what I can't quite wrap my head around:
The unmanaged lib is a very complex C++ library and uses a lot of inlining and other features, so I cannot compile this into the /clr-marked managed DLL. I need to compile this into a seperate DLL using the normal VC++ compiler.
How do I export symbols from this unmanaged code so that it can be used from the C++/CLI project? Do I mark every class I need visible as extern? Is it that simple or are there some more complexities?
How do I access the exported symbols from the C++/CLI project? Do I simply include the header files of the unmanaged source code and will the C++ linker take the actual code from the unmanaged DLL? Or do I have to hand write a seperate set of "extern" classes in a new header file that points to the classes in the DLL?
When my C++/CLI project creates the unmanaged classes, will the unmanaged code run perfectly fine in the normal VC9 runtime or will it be forced to run within .NET? causing more compatibility issues?
The C++ project creates lots of instances and has its own custom-implemented garbage collector, all written in plain C++, it is a DirectX sound renderer and manages lots of DirectX objects. Will all this work normally or would such Win32 functionality be affected in any way?
You can start with an ordinary native C++ project (imported from, say, Visual Studio 6.0 from well over a decade ago) and when you build it today, it will link to the current version of the VC runtime.
Then you can add a single new foo.cpp file to it, but configure that file so it has the /CLR flag enabled. This will cause the compiler to generate IL from that one file, and also link in some extra support that causes the .NET framework to be loaded into the process as it starts up, so it can JIT compile and then execute the IL.
The remainder of the application is still compiled natively as before, and is totally unaffected.
The truth is that even a "pure" CLR application is really a hybrid, because the CLR itself is (obviously) native code. A mixed C++/CLI application just extends this by allowing you to add more native code that shares the process with some CLR-hosted code. They co-exist for the lifetime of the process.
If you make a header foo.h with a declaration:
void bar(int a, int b);
You can freely implement or call this either in your native code or in the foo.cpp CLR code. The compiler/linker combination takes care of everything. There should be no need to do anything special to call into native code from within your CLR code.
You may get compile errors about incompatible switches:
/ZI - Program database for edit and continue, change it to just Program database
/Gm - you need to disable Minimal rebuild
/EHsc - C++ exceptions, change it to Yes with SEH Exceptions (/EHa)
/RTC - Runtime checks, change it to Default
Precompiled headers - change it to Not Using Precompiled Headers
/GR- - Runtime Type Information - change it to On (/GR)
All these changes only need to be made on your specific /CLR enabled files.
As mentioned from Daniel, you can fine-tune your settings on file level. You can also play with '#pragma managed' inside files, but I wouldn't do that without reason.
Have in mind, that you can create a complete mixed mode assembly. That means, you can compile your native code unchanged into this file PLUS some C++/CLI wrapper around this code. Finally, you will have the same file as native Dll with all your exported native symbols AND as full-fledged .NET assembly (exposing C++/CLI objects) at the same time!
That also means, you have only to care about exports as far as native client code outside your file is considered. Your C++/CLI code inside the mixed dll/assembly can access the native data structures using the usual access rules (provided simply by including the header)
Because you mentioned it, I did this for some non-trivial native C++ class hierarchy including a fair amount of DirectX code. So, no principal problem here.
I would advise against usage of pInvoke in a .NET-driven environment. True, it works. But for anything non-trivial (say more than 10 functions) you are certainly better with an OO approach as provided by C++/CLI. Your C# client developers will be thankful. You have all the .NET stuff like delegates/properties, managed threading and much more at your finger tips in C++/CLI. Starting with VS 2012 with a somewhat usable Intellisense too.
You can use PInvoke to call exported functions from unmanaged DLLs. This is how unmanaged Windows API is accessed from .Net. However, you may run into problems if your exported functions use C++ objects, and not just plain C data structures.
There also seems to be C++ interop technology that can be of use to you: http://msdn.microsoft.com/en-us/library/2x8kf7zx(v=vs.80).aspx
Hey i have done a few of decompiling in .net as i am learning c# so it helps me to see codes as it helps a lot. But lately i have come acrossed few program that i know are .net but in reflector show up as non .net assemblies. Here is the example of program named: Proxy Multiply.
I am not trying to do any illegal stuff or something. Just trying to learn. I have tried to google this but i was not able to achieve any good result.
Thanks
here is the link to image.
There are many .Net code protection alternative, that obfuscate the IL codes so that they are not that much exposed to IL disassembler application.
.Net Reactor
Themida
SmartAssembly
the list is huge . . .
many of the protector modify the Exe (PE Header info), .Net exe contains some extra MetaData that helps disassembler to identify it.
Download this little application it may tell you a little more about the exe.
Download PEiD 0.95
PEiD is an intuitive application that relies on its user-friendly
interface to detect packers, cryptors and compilers found in PE
executable files – its detection rate is higher than that of other
similar tools since the app packs more than 600 different signatures
in PE files.
PEiD comes with three different scanning methods, each suitable for a
distinct purpose. The Normal one scans the user-specified PE file at
its Entry Point for all its included signatures. The so-called Deep
Mode comes with increased detection ratio since it scans the file's
Entry Point containing section, whereas the Hardcore mode scans the
entire file for all the documented signatures.
My best guess the assembly you are looking for is Protected by .Net Reactor or Themida
I have same problem with dot net reflector before,
try JetBrains dotPeek version 1.0 Decompling(this application will show code that obfuscated)
Decompiling .NET 1.0-4.5 assemblies to C#
Support for .dll, .exe, .zip, .vsix, .nupkg, and .winmd files
Quick jump to a type, assembly, symbol, or type member
Effortless navigation to symbol declarations,
implementations, derived and base symbols, and more
Accurate search for symbol usages
with advanced presentation of search results
Overview of inheritance chains
Support for downloading code from source servers
Syntax highlighting
Complete keyboard support
dotPeek is free!
Just because it is .NET doesn't mean that you can just decompile it like that. They probably used ILMerge. That's not to say it's impossible but it will require more work.
See Is it possible to “decompile” a Windows .exe? Or at least view the Assembly?
I would first like to say my goal is to convert MSIL into native X86 code. I am fine with my assembly's still needing the .net framework installed. NGEN is not what I want as you still need the original assembly's.
I came across ilasm, and what I am wondering is this what I want, will this make pure assembly code?
I have looked at other projects like mono (which does not support some of the key features my app uses) and .net linkers but they simple just make a single EXE with the .net framework which is not what I am looking for.
So far any research has come up with...you can't do it. I am really no sure as to why as the JIT does it when it loads the MSIL assembly. I have my own reasons for wanting this, so I guess my question(s) come down to this.
Is the link I posted helpful in anyway?
Is there anything out there that can turn MSIL into x86 assembly?
There are various third-party code-protection packages available that hide the IL by encrypting it and packing it with a special bootloader that only unpacks it during runtime. This might be an option if you're concerned about disassembly of your code, though most of these third-party packages are also already cracked (somewhat unavoidable, unfortunately.) Simple obfuscation may ultimately be just as effective, assuming this is your underlying goal.
One the major challenges associated with 'pre-jitting' the IL is that you end up including fixed address references in the native code. These in turn will need to be 're-based' when the native code is loaded for execution under the CLR. This means you need more than just the logic that gets compiled; you also need all of the reference context information necessary to rebase the fixed references when the code is loaded. It's a lot more than just caching code.
As with most things, the first question should be why instead of how. I assume you have a specific goal in mind, if you want to generate native code yourself (also, why x86? Why not x64 too?). This is the job of the JIT compiler - to compile an optimized instruction set on a particular platform only when needed, and execute it later.
The best source I can recommend to try and understand how the CLR works and how JIT works is taking a look at SSCLI - an implementation of the CLR based on the ECMA-335 spec.
Have you considered not using C#? Given that the output of the C# compiler is MSIL, it would make sense to develop on a different platform if that is not what you want.
Alternatively it sounds like NGEN does the operation you are wanting, it just doesn't handle putting the entire thing into an executable. You could analyze the resultant NGEN image to determine what needs to be done to accomplish that (note that NGENed images are PE files per the documentation)
Here is a link on NGEN that contains information on where the images are stored: C:\windows\assembly\NativeImages_CLR_Bit for instance C:\windows\assembly\NativeImages_v2.0.50727_86. Note that .NET 3.0 and 3.5 are both part of 2.0.
whats the relation(if any) of MASM assembly language and ILASM. Is there a one to one conversion? Im trying to incorporate Quantum GIS into a program Im kinda writing as I go along! I have GIS on my computer, I have RedGate Reflector and it nor the Object Browser of Visual Studio 2008 couldnt open one(of several which I dont have a strong clue to how they behave) of the .dlls in Quantum. I used the MASM assembly editor and "opened" the same dll and it spewed something I didnt expect to necessarily understand in the first place. How can I/can I make a conversion of that same "code" to something I can interact with in ILASM and Im assuming consequently in Csharp? Thanks a ton for reading and all the responses to earlier questions...please bear in mind Im relatively new to programming in Csharp, and even fresher to MASM and ILASM.
MASM deals with the x86 instructions and is platform/processor dependent, while ILASM reffers to the .Net CIL (common intermediary language) instructions which are platform/processor independent. Switching from something specific to something more general is hard to achieve, that's why, AFAIK, there is no converter from MASM to ILASM (inverse, there is!)
IL is a platform independent layer of abstraction over native code. Code written on the .NET platform in C#, VB.NET, or other .NET language all compile down to an assembly .EXE/.DLL containing IL. Typically, the first time the IL code is executed the .NET runtime will run it through NGen, which compiles it once again down to native code and stores the output in a temporary location where it is actually executed. This allows .NET platform code to be deployed to any platform supporting that .NET framework, regardless of the processor or architecture of the system.
As you've seen, Reflector is great for viewing the code in an assembly because IL can easily be previewed in C# or VB.NET form. This is because IL is generally a little higher level instructions and also contain a lot of metadata that native code wouldn't normally have, such as class, method, and variable names.
It's also possible to compile a .NET project directly to native code by setting the Visual Studio project platform or by calling Ngen.exe directly on the assembly. Once done, it's really difficult to make sense of the native code.
Ther is no relationship between MASM assembly language and ILASM. I don't see you have any way to convert native code to IL code. IL can be understood by CLR only while the MASM assembly language is about native machine code. CLR turns the IL into native code in runtime