Q1) Why is C# initially compiled to IL and then at runtime JIT complied and run on top of a virtual machine(?). Or is it JIT complied to native machine code?
Q2) If the second is true (JIT complied to native machine code), then where is the .NET sandbox the code runs under?
Q3) In addition, why is the code compiled to IL in the first place. Why not simply compile to native machine code all the time? There is a tool from MS from this called ngen but why is that optional?
The IL is JIT'd (JIT = Just In Time) compiled to native machine code as the process runs.
The use of a virtual machine layer allows .NET to behave in a consistent manner across platforms (e.g. an int is always 32 bits regardless of whether you're running on a 32- or 64- bit machine, this is not the case with C++).
JIT compiling allows optimisations to dynamically tailor themselves to the code as it runs (e.g. apply more aggressive optimisations to bits of code that are called frequently, or make use of hardware instructions available on the specific machine like SSE2) which you can't do with a static compiler.
A1) JIT compiles to native machine code
A2) In .net there is no such term as sandbox. There is AppDomains instead. And they runs as part of CLR (i.e. as part of executable process)
A3) NGen drawbacks from Jeffrey Richter:
NGen'd files can get out of sync.
When the CLR loads an NGen'd file, it compares a
number of characteristics about the previously compiled code and the current execution
environment. If any of the characteristics don't match, the NGen'd file cannot be
used, and the normal JIT compiler process is used instead.
Inferior Load-Time Performance (Rebasing/Binding).
Assembly files are standard Windows PE files, and, as such, each contains a preferred base address. Many Windows
developers are familiar with the issues surrounding base addresses and rebasing. When JIT compiling code, these issues aren't a concern because correct memory address references are calculated at run time.
Inferior Execution-Time Performance.
When compiling code, NGen can't make as many
assumptions about the execution environment as the JIT compiler can. This causes
NGen.exe to produce inferior code. For example, NGen won't optimize the use of
certain CPU instructions; it adds indirections for static field access because the actual
address of the static fields isn't known until run time. NGen inserts code to call class
constructors everywhere because it doesn't know the order in which the code will execute
and if a class constructor has already been called.
You can use NGEN to create native versions of your .NET assemblies. Doing this means that the JIT does not have to do this at runtime.
.NET is compiled to IL first and then to native since the JIT was designed to optimize IL code for the current CPU the code is running under.
.NET code is compiled to IL for compatability. Since you can create code using C#, VB.NET, etc then the JIT needs a common instruction set (IL) in order to compile to native code. If the JIT had to be aware of languages, then the JIT would need to be updated when a new .NET language was released.
I'm not sure about the sandbox question, my best guess is that a .NET app runs with 3 application domains. One domain contains the .NET runtimes (mscorlib, system.dll, etc), another domain contains your .NET code, and I can't recall what the other domain's for.
Check out http://my.safaribooksonline.com/9780321584090
1. C# is compiled in to CIL (or IL) because it shares a platform with the rest of the .NET languages (which is why you can write a DLL in C# and use it in VB.NET or F# without hassle). The CLR will then JIT Compile the code into Native Machine Code.
.NET can also be run on multiple platforms (Mono on *NIX and OS X). If C# compiled to native code, this wouldn't be nearly as easy.
2. There is no sandbox.
3. Covered in the answer to #1
A1) This way it's platform agnostic (Windows, Linux, Mac) and it can also use specific optimizations for your current hardware. When it gets JIT compiled it's to machine code.
A2) The whole framework (the .NET framework) is all sandbox so all calls you might make through your app will go through the .NET framework sandbox.
A3) As in answer 1, it allows the .NET binary to work in different platforms and perform specific optimizations in the client machine on the fly.
Compiled .Net code becomes IL which is an intermediate language in the exact same way as that of Javas' object code. Yes it is possible to generate native machine code using the NGen tool. NGen binds the resulting native image to the machine, so copying the ngen'd binary to a different system would not produce expected results. Compiling to intermediate code allows for runtime decisions that can be made that otherwise can't (easily) be made with a staticly-typed language like C++, it also allows the functioning of code on different hardware archetectures because the code then becomes descriptive in the sense that it also describes the intent of what should happen in a bit (eg 32 or 64)-agnostic way, as opposed to machine-specific code that only works on 32-bit systems or 64-bit systems but not both.
Also, NGen is optional because as I said it binds the binary to the system, it can be useful when you need the performance of compiled machine code with the flexibility of a dynamically typed language and you know that the binary won't be moving to a system it's not bound to.
Related
In .NET we have several platforms each of which is composed by its own runtime, its own base libraries and its own supporting software for booting the runtime and so forth.
Concerning those different platforms we can target a specific platform when we are compiling our code. This means that we compile for a specific platform.
In the new .NET Core project model this is even clearer. On the project.json file we specify in the frameworks section the platforms we want to compile for by listing their TFM's.
My problem here is that as I understand, the main difference between developing to a platform or another is the base libraries available (for the full .NET we have the whole BCL for instance). But this seems to be one "run time issue" rather than "compile time issue".
The reason is that when the code is deployed as IL to the specific platform and when it's going to run that it'll see if the necessary assemblies from the needed base libraries are available right?
In that case, why there's this idea of "compiling for a specific platform"? Is the compilation process different for each platform? Is the generated IL different for each platform?
In that case, why there's this idea of "compiling for a specific platform"? Is the compilation process different for each platform? Is the generated IL different for each platform?
The IL is different, but generally only slightly, i.e. the assembly flags may be different, to indicate the target platform specified when compiled.
Of course, you may have conditionally-compiled code in your assembly, protected by #if directives. I assume you are not referring to that sort of difference. But just because the main part of the IL is the same from platform to platform, that doesn't necessarily mean you can run any IL on any platform.
Often, the target platform specified during compilation will be a critical choice, because the managed code engages in some kind of interop with native code that's available only for a specific architecture. Another reason is if the program for some reason requires the use of x64 architecture for virtual address space reasons (i.e. the process expects to need to allocate more than the nominal 3GB maximum available to x86 processes).
I would first like to say my goal is to convert MSIL into native X86 code. I am fine with my assembly's still needing the .net framework installed. NGEN is not what I want as you still need the original assembly's.
I came across ilasm, and what I am wondering is this what I want, will this make pure assembly code?
I have looked at other projects like mono (which does not support some of the key features my app uses) and .net linkers but they simple just make a single EXE with the .net framework which is not what I am looking for.
So far any research has come up with...you can't do it. I am really no sure as to why as the JIT does it when it loads the MSIL assembly. I have my own reasons for wanting this, so I guess my question(s) come down to this.
Is the link I posted helpful in anyway?
Is there anything out there that can turn MSIL into x86 assembly?
There are various third-party code-protection packages available that hide the IL by encrypting it and packing it with a special bootloader that only unpacks it during runtime. This might be an option if you're concerned about disassembly of your code, though most of these third-party packages are also already cracked (somewhat unavoidable, unfortunately.) Simple obfuscation may ultimately be just as effective, assuming this is your underlying goal.
One the major challenges associated with 'pre-jitting' the IL is that you end up including fixed address references in the native code. These in turn will need to be 're-based' when the native code is loaded for execution under the CLR. This means you need more than just the logic that gets compiled; you also need all of the reference context information necessary to rebase the fixed references when the code is loaded. It's a lot more than just caching code.
As with most things, the first question should be why instead of how. I assume you have a specific goal in mind, if you want to generate native code yourself (also, why x86? Why not x64 too?). This is the job of the JIT compiler - to compile an optimized instruction set on a particular platform only when needed, and execute it later.
The best source I can recommend to try and understand how the CLR works and how JIT works is taking a look at SSCLI - an implementation of the CLR based on the ECMA-335 spec.
Have you considered not using C#? Given that the output of the C# compiler is MSIL, it would make sense to develop on a different platform if that is not what you want.
Alternatively it sounds like NGEN does the operation you are wanting, it just doesn't handle putting the entire thing into an executable. You could analyze the resultant NGEN image to determine what needs to be done to accomplish that (note that NGENed images are PE files per the documentation)
Here is a link on NGEN that contains information on where the images are stored: C:\windows\assembly\NativeImages_CLR_Bit for instance C:\windows\assembly\NativeImages_v2.0.50727_86. Note that .NET 3.0 and 3.5 are both part of 2.0.
I just looked at the source of Mono for the first time and I thought I would find a bunch of C or C++ code, instead I found 26,192 .cs files and 7 .cpp files.
I am not totally shocked but it made me think of a quesiton I've always had in the back of my mind:
How does a project end up being written in "itself" like this?
Was an older version of mono more c/c++? Or was there initial effort to create some kind of machine coded compiler...
What's the "trick" here?
Mono's compiler is written in C#. You may want to read about compiler bootstrapping.
You should be looking for .c files, instead of .cpp files: the mono runtime is written in C, not C++.
I think it is also important to remember that mono is both a virtual machine runtime (the JIT compiler, garbage collector, etc.) as well as a collection of class libraries that run on this framework (the System.Linq namespace, the XML parsers, etc.).
The majority of the .cs files you see are part of the class libraries. These are basically C# code that run like your own C# code (with some exceptions, but basically it doesn't make sense for everyone to reinvent and re-distribute the wheel over and over, so these are the C# "base" class libraries). This is why you can download complex mono programs as such small file sizes if mono is already installed on the machine.
For mono, the JIT, runtime and garbage collector are largely written in C/C++ as you would expect. If you ever get a low level error, you will often see GNU debug tool dumps as you would in C, just with lots more useful information. The Mono framework is very good at taking any C# code and converting it to CIL code that can run anywhere, and they use whatever toolset is best suited to ensure the code does run anywhere (which in this case meant a C compiler runtime on linux).
whats the relation(if any) of MASM assembly language and ILASM. Is there a one to one conversion? Im trying to incorporate Quantum GIS into a program Im kinda writing as I go along! I have GIS on my computer, I have RedGate Reflector and it nor the Object Browser of Visual Studio 2008 couldnt open one(of several which I dont have a strong clue to how they behave) of the .dlls in Quantum. I used the MASM assembly editor and "opened" the same dll and it spewed something I didnt expect to necessarily understand in the first place. How can I/can I make a conversion of that same "code" to something I can interact with in ILASM and Im assuming consequently in Csharp? Thanks a ton for reading and all the responses to earlier questions...please bear in mind Im relatively new to programming in Csharp, and even fresher to MASM and ILASM.
MASM deals with the x86 instructions and is platform/processor dependent, while ILASM reffers to the .Net CIL (common intermediary language) instructions which are platform/processor independent. Switching from something specific to something more general is hard to achieve, that's why, AFAIK, there is no converter from MASM to ILASM (inverse, there is!)
IL is a platform independent layer of abstraction over native code. Code written on the .NET platform in C#, VB.NET, or other .NET language all compile down to an assembly .EXE/.DLL containing IL. Typically, the first time the IL code is executed the .NET runtime will run it through NGen, which compiles it once again down to native code and stores the output in a temporary location where it is actually executed. This allows .NET platform code to be deployed to any platform supporting that .NET framework, regardless of the processor or architecture of the system.
As you've seen, Reflector is great for viewing the code in an assembly because IL can easily be previewed in C# or VB.NET form. This is because IL is generally a little higher level instructions and also contain a lot of metadata that native code wouldn't normally have, such as class, method, and variable names.
It's also possible to compile a .NET project directly to native code by setting the Visual Studio project platform or by calling Ngen.exe directly on the assembly. Once done, it's really difficult to make sense of the native code.
Ther is no relationship between MASM assembly language and ILASM. I don't see you have any way to convert native code to IL code. IL can be understood by CLR only while the MASM assembly language is about native machine code. CLR turns the IL into native code in runtime
I am running performance profile for a C# application on a virtual machine.
The results shows a huge load of "JIT Compiler". When I dig further, it shows something called "Class Loader" as the only method getting called by JIT compiler.
What should I do to bring "JIT compiler" load down?
JIT is the 'Just In Time' compiler, this essentially compiles your C# into executable code that can work on the current processor.
.Net comes with a utility called NGEN, this creates a native image of your C# code, that doesn't need to be JIT'ted. There are downsides to this however, have a read of this:
http://codeidol.com/csharp/net-framework/Assemblies,-Loading,-and-Deployment/Native-Image-Generation-%28NGen%29/
And finally here's a link to the MS info about NGEN:
http://msdn.microsoft.com/en-us/library/6t9t5wcf%28VS.80%29.aspx
You could try using NGEN to pre-JIT your assemblies to native images. This will lessen Jitting overhead on application load:
http://msdn.microsoft.com/en-us/library/6t9t5wcf(VS.80).aspx
You should run this tool on the machine where your assemblies are i.e. your virtual machine.