I am developing a compiler that emits IL code. It is important that the resulting IL is JIT'ted to the fastest possible machine codes by Mono and Microsoft .NET JIT compilers.
My questions are:
Does it make sense to optimize patterns like:
'stloc.0; ldloc.0; ret' => 'ret'
'ldc.i4.0; conv.r8' => 'ldc.r8.0'
and such, or are the JIT's smart enough to take care of these?
Is there a specification with the list of optimizations performed by Microsoft/Mono JIT compilers?
Is there any good read with practical recommendations / best practices to optimize IL so that JIT compilers can in turn generate the most optimal machine code (performance-wise)?
The two patterns yo described are the easy stuff that the JIT actually gets right (except for non-primitive structs). In SSA form constant propagation and elimination of dead values is very easy.
No, you have to test what the JIT can do. Look into compiler literature to see what standard optimizations to expect. Then, test for them. The two JITs that we have right now optimize very little and sometimes do not get the most basic stuff right. For example, MyStruct s; s.x = 1; s.x = 1; is not optimized by RyuJIT. s = s; isn't either. s.x + s.x loads x twice from memory. Expect little.
You need to understand what machine code basic operations map to. This is not too complicated. Try a few things and look at the disassembly listing. You'll quickly get a feel for what the output is going to look like.
Redundant conversions and load/stores like that are a pretty inevitable side-effect of a recursive decent parser. You can technically get rid of them with a peephole optimizer. But it is nothing to worry about, the C# and VB.NET compilers generate them as well.
The existing .NET/Mono jitters are very good at optimizing them away. They focus on optimizing the code that really matters for execution speed, the machine code. With the very nice advantage that anybody that writes a compiler that generates IL automatically benefits from these optimizations without having to do anything special.
Jitter optimizations are covered in this post.
There is a lot of contradicting information about this. While some say C# is compiled (as it is compiled into IL and then to native code when run), others say it's interpreted as it needs .NET. EN Wiki says:
Many interpreted languages are first compiled to some form of virtual
machine code, which is then either interpreted or compiled at runtime
to native code.
So I'm quite confused. Could anyone explain this clearly?
C# is compiled into IL, by the c# compiler.
This IL is then compiled just-in-time (JIT) as it's needed, into the native assembly language of the host machine. It would be possible to write a .NET runtime that interpreted the IL instead though. Even if this was done, I'd still argue that c# is a compiled language.
A purely compiled language has some advantages. Speed, as a rule, and often working set size.
A purely interpreted language has some advantages. Flexibility of not needing an explicit compilation stage that allows us to edit in place, and often easier portability.
A jitted language fits in a middle ground in this case.
That's a reason alone why we might think of a jitted language as either compiled or as interpreted depending on which position on which metric we care about attaining, and our prejudices for and against one or the other.
C# can also be compiled on first run, as happens in ASP.NET, which makes it close to interpreted in that case (though it's still compiled to IL and then jitted in this case). Certainly, it has pretty much all the advantages of interpreted in this case (compare with VBScript or JScript used in classic ASP), along with much of the advantages of compiled.
Strictly, no language is jitted, interpretted or compiled qua language. We can NGen C# to native code (though if it does something like dynamically loading an assembly it will still use IL and jitting). We could write an intepretter for C or C++ (several people have done so). In its most common use case though, C# is compiled to IL which is then jitted, which is not quite the classic definition of interpreted nor of compiled.
Too many semantics and statements based on opinion.
First off: C# isn't an interpreted language; the CLR and JVM are considered "runtimes" or "middleware", but the same name applies to things like Perl. This creates a lot of confusion among people concerned with names.
The term "Interpreter" referencing a runtime generally means existing code interprets some non-native code. There are two large paradigms: Parsing reads the raw source code and takes logical actions; bytecode execution first compiles the code to a non-native binary representation, which requires much fewer CPU cycles to interpret.
Java originally compiled to bytecode, then went through an interpreter; now, the JVM reads the bytecode and just-in-time compiles it to native code. CIL does the same: The CLR uses just-in-time compilation to native code.
Consider all the combinations of running source code, running bytecode, compiling to native, just-in-time compilation, running source code through a compiler to just-in-time native, and so forth. The semantics of whether a language is compiled or interpreted become meaningless.
As an example: many interpreted languages use just-in-time bytecode compilation. C# compiles to CIL, which JIT compiles to native; by contrast, Perl immediately compiles a script to a bytecode, and then runs this bytecode through an interpreter. You can only run a C# assembly in CIL bytecode format; you can only run a Perl script in raw source code format.
Just-in-time compilers also run a lot of external and internal instrumentation. The runtime tracks the execution of various functions, and then adjusts the code layout to optimize branches and code organization for its particular execution flow. That means JIT code can run faster than native-compiled code (like C++ typically is, or like C# run through IL2CPP), because the JIT adjusts its optimization strategy to the actual execution case of the code as it runs.
Welcome to the world of computer programming. We decided to make it extremely complicated, then attach non-descriptive names to everything. The purpose is to create flamewars over the definition of words which have no practical meaning.
If you feel, learned, or are old school, that a compiled EXE is going from source to machine code then C# is interpreted.
If you think compiled means converting source code into other code such as byte code, then yes its converted. For me, anything that takes run-time processing to work in the OS it was built for is interpreted.
Look here: http://msdn.microsoft.com/library/z1zx9t92
Source code written in C# is compiled into an intermediate language
(IL) that conforms to the CLI specification.
(...)
When the C# program is executed, the assembly is loaded into the CLR,
which might take various actions based on the information in the
manifest. Then, if the security requirements are met, the CLR performs
just in time (JIT) compilation to convert the IL code to native
machine instructions.
First off let's understand the definitions of interpreted and compiled.
"Compile" (when referring to code) means to translate code from one language to another. Typically from human readable source code into machine code that the target processer can... process.
"Interpret" (when referring to code) ALSO means to translate code from one language to another. But this time it's typically used to go from human readable source code into an intermediate code which is taken by a virtual machine which interprets it into machine code.
Just to be clear
Source code -> Compiler -> Machine code
Source code -> Compiler -> Byte Code -> Interpreter -> Machine code
Any language can, in theory, be interpreted or compiled. Typically Java is compiled into bytecode which is interpreted by the Java virtual machine into machine code. C# is typically interpreted into bytecode which is compiled by the CLR, the common language runtime, another virtual machine.
By and far the whole thing is a marketing gimmick. The term "interpreted" was added (or at least, increased in usage) to help showcase how neat just-in-time compiling was. But they could have just used "compiled". The distinction is more a study of the English language and business trends rather than anything of a technical nature.
C# is both interpreted and compiled in its lifetime. C# is compiled to a virtual language which is interpreted by a VM.
The confusion stems from the fuzzy concept of a "Compiled Language".
"Compiled Language" is a misnomer, in a sense, because compiled or interpreted is not a property of the language but of the runtime.
e.g. You could write a C interpreter but people usually call it a "Compiled Language", because C implementations compile to machine code, and the language was designed with compilation in mind.
Most languages, if not all, requires an interpreter that translates their scripts to machine codes in order to allow the cpu to understand and execute it!
Each language handles the translation process differently!
For example, "AutoIt" is what we can describe as being a 100% interpreted language!
why?
Because "AutoIt" interpreter is constantly needed while its script is being executed! See example below:
Loop, 1000
Any-Code
"AutoIt" interpreter would have to translate "Any-Code" 1000 times to machine code, which automatically makes "AutoIt" a slow language!
In the other hand, C# handles the translation process differently, C#'s interpreter is required only once, before script execution, after that it is not required anymore during script execution!
C#'s interpreter would have to translate "Any-Code" only once to machine code, which automatically makes "C#" a fast language!
So basically,
A language that requires its interpreter during script execution is an "Interpreted Language"!
A language that requires its interpreter only once (before script execution) is a "Compiled Language"!
Finally,
"AutoIt" is an "Interpreted Language"!
"C#" is a "Compiled Language"!
I believe this is a pretty old topic.
From my point of view, interpreted code will go through an interpreter, line by line translate and execute at the same time. Like example javascript, it is an interpreted code, when a line of javascript ran into an error, the script will just break.
While compiled code, it will go through a compiler, translate all code to another form of code at once, without execute it first. The execution is in another context.
If we agree with the definition of interpreter «In computer science, an interpreter is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program.» there is no doubt: C# is not an interpreted language.
Interpreter on Wikipedia
C#, like Java, has a hybrid language processor. Hybrid processors perform the jobs of both interpretation and compilation.
Since a computer can only execute binary code, any language will lead to the production of binary code at one point or another.
The question is : does the language let you produce a program in binary code?
If yes, then it is a compiled language : by definition "compiled" in "compiled language" refers to compilation into binary code, not transformation into some intermediary code.
If the language lead to the production of such intermediary code for a program, it will need an additional software to perform the binary compilation from this code : it is then an interpreted language.
Is a program "compiled" by C# directly executable on a machine without any other software at all installed on this machine? if no, then it is an interpreted language.
For an interpreted language, it is an interpreter that will generate the underlying binary code, most of the time in a dynamic way since this mechanism is the basis of the flexibility of such languages.
rem. : sometimes it does not look obvious because the interpreter is bundled into the OS
C# is compilable language.
Probably, as I met too those kind of opinions, the fact that someone thinks that there is an Interpreter for C# language, is due the kind of projects like
C# Interpreter Console
or, for example, famous
LinqPAD
where you can write just lines of the code and execute them, which brings to think that it's Python like language, which is not true. It compiles those lines and executes them, like a ordinary compilable programming language (from workflow point of view).
Great debate going on here. I have read all the answers and want to express some conclusions, based on my research and Programming language implementation concept.
There is a concept of Programming language implementation.
Programming language implementation: In computer programming, a programming language implementation is a system for executing computer programs. There are two general approaches to programming language implementation:
Compilation:
The program is read by a compiler, which translates it into some other language, such as bytecode or machine code. The translated code may either be directly executed by hardware, or serve as input to another interpreter or another compiler.
Interpretation:
An interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program.
Parse the source code and perform its behavior directly.
Translate source code into some efficient intermediate representation or object code and immediately execute that.
Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.
Conclusions:
So any language which converts the code to intermediate byte code or machine code are compiled languages.
There are multiple types of Interpreters like Byte Code Interpreters, Just-in-time interpreters, etc.
Famous Compiled Languages:
JAVA C# C C++ GO Kotlin Rust
Famous Interpreted Languages:
JavaScript PHP Python
Directly Compiled Languages v/s Languages which are compiled to bytecode first:
Bytecode interpreters (virtual machines) are generally slower than direct execution of machine code. Any interpreter has some overhead when converting bytecode to actual machine instructions.
Interpreters do line by line execution.
So, directly compiled languages like C++/C are faster then Java/C#
There are an implementation of C# that is a compiled language.
It is Remobjects c# island that compiles directly to binary machine code and run without an VM and without an Runtime but uses directly platform API (win32 on microsoft, Cocoa on apple and posix on linux).
Also remobjects c# interoperate directly with C/C++ compiled librarres because it's calling convention translate to c calling convention.
My professor asked us this question: What are the differences between a C#(.Net) Compiler and Java Compiler Technologies?
Both the Java and C# compilers compile to an "machine code" for an intermediate virtual machine that is independent of the ultimate execution platform; the JVM and CLR respectively.
JVM was originally designed solely to support Java. While it is possible to compile languages other than Java to run on a JVM, there are aspects of its design that are not entirely suited to certain classes of language. By contrast, the CLR and its instruction set were designed from day one to support a range of languages.
Another difference is in the way that JIT compilation works. According to Wikipedia, CLR is designed to run fully compiled code, so (presumably) the CLR's JIT compiler must eagerly compile the entire application before starting. (I also gather that you can compile the bytecodes to native code ahead of time.) By contrast, the Hotspot JVMs use true "just in time" compilation. Bytecode methods are initially executed by the JVM using a bytecode interpreter, which also gathers trace information about execution paths taken within the method. Those methods that are executed a number of times then get compiled to native code by the JIT compiler, using the captured trace information to help in the code optimization. This allows the native code to be optimized for the actual execution platform and even for the behaviour of the current execution of the application.
Of course, the C# and Java languages have many significant differences, and the corresponding compilers are different because of the need to handle these linguistic differences. For example, some C# compilers do more type inferencing ... because the corresponding C# language version relies more on inferred types. (And note that both the Java and C# languages have evolved over time.)
In terms of compiler, the largest difference I can think of (except the obvious "inputs" and "outputs") is the generics implementation, since both have generics, but very different (type erasure vs runtime-assisted). The boxing model is obviously different, but I'm not sure that is huge for the compiler.
The are obvious difference in features in terms of anonymous methods, anonymous inner classes, lambdas, delegates, etc but that is hard to compare 1:1. Ultimately, though, only your professor knows the answer he is looking for (and all due respect to professors, but don't necessarily be surprised if his answer is a year-or-more out of date with the bleeding edge).
One difference is that the C# compiler has some type inferencing capabilities that a Java compiler wouldn't have (although Java 7 may change this). As a simple example, in Java you have to type Map<String, List<String>> anagrams = new HashMap<String, List<String>>(); while in C# you can use var anagrams = new HashMap<String, List<String>>(); (although you can create very large, complex expressions in C# without ever having to name a type).
Another difference is that the C# compiler can create expression trees, enabling you to pass descriptions of a function to another function. For example, (Func<int,int>) x => x * 2 is a function that takes an int and doubles it, while (Expression<Func<int,int>>) x => x * 2 is a datastructure that describes a function that takes an int and doubles it. You can take this description and compile it into a function (to run locally) or translate it into SQL (to run as part of a database query).
http://www.scribd.com/doc/6256795/Comparison-Between-CLR-and-JVM
i think this will give you an basic idea
When I am reading F# stuff, they are talking about inlining methods, but I thought .NET didn't expose this functionality to programmers. If it's exposed then it has to be in the IL? And so can C# make use of it as well?
Just wondering if this thing is the same as C++ inline functionality.
It is actually more complicated when compared to C++ inlining, because F# works on top of .NET, which has IL as an intermediate language, so there are actually two layers where some inlining can be done:
At the F# -> IL level - The inline keyword allows you to specify that an F# function should be inlined when generating .NET IL code. In this case, the IL instructions of the function will be placed in place of a IL instruction representing a method call.
At the IL -> assembly level - This is fully controlled by JITter (.NET just-in-time compiler), which compiles the IL (intermediate language) to actual executable assembly code. This is done fully automatically, so you cannot specify that something should be inlined at this level. However, JITter also inlines some simple calls (such as calls to property getters and setters).
To answer some of your specific questions, inline is an F#-specific construct that interacts with both the type system (e.g. static member constraints) and code generation (inlining code for optimization purposes). The F# compiler deals with these things, and the information regarding inlining is stored in F#-specific metadata in the assembly, which enables functions to be inlined across F# assembly boundaries.
Guess I'll post as an answer... didn't really want to do such because I don't know anything about F# beyond the basics. :p
http://msdn.microsoft.com/en-us/library/dd548047%28VS.100%29.aspx
What is the difference between the JIT compiler and CLR? If you compile your code to il and CLR runs that code then what is the JIT doing? How has JIT compilation changed with the addition of generics to the CLR?
You compile your code to IL which gets executed and compiled to machine code during runtime, this is what's called JIT.
Edit, to flesh out the answer some more (still overly simplified):
When you compile your C# code in visual studio it gets turned into IL that the CLR understands, the IL is the same for all languages running on top of the CLR (which is what enables the .NET runtime to use several languages and inter-op between them easily).
During runtime the IL is interpreted into machine code (which is specific to the architecture you're on) and then it's executed. This process is called Just In Time compilation or JIT for short. Only the IL that is needed is transformed into machine code (and only once, it's "cached" once it's compiled into machinecode), just in time before it's executed, hence the name JIT.
This is what it would look like for C#
C# Code > C# Compiler > IL > .NET Runtime > JIT Compiler > Machinecode > Execution
And this is what it would look like for VB
VB Code > VB Compiler > IL > .NET Runtime > JIT Compiler > Machinecode > Execution
And as you can see only the two first steps are unique to each language, and everything after it's been turned into IL is the same which is, as I said before, the reason you can run several different languages on top of .NET
The JIT is one aspect of the CLR.
Specifically it is the part responsible for changing CIL (hereafter called IL) produced by the original language's compiler (csc.exe for Microsoft c# for example) into machine code native to the current processor (and architecture that it exposes in the current process, for example 32/64bit). If the assembly in question was ngen'd then the the JIT process is completely unnecessary and the CLR will run this code just fine without it.
Before a method is used which has not yet been converted from the intermediate representation it is the JIT's responsibility to convert it.
Exactly when the JIT will kick in is implementation specific, and subject to change. However the CLR design mandates that the JIT happens before the relevant code executes, JVMs in contrast would be free to interpret the code for a while while a separate thread creates a machine code representation.
The 'normal' CLR uses a pre-JIT stub approach where by methods are JIT compiled only as they are used. This involves having the initial native method stub be an indirection to instruct the JIT to compile the method then modify the original call to skip past the initial stub. The current compact edition instead compiles all methods on a type when it is loaded.
To address the addition of Generics.
This was the last major change to the IL specification and JIT in terms of its semantics as opposed to its internal implementation details.
Several new IL instructions were added, and more meta data options were provided for instrumenting types and members.
Constraints were added at the IL level as well.
When the JIT compiles a method which has generic arguments (either explicitly or implicitly through the containing class) it may set up different code paths (machine code instructions) for each type used. In practice the JIT uses a shared implementation for all reference types since variables for these will exhibit the same semantics and occupy the same space (IntPtr.Size).
Each value type will get specific code generated for it, dealing with the reduced / increased size of the variables on the stack/heap is a major reason for this. Also by emitting the constrained opcode before method calls many invocations on non reference types need not box the value to call the method (this optimization is used in non generic cases as well). This also allows the default<T> behaviour to be correctly handled and for comparisons to null to be stripped out as no ops (always false) when a non Nullable value type is used.
If an attempt is made at runtime to create an instance of a generic type via reflection then the type parameters will be validated by the runtime to ensure they pass any constraints. This does not directly affect the JIT unless this is used within the type system (unlikely though possible).
As Jon Skeet says, JIT is part of the CLR. Basically this is what is happening under the hood:
Your source code is compiled into a byte code know as the common intermediate language (CIL).
Metadata from every class and every methods (and every other thing :O) is included in the PE header of the resulting executable (be it a dll or an exe).
If you're producing an executable the PE Header also includes a conventional bootstrapper which is in charge of loading the CLR (Common language runtime) when you execute you executable.
Now, when you execute:
The bootstraper initializes the CLR (mainly by loading the mscorlib assembly) and instructs it to execute your assembly.
The CLR executes your main entry.
Now, classes have a vector table which hold the addresses of the method functions, so that when you call MyMethod, this table is searched and then a corresponding call to the address is made. Upon start ALL entries for all tables have the address of the JIT compiler.
When a call to one of such method is made, the JIT is invoked instead of the actual method and takes control. The JIT then compiles the CIL code into actual assembly code for the appropiate architecture.
Once the code is compiled the JIT goes into the method vector table and replaces the address with the one of the compiled code, so that every subsequent call no longer invokes the JIT.
Finally, the JIT handles the execution to the compiled code.
If you call another method which haven't yet being compiled then go back to 4... and so on...
The JIT is basically part of the CLR. The garbage collector is another. Quite where you put interop responsibilities etc is another matter, and one where I'm hugely underqualified to comment :)
I know the thread is pretty old, but I thought I might put in the picture that made me understand JIT. It's from the excellent book CLR via C# by Jeffrey Ritcher. In the picture, the metadata he is talking about is the metadata emitted in the assembly header where all information about types in the assembly is stored:
1)while compiling the .net program,.net program code is converted into Intermediate Language(IL) code
2)upon executing the program the Intermediate language code is converted into operating system Native code as and when a method is called; this is called JIT (Just in Time) compilation.
Common Language Runtime(CLR) is interpreter while Just In Time(JIT) is compiler in .Net Framework.
2.JIT is the internal compiler of .NET which takes MicroSoft Intermediate Code Language (MSICL) code from CLR and executes it to machine specific instructions whereas CLR works as an engine its main task is to provide MSICL code to JIT to ensure that code is fully compiled as per machine specification.