Detecting JIT Optimizations or missed Optimizations

Detecting JIT Optimizations or missed Optimizations - c#

The .NET CLR JIT will; to my understanding; try to optimize code using patterns such as Method Inlining, Loop Unrolling, etc... In the case of Method Inlining this would not be performed for reasons such as the following:
Methods that are greater than 32 bytes of IL will not be inlined.
Virtual functions are not inlined.
Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while.
Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining.
If any of the method's formal arguments are structs, the method will not be inlined.
Etc...
My question is... Is there any way to detect what the JIT Optimization process is deciding to skip for these or other reason?
My thinking is that, I want to know what areas of code may need to be restructured to ensure I can take advantange of JIT optimizations.

Nowadays you can run your application on your own build of CoreCLR and gather all statistics you want. You can examine clrconfigvalues.h and enable any flag you want to get any related information (for example JitDump, using set COMPLUS_JitDump command in command prompt)
It's not quite easy, but it's possible.

Related

.NET JIT difference while using Factory Pattern and If-else or switch

I was just wandering, if I was to have a one method that handles 100 different cases based upon an enum let's say (assuming each case has on average 5 lines of code), would that actually impact performance ? How about instead of having all the code in one method, the Factory or Strategy pattern would to be used ?
The JIT only compiles the code which is actually needed at that point. So I guess it will compile the hole method of 100 cases right? It wouldn't actually know what part of that method is needed correct ? but if I was to split that method it will actually compile what it is needed right?. For example having an actions drop-down (list of 100 Car Brands)
How would that compare to each other in terms of performance?
Thanks.

I am afraid you are missing the essential thing that Jit does. It compiles cil which is CPU and platform independent into machine and platform specific machine code.
The JIT only compiles the code which is actually needed at that point.
Yes, the very first time your method is called then the Jit compiles it, hence the name Just-In-Time compiler.
So I guess it will compile the hole method of 100 cases right?
Yes.
It wouldn't actually know what part of that method is needed correct ? but if I was to split that method it will actually compile what it is needed right?
Now here the confusion comes, yes it will compile the whole method, but notice this will be only once per application startup. Since, Jit compiles at runtime we can say that there will be some negligible performance impact first time method was called, one can argue that splitting this method into multiple methods will require multiple Jit compilations.
With that being said, if you are working on some very high demand for performance application and you are worrying about Jit being drawback you can always use ngen and compile native images before application start and then CLR can use those images to spin up the process for you and this will remove Jit compilation as it was basically done before application was started. I can see how this might be useful for applications caring about cold startups such as applications hosted on function as a service platforms but thats as far as I can go with the real world examples.
And in the end,
How would that compare to each other in terms of performance?
to be honest I wouldn't be worried ever about Jit compilation impact, I'd rather focus on the data structures I am using or the domain specific logic I have in order to improve my performance.

The 32 bytes limit for inline function ... not too small?

I have a very small c# code marked as inline, but dont work.
I have seen that the longest function generates more than 32 bytes of IL code. Does the limit of 32 bytes too short ?
// inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public bool INL_IsInRange (this byte pValue, byte pMin) {
return(pValue>=pMin);
}
// NOT inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public bool INL_IsInRange (this byte pValue, byte pMin, byte pMax) {
return(pValue>=pMin&&pValue<=pMax);
}
Is it possible to change that limit?

I am looking for inline function criteria also. In your case, I believe that JIT optimization timed out before it could reach the decision to inline your second function. For JIT, it's not a priority to inline a function, so it was busy analyzing your long code. However, if you place your calls inside tight loops, JIT will probably inline them, as inner calls gain priority to inline. If you really care about this type of micro-optimization, it's time to switch to C++. It's a whole new brave world out there for you to explore and exploit!
I noticed that the question had been edited right after this answer had been posted, meaning a high level of interactivity. Well, I don't know why there is a limit of 32 bytes, but that seems to be exactly the size of a CPU cache block, conservatively speaking. What a coincidence! In any case, code optimization must be done with a particular hardware configuration, better saved in an extra file side by side with its assembly. The timeout policy is stupid, because optimization is not supposed to be done at run-time, competing against the precious code execution time. Optimization is supposed to be done at application load-time, only the first time it's run on the machine, once for all. It can be triggered again when hardware configuration change is detected. Again, if you really need performance, just go with C/C++. C# is not designed for performance and will never make performance its top priority. Like Java, C# is designed for safety, with a much stronger caution against possible negative performance impacts.

Up to the "32-bytes of IL" limit, there are a number of other factors which affect whether a method would be inlined or not. There are at least a couple of articles that describe these factors.
One article explains that a scoring heuristic is used to adjust an initial guess about the relative size of the code when inlined vs not (i.e. whether the call site is larger or smaller than the inlined code itself):
If inlining makes code smaller then the call it replaces, it is ALWAYS good. Note that we are talking about the NATIVE code size, not the IL code size (which can be quite different).
The more a particular call site is executed, the more it will benefit from inlning. Thus code in loops deserves to be inlined more than code that is not in loops.
If inlining exposes important optimizations, then inlining is more desirable. In particular methods with value types arguments benefit more than normal because of optimizations like this and thus having a bias to inline these methods is good.
Thus the heuristic the X86 JIT compiler uses is, given an inline candidate.
Estimate the size of the call site if the method were not inlined.
Estimate the size of the call site if it were inlined (this is an estimate based on the IL, we employ a simple state machine (Markov Model), created using lots of real data to form this estimator logic)
Compute a multiplier. By default it is 1
Increase the multiplier if the code is in a loop (the current heuristic bumps it to 5 in a loop)
Increase the multiplier if it looks like struct optimizations will kick in.
If InlineSize <= NonInlineSize * Multiplier do the inlining.
Another article explains several conditions that will prevent a method from being inlined based on their mere existence (including the "32-bytes of IL" limit):
These are some of the reasons for which we won't inline a method:
Method is marked as not inline with the CompilerServices.MethodImpl attribute.
Size of inlinee is limited to 32 bytes of IL: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it's x86 JIT specific and it's longterm value, as compilers get smarter, is dubious).
Virtual calls: We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case.
Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).
MarshalByRef: Call targets that are in MarshalByRef classes won't be inlined (call has to be intercepted and dispatched). We've got better in Whidbey for this scenario
VM restrictions: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).
Complicated flowgraph: We don't inline loops, methods with exception handling regions, etc...
If basic block that has the call is deemed as it won't execute frequently (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)
Other: Exotic IL instructions, security checks that need a method frame, etc...

What is /optimize C# compiler key intended for?

Is there a full list of optimizations done by the /optimize C# compiler key available anywhere?
EDIT:
Why is it disabled by default?
Is it worth using in a real-world app? -- it is disabled by default only in Debug configuration and Enabled in Release.

Scott Hanselman has a blog post that shows a few examples of what /optimize (which is enabled in Release Builds) does.
As a summary: /optimize does many things with no exact number or definition given, but one of the more visible are method inlining (If you have a Method A() which calls B() which calls C() which calls D(), the compiler may "skip" B and C and go from A to D directly), which may cause a "weird" callstack in the Release build.

It is disabled by default for debug builds. For Release builds it is enabled.
It is definitely worth enabling this switch as the compiler makes lots of tweaks and optimizations depending on the kind of code you have.
For eg: Skipping redundant initializations, comparisons that never change etc.
Note: You might have some difficulty debugging if your turn on optimization as the code you have and the IL code that is generated may not match. This is the reason it is turned on only for Release builds.

Quoted from the MSDN page:
The /optimize option enables or
disables optimizations performed by
the compiler to make your output file
smaller, faster, and more efficient.
In other words, it does exactly what you think it would - optimises the compiled CIL (Common Intermediate Language) code that gets executed by the .NET VM. I wouldn't worry about what the specific optimisations are - suffice to say that they are many, and probably quite complex in some cases. If you are really interested in what sort of things it does, you could probably investigate the Mono C# Compiler (I doubt the details about the MS C# one are public).
The reason optimisation is disabled by default for Debug configurations is that it makes certain debugging features impossible. A few notable ones:
Perhaps most crucially, the Edit and Continue feature is disabled - i.e. no modifying code during execution.
Breaking execution often means the wrong line of code is highlighted (usually the one after the expected one).
Unused local variables aren't actually assigned or even declared.
Really, the default options for optimisation never ought to be changed. Having the option off for debugging is highly useful, while having it on for Release mode is equally wise.

Can I check if the C# compiler inlined a method call?

I'm writing an XNA game where I do per-pixel collision checks. The loop which checks this does so by shifting an int and bitwise ORing and is generally difficult to read and understand.
I would like to add private methods such as private bool IsTransparent(int pixelColorValue) to make the loop more readable, but I don't want the overhead of method calls since this is very performance sensitive code.
Is there a way to force the compiler to inline this call or will I do I just hope that the compiler will do this optimization?
If there isn't a way to force this, is there a way to check if the method was inlined, short of reading the disassembly? Will the method show up in reflection if it was inlined and no other callers exist?
Edit: I can't force it, so can I detect it?

No you can't. Even more, the one who decides on inlining isn't VS compiler that takes you code and converts it into IL, but JIT compiler that takes IL and converts it to machine code. This is because only the JIT compiler knows enough about the processor architecture to decide if putting a method inline is appropriate as it’s a tradeoff between instruction pipelining and cache size.
So even looking in .NET Reflector will not help you.

"You can check
System.Reflection.MethodBase.GetCurrentMethod().Name.
If the method is inlined, it will
return the name of the caller
instead."
--Joel Coehoorn

There is a new way to encourage more agressive inlining in .net 4.5 that is described here: http://blogs.microsoft.co.il/blogs/sasha/archive/2012/01/20/aggressive-inlining-in-the-clr-4-5-jit.aspx
Basically it is just a flag to tell the compiler to inline if possible. Unfortunatly, it's not available in the current version of XNA (Game Studio 4.0) but should be available when XNA catches up to VS 2012 this year some time. It is already available if you are somehow running on Mono.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int LargeMethod(int i, int j)
{
if (i + 14 > j)
{
return i + j;
}
else if (j * 12 < i)
{
return 42 + i - j * 7;
}
else
{
return i % 14 - j;
}
}

Be aware that the XBox works different.
A google turned up this:
"The inline method which mitigates the overhead of a call of a method.
JIT forms into an inline what fulfills the following conditions.
The IL code size is 16 bytes or less.
The branch command is not used (if
sentence etc.).
The local variable is not used.
Exception handling has not been
carried out (try, catch, etc.).
float is not used as the argument or
return value of a method (probably by
the Xbox 360, not applied).
When two or more arguments are in a
method, it uses for the turn
declared.
However, a virtual function is not formed into an inline."
http://xnafever.blogspot.com/2008/07/inline-method-by-xna-on-xbox360.html
I have no idea if he is correct. Anyone?

Nope, you can't.
Basically, you can't do that in most modern C++ compilers either. inline is just an offer to the compiler. It's free to take it or not.
The C# compiler does not do any special inlining at the IL level. JIT optimizer is the one that will do it.

why not use unsafe code (inline c as its known) and make use of c/c++ style pointers, this is safe from the GC (ie not affected by collection) but comes with its own security implications (cant use for internet zone apps) but is excellent for the kind of thing it appears you are trying to achieve especially with performance and even more so with arrays and bitwise operations?
to summarise, you want performance for a small part of your app? use unsafe code and make use of pointers etc seems the best option to me
EDIT: a bit of a starter ?
http://msdn.microsoft.com/en-us/library/aa288474(VS.71).aspx

The only way to check this is to get or write a profiler, and hook into the JIT events, you must also make sure Inlining is not turned off as it is by default when profiling.

You can detect it at runtime with the aforementioned GetCurrentMethod call. But, that'd seem to be a bit of a waste[1]. The easiest thing to do would to just ILDASM the MSIL and check there.
Note that this is specifically for the compiler inlining the call, and is covered in the various Reflection docs on MSDN.
If the method that calls the GetCallingAssembly method is expanded inline by the compiler (that is, if the compiler inserts the function body into the emitted Microsoft intermediate language (MSIL), rather than emitting a function call), then the assembly returned by the GetCallingAssembly method is the assembly containing the inline code. This might be different from the assembly that contains the original method. To ensure that a method that calls the GetCallingAssembly method is not inlined by the compiler, you can apply the MethodImplAttribute attribute with MethodImplOptions.NoInlining.
However, the JITter is also free to inline calls - but I think a disassembler would be the only way to verify what is and isn't done at that level.
Edit: Just to clear up some confusion in this thread, csc.exe will inline MSIL calls - though the JITter will (probably) be more aggressive in it.
[1] And, by waste - I mean that (a) that it defeats the purpose of the inlining (better performance) because of the Reflection lookup. And (b), it'd probably change the inlining behavior so that it's no longer inlined anyway. And, before you think you can just turn it on Debug builds with an Assert or something - realize that it will not be inlined during Debug, but may be in Release.

Is there a way to force the compiler to inline this call or will I do I just hope that the compiler will do this optimization?
If it is cheaper to inline the function, it will. So don't worry about it unless your profiler says that it actually is a problem.
For more information
JIT Enhancements in .NET 3.5 SP1

For simple code, you can try to get asm even online: https://sharplab.io/
For more complex cases, try https://github.com/szehetner/InliningAnalyzer (I've not tried it yet).

Turning off JIT, and controlling codeflow in MSIL (own VM)

I'm writing my own scripting language in C#, with some features I like, and I chose to use MSIL as output's bytecode (Reflection.Emit is quite useful, and I dont have to think up another bytecode). It works, emits executable, which can be run ( even decompiled with Reflector :) )and is quite fast.
But - I want to run multiple 'processes' in one process+one thread, and control their assigned CPU time manually (also implement much more robust IPC that is offered by .NET framework) Is there any way to entirely disable JIT and create own VM, stepping instruction-after-instruction using .NET framework (and control memory usage, etc.), without need to write anything on my own, or to achieve this I must write entire MSIL interpret?
EDIT 1): I know that interpreting IL isn't the fastest thing in the universe :)
EDIT 2): To clarify - I want my VM to be some kind of 'operating system' - it gets some CPU time and divides it between processes, controls memory allocation for them, and so on. It doesnt have to be fast, nor effective, but just a proof of concept for some of my experiments. I dont need to implement it on the level of processing every instruction - if this should be done by .NET, I wont mind, i just want to say : step one instruction, and wait till I told you to step next.
EDIT 3): I realized, that ICorDebug can maybe accomplish my needs, now looking at implementation of Mono's runtime.

You could use Mono - I believe that allows an option to interpret the IL instead of JITting it. The fact that it's open source means (subject to licensing) that you should be able to modify it according to your needs, too.
Mono doesn't have all of .NET's functionality, admittedly - but it may do all you need.

Beware that MSIL was designed to be parsed by a JIT compiler. It is not very suitable for an interpreter. A good example is perhaps the ADD instruction. It is used to add a wide variety of value type values: byte, short, int32, int64, ushort, uint32, uint64. Your compiler knows what kind of add is required but you'll lose that type info when generating the MSIL.
Now you need to find it back at runtime and that requires checking the types of the values on the evaluation stack. Very slow.
An easily interpreted IL has dedicated ADD instructions like ADD8, ADD16, etc.

Microsofts implementation of the Common Language Runtime has only one execution system, the JIT. Mono, on the other hand comes with both, a JIT and an interpreter.
I, however, do not fully understand what exactly you want to do yourself and what you would like to leave to Microsofts implementation:
Is there any way to entirely disable JIT and create own VM?
and
... without need to write anything on my own, or to achieve this I must write entire MSIL interpret?
is sort of contradicting.
If you think, you can write a better execution system than microsofts JIT, you will have to write it from scratch. Bear in mind, however, that both microsofts and monos JIT are highly optimized compilers. (Programming language shootout)
Being able to schedule CPU time for operating system processes exactly is not possible from user mode. That's the operating systems task.
Some implementation of green threads might be an idea, but that is definitely a topic for unmanaged code. If that's what you want, have a look at the CLR hosting API.
I would suggest, you try to implement your language in CIL. After all, it gets compiled down to raw x86. If you don't care about verifiability, you can use pointers where necessary.

One thing you could consider doing is generating code in a state-machine style. Let me explain what I mean by this.
When you write generator methods in C# with yield return, the method is compiled into an inner IEnumerator class that implements a state machine. The method's code is compiled into logical blocks that are terminated with a yield return or yield break statement, and each block corresponds to a numbered state. Because each yield return must provide a value, each block ends by storing a value in a local field. The enumerator object, in order to generate its next value, calls a method that consists of a giant switch statement on the current state number in order to run the current block, then advances the state and returns the value of the local field.
Your scripting language could generate its methods in a similar style, where a method corresponds to a state machine object, and the VM allocates time by advancing the state machine during the time allotted. A few tricky parts to this method: implementing things like method calls and try/finally blocks are harder than generating straight-up MSIL.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.