Turning off JIT, and controlling codeflow in MSIL (own VM) - c#

I'm writing my own scripting language in C#, with some features I like, and I chose to use MSIL as output's bytecode (Reflection.Emit is quite useful, and I dont have to think up another bytecode). It works, emits executable, which can be run ( even decompiled with Reflector :) )and is quite fast.
But - I want to run multiple 'processes' in one process+one thread, and control their assigned CPU time manually (also implement much more robust IPC that is offered by .NET framework) Is there any way to entirely disable JIT and create own VM, stepping instruction-after-instruction using .NET framework (and control memory usage, etc.), without need to write anything on my own, or to achieve this I must write entire MSIL interpret?
EDIT 1): I know that interpreting IL isn't the fastest thing in the universe :)
EDIT 2): To clarify - I want my VM to be some kind of 'operating system' - it gets some CPU time and divides it between processes, controls memory allocation for them, and so on. It doesnt have to be fast, nor effective, but just a proof of concept for some of my experiments. I dont need to implement it on the level of processing every instruction - if this should be done by .NET, I wont mind, i just want to say : step one instruction, and wait till I told you to step next.
EDIT 3): I realized, that ICorDebug can maybe accomplish my needs, now looking at implementation of Mono's runtime.

You could use Mono - I believe that allows an option to interpret the IL instead of JITting it. The fact that it's open source means (subject to licensing) that you should be able to modify it according to your needs, too.
Mono doesn't have all of .NET's functionality, admittedly - but it may do all you need.

Beware that MSIL was designed to be parsed by a JIT compiler. It is not very suitable for an interpreter. A good example is perhaps the ADD instruction. It is used to add a wide variety of value type values: byte, short, int32, int64, ushort, uint32, uint64. Your compiler knows what kind of add is required but you'll lose that type info when generating the MSIL.
Now you need to find it back at runtime and that requires checking the types of the values on the evaluation stack. Very slow.
An easily interpreted IL has dedicated ADD instructions like ADD8, ADD16, etc.

Microsofts implementation of the Common Language Runtime has only one execution system, the JIT. Mono, on the other hand comes with both, a JIT and an interpreter.
I, however, do not fully understand what exactly you want to do yourself and what you would like to leave to Microsofts implementation:
Is there any way to entirely disable JIT and create own VM?
and
... without need to write anything on my own, or to achieve this I must write entire MSIL interpret?
is sort of contradicting.
If you think, you can write a better execution system than microsofts JIT, you will have to write it from scratch. Bear in mind, however, that both microsofts and monos JIT are highly optimized compilers. (Programming language shootout)
Being able to schedule CPU time for operating system processes exactly is not possible from user mode. That's the operating systems task.
Some implementation of green threads might be an idea, but that is definitely a topic for unmanaged code. If that's what you want, have a look at the CLR hosting API.
I would suggest, you try to implement your language in CIL. After all, it gets compiled down to raw x86. If you don't care about verifiability, you can use pointers where necessary.

One thing you could consider doing is generating code in a state-machine style. Let me explain what I mean by this.
When you write generator methods in C# with yield return, the method is compiled into an inner IEnumerator class that implements a state machine. The method's code is compiled into logical blocks that are terminated with a yield return or yield break statement, and each block corresponds to a numbered state. Because each yield return must provide a value, each block ends by storing a value in a local field. The enumerator object, in order to generate its next value, calls a method that consists of a giant switch statement on the current state number in order to run the current block, then advances the state and returns the value of the local field.
Your scripting language could generate its methods in a similar style, where a method corresponds to a state machine object, and the VM allocates time by advancing the state machine during the time allotted. A few tricky parts to this method: implementing things like method calls and try/finally blocks are harder than generating straight-up MSIL.

Related

.NET JIT difference while using Factory Pattern and If-else or switch

I was just wandering, if I was to have a one method that handles 100 different cases based upon an enum let's say (assuming each case has on average 5 lines of code), would that actually impact performance ? How about instead of having all the code in one method, the Factory or Strategy pattern would to be used ?
The JIT only compiles the code which is actually needed at that point. So I guess it will compile the hole method of 100 cases right? It wouldn't actually know what part of that method is needed correct ? but if I was to split that method it will actually compile what it is needed right?. For example having an actions drop-down (list of 100 Car Brands)
How would that compare to each other in terms of performance?
Thanks.
I am afraid you are missing the essential thing that Jit does. It compiles cil which is CPU and platform independent into machine and platform specific machine code.
The JIT only compiles the code which is actually needed at that point.
Yes, the very first time your method is called then the Jit compiles it, hence the name Just-In-Time compiler.
So I guess it will compile the hole method of 100 cases right?
Yes.
It wouldn't actually know what part of that method is needed correct ? but if I was to split that method it will actually compile what it is needed right?
Now here the confusion comes, yes it will compile the whole method, but notice this will be only once per application startup. Since, Jit compiles at runtime we can say that there will be some negligible performance impact first time method was called, one can argue that splitting this method into multiple methods will require multiple Jit compilations.
With that being said, if you are working on some very high demand for performance application and you are worrying about Jit being drawback you can always use ngen and compile native images before application start and then CLR can use those images to spin up the process for you and this will remove Jit compilation as it was basically done before application was started. I can see how this might be useful for applications caring about cold startups such as applications hosted on function as a service platforms but thats as far as I can go with the real world examples.
And in the end,
How would that compare to each other in terms of performance?
to be honest I wouldn't be worried ever about Jit compilation impact, I'd rather focus on the data structures I am using or the domain specific logic I have in order to improve my performance.

JIT compiler and its benefits for speed up the execution of programs in .net in front of c++

As we know, .net has CLI and JIT to execute programs. but these two stage maybe cause to lower speed and performance in compare with c++ that compile all codes in one stage. I want to know that .net's languages how to overcome this disadvantage and deal with it?
Having worked on both C++ compilers and now having spent the past few years working on the .Net JIT, I think there are a few things worth considering:
As many others have pointed out, the JIT is running in process with your app, and it tries to carefully balance quick JIT times versus the quality of jitted code. The more elaborate optimizations seen in C++ often come with very high compile time price tags, and there are some pretty sharp knees in the compile-time-vs-code-quality graph.
Prejitting seemingly can change this equation somewhat as the jit runs beforehand and could take more time, but prejitting's ability to enlarge optimization scope is quite limited (for instance we try and avoid introducing fragile cross-assembly dependencies, and so for example won't inline across assembly boundaries). So prejitted code tends to run somewhat more slowly than jitted code, and mainly helps application startup times.
.Net's default execution model precludes many interprocedural optimizations, because of dynamic class loading, reflection, and the ability of a profiler to update method bodies in a running process. We think, by and large, that the productivity and app architecture gains from these features are worth the trouble. But for cases where these features are not needed we are looking for ways to ensure that if your app doesn't need it, your app won't pay for it.
For example we have some "pure" AOT work going on over in CoreRT but as a consequence reflection is limited.
.Net Core 2.1 includes a preview of Tiered jitting, which will allow us to ease some of the constraints on jit time -- we'll be able to invest more time jitting methods that we know are frequently executed. So I would expect to see more sophisticated optimizations get added to the JIT over time.
.Net Core 2.1 also includes a preview of Hardware Intrinsics so you can take full advantage of the rich instruction sets available on modern hardware.
.Net's JIT does not yet get much benefit from profile feedback. This is something we are actively working on changing, though it will take time, and will likely be tied into tiering.
The .Net execution model fundamentally alters the way one needs to think about certain compiler optimizations. For instance, from the compiler's standpoint, many operations -- including low level things like field access -- can raise semantically meaningful exceptions (in C++ only calls/throws can cause exceptions). And .Net's GC is precise and relocating which imposes constraints on optimizations in other ways.

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

C# or C++ sandboxed assembly

I'm thinking of writing a program that involves including super fast Assembly or as it dosn't have to be human readable it could be Machine Code in C++ or C#. However I also have other possibly more troublesome requirements.
I would need to be able to:
Store machine code programs in normal variables / object instances, for example strings "40 9B 7F 5F ..." to edit and run them.
Have the programs able to output data. I saw an example where one had a pointer to an int that it could use.
Have the programs not able to output data anywhere else. For example to not be able to perform such actions as to delete files, view the system spec or change the state of the memory of the C++ or C# program they are contained within.
For example, it could be something like this:
machine n;
n = "40 9B 7F";
n[1] = "5F";
// 'n' is now "40 5F 7F"
unsigned short s = 2;
n.run(&s);
// while 'n' was running it may have changed 's' but would not have been able to
// change anything else anywhere on the system including in this C++ / C# program
According to the wiki link Michael Dorgan posted "asm(std::string);" runs the String as assembler and it's also easy to referance variables from the C++ part of the program. Editing a std::String is easy and Alex has noted that I can ensure that the code is safe by not allowing unsafe commands.
Sandboxing native machine code is non-trivial. If you really want that take a look at NACL from google which implements a machine code sandbox for browsers.
What is more practical is to use .NET IL instead of machine code and use a sandboxed (or hosted) AppDomain. This comes much closer and still is fast due to the dynamically jit-compilation to machine code.
An alternative you have is to use Windows builtin rights management and spawn a new process with restricted rights. Never done that so I don't know if you can reduce the target processes rights as much as you want. Anyways that would be a pure win32 process just running machine code, so you lose any ability of using .NET in the sandboxed process.
If you want to include assembler in your C/C++ code, consider either inline assembly routines, or compiling seperate full on assembler files and linking them back in. Inline assembler syntax is kinda weird, but I believe it is probably the best choice for you from what I've read.
Wikipedia to the rescue for some samples:
Inline assembler examples
Update based on comments:
This is far from a trivial task. You have to implement a linker, assembler (to scan and sandbox) and loader.
I wonder what the use case is -- for my example I'll assume you want to to have an assembly contest where people submit solutions to problems and you "test" them.
This is the best solution I can think of:
Have a hosting program that takes as input assembly language.
Invoke the assembler to compile and link the assembly program.
Create a protected virtual environment for the program to run in (how you do this depends on the platform) which runs as a user that has no rights to the system.
Capture the results
This solution allows you to leverage existing assemblers, loaders and security without having to re-implement them.
The best example code of dynamically loading, running and sandboxing C# code I know of is the terrarium game at http://terrarium2.codeplex.com/
However, you might consider something better suited to this job, like a scripting system. Lua comes to mind as a popular one. Using Lua users will only be able to perform the actions you allow. http://www.lua.org/
If you restrict the subset of supported instructions, you can do what you want more or less easily.
First, you have to parse and decode an input instruction to see if it's in the supported subset (most of parsing/decoding can be done just once). Then you need to execute it.
But before executing, there's one important thing to take care of. Based on the decoded details of the instruction and the CPU registers state, you have to calculate the memory addresses that the instruction is going to access as data (including on-stack locations) or transfer control to. If any of those are outside of the established limits, fire alarm. Otherwise, if it's a control transferring instruction (e.g. jmp, jz), you must additionally ensure that the address it passes control to is not only within the memory, where all these instructions lie, but also is the address of one of those instructions and not an address inside of any of them (e.g. 1 or 2 bytes from the beginning of a 3+ bytes long instruction). Passing control anywhere else is a no-no. You do not want these instructions to pass control to any standard library functions either because you won't be able to control execution there and they're not always safe when supplied with bogus/malicious inputs. Also, these instructions must not be able to modify themselves.
If all is clear, you can either emulate the instruction or more or less directly execute it (control passing instructions will likely have to be always emulated because you want to stop execution after every instruction). For the latter you can create a modifiable function containing these things:
Code to save CPU registers of the caller and load them with the state for the instruction being executed.
The instruction.
The reverse of step 1: code to save post-execution register state and restore the caller's register state.
You can try this approach.

Would there be any point in designing a CPU that could handle IL directly?

If I understand this correctly:
Current CPU developing companies like AMD and Intel have their own API codes (the assembly language) as what they see as the 2G language on top of the Machine code (1G language)
Would it be possible or desirable (performance or otherwise) to have a CPU that would perform IL handling at it's core instead of the current API calls?
A similar technology does exist for Java - ARM do a range of CPUs that can do this, they call it their "Jazelle" technology.
However, the operations represented by .net IL opcodes are only well-defined in combination with the type information held on the stack, not on their own. This is a major difference from Java bytecode, and would make it much more difficult to create sensible hardware to execute IL.
Moreover, IL is intended for compilation to a final target. Most back ends that spit out IL do very little optimisation, aiming instead to preserve semantic content for verification and optimisation in the final compilation step. Even if the hardware problems could be overcome, the result will almost certainly still be slower than a decent optimising JIT.
So, to sum up: while it is not impossible, it would be disproportionately hard compared to other architectures, and would achieve little.
You seem a bit confused about how CPU's work. Assembly is not a separate language from machine code. It is simply a different (textual) representation of it.
Assembly code is simply a sequential listing of instructions to be executed. And machine code is exactly the same thing. Every instruction supported by the CPU has a certain bit-pattern that cause it to be executed, and it also has a textual name you can use in assembly code.
If I write add $10, $9, $8 and run it through an assembler, I get the machine code for the add instruction, taking the values in registers 9 and 8, adding them and storing the result in register 10.
There is a 1 to 1 mapping between assembler and machine code.
There also are no "API calls". The CPU simply reads from address X, and matches the subsequent bits against all the instructions it understands. Once it finds an instruction that matches this bit pattern, it executes the instruction, and moves on to read the next one.
What you're asking is in a sense impossible or a contradiction. IL stands for Intermediate Language, that is, a kind of pseudocode that is emitted by the compiler, but has not yet been translated into machine code. But if the CPU could execute that directly, then it would no longer be intermediate, it would be machine code.
So the question becomes "is your IL code a better, more efficient representation of a program, than the machine code the CPU supports now?"
And the answer is most likely no. MSIL (I assume that's what you mean by IL, which is a much more general term) is designed to be portable, simple and consistent. Every .NET language compiles to MSIL, and every MSIL program must be able to be translated into machine code for any CPU anywhere. That means MSIL must be general and abstract and not make assumptions about the CPU. For this reason, as far as I know, it is a purely stack-based architecture. Instead of keeping data in registers, each instruction processes the data on the top of the stack. That's a nice clean and generic system, but it's not very efficient, and doesn't translate well to the rigid structure of a CPU. (In your wonderful little high-level world, you can pretend that the stack can grow freely. For the CPU to get fast access to it, it must be stored in some small, fast on-chip memory with finite size. So what happens if your program push too much data on the stack?)
Yes, you could make a CPU to execute MSIL directly, but what would you gain?
You'd no longer need to JIT code before execution, so the first time you start a program, it would launch a bit faster. Apart from that, though? Once your MSIL program has been JIT'ed, it has been translated to machine code and runs as efficiently as if it had been written in machine code originally. MSIL bytecode no longer exists, just a series of instructions understood by the CPU.
In fact, you'd be back where you were before .NET. Non-managed languages are compiled straight to machine code, just like this would be in your suggestion. The only difference is that non-managed code targets machine code that is designed by CPU designers to be suitable for execution on a CPU, while in your case, it'd target machine code that's designed by software designers to be easy to translate to and from.
This is not a new idea - the same thing was predicted for Java, and Lisp machines were even actually implemented.
But experience with those shows that it's not really useful - by designing special-purpose CPUs, you can't benefit from the advances of "traditional" CPUs, and you very likely can't beat Intel at their own game. A quote from the Wikipedia article illustrates this nicely:
cheaper desktop PCs soon were able to
run Lisp programs even faster than
Lisp machines, without the use of
special purpose hardware.
Translating from one kind of machine code to another on the fly is a well-understood problem and so common (modern CISC CPUs even do something like that internally because they're really RISC) that I believe we can assume it is being done efficiently enough that avoiding it does not yield significant benefits - not when it means you have to decouple yourself from the state of the art in traditional CPUs.
I would say no.
The actual machine language instructions that need to run on a computer are lower level than IL. IL, for example, doesn't really describe how methods calls should be made, how registers should be managed, how the stack should be accessed, or any other of the details that are needed at the machine code level.
Getting the machine to recognize IL directly would, therefore, simple move all the JIT compilation logic from software into hardware.
That would make the whole process very rigid and unchangeable.
By having the machine language based on the capabilities of the machine, and an intermediate language based on capturing programmer intent, you get a much better system. The folks defining the machine can concentrate on defining an efficient computer architecture, and the folks defining the IL system can focus on things like expressiveness and safety.
If both the tool vendors and the hardware vendors had to use the exact same representation for everything, then innovation in either the hardware space or the tool space would be hampered. So, I say they should be separate from one another.
I wouldn't have thought so for two reasons:-
If you had hardware processing IL that hardware would not be able to run a newer version of IL. With JIT you just need a new JIT then existing hardware can run the newer IL.
IL is simple and is designed to be hardware agnostic. The IL would have to become much more complex to enable it to describe operations in the most effecient manner in order to get anywhere close to the performance of existing machine code. But that would mean the IL would be much harder to run on non IL specific hardware.

Categories