Questions about pre-JIT-ing using RuntimeHelpers.PrepareMethod()

Questions about pre-JIT-ing using RuntimeHelpers.PrepareMethod() - c#

I'm exploring the use of RuntimeHelpers.PrepareMethod() to reduce start-up time on thin client applications with heavy UI libraries
I created a JIT-helper class to run on a background thread and iterates through methods of a type or assembly and calls PrepareMethod on them
First of all, is there any drawback to doing this? (and I don't mean JIT-ing the entire application, I mean just heavy libraries eg Infragistics, DevExpress and classes representing window classes in WPF)
Secondly, is there anyway to determine whether or not a method has been JIT-ed already? (although I didn't notice any delay or problems from accidentally calling it multiple times)
Lastly, what happens if I do the JIT-ing on a background thread and another thread calls a method that is currently being JIT-ed?

Since what you ask is highly implementation-dependent there is no definitive answer... I would expect there to be some sort of locking on the method while it is being JITted... but other than digging deep into the specific .NET version etc. this remains speculation...
BTW: there is a (non-public) field called IsJitted on the respective MethodDesc which the JIT compiler sets to true after jitting... for some more information see here...

Related

AccessViolationException when accessing variable solution value

We've been utilizing the OR tools to solve linear optimizations in a real-time, .NET application. That is, solving linear optimizations regularly using different inputs as time progresses.
Recently we ran into an issue that we haven't seen before while running our application on a server for extended periods of time, in which seemingly random attempts to solve the optimizations were causing AccessViolationExceptions. Specifically,
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
at Google.OrTools.LinearSolver.operations_research_linear_solverPINVOKE.Variable_SolutionValue(System.Runtime.InteropServices.HandleRef)
...
I'm trying to find out more specifically where this is happening in the pipeline, but given the output there I believe it is a section in which we are trying to retrieve the individual variable solution values out of the solver after solving the optimization.
We are using a wide variety of constraints over a decent sized number of variables.
Has anyone seen this before?

Reference github issue link
After some testing we found that what appears to have been happening is that the garbage collector was collecting some of the Variables we were using during the P/Invoke, as per this.
Unfortunately, this seems to be a side effect of the way that SWIG creates its .NET wrappers and their IDisposable implementations, using HandleRefs instead of something like SafeHandles, which 'handle' this as per the documentation:
Platform invoke operations automatically increment the reference count of handles encapsulated by a SafeHandle and decrement them upon completion. This ensures that the handle will not be recycled or closed unexpectedly.
More information here.
Without wanting to get into the business of creating our own SWIG typemap or compiling a new version of SWIG, .NET provides a way of keeping objects 'alive' with regard to the Garbage Collector. That is, calling GC.KeepAlive on all of the objects which we will be accessing values from via P/Invoke (in our case the Solver and our Variables) at the end of the optimization procedure, prevents the garbage collector from thinking that they are collectible until the end of the scope of the KeepAlive method without side effects (as per their documentation).
Preliminary testing has shown this to work, though given that it was already intermittently occurring before, we'll be watching for this happening going forward.
Going forward, I think either making a request of SWIG to use SafeHandles is probably the best idea (it has been discussed before and is still an open issue) or changing the typemap to use SafeHandles directly, is likely the best option. I may try investigating the later option myself, but because this fix ended up only adding 3 lines of code (plus a host of comments) to our code base for what seems like a full fix, it's going to be low priority for me. That said, a fix for this would be nice for an upcoming version.

C# first class continuation via C++ interop or some other way?

We have a very high performance multitasking, near real-time C# application. This performance was achieved primarily by implementing cooperative multitasking in-house with a home grown scheduler. This is often called micro-threads. In this system all the tasks communicate with other tasks via queues.
The specific problem that we have seems to only be solvable via first class continuations which C# does not support.
Specifically the problem arises in 2 cases dealing with queues. Whenever any particular task performs some work before placing an item on a queue. What if the queue is full?
Conversely, a different task may do some work and then need to take an item off of a queue. What if that queue is empty?
We have solved this in 90% of the cases by linking queues to tasks to avoid tasks getting invoked if any of their outbound queues are full or inbound queue is empty.
Furthermore certain tasks were converted into state machines so they can handle if a queue is full/empty and continue without waiting.
The real problem arises in a few edge cases where it is impractical to do either of those solutions. The idea in that scenario would be to save the stack state at the point and switch to a different task so that it can do the work and subsequently retry the waiting task whenever it is able to continue.
In the past, we attempted to have the waiting task call back into the schedule (recursively) to allow the other tasks to and later retry the waiting task. However, that led to too many "deadlock" situations.
There was an example somewhere of a custom CLR host to make the .NET threads actually operate as "fibers" which essentially allows switching stack state between threads. But now I can't seem to find any sample code for that. Plus it seems that will take some significant complexity to get it right.
Does anyone have any other creative ideas how to switch between tasks efficiently and avoid the above problems?
Are there any other CLR hosts that offer this, commercial or otherwise? Is there any add-on native library that can offer some form of continuations for C#?

There is the C# 5 CTP, which performs a continuation-passing-style transformation over methods declared with the new async keyword, and continuation-passing based calls when using the await keyword.
This is not actually a new CLR feature but rather a set of directives for the compiler to perform the CPS transformation over your code and a handful of library routines for manipulating and scheduling continuations. Activation records for async methods are placed on the heap instead of the stack, so they're not tied to a specific thread.

Nope, not going to work. C# (and even IL) is too complex language to perform such transformations (CPS) in a general way. The best you can get is what C# 5 will offer. That said, you will probably not be able to break/resume with higher order loops/iterations, which is really want you want from general purpose reifiable continuations.

Fiber mode was removed from v2 of the CLR because of issues under stress, see:
Fiber mode is gone...
Fibers and the CLR
Question to the CLR experts : fiber mode support in hosting
To my knowledge fiber support has not yet bee re-added, although from reading the above articles it may be added again (however the fact that nothing has mentioned for 6-7 years on the topic makes me believe that its unlikely).
FYI fiber support was intended to be a way for existing applications that use fibers (such as SQL Server) to host the CLR in a way that allows them to maximise performance, not as a method to allow .Net applications to create hundereds of threads - in short fibers are not a magic bullet solution to your problem, however if you have an application that uses fibers an wishes to host the CLR then the managed hosting APIs do provide the means for the CLR to "work nicely" with your application. A good source of information on this would be the managed hosting API documentation, or to look into how SQL Server hosts the CLR, of which there are several highly informative articles around.
Also take a quick read of Threads, fibers, stacks and address space.

Actually, we decided on a direction to go with this. We're using the Observer pattern with Message Passsing. We built a home grown library to handle all communication between "Agents" which are similar to an Erlang process. Later we will consider using AppDomains to even better separate Agents from each other. Design ideas were borrowed from the Erlang programming language which has extremely reliable mult-core and distributed processing.

The solution to your problem is to use lock-free algorithms allowing for system wide progress of at least one task. You need to use inline assembler that is CPU dependent to make sure that you atomic CAS (compare-and-swap). Wikipedia has an article as well as patterns described the the book by Douglas Schmidt called "Pattern-Oriented Software Architecture, Patterns for Concurrent and Networked Objects". It is not immediately clear to me how you will do that under the dotnet framework.
Other way of solving your problem is using the publish-subscriber pattern or possible thread pools.
Hope this was helpful?

What is C#'s version of the GIL?

In the current implementation of CPython, there is an object known as the "GIL" or "Global Interpreter Lock". It is essentially a mutex that prevents two Python threads from executing Python code at the same time. This prevents two threads from being able to corrupt the state of the Python interpreter, but also prevents multiple threads from really executing together. Essentially, if I do this:
# Thread A
some_list.append(3)
# Thread B
some_list.append(4)
I can't corrupt the list, because at any given time, only one of those threads are executing, since they must hold the GIL to do so. Now, the items in the list might be added in some indeterminate order, but the point is that the list isn't corrupted, and two things will always get added.
So, now to C#. C# essentially faces the same problem as Python, so, how does C# prevent this? I'd also be interested in hearing Java's story, if anyone knows it.
Clarification: I'm interested in what happens without explicit locking statements, especially to the VM. I am aware that locking primitives exist for both Java & C# - they exist in Python as well: The GIL is not used for multi-threaded code, other than to keep the interpreter sane. I am interested in the direct equivalent of the above, so, in C#, if I can remember enough... :-)
List<String> s;
// Reference to s is shared by two threads, which both execute this:
s.Add("hello");
// State of s?
// State of the VM? (And if sane, how so?)
Here's another example:
class A
{
public String s;
}
// Thread A & B
some_A.s = some_other_value;
// some_A's state must change: how does it change?
// Is the VM still in good shape afterwards?
I'm not looking to write bad C# code, I understand the lock statements. Even in Python, the GIL doesn't give you magic-multi-threaded code: you must still lock shared resources. But the GIL prevents Python's "VM" from being corrupted - it is this behavior that I'm interested in.

Most other languages that support threading don't have an equivalent of the Python GIL; they require you to use mutexes, either implicitly or explicitly.

Using lock, you would do this:
lock(some_list)
{
some_list.Add(3);
}
and in thread 2:
lock(some_list)
{
some_list.Add(4);
}
The lock statement ensures that the object inside the lock statement, some_list in this case, can only be accessed by a single thread at a time. See http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx for more information.

C# does not have an equivalent of GIL to Python.
Though they face the same issue, their design goals make them
different.
With GIL, CPython ensures that suche operations as appending a list
from two threads is simple. Which also
means that it would allow only one
thread to run at any time. This
makes lists and dictionaries thread safe. Though this makes the job
simpler and intuitive, it makes it
harder to exploit the multithreading
advantage on multicores.
With no GIL, C# does the opposite. It ensures that the burden of integrity is on the developer of the
program but allows you to take
advantage of running multiple threads
simultaneously.
As per one of the discussion -
The GIL in CPython is purely a design choice of having
a big lock vs a lock per object
and synchronisation to make sure that objects are kept in a coherent state.
This consist of a trade off - Giving up the full power of
multithreading.
It has been that most problems do not suffer from this disadvantage
and there are libraries which help you exclusively solve this issue when
required.
That means for a certain class of problems, the burden to utilize the
multicore is
passed to developer so that rest can enjoy the more simpler, intuitive
approach.
Note: Other implementation like IronPython do not have GIL.

It may be instructive to look at the documentation for the Java equivalent of the class you're discussing:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

Most complex datastructures(for example lists) can be corrupted when used without locking in multiple threads.
Since changes of references are atomic, a reference always stays a valid reference.
But there is a problem when interacting with security critical code. So any datastructures used by critical code most be one of the following:
Inaccessible from untrusted code, and locked/used correctly by trusted code
Immutable (String class)
Copied before use (valuetype parameters)
Written in trusted code and uses internal locking to guarantee a safe state
For example critical code cannot trust a list accessible from untrusted code. If it gets passed in a List, it has to create a private copy, do it's precondition checks on the copy, and then operate on the copy.

I'm going to take a wild guess at what the question really means...
In Python data structures in the interpreter get corrupted because Python is using a form of reference counting.
Both C# and Java use garbage collection and in fact they do use a global lock when doing a full heap collection.
Data can be marked and moved between "generations" without a lock. But to actually clean it up everything must come to a stop. Hopefully a very short stop, but a full stop.
Here is an interesting link on CLR garbage collection as of 2007:
http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1104.entry

If reflection is inefficient, when is it most appropriate?

I find a lot of cases where I think to myself that I could use relfection to solve a problem, but I usually don't because I hear a lot along the lines of "don't use reflection, it's too inefficient".
Now I'm in a position where I have a problem where I can't find any other solution than to use reflection with new T(), as outlined in this question & answer.
So I'm wondering if somebody can tell me reflection's specific intended usage, and if there's a set of guidelines to indicate when it's appropriate and when it isn't?

It is often "fast enough", and if you need faster (for tight loops etc) you can do meta-programming with Expression or ILGenerator (perhaps via DynamicMethod), to make extremely fast code (including some tricks you can't do in C#).
Reflection is more commonly used for framework/library scenarios, where the library by definition knows nothing about the caller, and must work based on configuration, attributes or patterns.

If there's one thing that I hate hearing it's "don't use reflection, it's too inefficient".
Too inefficient for what? If you're writing a console application that's run once a month and isn't time critical, does it really matter if it takes 30 seconds instead of 28, because of you using reflection?
Guidelines for when it's inappropriate to use are ones that only you can really put together as they're heavily dependent on what you're doing and how efficient/performant alternatives are.

A useful abstraction for code efficiency is to partition it in three categories of time, each about 3 orders of magnitude apart.
First is human-time. There's a lot you can do when you only need to keep a person happy with the performance of your code. Humans cannot perceive the difference between code that needs 10 milliseconds or 20 milliseconds, both look instant. And a human is forgiving when a program needs 6 seconds instead of 5, roughly 3 billion machine instructions more. Common examples of programs that run at human-time are compilers and point-and-click designers. Using reflection is never a problem.
Then there is I/O-time. When your program needs to hit the disk or the network. I/O is slow, restricted by mechanical motion in the case of the disk, bandwidth and latency in the case of a network. You can always tell when I/O is the bottleneck, your program is running but it isn't driving up the CPU load much. The operating system is constantly blocking the thread, making it wait until the I/O request is complete.
Reflection operates at I/O-time. To retrieve type data, the CLR must read the assembly metadata. And when that wasn't done before, your program will cause a page-fault, requiring the operating system to read the data from disk. What follows is that, roughly, reflection can make I/O bound code only twice as slow. Usually better because after the first perf hit, the metadata is cached and can be retrieved a lot quicker. Reflection is thus often an acceptable trade-off. The canonical examples are serialization and dbase ORMs.
Then there's machine-time. The raw performance of a CPU core is stupendous. A property getter can execute in somewhere between 0 and 1/2 a nanosecond. This does not compare favorably with, say, PropertyInfo.GetValue(). Both will keep the CPU busy, you'll see the CPU load for the core at 100%. But GetValue() costs hundreds if not thousands of machine code instructions. Not counting the time needed to page in the metadata. While not much an incremental time, it builds up fast when you loop.
If you cannot classify your reflection code in the human-time or I/O-time categories then reflection is unlikely to be an appropriate substitute for regular code.

The key to keeping reflection from slowing down your program is to not use it inside a loop. If you want to read a property from an object during startup (happens once), use reflection. You want to read a property from a list of 10,000 objects of unknown type, use reflection to get the property getter delegate once (search term: PropertyInfo.GetGetMethod), then call the delegate 10,000 types. There are plenty of examples of this on StackOverflow.

Reflection is not inefficient. It is less efficient than direct calls. So personnaly I use reflection when there's no equivalent compile time safe method. IMHO the problem with reflection is not so much the efficiency but the fragility of the code as it uses magic strings which are very refactor unfriendly.

I use it for plugin architecture - looking through assemblies in the plugin folder for methods marked with a custom attribute indicating info about the plugin - and in a logging framework. The framework detects a custom attribute on the assembly itself which holds information about the author of the assembly, the project, version information, and other tags that are logged along with everything in the stack trace.
Going to give away a 'trade secret', but it's a good one. The framework allows you to tag each method or class with a 'Story ref', e.g.
[StoryRef(Ref="ImportCSV1")]
...and the idea is it would integrate into our agile project management framework: if there were any exceptions thrown within that class/method, the logging method would use reflection to check for a StoryRef attribute in the stack trace, and if so that would be logged as an exception against that story. In the PM software you could see exceptions by Story (a story is like an extreme/agile use case).
I think that's a valid use, at least! Basically, when it just seems the most neat, and appropriate way to do it, I use reflection. Nothing else really comes into it - I can't think of an occasion you'd be using reflection to make that many calls that efficiency would come into it.

So I'm wondering if somebody can tell
me reflection's specific intended
usage, and if there's a set of
guidelines to indicate when it's
appropriate and when it isn't?
A bad example of reflection is this one from Wikipedia:
//Without reflection
Foo foo = new Foo();
foo.Hello();
//With reflection
Type t = Type.GetType("FooNamespace.Foo");
object foo = Activator.CreateInstance(t);
t.InvokeMember("Hello", BindingFlags.InvokeMethod, null, foo, null);
Here, there is no advantage to using reflection: The non-reflection-using code is not only more efficient, but easier to understand.
Good uses of reflection are things like serialization and object-relational mapping, which are easy to implement if you have a list of a class's properties, but otherwise require a custom-written function for each class.

How can you do Co-routines using C#?

In python the yield keyword can be used in both push and pull contexts, I know how to do the pull context in c# but how would I achieve the push. I post the code I am trying to replicate in c# from python:
def coroutine(func):
def start(*args,**kwargs):
cr = func(*args,**kwargs)
cr.next()
return cr
return start
#coroutine
def grep(pattern):
print "Looking for %s" % pattern
try:
while True:
line = (yield)
if pattern in line:
print line,
except GeneratorExit:
print "Going away. Goodbye"

If what you want is an "observable collection" -- that is, a collection which pushes results at you rather than letting the consumer pull them -- then you probably want to look into the Reactive Framework extensions. Here's an article on it:
http://www.infoq.com/news/2009/07/Reactive-Framework-LINQ-Events
Now, as you note, you can build both "push" and "pull" style iterators easily if you have coroutines available. (Or, as Thomas points out, you can build them with continuations as well.) In the current version of C# we do not have true coroutines (or continuations). However, we are very concerned about the pain users feel around asynchronous programming.
Implementing fiber-based coroutines as a first-class language feature is one technique that could possibly be used to make asynchronous programming easier, but that is just one possible idea of many that we are at present researching. If you have a really solid awesome scenario where coroutines do a better job than anything else -- including the reactive framework -- then I'd love to hear more about it. The more realistic data we have about what real problems people are facing in asynchronous programming, the more likely we are to come up with a good solution. Thanks!
UPDATE: We have recently announced that we are adding coroutine-like asynchronous control flows to the next version of C# and VB. You can try it yourself with our Community Technology Preview edition, which you can download here.

C# does not have general co-routines. A general co-routine is where the co-routine has its own stack, i.e. it can invoke other methods and those methods can "yield" values. Implementation of general co-routines requires making some smart things with stacks, possibly up to and including allocating stack frames (the hidden structures which contain local variables) on the heap. This can be done, some languages do that (e.g. Scheme), but it is somewhat tricky to do it right. Also, many programmers find the feature difficult to understand.
General co-routines can be emulated with threads. Each thread has its own stack. In a co-routine setup, both threads (the initial caller, and the thread for the co-routine) will alternate control, they will never actually run simultaneously. The "yield" mechanism is then an exchange between the two threads, and as such it is expensive (synchronization, a roundtrip through the OS kernel and scheduler...). Also, there is much room for memory leaks (the co-routine must be explicitly "stopped", otherwise the waiting thread will stick forever). Thus, this is rarely done.
C# provides a bastardized-down co-routine feature called iterators. The C# compiler automatically converts the iterator code into a specific state class, with local variables becoming class fields. Yielding is then, at the VM level, a plain return. Such a thing is doable as long as the "yield" is performed from the iterator code itself, not from a method which the iterator code invokes. C# iterators already cover many use cases and the C# designers were unwilling to go further down the road to continuations. Some sarcastic people are keen to state that implementing full-featured continuations would have prevented C# from being as efficient as its arch-enemy Java (efficient continuations are feasible, but this requires quite some work with the GC and the JIT compiler).

Maybe this will help.
http://blogs.msdn.com/ericlippert/archive/2009/07/23/iterator-blocks-part-five-push-vs-pull.aspx

thanks #NickLarsen, you helped me remember the new stuff that MS have introduced, the IObservable interface.
link http://msdn.microsoft.com/en-us/library/dd783449(VS.100).aspx

Actually .NET does not make "incorrect assumptions" about thread affinity, in fact it totally decouples the notion of a .NET level thread from the OS level thread.
What you have to do is associate a logical .NET thread state with your fiber ( for that you need the CLR Hosting API's but you don'T need to write a host yourself you can use those needed from your own application directly ) and everything, lock tracking, exception handling works normally again.
An example can be found here: http://msdn.microsoft.com/en-us/magazine/cc164086.aspx
Btw Mono 2.6 contains low level Coroutine support and can be used to implement all higher level primitives easily.

I would love to see a fiber-based API for .Net.
I attempted to use the native fiber API in C# through p/invoke a while back, but because the runtime's exception handling (incorrectly) makes thread-based assumptions, things broke (badly) when exceptions happened.
One "killer app" for a fiber-based coroutine API is game programming; certain types of AI require a "lightweight" thread that you can time-slice at will. For example, game behavior trees require the ability to "pulse" the decision code every frame, allowing the AI code to cooperatively yield back to the caller when the decision slice is up. This is possible to implement with hard threads, but much, much more complicated.
So while true fiber use-cases are not mainstream, they definitely exist, and a small niche of us .Net coders would cheer mightily if the existing bugs in the fiber subsystem were worked out.

Well, i gave a try developing a full library to manage coroutines with only a single thread. The hard part was to call coroutines inside coroutines...and to return parameters, but finally i reached a pretty good result here. The only warning are that blocking I/O operations must be made through tasks and alll "return" must be replaced with "yield return".
With the application server based on this library i was able to nearly double the requests made with a standard async/await based on IIS. (Seek for Node.Cs and Node.Cs.Musicstore on github to try it at home)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.