I have been searching lately for information on how to construct a lock-free priority queue in C#. I have yet to even find an implementation in any language, or a decent paper on the matter. I have found several papers which appear to be copies or at least referencing one particular paper which is not actually a paper on lock free priority queues, despite its name; it is in fact a paper on a priority queue which uses fine grained locks.
The responses I have been receiving from elsewhere include "use a single thread" and "you do not need it to be lock free" and "it is impossible". All three of these responses are incorrect.
If someone has some information on this, I would greatly appreciate it.
Generally, it's a bad idea to write this kind of code yourself.
However, if you really want to write this kind of code, I say take a page from Eric Lippert's book (or blog, as it were) (web archive link), where basically, you would implement the queue but instead of having all the functions that make modifications on the queue modify the instance you call the method on, the methods return completely new instances of the queue.
This is semantically similar to the pattern that System.String uses to maintain immutability; all operations return a new System.String, the original is not modified.
The result of this is that you are forced to reassign the reference returned on every call. Because the assignments of references are atomic operations, there is no concern about thread-safety; you are guaranteed that the reads/writes will be atomic.
However, this will result in a last-in-wins situation; it's possible that multiple modifications are being made to the queue, but only the last assignment will hold, losing the other insertions into the queue.
This might be acceptable; if not, you have to use synchronization around the assignment and reading of the reference. You will still have a lock-free-priority queue, but if you have concerns about thread-safety and maintaining the integrity of the operations, you have done nothing but move the concern about synchronization outside of the data structure (which is almost all cases, is a good thing, as it gives you fine-grained explicit control).
The Art of Multiprocessor Programming. Look at Chapter 15 - Priority Queues. Book is in Java, but can be easily translated to C# since they both have GC (which is important for most implementations in the book).
Related
I have a built complex application using a multi-tiered producer-consumer pattern, with multiple consumers performing specialized tasks before enqueing data to the next group of consumers. The ultimate job of the application is to break down a raw data file into test records for individual units that that will have been normalized.
The base of the P-C pattern uses Dustin Hyun's pattern from http://dustin-hyun.blogspot.com/2013_07_01_archive.html. I have made numerous modifications because of the multiple tiered approach and others. The code is too complex to post here- perhaps I could post snippets upon request to help clarify and answer questions.
I have employed two tools to speed up how a file gets processed. First is multiple instances of any of the tiers of consumer- there could be eight "index" consumers running whose jobs are to convert the test data from unit IDs and Test Names to Unit Indices and Test Name Indices to normalize the results to load into the DB. Second is the Bundling of units into merged DataTables at two point in the operation.
I have identified that data is lost intermittently, but in a fairly predictable pattern. It appears to be the last, incomplete bundle where the data was expected to have been. After the standard loop pattern, I have a check for a boolean that I use to flag if there is an incoMplete bundle, and it works:
if (dataToSend) // Check if incomplete bundle to process & send prior to ending comsumer operation.
{
UpdateLimitsIndices(bundleNlu);
Enqueue(StdfQType.Func, new BundledNamedTables((N_ParamRes)bundlePR.Copy(), (N_FuncRes)bundleFR.Copy(), numUnitsInCurrBundle));
}
I also have put locks onto everyplace I can see where the any of the p_c entities read or write anything from any of the shared queue members. With just the locks, there appeared to be no real impact. On a whim, I started to play with the sleep time before the loop re-spins So far, Test conditions that caused data loss with a 1ms sleep did not cause data loss during a 100 ms sleep or even a 10 ms sleep during limited testing. Could it be that the longer sleep is allowing the last piece/bundle of data to be properly processed?
I recognize that this question is vague and has few specifics because the application is too complex to post. I do hope I gave enough information for a dialog to start, however. I look for eard to heading your thoughts.
Jeff
I would suggest that because you are not using thread-safe collections (and neither does the author that you are basing your code on) that this may be the basis for losing data due to a concurrent write operation that fails (silently).
Luckily, along with the Task Parallel Library (TPL) .NET 4.0 gives us a whole bunch of concurrent collections which ARE thread-safe for multi-threaded environments.
Have a look at the collections in System.Collections.Concurrent as they are all thread-safe and their locking mechanisms are a lot faster than traditional lock-based objects.
Threading is very difficult to get right, and it appears that you have not gotten it right. Also, why are you (and the author of that blog post) using sleep intervals rather than Monitor.Pulse()?
Rather than trying to implement this yourself, why not use a library that will give you a slightly higher level of abstraction above the underlying thread coordination mechanism?
TPL Dataflow
Reactive Extensions
Does a thread's context refer to a thread's personal memory? If so, how is memory shared between multiple threads?
I'm not looking for code examples- I understand synchronization on a high level, I'm just confused about this term, and looking to gain some insight on what's actually happening behind scenes.
The reason I thought/think each thread has some kind of private memory was because of the volatile keyword in Java and .NET, and how different threads can have different values for the same primitive if its not used. That always implied private memory to me.
As I didn't realize the term was more general, I guess I'm asking how context-switching works in Java and C# specifically.
The reason I thought/think each thread has some kind of private memory was because of the volatile keyword in Java and .NET, and how different threads can have different values for the same primitive if its not used. That always implied private memory to me.
OK, now we're getting to the source of your confusion. This is one of the most confusing parts about modern programming. You have to wrap your head around this contradiction:
All threads in a process share the same virtual memory address space, but
Any two threads can disagree at any time on the contents of that space
How can that be? Because
processors make local copies of memory pages for performance reasons, and only infrequently compare notes to make sure that all their copies say the same thing. If two threads are on two different processors then they can have completely inconsistent views of "the same" memory.
memory in single-threaded scenarios is typically thought of as "still" unless something causes it to change. This intuition serves you poorly in multithreaded processes. If there are multiple threads accessing memory you are best to treat all memory as constantly in a state of flux unless something is forcing it to remain still. Once you start thinking of all memory as changing all the time it becomes clear that two threads can have an inconsistent view. No two movies of the ocean during a storm are alike, even if its the same storm.
compilers are free to make any optimization to code that would be invisible on a single threaded system. On a multi-threaded system, those optimizations can suddenly become visible, which can lead to inconsistent views of data.
If any of that is not clear, then start by reading my article explaining what "volatile" means in C#:
http://blogs.msdn.com/b/ericlippert/archive/2011/06/16/atomicity-volatility-and-immutability-are-different-part-three.aspx
And then read the section "The Need For Memory Models" in Vance's article here:
http://msdn.microsoft.com/en-us/magazine/cc163715.aspx
Now, as for the specific question as to whether a thread has its own block of memory, the answer is yes, in two ways. First, since a thread is a point of control, and since the stack is the reification of control flow, every thread has its own million-byte stack. That's why threads are so expensive. In .NET, those million bytes are actually committed to the page file every time you create a thread, so be careful about creating unnecessary threads.
Second, threads have the aptly named "thread local storage", which is a small section of memory associated with each thread that the thread can use to store interesting information. In C# you use the ThreadStatic attribute to mark a field as being local to a thread.
The actual make up of a "thread context" is implementation specific, but generally I have always understood a thread's context to refer to the current state of the thread and how it views memory at a specific time. This is what "context switching" is.. saving and restoring the state of a thread (it's context).
Memory is shared between the contexts.. they are part of the same process.
I don't consider myself a huge expert on the topic.. but this is what I have always understood that specific term to mean.
There are a lot of articles and discussions explaining why it is good to build thread-safe classes. It is said that if multiple threads access e.g. a field at the same time, there can only be some bad consequences. So, what is the point of keeping non thread-safe code? I'm focusing mostly on .NET, but I believe the main reasons are not language-dependent.
E.g. .NET static fields are not thread-safe. What would be the result if they were thread-safe by default? (without a need to perform "manual" locking). What are the benefits of using (actually defaulting to) non-thread-safety?
One thing that comes to my mind is performance (more of a guess, though). It's rather intuitive that, when a function or field doesn't need to be thread-safe, it shouldn't be. However, the question is: what for? Is thread-safety just an additional amount of code you always need to implement? In what scenarios can I be 100% sure that e.g. a field won't be used by two threads at once?
Writing thread-safe code:
Requires more skilled developers
Is harder and consumes more coding efforts
Is harder to test and debug
Usually has bigger performance cost
But! Thread-safe code is not always needed. If you can be sure that some piece of code will be accessed by only one thread the list above becomes huge and unnecessary overhead. It is like renting a van when going to neighbor city when there are two of you and not much luggage.
Thread safety comes with costs - you need to lock fields that might cause problems if accessed simultaneously.
In applications that have no use of threads, but need high performance when every cpu cycle counts, there is no reason to have safe-thread classes.
So, what is the point of keeping non thread-safe code?
Cost. Like you assumed, there usually is a penalty in performance.
Also, writing thread-safe code is more difficult and time consuming.
Thread safety is not a "yes" or "no" proposition. The meaning of "thread safety" depends upon context; does it mean "concurrent-read safe, concurrent write unsafe"? Does it mean that the application just might return stale data instead of crashing? There are many things that it can mean.
The main reason not to make a class "thread safe" is the cost. If the type won't be accessed by multiple threads, there's no advantage to putting in the work and increase the maintenance cost.
Writing threadsafe code is painfully difficult at times. For example, simple lazy loading requires two checks for '== null' and a lock. It's really easy to screw up.
[EDIT]
I didn't mean to suggest that threaded lazy loading was particularly difficult, it's the "Oh and I didn't remember to lock that first!" moments that come fast and hard once you think you're done with the locking that are really the challenge.
There are situations where "thread-safe" doesn't make sense. This consideration is in addition to the higher developer skill and increased time (development, testing, and runtime all take hits).
For example, List<T> is a commonly-used non-thread-safe class. If we were to create a thread-safe equivalent, how would we implement GetEnumerator? Hint: there is no good solution.
Turn this question on its head.
In the early days of programming there was no Thread-Safe code because there was no concept of threads. A program started, then proceeded step by step to the end. Events? What's that? Threads? Huh?
As hardware became more powerful, concepts of what types of problems could be solved with software became more imaginative and developers more ambitious, the software infrastructure became more sophisticated. It also became much more top-heavy. And here we are today, with a sophisticated, powerful, and in some cases unnecessarily top-heavy software ecosystem which includes threads and "thread-safety".
I realize the question is aimed more at application developers than, say, firmware developers, but looking at the whole forest does offer insights into how that one tree evolved.
So, what is the point of keeping non thread-safe code?
By allowing for code that isn't thread safe you're leaving it up to the programmer to decide what the correct level of isolation is.
As others have mentioned this allows for complexity reduction and improved performance.
Rico Mariani wrote two articles entitled "Putting your synchronization at the correct level" and
Putting your synchronization at the correct level -- solution that have a nice example of this in action.
In the article he has a method called DoWork(). In it he calls other classes Read twice Write twice and then LogToSteam.
Read, Write, and LogToSteam all shared a lock and were thread safe. This is good except for the fact that because DoWork was also thread safe all the synchronizing work in each Read, Write and LogToSteam was a complete waste of time.
This is all related to the nature Imperative Programming. Its side effects cause the need for this.
However if you had an development platform where applications could be expressed as pure functions where there were no dependencies or side effects then it would be possible to create applications where the threading was managed without developer intervention.
So, what is the point of keeping non thread-safe code?
The rule of thumb is to avoid locking as much as possible. The Ideal code is re-entrant and thread safe with out any locking. But that would be utopia.
Coming back to reality, a good programmer tries his level best to have a sectional locking as opposed to locking the entire context. An example would be to lock few lines of code at a time in various routines than locking everything in a function.
So Also, one has to refactor the code to come up with a design that would minimize the locking if not get rid of it in entirity.
e.g. consider a foobar() function that gets new data on each call and uses switch() case on a type of data to changes a node in a tree. The locking can be mostly avoided (if not completely) As each case statement would touch a different node in a tree. This may be a more specific example but i think it elaborates my point.
In the current implementation of CPython, there is an object known as the "GIL" or "Global Interpreter Lock". It is essentially a mutex that prevents two Python threads from executing Python code at the same time. This prevents two threads from being able to corrupt the state of the Python interpreter, but also prevents multiple threads from really executing together. Essentially, if I do this:
# Thread A
some_list.append(3)
# Thread B
some_list.append(4)
I can't corrupt the list, because at any given time, only one of those threads are executing, since they must hold the GIL to do so. Now, the items in the list might be added in some indeterminate order, but the point is that the list isn't corrupted, and two things will always get added.
So, now to C#. C# essentially faces the same problem as Python, so, how does C# prevent this? I'd also be interested in hearing Java's story, if anyone knows it.
Clarification: I'm interested in what happens without explicit locking statements, especially to the VM. I am aware that locking primitives exist for both Java & C# - they exist in Python as well: The GIL is not used for multi-threaded code, other than to keep the interpreter sane. I am interested in the direct equivalent of the above, so, in C#, if I can remember enough... :-)
List<String> s;
// Reference to s is shared by two threads, which both execute this:
s.Add("hello");
// State of s?
// State of the VM? (And if sane, how so?)
Here's another example:
class A
{
public String s;
}
// Thread A & B
some_A.s = some_other_value;
// some_A's state must change: how does it change?
// Is the VM still in good shape afterwards?
I'm not looking to write bad C# code, I understand the lock statements. Even in Python, the GIL doesn't give you magic-multi-threaded code: you must still lock shared resources. But the GIL prevents Python's "VM" from being corrupted - it is this behavior that I'm interested in.
Most other languages that support threading don't have an equivalent of the Python GIL; they require you to use mutexes, either implicitly or explicitly.
Using lock, you would do this:
lock(some_list)
{
some_list.Add(3);
}
and in thread 2:
lock(some_list)
{
some_list.Add(4);
}
The lock statement ensures that the object inside the lock statement, some_list in this case, can only be accessed by a single thread at a time. See http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx for more information.
C# does not have an equivalent of GIL to Python.
Though they face the same issue, their design goals make them
different.
With GIL, CPython ensures that suche operations as appending a list
from two threads is simple. Which also
means that it would allow only one
thread to run at any time. This
makes lists and dictionaries thread safe. Though this makes the job
simpler and intuitive, it makes it
harder to exploit the multithreading
advantage on multicores.
With no GIL, C# does the opposite. It ensures that the burden of integrity is on the developer of the
program but allows you to take
advantage of running multiple threads
simultaneously.
As per one of the discussion -
The GIL in CPython is purely a design choice of having
a big lock vs a lock per object
and synchronisation to make sure that objects are kept in a coherent state.
This consist of a trade off - Giving up the full power of
multithreading.
It has been that most problems do not suffer from this disadvantage
and there are libraries which help you exclusively solve this issue when
required.
That means for a certain class of problems, the burden to utilize the
multicore is
passed to developer so that rest can enjoy the more simpler, intuitive
approach.
Note: Other implementation like IronPython do not have GIL.
It may be instructive to look at the documentation for the Java equivalent of the class you're discussing:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Most complex datastructures(for example lists) can be corrupted when used without locking in multiple threads.
Since changes of references are atomic, a reference always stays a valid reference.
But there is a problem when interacting with security critical code. So any datastructures used by critical code most be one of the following:
Inaccessible from untrusted code, and locked/used correctly by trusted code
Immutable (String class)
Copied before use (valuetype parameters)
Written in trusted code and uses internal locking to guarantee a safe state
For example critical code cannot trust a list accessible from untrusted code. If it gets passed in a List, it has to create a private copy, do it's precondition checks on the copy, and then operate on the copy.
I'm going to take a wild guess at what the question really means...
In Python data structures in the interpreter get corrupted because Python is using a form of reference counting.
Both C# and Java use garbage collection and in fact they do use a global lock when doing a full heap collection.
Data can be marked and moved between "generations" without a lock. But to actually clean it up everything must come to a stop. Hopefully a very short stop, but a full stop.
Here is an interesting link on CLR garbage collection as of 2007:
http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1104.entry
In python the yield keyword can be used in both push and pull contexts, I know how to do the pull context in c# but how would I achieve the push. I post the code I am trying to replicate in c# from python:
def coroutine(func):
def start(*args,**kwargs):
cr = func(*args,**kwargs)
cr.next()
return cr
return start
#coroutine
def grep(pattern):
print "Looking for %s" % pattern
try:
while True:
line = (yield)
if pattern in line:
print line,
except GeneratorExit:
print "Going away. Goodbye"
If what you want is an "observable collection" -- that is, a collection which pushes results at you rather than letting the consumer pull them -- then you probably want to look into the Reactive Framework extensions. Here's an article on it:
http://www.infoq.com/news/2009/07/Reactive-Framework-LINQ-Events
Now, as you note, you can build both "push" and "pull" style iterators easily if you have coroutines available. (Or, as Thomas points out, you can build them with continuations as well.) In the current version of C# we do not have true coroutines (or continuations). However, we are very concerned about the pain users feel around asynchronous programming.
Implementing fiber-based coroutines as a first-class language feature is one technique that could possibly be used to make asynchronous programming easier, but that is just one possible idea of many that we are at present researching. If you have a really solid awesome scenario where coroutines do a better job than anything else -- including the reactive framework -- then I'd love to hear more about it. The more realistic data we have about what real problems people are facing in asynchronous programming, the more likely we are to come up with a good solution. Thanks!
UPDATE: We have recently announced that we are adding coroutine-like asynchronous control flows to the next version of C# and VB. You can try it yourself with our Community Technology Preview edition, which you can download here.
C# does not have general co-routines. A general co-routine is where the co-routine has its own stack, i.e. it can invoke other methods and those methods can "yield" values. Implementation of general co-routines requires making some smart things with stacks, possibly up to and including allocating stack frames (the hidden structures which contain local variables) on the heap. This can be done, some languages do that (e.g. Scheme), but it is somewhat tricky to do it right. Also, many programmers find the feature difficult to understand.
General co-routines can be emulated with threads. Each thread has its own stack. In a co-routine setup, both threads (the initial caller, and the thread for the co-routine) will alternate control, they will never actually run simultaneously. The "yield" mechanism is then an exchange between the two threads, and as such it is expensive (synchronization, a roundtrip through the OS kernel and scheduler...). Also, there is much room for memory leaks (the co-routine must be explicitly "stopped", otherwise the waiting thread will stick forever). Thus, this is rarely done.
C# provides a bastardized-down co-routine feature called iterators. The C# compiler automatically converts the iterator code into a specific state class, with local variables becoming class fields. Yielding is then, at the VM level, a plain return. Such a thing is doable as long as the "yield" is performed from the iterator code itself, not from a method which the iterator code invokes. C# iterators already cover many use cases and the C# designers were unwilling to go further down the road to continuations. Some sarcastic people are keen to state that implementing full-featured continuations would have prevented C# from being as efficient as its arch-enemy Java (efficient continuations are feasible, but this requires quite some work with the GC and the JIT compiler).
Maybe this will help.
http://blogs.msdn.com/ericlippert/archive/2009/07/23/iterator-blocks-part-five-push-vs-pull.aspx
thanks #NickLarsen, you helped me remember the new stuff that MS have introduced, the IObservable interface.
link http://msdn.microsoft.com/en-us/library/dd783449(VS.100).aspx
Actually .NET does not make "incorrect assumptions" about thread affinity, in fact it totally decouples the notion of a .NET level thread from the OS level thread.
What you have to do is associate a logical .NET thread state with your fiber ( for that you need the CLR Hosting API's but you don'T need to write a host yourself you can use those needed from your own application directly ) and everything, lock tracking, exception handling works normally again.
An example can be found here: http://msdn.microsoft.com/en-us/magazine/cc164086.aspx
Btw Mono 2.6 contains low level Coroutine support and can be used to implement all higher level primitives easily.
I would love to see a fiber-based API for .Net.
I attempted to use the native fiber API in C# through p/invoke a while back, but because the runtime's exception handling (incorrectly) makes thread-based assumptions, things broke (badly) when exceptions happened.
One "killer app" for a fiber-based coroutine API is game programming; certain types of AI require a "lightweight" thread that you can time-slice at will. For example, game behavior trees require the ability to "pulse" the decision code every frame, allowing the AI code to cooperatively yield back to the caller when the decision slice is up. This is possible to implement with hard threads, but much, much more complicated.
So while true fiber use-cases are not mainstream, they definitely exist, and a small niche of us .Net coders would cheer mightily if the existing bugs in the fiber subsystem were worked out.
Well, i gave a try developing a full library to manage coroutines with only a single thread. The hard part was to call coroutines inside coroutines...and to return parameters, but finally i reached a pretty good result here. The only warning are that blocking I/O operations must be made through tasks and alll "return" must be replaced with "yield return".
With the application server based on this library i was able to nearly double the requests made with a standard async/await based on IIS. (Seek for Node.Cs and Node.Cs.Musicstore on github to try it at home)