(Edit: to clarify, my main goal is concurrency, but not necessarily for multi-core machines)
I'm fairly new to all concepts on concurrency, but I figured out I needed to have parallel drawing routines, for a number of reasons:
I wanted to draw different portions of a graphic separatedly (background refreshed less often than foreground, kept on a buffer).
I wanted control about priority (More priority to UI responsiveness than drawing a complex graph).
I wanted to have per-frame drawing calculations multithreaded.
I wanted to offer cancelling for complex on-buffer drawing routines.
However, being such a beginner, my code soon looked like a mess and refactoring or bug-fixing became so awkward that I decided I need to play more with it before doing anything serious.
So, I'd like to know how to make clean, easy to mantain .NET multithreaded code that makes sense when I look at it after waking up the next day. The bigest issue I had was structuring the application so all parts talk to each other in a smart (as opposed to awkward and hacky) way.
Any suggestion is welcome, but I have a preference for sources that I can digest in my free time (e.g., not a 500+ pages treatise on concurrency) and for C#/VB.NET, up to the latest version (since I see there have been advances). Basically I want something straight to the point so I can get started by playing with the concepts on my toy projects.
but I figured out I needed to have
parallel drawing routines
Three words: NOT UNDER WINDOWS.
Simple like that. Standard windows drawing is single threaded per definition, for compatibility reasons. Any UI control (let's stick to the .NET world) shall ONLY be manipulated from it's creational thread (so in reality it is more brutal than single threaded - it is ONE SPECIFIC THREAD ONLY).
You can do the precalculation separately, but the real drawing has t obe done from that one thread.
UNLESS you allocate a bitmap, have your own drawing there, and then turn that over to the UI thread for painting onto the window.
This has nothing to do with the whole Task Parallel Library etc. (which I downvoted) but goes back town to a very old requirement that is kept around for simplicity reason AND compatibility. This is the the reason any UI thread is to be market as sintgle threaded appartement.
Also note that multi threaded drawing, if you implement it yourself, has serious implications. Which one wins optically (stays in the foreground)? This is no really determinable when using multi threaded. You are free to try it, though.
In this case:
Having your own buffer and synchronization is a must. Stay away from any windows level graphics library (WPF or Winforms) except for the last step (rawing your bitmap).
DirectX 11 supposedly has some support for multi thread calls, but I am unsure how far that goes.
The Task Parallel Library is definitely the place to look for simplifying your code. I've personally written a (semi-long) introduction to Parallelism with .NET 4 that covers quite a few concepts that would be useful.
Be aware, however, that you probably will want to consider leaving your drawing single threaded. You should try to keep the computation multithreaded, and the actual drawing operations done on the GUI thread.
Most drawing APIs require all actual drawing calls to happen on the same synchronization context.
That being said, using the new collection classes like ConcurrentQueue simplify this type of code. Try to think in terms of lots of threads (producers) adding "drawing operations" to a shared, concurrent queue - and one thread (the consumer) grabbing the operations and performing them.
This gives you a reasonably scalable, but fairly simple design on which you can build.
Related
I like to write an application that opens many sockets and files. Think of it as webserver (which is not true in my case, but to simplify the problem here).
If I would write it in C on Unix I would use poll/select and be quite efficient and because I don't have multiple threads, everything is easy to write, while being very efficient.
If I use multiple threads to use all cores of the CPU (given that I don't wanna use processes) I would use Unix FIFOs to transfer messages and use still poll/select on each thread (which works flawlessly with files/socket/fifos/). Things are still very simple while being quite efficient.
But when using C# it looks like there are different selects and most classes don't support that programming style at all (HttpWebListener just as one example). I don't like the BeginInvoke messiness because there are things happening in the background on which I don't have any control (ThreadPooling, Shutting down a blocking server gracefully, ...).
I wonder if there is any select/poll alike framework available for C#?
You can actually use your same approaches in C# - you just need to use the lower level Socket class, which provides Select and Poll.
That being said, the new asynchronous methods built on top of socket in the higher level classes tend to have many advantages. Once you learn and understand how they function, they can be very efficient and quite a bit nicer to develop against.
This extends all the way up the stack - with the "highest level" abstractions being frameworks like WCF, which provide huge benefits in terms of productivity, reliability, safety, and ease of development for many types of applications.
BeginInvoke (or Tasks based on the Begin/End pattern) are the standard model of async programming on .NET. They indeed force the continuation callbacks to run on the thread-pool. If you are fine with that the Begin/End model is actually very efficient and nice (as nice as callback-based code can be...).
Of the top of my head I cannot see a compelling reason why I wouldn't want to use the thread-pool for completion callbacks. Maybe you can squeeze out a little more efficiency using IOCPs.
Select/poll certainly isn't the way to become more efficient. Although .NET sockets support it.
You said
Shutting down a blocking server gracefully
would be a problem. I don't see why. Can you elaborate?
There are a lot of articles and discussions explaining why it is good to build thread-safe classes. It is said that if multiple threads access e.g. a field at the same time, there can only be some bad consequences. So, what is the point of keeping non thread-safe code? I'm focusing mostly on .NET, but I believe the main reasons are not language-dependent.
E.g. .NET static fields are not thread-safe. What would be the result if they were thread-safe by default? (without a need to perform "manual" locking). What are the benefits of using (actually defaulting to) non-thread-safety?
One thing that comes to my mind is performance (more of a guess, though). It's rather intuitive that, when a function or field doesn't need to be thread-safe, it shouldn't be. However, the question is: what for? Is thread-safety just an additional amount of code you always need to implement? In what scenarios can I be 100% sure that e.g. a field won't be used by two threads at once?
Writing thread-safe code:
Requires more skilled developers
Is harder and consumes more coding efforts
Is harder to test and debug
Usually has bigger performance cost
But! Thread-safe code is not always needed. If you can be sure that some piece of code will be accessed by only one thread the list above becomes huge and unnecessary overhead. It is like renting a van when going to neighbor city when there are two of you and not much luggage.
Thread safety comes with costs - you need to lock fields that might cause problems if accessed simultaneously.
In applications that have no use of threads, but need high performance when every cpu cycle counts, there is no reason to have safe-thread classes.
So, what is the point of keeping non thread-safe code?
Cost. Like you assumed, there usually is a penalty in performance.
Also, writing thread-safe code is more difficult and time consuming.
Thread safety is not a "yes" or "no" proposition. The meaning of "thread safety" depends upon context; does it mean "concurrent-read safe, concurrent write unsafe"? Does it mean that the application just might return stale data instead of crashing? There are many things that it can mean.
The main reason not to make a class "thread safe" is the cost. If the type won't be accessed by multiple threads, there's no advantage to putting in the work and increase the maintenance cost.
Writing threadsafe code is painfully difficult at times. For example, simple lazy loading requires two checks for '== null' and a lock. It's really easy to screw up.
[EDIT]
I didn't mean to suggest that threaded lazy loading was particularly difficult, it's the "Oh and I didn't remember to lock that first!" moments that come fast and hard once you think you're done with the locking that are really the challenge.
There are situations where "thread-safe" doesn't make sense. This consideration is in addition to the higher developer skill and increased time (development, testing, and runtime all take hits).
For example, List<T> is a commonly-used non-thread-safe class. If we were to create a thread-safe equivalent, how would we implement GetEnumerator? Hint: there is no good solution.
Turn this question on its head.
In the early days of programming there was no Thread-Safe code because there was no concept of threads. A program started, then proceeded step by step to the end. Events? What's that? Threads? Huh?
As hardware became more powerful, concepts of what types of problems could be solved with software became more imaginative and developers more ambitious, the software infrastructure became more sophisticated. It also became much more top-heavy. And here we are today, with a sophisticated, powerful, and in some cases unnecessarily top-heavy software ecosystem which includes threads and "thread-safety".
I realize the question is aimed more at application developers than, say, firmware developers, but looking at the whole forest does offer insights into how that one tree evolved.
So, what is the point of keeping non thread-safe code?
By allowing for code that isn't thread safe you're leaving it up to the programmer to decide what the correct level of isolation is.
As others have mentioned this allows for complexity reduction and improved performance.
Rico Mariani wrote two articles entitled "Putting your synchronization at the correct level" and
Putting your synchronization at the correct level -- solution that have a nice example of this in action.
In the article he has a method called DoWork(). In it he calls other classes Read twice Write twice and then LogToSteam.
Read, Write, and LogToSteam all shared a lock and were thread safe. This is good except for the fact that because DoWork was also thread safe all the synchronizing work in each Read, Write and LogToSteam was a complete waste of time.
This is all related to the nature Imperative Programming. Its side effects cause the need for this.
However if you had an development platform where applications could be expressed as pure functions where there were no dependencies or side effects then it would be possible to create applications where the threading was managed without developer intervention.
So, what is the point of keeping non thread-safe code?
The rule of thumb is to avoid locking as much as possible. The Ideal code is re-entrant and thread safe with out any locking. But that would be utopia.
Coming back to reality, a good programmer tries his level best to have a sectional locking as opposed to locking the entire context. An example would be to lock few lines of code at a time in various routines than locking everything in a function.
So Also, one has to refactor the code to come up with a design that would minimize the locking if not get rid of it in entirity.
e.g. consider a foobar() function that gets new data on each call and uses switch() case on a type of data to changes a node in a tree. The locking can be mostly avoided (if not completely) As each case statement would touch a different node in a tree. This may be a more specific example but i think it elaborates my point.
Can C# be used for developing a real-time application that involves taking input from web cam continuously and processing the input?
You cannot use any main stream garbage collected language for “hard real-time systems”, as the garbage collect will sometimes stop the system responding in a defined time. Avoiding allocating object can help, however you need a way to prove you are not creating any garbage and that the garbage collector will not kick in.
However most “real time” systems don’t in fact need to always respond within a hard time limit, so it all comes down do what you mean by “real time”.
Even when parts of the system needs to be “hard real time” often other large parts of the system like the UI don’t.
(I think your app needs to be fast rather than “real time”, if 1 frame is lost every 100 years how many people will get killed?)
I've used C# to create multiple realtime, high speed, machine vision applications that run 24/7 and have moving machinery dependent on the application. If something goes wrong in the software, something immediately and visibly goes wrong in the real world.
I've found that C#/.Net provide pretty good functionality for doing so. As others have said, definitely stay on top of garbage collection. Break up to processing into several logical steps, and have separate threads working each. I've found the Producer Consumer programming model to work well for this, perhaps ConcurrentQueue for starters.
You could start with something like:
Thread 1 captures the camera image, converts it to some format, and puts it into an ImageQueue
Thread 2 consumes from the ImageQueue, processing the image and comes up with a data object that is put onto a ProcessedQueue
Thread 3 consumes from the ProcessedQueue and does something interesting with the results.
If Thread 2 takes too long, Threads 1 and 3 are still chugging along. If you have a multicore processor you'll be throwing more hardware at the math. You could also use several threads in place of any thread that I wrote above, although you'd have to take care of ordering the results manually.
Edit
After reading other peoples answers, you could probably argue my definition of "realtime". In my case, the computer produces targets that it sends to motion controllers which do the actual realtime motion. The motion controllers provide their own safety layers for things like timing, max/min ranges, smooth accel/decelerations and safety sensors. These controllers read sensors across an entire factory with a cycle time of less than 1ms.
Absolutely. The key will be to avoid garbage collection and memory management as much as possible. Try to avoid new-ing objects as much as possible, using buffers or object pools when you can.
Of course, someone has even developed a library to do that: AForge.NET
As with any real-time application and not just C#, you'll have to manage the buffers well as #David suggested.
Not only that, there're also the XNA Framework (for things like 3D games) and you can program DirectX using C# as well which are very real-time.
And did you know that, if you want, you can do pointer manipulations in C# too?
It depends on how 'real-time' it needs to be; ie, what your timing constraints are, and how quickly you need to 'do something'.
If you can handle 'doing something' maybe every 300ms or so in .NET, say on a timer event, I've found Windows to work okay. Note that this is something I found true on multiple systems of different ages and different speeds. As always, YMMV.
But that number is awfully long for a lot of applications. Maybe not for yours.
Do some research, make sure your app responds quickly enough for your application.
Learning about threading is fascinating no doubt and there are some really good resources to do that. But, my question is threading applied explicitly either as part of design or development in real-world applications.
I have worked on some extensively used and well-architected .NET apps in C# but found no trace of explicit usage.Is there no real need due to this being managed by CLR or is there any specific reason?
Also, any example of threading coded in widely used .NET apps. in Codelplex or Gooogle Code are also welcome.
The simplest place to use threading is performing a long operation in a GUI while keeping the UI responsive.
If you perform the operation on the UI thread, the entire GUI will freeze until it finishes. (Because it won't run a message loop)
By executing it on a background thread, the UI will remain responsive.
The BackgroundWorker class is very useful here.
is threading applied explicitly either as part of design or development in real-world applications.
In order to take full advantage of modern, multi-core systems, threading must be part of the design from the start. While it's fairly easy (especially in .NET 4) to find small portions of code to thread, to get real scalability, you need to design your algorithms to handle being threaded, preferably at a "high level" in your code. The earlier this is done in the design phases, the easier it is to properly build threading into an application.
Is there no real need due to this being managed by CLR or is there any specific reason?
There is definitely a need. Threading doesn't come for free - it must be added in by the developer. The main reason this isn't found very often, especially in open source code, is really more a matter of difficulty. Even using .NET 4, properly designing algorithms to thread in a scalable, safe manner is difficult.
That entirely depends on the application.
For a client app that ever needs to do any significant work (or perform other potentially long-running tasks, such as making web service calls) I'd expect background threads to be used. This could be achieved via BackgroundWorker, explicit use of the thread pool, explicit use of Parallel Extensions, or creating new threads explicitly.
Web services and web applications are somewhat less likely to create their own threads, in my experience. You're more likely to effectively treat each request as having a separate thread (even if ASP.NET moves it around internally) and perform everything synchronously. Of course there are web applications which either execute asynchronously or start threads for other reasons - but I'd say this comes up less often than in client apps.
Definitely a +1 on the Parallel Extensions to .NET. Microsoft has done some great work here to improve the ThreadPool. You used to have one global queue which handled all tasks, even if they were spawned from a worker thread. Now they have a lock-free global queue and local queues for each worker thread. That's a very nice improvement.
I'm not as big a fan of things like Parallel.For, Parallel.Foreach, and Parallel.Invoke (regions), as I believe they should be pure language extensions rather than class libraries. Obviously, I understand why we have this intermediate step, but it's inevitable for C# to gain language improvements for concurrency and it's equally inevitable that we'll have to go back and change our code to take advantage of it :-)
Overall, if you're looking at building concurrent apps in .NET, you owe it to yourself to research the heck out of the Parallel Extensions. I also think, given that this is a pretty nascent effort from Microsoft, you should be very vocal about what works for you and what doesn't, independent of what you perceive your own skill level to be with concurrency. Microsoft is definitely listening, but I don't think there are that many people yet using the Parallel Extensions. I was at VSLive Redmond yesterday and watched a session on this topic and continue to be impressed with the team working on this.
Disclosure: I used to be the Marketing Director for Visual Studio and am now at a startup called Corensic where we're building tools to detect bugs in concurrent apps.
Most real-world usages of threading I've seen is to simply avoid blocking - UI, network, database calls, etc.
You might see it in use as BeginXXX and EndXXX method pairs, delegate.BeginInvoke calls, Control.Invoke calls.
Some systems I've seen, where threading would be a boon, actually use the isolation principle to achieve multiple "threads", in other words, split the work down into completely unrelated chunks and process them all independently of each other - "multi-threading" (or many-core utilisation) is automagically achieved by simply running all the processes at once.
I think it's fair to say you find a lot of stock-and-trade applications (data presentation) largely do not require massive parallisation, nor are they always able to be architected to be suitable for it. The examples I've seen are all very specific problems. This may attribute to why you've not seen any noticable implementations of it.
The question of whether to make use of an explicit threading implementation is normally a design consideration as others have mentioned here. Trying to implement concurrency as an afterthought usually requires a lot of radical and wholesale changes.
Keep in mind that simply throwing threads into an application doesn't inherently increase performance or speed, given that there is a cost in managing each thread, and also perhaps some memory overhead (not to mention, debugging it can be fun).
From my experience, the most common place to implement a threading design has been in Windows Services (background applications) and on applications which have had use case scenarios where a volume of work could be easily split up into smaller parcels of work (and handed off to threads to complete asynchronously).
As for examples, you could check out the Microsoft Robotics Studio (as far as I know there's a free version now) - it comes with an redistributable (I can't find it as a standalone download) of the Concurrency and Coordination Runtime, there's some coverage of it on Microsoft's Channel 9.
As mentioned by others the Parallel Extensions team (blog is here) have done some great work with thread safety and parallel execution and you can find some samples/examples on the MSDN Code site.
Threading is used in all sorts of scenarios, anything network based depends on threading, whether explicit (sockets stuff) or implicit (web services). Threading keeps UI responsive. And windows services having multiple parallel runs doing the same things in processing data working through queues that need to be processed.
Those are just the most common ones I've seen.
Most answers reference long-running tasks in a GUI application. Another very common usage scenario in my experience is Producer/Consumer queues. We have many utility applications that have to perform web requests etc. often to large number of endpoints. We use producer/consumer threading pattern (usually by integrating a custom thread pool) to allow high parallelization of these tasks.
In fact, at this very moment I am checking up on an application that uploads a 200MB file to 200 different FTP locations. We use SmartThreadPool and run up to around 50 uploads in parallel, which allows the whole batch to complete in under one hour (as opposed to over 50 hours were it all uploads to happen consecutively - so in our usage we find almost straight linear improvements in time).
As modern day programmers we love abstractions so we use threads by calling Async methods or BeginInvoke and by using things like BackgroundWorker or PFX in .Net 4.
Yet sometimes there is a need to do the threading yourself. For Example in a web app I built I have a mail queue that I add to from within the app and there is a background thread that sends the emails. If the thread notices that the queue is filling up faster that it is sending it creates another thread if it then sees that that thread is idle it kills it. This can be done with a higher level abstraction I guess but i did it manually.
I can't resist the edge case - in some applications where either a high degree of operational certainty must be achieved or a high degree of operational uncertainty must be tolerated, then threads and processes are considered from initial architecture design all the way through end delivery
Case 1 - for systems that must achieve extremely high levels of operational reliability, three completely separate subsystems using three different mechanisms may be used in a voting architecture - Spawn 3 threads/proceses across each of the voters, wait for them to conclude/die/be killed, and proceed IFF they all say the same thing - example - complex avionic susystems
Case 2 - for systems that must deal with a high degree of operational uncertainty - do the same thing, but once something/anything gets back to you, kill off the stragglers and go forth with the best answer you got - example - complex intraday trading algorithms endeavoring to destroy the business that employ them :-)
I'm writing a book on multicore programming using .NET 4 and I'm curious to know what parts of multicore programming people have found difficult to grok or anticipate being difficult to grok?
What's a useful unit of work to parallelize, and how do I find/organize one?
All these parallelism primitives aren't helpful if you fork a piece of work that is smaller than the forking overhead; in fact, that buys you a nice slowdown instead of what you are expecting.
So one of the big problems is finding units of work that are obviously more expensive than the parallelism primitives. A key problem here is that nobody knows what anything costs to execute, including the parallelism primitives themselves. Clearly calibrating these costs would be very helpful. (As an aside, we designed, implemented, and daily use a parallel programming langauge, PARLANSE whose objective was to minimize the cost of the parallelism primitives by allowing the compiler to generate and optimize them, with the goal of making smaller bits of work "more parallelizable").
One might also consider discussion big-Oh notation and its applications. We all hope that the parallelism primitives have cost O(1). If that's the case, then if you find work with cost O(x) > O(1) then that work is a good candidate for parallelization. If your proposed work is also O(1), then whether it is effective or not depends on the constant factors and we are back to calibration as above.
There's the problem of collecting work into large enough units, if none of the pieces are large enough. Code motion, algorithm replacement, ... are all useful ideas to achieve this effect.
Lastly, there's the problem of synchnonization: when do my parallel units have to interact, what primitives should I use, and how much do those primitives cost? (More than you expect!).
I guess some of it depends on how basic or advanced the book/audience is. When you go from single-threaded to multi-threaded programming for the first time, you typically fall off a huge cliff (and many never recover, see e.g. all the muddled questions about Control.Invoke).
Anyway, to add some thoughts that are less about the programming itself, and more about the other related tasks in the software process:
Measuring: deciding what metric you are aiming to improve, measuring it correctly (it is so easy to accidentally measure the wrong thing), using the right tools, differentiating signal versus noise, interpreting the results and understanding why they are as they are.
Testing: how to write tests that tolerate unimportant non-determinism/interleavings, but still pin down correct program behavior.
Debugging: tools, strategies, when "hard to debug" implies feedback to improve your code/design and better partition mutable state, etc.
Physical versus logical thread affinity: understanding the GUI thread, understanding how e.g. an F# MailboxProcessor/agent can encapsulate mutable state and run on multiple threads but always with only a single logical thread (one program counter).
Patterns (and when they apply): fork-join, map-reduce, producer-consumer, ...
I expect that there will be a large audience for e.g. "help, I've got a single-threaded app with 12% CPU utilization, and I want to learn just enough to make it go 4x faster without much work" and a smaller audience for e.g. "my app is scaling sub-linearly as we add cores because there seems to be contention here, is there a better approach to use?", and so a bit of the challenge may be serving each of those audiences.
Since you write a whole book for multi-core programming in .Net.
I think you can also go beyond multi-core a little bit.
For example, you can use a chapter talking about parallel computing in a distributed system in .Net. Unlikely, there is no mature frameworks in .Net yet. DryadLinq is the closest. (On the other side, Hadoop and its friends in Java platform are really good.)
You can also use a chapter demonstrating some GPU computing stuff.
One thing that has tripped me up is which approach to use to solve a particular type of problem. There's agents, there's tasks, async computations, MPI for distribution - for many problems you could use multiple methods but I'm having difficulty understanding why I should use one over another.
To understand: low level memory details like the difference between acquire and release semantics of memory.
Most of the rest of the concepts and ideas (anything can interleave, race conditions, ...) are not that difficult with a little usage.
Of course the practice, especially if something is failing sometimes, is very hard as you need to work at multiple levels of abstraction to understand what is going on, so keep your design simple and as far as possible design out the need for locking etc. (e.g. using immutable data and higher level abstractions).
Its not so much theoretical details, but more the practical implementation details which trips people up.
What's the deal with immutable data structures?
All the time, people try to update a data structure from multiple threads, find it too hard, and someone chimes in "use immutable data structures!", and so our persistent coder writes this:
ImmutableSet set;
ThreadLoop1()
foreach(Customer c in dataStore1)
set = set.Add(ProcessCustomer(c));
ThreadLoop2()
foreach(Customer c in dataStore2)
set = set.Add(ProcessCustomer(c));
Coder has heard all their lives that immutable data structures can be updated without locking, but the new code doesn't work for obvious reasons.
Even if your targeting academics and experienced devs, a little primer on the basics of immutable programming idioms can't hurt.
How to partition roughly equal amounts of work between threads?
Getting this step right is hard. Sometimes you break up a single process into 10,000 steps which can be executed in parallel, but not all steps take the same amount of time. If you split the work on 4 threads, and the first 3 threads finish in 1 second, and the last thread takes 60 seconds, your multithreaded program isn't much better than the single-threaded version, right?
So how do you partition problems with roughly equal amounts of work between all threads? Lots of good heuristics on solving bin packing problems should be relevant here..
How many threads?
If your problem is nicely parallelizable, adding more threads should make it faster, right? Well not really, lots of things to consider here:
Even a single core processor, adding more threads can make a program faster because more threads gives more opportunities for the OS to schedule your thread, so it gets more execution time than the single-threaded program. But with the law of diminishing returns, adding more threads increasing context-switching, so at a certain point, even if your program has the most execution time the performance could still be worse than the single-threaded version.
So how do you spin off just enough threads to minimize execution time?
And if there are lots of other apps spinning up threads and competing for resources, how do you detect performance changes and adjust your program automagically?
I find the conceptions of synchronized data moving across worker nodes in complex patterns very hard to visualize and program.
Usually I find debugging to be a bear, also.