C# Threading in real-world apps - c#

Learning about threading is fascinating no doubt and there are some really good resources to do that. But, my question is threading applied explicitly either as part of design or development in real-world applications.
I have worked on some extensively used and well-architected .NET apps in C# but found no trace of explicit usage.Is there no real need due to this being managed by CLR or is there any specific reason?
Also, any example of threading coded in widely used .NET apps. in Codelplex or Gooogle Code are also welcome.

The simplest place to use threading is performing a long operation in a GUI while keeping the UI responsive.
If you perform the operation on the UI thread, the entire GUI will freeze until it finishes. (Because it won't run a message loop)
By executing it on a background thread, the UI will remain responsive.
The BackgroundWorker class is very useful here.

is threading applied explicitly either as part of design or development in real-world applications.
In order to take full advantage of modern, multi-core systems, threading must be part of the design from the start. While it's fairly easy (especially in .NET 4) to find small portions of code to thread, to get real scalability, you need to design your algorithms to handle being threaded, preferably at a "high level" in your code. The earlier this is done in the design phases, the easier it is to properly build threading into an application.
Is there no real need due to this being managed by CLR or is there any specific reason?
There is definitely a need. Threading doesn't come for free - it must be added in by the developer. The main reason this isn't found very often, especially in open source code, is really more a matter of difficulty. Even using .NET 4, properly designing algorithms to thread in a scalable, safe manner is difficult.

That entirely depends on the application.
For a client app that ever needs to do any significant work (or perform other potentially long-running tasks, such as making web service calls) I'd expect background threads to be used. This could be achieved via BackgroundWorker, explicit use of the thread pool, explicit use of Parallel Extensions, or creating new threads explicitly.
Web services and web applications are somewhat less likely to create their own threads, in my experience. You're more likely to effectively treat each request as having a separate thread (even if ASP.NET moves it around internally) and perform everything synchronously. Of course there are web applications which either execute asynchronously or start threads for other reasons - but I'd say this comes up less often than in client apps.

Definitely a +1 on the Parallel Extensions to .NET. Microsoft has done some great work here to improve the ThreadPool. You used to have one global queue which handled all tasks, even if they were spawned from a worker thread. Now they have a lock-free global queue and local queues for each worker thread. That's a very nice improvement.
I'm not as big a fan of things like Parallel.For, Parallel.Foreach, and Parallel.Invoke (regions), as I believe they should be pure language extensions rather than class libraries. Obviously, I understand why we have this intermediate step, but it's inevitable for C# to gain language improvements for concurrency and it's equally inevitable that we'll have to go back and change our code to take advantage of it :-)
Overall, if you're looking at building concurrent apps in .NET, you owe it to yourself to research the heck out of the Parallel Extensions. I also think, given that this is a pretty nascent effort from Microsoft, you should be very vocal about what works for you and what doesn't, independent of what you perceive your own skill level to be with concurrency. Microsoft is definitely listening, but I don't think there are that many people yet using the Parallel Extensions. I was at VSLive Redmond yesterday and watched a session on this topic and continue to be impressed with the team working on this.
Disclosure: I used to be the Marketing Director for Visual Studio and am now at a startup called Corensic where we're building tools to detect bugs in concurrent apps.

Most real-world usages of threading I've seen is to simply avoid blocking - UI, network, database calls, etc.
You might see it in use as BeginXXX and EndXXX method pairs, delegate.BeginInvoke calls, Control.Invoke calls.
Some systems I've seen, where threading would be a boon, actually use the isolation principle to achieve multiple "threads", in other words, split the work down into completely unrelated chunks and process them all independently of each other - "multi-threading" (or many-core utilisation) is automagically achieved by simply running all the processes at once.
I think it's fair to say you find a lot of stock-and-trade applications (data presentation) largely do not require massive parallisation, nor are they always able to be architected to be suitable for it. The examples I've seen are all very specific problems. This may attribute to why you've not seen any noticable implementations of it.

The question of whether to make use of an explicit threading implementation is normally a design consideration as others have mentioned here. Trying to implement concurrency as an afterthought usually requires a lot of radical and wholesale changes.
Keep in mind that simply throwing threads into an application doesn't inherently increase performance or speed, given that there is a cost in managing each thread, and also perhaps some memory overhead (not to mention, debugging it can be fun).
From my experience, the most common place to implement a threading design has been in Windows Services (background applications) and on applications which have had use case scenarios where a volume of work could be easily split up into smaller parcels of work (and handed off to threads to complete asynchronously).
As for examples, you could check out the Microsoft Robotics Studio (as far as I know there's a free version now) - it comes with an redistributable (I can't find it as a standalone download) of the Concurrency and Coordination Runtime, there's some coverage of it on Microsoft's Channel 9.
As mentioned by others the Parallel Extensions team (blog is here) have done some great work with thread safety and parallel execution and you can find some samples/examples on the MSDN Code site.

Threading is used in all sorts of scenarios, anything network based depends on threading, whether explicit (sockets stuff) or implicit (web services). Threading keeps UI responsive. And windows services having multiple parallel runs doing the same things in processing data working through queues that need to be processed.
Those are just the most common ones I've seen.

Most answers reference long-running tasks in a GUI application. Another very common usage scenario in my experience is Producer/Consumer queues. We have many utility applications that have to perform web requests etc. often to large number of endpoints. We use producer/consumer threading pattern (usually by integrating a custom thread pool) to allow high parallelization of these tasks.
In fact, at this very moment I am checking up on an application that uploads a 200MB file to 200 different FTP locations. We use SmartThreadPool and run up to around 50 uploads in parallel, which allows the whole batch to complete in under one hour (as opposed to over 50 hours were it all uploads to happen consecutively - so in our usage we find almost straight linear improvements in time).

As modern day programmers we love abstractions so we use threads by calling Async methods or BeginInvoke and by using things like BackgroundWorker or PFX in .Net 4.
Yet sometimes there is a need to do the threading yourself. For Example in a web app I built I have a mail queue that I add to from within the app and there is a background thread that sends the emails. If the thread notices that the queue is filling up faster that it is sending it creates another thread if it then sees that that thread is idle it kills it. This can be done with a higher level abstraction I guess but i did it manually.

I can't resist the edge case - in some applications where either a high degree of operational certainty must be achieved or a high degree of operational uncertainty must be tolerated, then threads and processes are considered from initial architecture design all the way through end delivery
Case 1 - for systems that must achieve extremely high levels of operational reliability, three completely separate subsystems using three different mechanisms may be used in a voting architecture - Spawn 3 threads/proceses across each of the voters, wait for them to conclude/die/be killed, and proceed IFF they all say the same thing - example - complex avionic susystems
Case 2 - for systems that must deal with a high degree of operational uncertainty - do the same thing, but once something/anything gets back to you, kill off the stragglers and go forth with the best answer you got - example - complex intraday trading algorithms endeavoring to destroy the business that employ them :-)

Related

select/poll based framework in C#

I like to write an application that opens many sockets and files. Think of it as webserver (which is not true in my case, but to simplify the problem here).
If I would write it in C on Unix I would use poll/select and be quite efficient and because I don't have multiple threads, everything is easy to write, while being very efficient.
If I use multiple threads to use all cores of the CPU (given that I don't wanna use processes) I would use Unix FIFOs to transfer messages and use still poll/select on each thread (which works flawlessly with files/socket/fifos/). Things are still very simple while being quite efficient.
But when using C# it looks like there are different selects and most classes don't support that programming style at all (HttpWebListener just as one example). I don't like the BeginInvoke messiness because there are things happening in the background on which I don't have any control (ThreadPooling, Shutting down a blocking server gracefully, ...).
I wonder if there is any select/poll alike framework available for C#?
You can actually use your same approaches in C# - you just need to use the lower level Socket class, which provides Select and Poll.
That being said, the new asynchronous methods built on top of socket in the higher level classes tend to have many advantages. Once you learn and understand how they function, they can be very efficient and quite a bit nicer to develop against.
This extends all the way up the stack - with the "highest level" abstractions being frameworks like WCF, which provide huge benefits in terms of productivity, reliability, safety, and ease of development for many types of applications.
BeginInvoke (or Tasks based on the Begin/End pattern) are the standard model of async programming on .NET. They indeed force the continuation callbacks to run on the thread-pool. If you are fine with that the Begin/End model is actually very efficient and nice (as nice as callback-based code can be...).
Of the top of my head I cannot see a compelling reason why I wouldn't want to use the thread-pool for completion callbacks. Maybe you can squeeze out a little more efficiency using IOCPs.
Select/poll certainly isn't the way to become more efficient. Although .NET sockets support it.
You said
Shutting down a blocking server gracefully
would be a problem. I don't see why. Can you elaborate?

Best concurrency framework for low latency, high throughput data transfer on single machine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking for ideas how a concurrent framework might be implemented for my specific architecture, using C#:
I implemented several modules/containers (implemented as classes) that are all individually to connect to a message bus. Each module either mainly produces or mainly consumes, but all modules also implement a request/reply pattern for communication between two given modules. I am very new to concurrent and asynchronous programming but essentially want to run the whole architecture in a concurrent way rather than synchronously. I would really appreciate some pointers which technology (TPL, ThreadPool, CTP, open source libraries,..) to consider for my specific use case, given the following requirements:
The whole system only runs on a local machine (in-process, even the message bus)
At least one module performs heavy IO (several million 16byte messages per second reads from physical drive), publishing multiple 16byte chunks to a blocking collection throughout the whole time.
Another modules consumes from the blocking collection throughout the whole time.
The entry point is the producer starting to publish messages, exit when the producer finishes publishing a finite set of 16byte messages.
The only communication that circumvents the message bus is the publishing/consuming to/from the blocking collection for throughput and latency reasons. (Am happy to hear suggestions to get rid of the message bus if it is plausible)
Other modules handle operations such as writing to an SQL database, publishing to a GUI server, connecting to APIs that communicate with outside servers.Such operations run less frequently/throttled and could potentially be run as tasks rather than utilizing a whole thread throughout running the system.
I run on a 64bit, quad core, 16gb memory machine but ideally I would like to implement a solution that can also run on a duo core machine.
Given what I like to manage what concurrency implementation would you suggest I should focus on?
EDIT: I like to emphasize that the biggest problem I am facing is how to conveniently hook up each container/module to a thread/task pool so that each of the modules runs async while still providing full in and out communication between such modules. I am not too concerned with optimizing a single producer/consumer pattern before I have not solved hooking up all the modules to a concurrent platform that can handle the number of tasks/threads involved dynamically.
I found n-act http://code.google.com/p/n-act/ , an Actors framework for .Net which implements pretty much what I am looking for. I described in my question that I look for bigger picture framework suggestions and it looks to me that an Actor Framework solves what I need. I am not saying that the n-act library will be what I implement but it is a neat example of setting up actors that can communicate asynchronously and can run on their own threads. Message passing also supports the new C#5 async/await functionality.
Disruptor was mentioned above and also the TPL and couple other ideas and I appreciate the input, it actually really got me thinking and I spent quite a bit of time to understand what each library/framework attempts to target and what problems it tries to solve, so the input was very fruitful.
For my particular case, however, I think I believe the Actors Framework is exactly what I need because my main concern is the exchange of async data flow. Unfortunately I do not see much of the Actor model implemented in any .Net technology (yet). TPL Dataflow looks very promising but as Weismat pointed out it is not yet production ready.
If N-Act does not prove stable or usable then I will look for a custom implementation through the TPL. It's about time anyway to fully understand all that TPL has to offer and start thinking concurrently already at the design stage rather than trying to transfer synchronous models into an asynchronous framework.
In summary, "Actor Model" was what I was looking for.
I recommend disruptor-net for a task like this, where you have high throughput, low latency, and a well-defined dataflow.
If you're willing to sacrifice some performance for some thread management, TPL Dataflow might work for you. It does a good job of using TPL for task scheduling.
You may look into Concurrency and Coordination_Runtime as well if you are looking for a framework based concurrency solution. I think this might be a fit for your design ideas.
Otherwise I would follow the rule, that threads should be used when something will be running for the whole lifetime of your application and tasks for short-running items.
I believe it is more important that the responsibility for the concurency is clearly defined, so that you might change the framework later.
As usual for writing fast code, there are no rules of thumb, but th need of a lot of testing with small stubs with measuring the actual performance.

C# first class continuation via C++ interop or some other way?

We have a very high performance multitasking, near real-time C# application. This performance was achieved primarily by implementing cooperative multitasking in-house with a home grown scheduler. This is often called micro-threads. In this system all the tasks communicate with other tasks via queues.
The specific problem that we have seems to only be solvable via first class continuations which C# does not support.
Specifically the problem arises in 2 cases dealing with queues. Whenever any particular task performs some work before placing an item on a queue. What if the queue is full?
Conversely, a different task may do some work and then need to take an item off of a queue. What if that queue is empty?
We have solved this in 90% of the cases by linking queues to tasks to avoid tasks getting invoked if any of their outbound queues are full or inbound queue is empty.
Furthermore certain tasks were converted into state machines so they can handle if a queue is full/empty and continue without waiting.
The real problem arises in a few edge cases where it is impractical to do either of those solutions. The idea in that scenario would be to save the stack state at the point and switch to a different task so that it can do the work and subsequently retry the waiting task whenever it is able to continue.
In the past, we attempted to have the waiting task call back into the schedule (recursively) to allow the other tasks to and later retry the waiting task. However, that led to too many "deadlock" situations.
There was an example somewhere of a custom CLR host to make the .NET threads actually operate as "fibers" which essentially allows switching stack state between threads. But now I can't seem to find any sample code for that. Plus it seems that will take some significant complexity to get it right.
Does anyone have any other creative ideas how to switch between tasks efficiently and avoid the above problems?
Are there any other CLR hosts that offer this, commercial or otherwise? Is there any add-on native library that can offer some form of continuations for C#?
There is the C# 5 CTP, which performs a continuation-passing-style transformation over methods declared with the new async keyword, and continuation-passing based calls when using the await keyword.
This is not actually a new CLR feature but rather a set of directives for the compiler to perform the CPS transformation over your code and a handful of library routines for manipulating and scheduling continuations. Activation records for async methods are placed on the heap instead of the stack, so they're not tied to a specific thread.
Nope, not going to work. C# (and even IL) is too complex language to perform such transformations (CPS) in a general way. The best you can get is what C# 5 will offer. That said, you will probably not be able to break/resume with higher order loops/iterations, which is really want you want from general purpose reifiable continuations.
Fiber mode was removed from v2 of the CLR because of issues under stress, see:
Fiber mode is gone...
Fibers and the CLR
Question to the CLR experts : fiber mode support in hosting
To my knowledge fiber support has not yet bee re-added, although from reading the above articles it may be added again (however the fact that nothing has mentioned for 6-7 years on the topic makes me believe that its unlikely).
FYI fiber support was intended to be a way for existing applications that use fibers (such as SQL Server) to host the CLR in a way that allows them to maximise performance, not as a method to allow .Net applications to create hundereds of threads - in short fibers are not a magic bullet solution to your problem, however if you have an application that uses fibers an wishes to host the CLR then the managed hosting APIs do provide the means for the CLR to "work nicely" with your application. A good source of information on this would be the managed hosting API documentation, or to look into how SQL Server hosts the CLR, of which there are several highly informative articles around.
Also take a quick read of Threads, fibers, stacks and address space.
Actually, we decided on a direction to go with this. We're using the Observer pattern with Message Passsing. We built a home grown library to handle all communication between "Agents" which are similar to an Erlang process. Later we will consider using AppDomains to even better separate Agents from each other. Design ideas were borrowed from the Erlang programming language which has extremely reliable mult-core and distributed processing.
The solution to your problem is to use lock-free algorithms allowing for system wide progress of at least one task. You need to use inline assembler that is CPU dependent to make sure that you atomic CAS (compare-and-swap). Wikipedia has an article as well as patterns described the the book by Douglas Schmidt called "Pattern-Oriented Software Architecture, Patterns for Concurrent and Networked Objects". It is not immediately clear to me how you will do that under the dotnet framework.
Other way of solving your problem is using the publish-subscriber pattern or possible thread pools.
Hope this was helpful?

C# - Moving files - to queue or multi-thread

I have an app that moves a project and its files from preview to production using a Flex front-end and a .NET web service. Currently, the process takes about 5-10 mins/per project. Aside from latency concerns, it really shouldn't take that long. I'm wondering whether or not this is a good use-case for multi-threading. Also, considering the user may want to push multiple projects or one right after another, is there a way to queue the jobs.
Any suggestions and examples are greatly appreciated.
Thanks!
Something that does heavy disk IO typically isn't a good candidate for multithreading since the disks can really only do one thing at a time. However, if you're pushing to multiple servers or the servers have particularly good disk subsystems some light threading may be beneficial.
As a note - regardless of whether or not you decide to queue the jobs, you will use multi-threading. Queueing is just one way of handling what is ultimately solved using multi-threading.
And yes, I'd recommend you build a queue to push out each project.
You should compare the speed of your code compared to just copying in Windows (i.e., explorer or command line) vs copying with something advanced like TeraCopy. If your code is significantly slower than Window then look at parts in your code to optimize using a profiler. If your code is about as fast as Windows but slower than TeraCopy, then multithreading could help.
Multithreading is not generally helpful when the operation I/O bound, but copying files involves reading from the disk AND writing over the network. This is two I/O operations, so if you separate them onto different threads, it could increase performance. For something like this you need a producer/consumer setup where you have a Circular queue with one thread reading from disk and writing to the queue, and another thread reading from the queue and writing to the network. It'll be important to keep in mind that the two threads will not run at the same speed, so if the queue gets full, wait before writing more data and if it's empty, wait before writing. Also the locking strategy could have a big impact on performance here and could cause the performance to degrade to slower than a single-threaded implementation.
If you're moving things between just two computers, the network is going to be the bottleneck, so you may want to queue these operations.
Likewise, on the same machine, the I/O is going to be the bottleneck, so you'd want to queue there, too.
You should try using the ThreadPool.
ThreadPool.QueueUserWorkItem(MoveProject, project);
Agreed with everyone over the limited performance of running the tasks in parallel.
If you have full control over your deployment environment, you could use Rhino Queues:
http://ayende.com/Blog/archive/2008/08/01/Rhino-Queues.aspx
This will allow you to produce a queue of jobs asynchronously (say from a WCF service being called from your Silverlight/Flex app) and consume them synchronously.
Alternatively you could use WCF and MSMQ, but the learning curve is greater.
When dealing with multiple files using multiple threads usually IS a good idea in concerns of performance.The main reason is that most disks nowadays support native command queuing.
I wrote an article recently about reading/writing files with multiple files on ddj.com.
See http://www.ddj.com/go-parallel/article/showArticle.jhtml?articleID=220300055.
Also see related question
Will using multiple threads with a RandomAccessFile help performance?
In particular i made the experience that when dealing with very many files it IS a good idea to use a number of threads. In contrary using many thread in many cases does not slow down applications as much as commonly expected.
Having said that i'd say there is no other way to find out than trying all possible different approaches. It depends on very many conditions: Hardware, OS, Drivers etc.
The very first thing you should do is point any kind of profiling tool towards your software. If you can't do that (like, if you haven't got such a tool), insert logging code.
The very first thing you need to do is figure out what is taking a long time to complete, and then why is it taking a long time to complete. That your "copy" operation as a whole takes a long time to complete isn't good enough, you need to pinpoint the reason for this down to a method or a set of methods.
Until you do that, all the other things you can do to your code will likely be guesswork. My experience has taught me that when it comes to performance, 9 out of 10 reasons for things running slow comes as surprises to the guy(s) that wrote the code.
So measure first, then change.
For instance, you might discover that you're in fact reporting progress of copying the file on a byte-per-byte basis, to a GUI, using a synchronous call to the UI, in which case it wouldn't matter how fast the actual copying can run, you'll still be bound by message handling speed.
But that's just conjecture until you know, so measure first, then change.

How to make my code run on multiple cores?

I have built an application in C# that I would like to be optimized for multiple cores. I have some threads, should I do more?
Updated for more detail
C# 2.0
Run on Windows Vista and Windows Server 2003
Updated again
This code is running as a service
I do not want to have the complete code... my goal here is to get your experience and how to start. Like I say, I have already use threads. What more can I do?
I'd generalize that writing a highly optimized multi-threaded process is a lot harder than just throwing some threads in the mix.
I recommend starting with the following steps:
Split up your workloads into discrete parallel executable units
Measure and characterize workload types - Network intensive, I/O intensive, CPU intensive etc - these become the basis for your worker pooling strategies. e.g. you can have pretty large pools of workers for network intensive applications, but it doesn't make sense having more workers than hardware-threads for CPU intensive tasks.
Think about queuing/array or ThreadWorkerPool to manage pools of threads. Former more finegrain controlled than latter.
Learn to prefer async I/O patterns over sync patterns if you can - frees more CPU time to perform other tasks.
Work to eliminate or atleast reduce serialization around contended resources such as disk.
Minimize I/O, acquire and hold minimum level of locks for minimum period possible. (Reader/Writer locks are your friend)
5.Comb through that code to ensure that resources are locked in consistent sequence to minimize deadly embrace.
Test like crazy - race conditions and bugs in multithreaded applications are hellish to troubleshoot - often you only see the forensic aftermath of the massacre.
Bear in mind that it is entirely possible that a multi-threaded version could perform worse than a single-threaded version of the same app. There is no excuse for good engineering measurement.
You might want to take a look at the parallel extensions for .NET
http://msdn.com/concurrency
You might want to read Herb Sutter's column 'Effective Concurrency'. You'll find those articles here, with others.
To be able to utilize multiple cores more efficiently you should divide your work up in parts that can be executed in parallel and use threads to divide the work over the cores. You could use threads, background workers, thread pools, etc
For C#, start learning the LINQ-way of doing things, then make use of the Parallel LINQ library and its .AsParallel() extension.
Understanding the parallelism (or potential for parallelism) in the problem(s) you are trying to solve, your application and its algorithms is much more important than any details of thread synchronization, libraries, etc.
Start by reading Patterns for Parallel Programming (which focuses on 'finding concurrency' and higher-level design issues), and then move on to The Art of Multiprocessor Programming (practical details starting from a theoretical basis).

Categories