is it acceptable to use ThreadPool in a library? - c#

is it acceptable to use ThreadPool in a library?
because that obviously might cause some unpleasant problems if the user of your library is using ThreadPool as well (due to ThreadPool being a static class of course)..
what's the convention?

Yes.I think it is appropriate to make use of the ThreadPool in library code. Even if the user may use ThreadPool outside, ThreadPool is still good enough to tune itself.
On the other hand, as a library developer, you should provide flexibility: user may choose to use ThreadPool, a specific thread(s), or even a 3rd party thread pool implementation.

Yes.
As long as it's well documented, and you provide methods to allow the user of the library to control the threadpool, such as min/max threads and maybe the option to not use a threadpool at all.
You should also make it very clear which exposed parts of your library are threadsafe and which are not.

ThreadPool is designed to be used by multiple components simultaneously. So it in itself presents no particular problem if used from your particular library.
What can be a problem is threading behavior in general in your library. It must be clearly documented what the threading semantics of your library are. How these threads are created and used should should be an implementation detail. The ThreadPool itself shouldn't present a problem unless one of it's inherent properties (COM apartment affinity, inability to cancel threads, etc ...) presents a problem for your API or consumers.

Related

C# Threading - Using a class in a thread-safe way vs. implementing it as thread-safe

Suppose I want to use a non thread-safe class from the .Net Framework (the documentation states that it is not thread-safe). Sometimes I change the value of Property X from one thread, and sometimes from another thread, but I never access it from two threads at the same time. And sometimes I call Method Y from one thread, and sometimes from another thread, but never at the same time.
Is this means that I use the class in a thread-safe way, and the fact that the documentation state that it's not thread-safe
is no longer relevant to my situation?
If the answer is No: Can I do everything related to a specific object in the same thread - i.e, creating it and calling its members always in the same thread (but not the GUI thread)? If so, how do I do that? (If relevant, it's a WPF app).
No, it is not thread safe. As a general rule, you should never write multi threaded code without some kind of synchronization. In your first example, even if you somehow manage to ensure that modifying/reading is never done at the same time, still there is a problem of caching values and instructions reordering.
Just for example, CPU caches values into a register, you update it on one thread, read it from another. If the second one has it cached, it doesn't go to RAM to fetch it and doesn't see the updated value.
Take a look at this great post for more info and problems with writing lock free multi threaded code link. It has a great explanation how CPU, compiler and CLI byte code compiler can reorder instructions.
Suppose I want to use a non thread-safe class from the .Net Framework (the documentation states that it is not thread-safe).
"Thread-safe" has a number of different meanings. Most objects fall into one of three categories:
Thread-affine. These objects can only be accessed from a single thread, never from another thread. Most UI components fall into this category.
Thread-safe. These objects can be accessed from any thread at any time. Most synchronization objects (including concurrent collections) fall into this category.
One-at-a-time. These objects can be accessed from one thread at a time. This is the "default" category, with most .NET types falling into this category.
Sometimes I change the value of Property X from one thread, and sometimes from another thread, but I never access it from two threads at the same time. And sometimes I call Method Y from one thread, and sometimes from another thread, but never at the same time.
As another answerer noted, you have to take into consideration instruction reordering and cached reads. In other words, it's not sufficient to just do these at different times; you'll need to implement proper barriers to ensure it is guaranteed to work correctly.
The easiest way to do this is to protect all access of the object with a lock statement. If all reads, writes, and method calls are all within the same lock, then this would work (assuming the object does have a one-at-a-time kind of threading model and not thread-affine).
Suppose I want to use a non thread-safe class from the .Net Framework (the documentation states that it is not thread-safe). Sometimes I change the value of Property X from one thread, and sometimes from another thread, but I never access it from two threads at the same time. And sometimes I call Method Y from one thread, and sometimes from another thread, but never at the same time.
All Classes are by default non thread safe, except few Collections like Concurrent Collections designed specifically for the thread safety. So for any other class that you may choose and if you access it via multiple threads or in a Non atomic manner, whether read / write then it's imperative to introduce thread safety while changing the state of an object. This only applies to the objects whose state can be modified in a multi-threaded environment but Methods as such are just functional implementation, they are themselves not a state, which can be modified, they just introduce thread safety for maintaining the object state.
Is this means that I use the class in a thread-safe way, and the fact that the documentation state that it's not thread-safe is no longer relevant to my situation? If the answer is No: Can I do everything related to a class in the same thread (but not the GUI thread)? If so, how do I do that? (If relevant, it's a WPF app).
For a Ui application, consider introducing Async-Await for IO based operations, like file read, database read and use TPL for compute bound operations. Benefit of Async-Await is that:
It doesn't block the Ui thread at all, and keeps Ui completely responsive, in fact post await Ui controls can be directly updated with no Cross thread concern, since only one thread is involved
The TPL concurrency too makes compute operations blocking, they summon the threads from the thread Pool and can't be used for the Ui update due to Cross thread concern
And last: there are classes in which one method starts an operation, and another one ends it. For example, using the SpeechRecognitionEngine class you can start a speech recognition session with RecognizeAsync (this method was before the TPL library so it does not return a Task), and then cancel the recognition session with RecognizeAsyncCancel. What if I call RecognizeAsync from one thread and RecognizeAsyncCancel from another one? (It works, but is it "safe"? Will it fail on some conditions which I'm not aware of?)
As you have mentioned the Async method, this might be an older implementation, based on APM, which needs AsyncCallBack to coordinate, something on the lines of BeginXX, EndXX, if that's the case, then nothing much would be required to co-ordinate, as they use AsyncCallBack to execute a callback delegate. In fact as mentioned earlier, there's no extra thread involved here, whether its old version or new Async-Await. Regarding task cancellation, CancellationTokenSource can be used for the Async-Await, a separate cancellation task is not required. Between multiple threads coordination can be done via Auto / Manual ResetEvent.
If the calls mentioned above are synchronous, then use the Task wrapper to return the Task can call them via Async method as follows:
await Task.Run(() => RecognizeAsync())
Though its a sort of Anti-Pattern, but can be useful in making whole call chain Async
Edits (to answer OP questions)
Thanks for your detailed answer, but I didn't understand some of it. At the first point you are saying that "it's imperative to introduce thread safety", but how?
Thread safety is introduced using synchronization constructs like lock, mutex, semaphore, monitor, Interlocked, all of them serve the purpose of saving an object from getting corrupt / race condition. I don't see any steps.
Does the steps I have taken, as described in my post, are enough?
I don't see any thread safety steps in your post, please highlight which steps you are talking about
At the second point I'm asking how to use an object in the same thread all the time (whenever I use it). Async-Await has nothing to do with this, AFAIK.
Async-Await is the only mechanism in concurrency, which since doesn't involved any extra thread beside calling thread, can ensure everything always runs on same thread, since it use the IO completion ports (hardware based concurrency), otherwise if you use Task Parallel library, then there's no way for you to ensure that same / given thread is always use, as that's a very high level abstraction
Check one of my recent detailed answer on threading here, it may help in providing some more detailed aspects
It is not thread-safe, as the technical risk exists, but your policy is designed to cope with the problem and work around the risk. So, if things stand as you described, then you are not having a thread-safe environment, however, you are safe. For now.

Threads Syncronization Vs Tasks Syncronization Vs ConcurrentDictionary (No sync Needed) , which to choose

If in our program we are using Threads to access lets say shared collection, then we should ensure thread safety with Mutex, Monitor or Sempahore, et.c
but If we are not using Threads but we are using Tasks and then multiple tasks are trying to access common shared collection then also we should ensure safety by some methods
But If we use some readymade threadsafe collection like ConcurrentDictionary then ensuring locking and thread-task safety is not required as it is already handled at framework level.
So basically i want to know which approach can be used if we are working with shared resource in concurrent consumer environment.
They're all great solutions for different problems. If you can tell us precisely what you're trying to do, what resources are shared, what kinds of accesses are required, then we can tell you which is probably right for your solution.
Overall, unless you've got very specific performance requirements, go with the easiest solution. That is, the ConcurrentDictionary. Since the synchronization logic is built-in, you can be almost certain that nobody will mess up. 'Manual' task and thread synchronization can be pretty tricky at times.

Are there any cases when it's preferable to use a plain old Thread object instead of one of the newer constructs?

I see a lot of people in blog posts and here on SO either avoiding or advising against the usage of the Thread class in recent versions of C# (and I mean of course 4.0+, with the addition of Task & friends). Even before, there were debates about the fact that a plain old thread's functionality can be replaced in many cases by the ThreadPool class.
Also, other specialized mechanisms are further rendering the Thread class less appealing, such as Timers replacing the ugly Thread + Sleep combo, while for GUIs we have BackgroundWorker, etc.
Still, the Thread seems to remain a very familiar concept for some people (myself included), people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class. I've been wondering lately if it's time to amend my ways.
So my question is, are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
The Thread class cannot be made obsolete because obviously it is an implementation detail of all those other patterns you mention.
But that's not really your question; your question is
are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
Sure. In precisely those cases where one of the higher-level constructs does not meet your needs.
My advice is that if you find yourself in a situation where existing higher-abstraction tools do not meet your needs, and you wish to implement a solution using threads, then you should identify the missing abstraction that you really need, and then implement that abstraction using threads, and then use the abstraction.
Threads are a basic building block for certain things (namely parallelism and asynchrony) and thus should not be taken away. However, for most people and most use cases there are more appropriate things to use which you mentioned, such as thread pools (which provide a nice way of handling many small jobs in parallel without overloading the machine by spawning 2000 threads at once), BackgroundWorker (which encapsulates useful events for a single shortlived piece of work).
But just because in many cases those are more appropriate as they shield the programmer from needlessly reinventing the wheel, doing stupid mistakes and the like, that does not mean that the Thread class is obsolete. It is still used by the abstractions named above and you would still need it if you need fine-grained control over threads that is not covered by the more special classes.
In a similar vein, .NET doesn't forbid the use of arrays, despite List<T> being a better fit for many cases where people use arrays. Simply because you may still want to build things that are not covered by the standard lib.
Task and Thread are different abstractions. If you want to model a thread, the Thread class is still the most appropriate choice. E.g. if you need to interact with the current thread, I don't see any better types for this.
However, as you point out .NET has added several dedicated abstractions which are preferable over Thread in many cases.
The Thread class is not obsolete, it is still useful in special circumstances.
Where I work we wrote a 'background processor' as part of a content management system: a Windows service that monitors directories, e-mail addresses and RSS feeds, and every time something new shows up execute a task on it - typically to import the data.
Attempts to use the thread pool for this did not work: it tries to execute too much stuff at the same time and trash the disks, so we implemented our own polling and execution system using directly the Thread class.
The new options make direct use and management of the (expensive) threads less frequent.
people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class.
Which is a very expensive and relatively complex way of doing stuff in parallel.
Note that the expense matters most: You cannot use a full thread to do a small job, it would be counterproductive. The ThreadPool combats the costs, the Task class the complexities (exceptions, waiting and canceling).
To answer the question of "are there any cases when it's necessary or useful to use a plain old Thread object", I'd say a plain old Thread is useful (but not necessary) when you have a long running process that you won't ever interact with from a different thread.
For example, if you're writing an application that subscribes to receive messages from some sort of message queue and you're application is going to do more than just process those messages then it would be useful to use a Thread because the thread will be self-contained (i.e. you aren't waiting on it to get done), and it isn't short-lived. Using the ThreadPool class is more for queuing up a bunch of short-lived work items and allowing the ThreadPool class manage efficiently processing each one as a new Thread is available. Tasks can be used where you would use Thread directly, but in the above scenario I don't think they would buy you much. They help you interact with the thread more easily (which the above scenario doesn't need) and they help determine how many Threads actually should be used for the given set of tasks based on the number of processors you have (which isn't what you want, so you'd tell the Task your thing is LongRunning in which case in the current 4.0 implementation it would simply create a separate non-pooled Thread).
Probably not the answer you were expecting, but I use Thread all the time when coding against the .NET Micro Framework. MF is quite cut down and doesn't include higher level abstractions and the Thread class is super flexible when you need to get the last bit of performance out of a low MHz CPU.
You could compare the Thread class to ADO.NET. It's not the recommended tool for getting the job done, but its not obsolete. Other tools build on top of it to ease the job.
Its not wrong to use the Thread class over other things, especially if those things don't provide a functionality that you need.
It's not definitely obsolete.
The problem with multithreaded apps is that they are very hard to get right (often indeterministic behavior, input, output and also internal state is important), so a programmer should push as much work as possible to framework/tools. Abstract it away. But, the mortal enemy of abstraction is performance.
So my question is, are there any cases when it's necessary or useful
to use a plain old Thread object instead of one of the above
constructs?
I'd go with Threads and locks only if there will be serious performance problems, high performance goals.
I've always used the Thread class when I need to keep count and control over the threads I've spun up. I realize I could use the threadpool to hold all of my outstanding work, but I've never found a good way to keep track of how much work is currently being done or what the status is.
Instead, I create a collection and place the threads in them after I spin them up - the very last thing a thread does is remove itself from the collection. That way, I can always tell how many threads are running, and I can use the collection to ask each what it's doing. If there's a case when I need to kill them all, normally you'd have to set some kind of "Abort" flag in your application, wait for every thread to notice that on its own and self-terminate - in my case, I can walk the collection and issue a Thread.Abort to each one in turn.
In that case, I haven't found a better way that working directly with the Thread class. As Eric Lippert mentioned, the others are just higher-level abstractions, and it's appropriate to work with the lower-level classes when the available high-level implementations don't meet your need. Just as you sometimes need to do Win32 API calls when .NET doesn't address your exact needs, there will always be cases where the Thread class is the best choice despite recent "advancements."

How do I create a fixed-size ThreadPool in .NET?

I want to create a fixed arbitrary size ThreadPool in .NET - I understand the default size is 25 - but I wish to have a different size e.g. 5 or 10. Anyone?
You should be careful about changing the size of the thread pool. There is just one fixed system thread pool, used by all kinds of things. Making it too small could cause problems in areas you didn't even think you were using.
If you want to have a relatively small thread pool for one specific task, you should use a separate pool. There are various third party pools available - I have a rather old one as part of MiscUtil, but it should be good enough for simple use cases. I'm sure you can find more advanced ones if you look.
It's unfortunate that there isn't an instantiable ThreadPool in the framework yet. I can't remember offhand whether Parallel Extensions will effectively provide one, but I don't think it will.
You can use ThreadPool.SetMinThreads and ThreadPool.SetMaxThreads to have some control over the number of threads in the thread pool.
That being said, I recommend being cautious in using this. It's easy to get yourself into trouble, as many operations in the BCL rely on threadpool threads being available.
ThreadPool.SetMaxThreads()
ThreadPool.SetMaxThreads(5,5) and then anything over five threads will get queued.
You want the ThreadPool.SetMaxThreads() method.

Multi-threading libraries for .NET

I used multiple threads in a few programs, but still don't feel very comfortable about it.
What multi-threading libraries for C#/.NET are out there and which advantages does one have over the other?
By multi-threading libraries I mean everything which helps make programming with multiple threads easier.
What .NET integratet (i.e. like ThreadPool) do you use periodically?
Which problems did you encounter?
There are various reasons for using multiple threads in an application:
UI responsiveness
Concurrent operations
Parallel speedup
The approach one should choose depends on what you're trying to do. For UI responsiveness, consider using BackgroundWorker, for example.
For concurrent operations (e.g. a server: something that doesn't have to be parallel, but probably does need to be concurrent even on a single-core system), consider using the thread pool or, if the tasks are long-lived and you need a lot of them, consider using one thread per task.
If you have a so-called embarrassingly parallel problem that can be easily divided up into small subproblems, consider using a pool of worker threads (as many threads as CPU cores) that pull tasks from a queue. The Microsoft Task Parallel Library (TPL) may help here. If the job can be easily expressed as a monadic stream computation (i.e. with a query in LINQ with work in transformations and aggregations etc.), Parallel LINQ (same link) which runs on top of TPL may help.
There are other approaches, such as Actor-style parallelism as seen in Erlang, which are harder to implement efficiently in .NET because of the lack of a green threading model or means to implement same, such as CLR-supported continuations.
I like this one
http://www.codeplex.com/smartthreadpool
Check out the Power Threading library.
I have written a lot of threading code in my days, even implemented my own threading pool & dispatcher. A lot of it is documented here:
http://web.archive.org/web/20120708232527/http://devplanet.com/blogs/brianr/default.aspx
Just realize that I wrote these for very specific purposes and tested them in those conditions, and there is no real silver-bullet.
My advise would be to get comfortable with the thread pool before you move to any other libraries. A lot of the framework code uses the thread pool, so even if you happen to find The Best Threads Library(TM), you will still have to work with the thread pool, so you really need to understand that.
You should also keep in mind that a lot of work has been put into implementing the thread pool and tuning it. The upcoming version of .NET has numerous improvements triggered by the development the parallel libraries.
In my point of view many of the "problems" with the current thread pool can be amended by knowing its strengths and weaknesses.
Please keep in mind that you really should be closing threads (or allowing the threadpool to dispose) when you no longer need them, unless you will need them again soon. The reason I say this is that each thread requires stack memory (usually 1mb), so when you have applications sitting on threads but not using them, you are wasting memory.
For exmaple, Outlook on my machine right now has 20 threads open and is using 0% CPU. That is simply a waste of (a least) 20mb of memory. Word is also using another 10 threads with 0% CPU. 30mb may not seem like much, but what if every application was wasting 10-20 threads?
Again, if you need access to a threadpool on a regular basis then you don't need to close it (creating/destroying threads has an overhead).
You don't have to use the threadpool explicitly, you can use BeginInvoke-EndInvoke if you need async calls. It uses the threadpool behind the scenes. See here: http://msdn.microsoft.com/en-us/library/2e08f6yc.aspx
You should take a look at the Concurrency & Coordination Runtime. The CCR can be a little daunting at first as it requires a slightly different mind set. This video has a fairly good job of explanation of its workings...
In my opinion this would be the way to go, and I also hear that it will use the same scheduler as the TPL.
For me the builtin classes of the Framework are more than enough. The Threadpool is odd and lame, but you can write your own easily.
I often used the BackgroundWorker class for Frontends, cause it makes life much easier - invoking is done automatically for the eventhandlers.
I regularly start of threads manually and safe them in an dictionary with a ManualResetEvent to be able to examine who of them has ended already. I use the WaitHandle.WaitAll() Method for this. Problem there is, that WaitHandle.WaitAll does not acceppt Arrays with more than 64 WaitHandles at once.
You might want to look at the series of articles about threading patterns. Right now it has sample codes for implementing a WorkerThread and a ThreadedQueue.
http://devpinoy.org/blogs/jakelite/archive/tags/Threading+Patterns/default.aspx

Categories