Is it possible to purge a ThreadPool?
Remove items from the ThreadPool?
Anything like that?
ThreadPool.QueueUserWorkItem(GetDataThread);
RegisteredWaitHandle Handle = ThreadPool.RegisterWaitForSingleObject(CompletedEvent, WaitProc, null, 10000, true);
Any thoughts?
I recommend using the Task class (added in .NET 4.0) if you need this kind of behaviour. It supports cancellation, and you can have any number of tasks listening to the same cancellation token, which enables you to cancel them all with a single method call.
Updated (non-4.0 solution):
You really only have two choices. One: implement your own event demultiplexer (this is far more complex than it appears, due to the 64-handle wait limitation); I can't recommend this - I had to do it once (in unmanaged code), and it was hideous.
That leaves the second choice: Have a signal to cancel the tasks. Naturally, RegisteredWaitHandle.Unregister can cancel the RWFSO part. The QUWI is more complex, but can be done by making the action aware of a "token" value. When the action executes, it first checks the token value against its stored token value; if they are different, then it shouldn't do anything.
One major thing to consider is race conditions. Just keep in mind that there is a race condition between cancelling an action and the ThreadPool executing it, so it is possible to see actions running after cancellation.
I have a blog post on this concept, which I call "asynchronous callback contexts". The CallbackContext type mentioned in the blog post is available in the Nito.Async library.
There's no interface for removing a queued item. However, nothing stops you from "poisoning" the delegate so that it returns immediately.
edit
Based on what Paul said, I'm thinking you might also want to consider a pipelined architecture, where you have a fixed number of threads reading from a blocking queue (like .NET 4.0's BlockingCollection on a ConcurrentQueue). This way, if you want to cancel items, you can just access the queue yourself.
Having said that, Stephen's advice about Task is likely better, in that it gives you all the control you would realistically want, without all the hard work that rolling your own pipelines involves. I mention this only for completion.
The ThreadPool exists to help you manage your threads. You should not have to worry about purging it at all since it will make the best performance decisions on your behalf.
If you think you need tighter control over your threads then you could consider creating your own thread management class (similar to ThreadPool) but it would take a lot of work to match and exceed the functionality that ThreadPool has built in.
Take a look here at some of the ThreadPool optimizations and the ideas behind it.
For my second point, I found an article on Code Project that implements a "Cancelable Threadpool", probably for some of your own similar reasons. It would be a good place to start looking if you're going to write your own.
Related
I see a lot of people in blog posts and here on SO either avoiding or advising against the usage of the Thread class in recent versions of C# (and I mean of course 4.0+, with the addition of Task & friends). Even before, there were debates about the fact that a plain old thread's functionality can be replaced in many cases by the ThreadPool class.
Also, other specialized mechanisms are further rendering the Thread class less appealing, such as Timers replacing the ugly Thread + Sleep combo, while for GUIs we have BackgroundWorker, etc.
Still, the Thread seems to remain a very familiar concept for some people (myself included), people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class. I've been wondering lately if it's time to amend my ways.
So my question is, are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
The Thread class cannot be made obsolete because obviously it is an implementation detail of all those other patterns you mention.
But that's not really your question; your question is
are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
Sure. In precisely those cases where one of the higher-level constructs does not meet your needs.
My advice is that if you find yourself in a situation where existing higher-abstraction tools do not meet your needs, and you wish to implement a solution using threads, then you should identify the missing abstraction that you really need, and then implement that abstraction using threads, and then use the abstraction.
Threads are a basic building block for certain things (namely parallelism and asynchrony) and thus should not be taken away. However, for most people and most use cases there are more appropriate things to use which you mentioned, such as thread pools (which provide a nice way of handling many small jobs in parallel without overloading the machine by spawning 2000 threads at once), BackgroundWorker (which encapsulates useful events for a single shortlived piece of work).
But just because in many cases those are more appropriate as they shield the programmer from needlessly reinventing the wheel, doing stupid mistakes and the like, that does not mean that the Thread class is obsolete. It is still used by the abstractions named above and you would still need it if you need fine-grained control over threads that is not covered by the more special classes.
In a similar vein, .NET doesn't forbid the use of arrays, despite List<T> being a better fit for many cases where people use arrays. Simply because you may still want to build things that are not covered by the standard lib.
Task and Thread are different abstractions. If you want to model a thread, the Thread class is still the most appropriate choice. E.g. if you need to interact with the current thread, I don't see any better types for this.
However, as you point out .NET has added several dedicated abstractions which are preferable over Thread in many cases.
The Thread class is not obsolete, it is still useful in special circumstances.
Where I work we wrote a 'background processor' as part of a content management system: a Windows service that monitors directories, e-mail addresses and RSS feeds, and every time something new shows up execute a task on it - typically to import the data.
Attempts to use the thread pool for this did not work: it tries to execute too much stuff at the same time and trash the disks, so we implemented our own polling and execution system using directly the Thread class.
The new options make direct use and management of the (expensive) threads less frequent.
people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class.
Which is a very expensive and relatively complex way of doing stuff in parallel.
Note that the expense matters most: You cannot use a full thread to do a small job, it would be counterproductive. The ThreadPool combats the costs, the Task class the complexities (exceptions, waiting and canceling).
To answer the question of "are there any cases when it's necessary or useful to use a plain old Thread object", I'd say a plain old Thread is useful (but not necessary) when you have a long running process that you won't ever interact with from a different thread.
For example, if you're writing an application that subscribes to receive messages from some sort of message queue and you're application is going to do more than just process those messages then it would be useful to use a Thread because the thread will be self-contained (i.e. you aren't waiting on it to get done), and it isn't short-lived. Using the ThreadPool class is more for queuing up a bunch of short-lived work items and allowing the ThreadPool class manage efficiently processing each one as a new Thread is available. Tasks can be used where you would use Thread directly, but in the above scenario I don't think they would buy you much. They help you interact with the thread more easily (which the above scenario doesn't need) and they help determine how many Threads actually should be used for the given set of tasks based on the number of processors you have (which isn't what you want, so you'd tell the Task your thing is LongRunning in which case in the current 4.0 implementation it would simply create a separate non-pooled Thread).
Probably not the answer you were expecting, but I use Thread all the time when coding against the .NET Micro Framework. MF is quite cut down and doesn't include higher level abstractions and the Thread class is super flexible when you need to get the last bit of performance out of a low MHz CPU.
You could compare the Thread class to ADO.NET. It's not the recommended tool for getting the job done, but its not obsolete. Other tools build on top of it to ease the job.
Its not wrong to use the Thread class over other things, especially if those things don't provide a functionality that you need.
It's not definitely obsolete.
The problem with multithreaded apps is that they are very hard to get right (often indeterministic behavior, input, output and also internal state is important), so a programmer should push as much work as possible to framework/tools. Abstract it away. But, the mortal enemy of abstraction is performance.
So my question is, are there any cases when it's necessary or useful
to use a plain old Thread object instead of one of the above
constructs?
I'd go with Threads and locks only if there will be serious performance problems, high performance goals.
I've always used the Thread class when I need to keep count and control over the threads I've spun up. I realize I could use the threadpool to hold all of my outstanding work, but I've never found a good way to keep track of how much work is currently being done or what the status is.
Instead, I create a collection and place the threads in them after I spin them up - the very last thing a thread does is remove itself from the collection. That way, I can always tell how many threads are running, and I can use the collection to ask each what it's doing. If there's a case when I need to kill them all, normally you'd have to set some kind of "Abort" flag in your application, wait for every thread to notice that on its own and self-terminate - in my case, I can walk the collection and issue a Thread.Abort to each one in turn.
In that case, I haven't found a better way that working directly with the Thread class. As Eric Lippert mentioned, the others are just higher-level abstractions, and it's appropriate to work with the lower-level classes when the available high-level implementations don't meet your need. Just as you sometimes need to do Win32 API calls when .NET doesn't address your exact needs, there will always be cases where the Thread class is the best choice despite recent "advancements."
I'm still fairly new to WF so bear with me if I don't get this worded correctly the first time. ;)
If you're doing selects against a well-normalized database, using primary keys, returning single records, in a fairly low volume environment (a few hundred requests per day), does it really make a difference whether you use CodeActivity vs AsyncCodeActivity?
While I've got some additional research to do on hosting and execution, it will be possible, but not probable, for multiple requests to be received at or near the same time. I'm not sure if that will change the answer or not.
Thanks!
Microsoft used non async in their ExecuteSqlQuery activity: http://wf.codeplex.com/releases/view/43585
Async Activities:
"This is useful for custom activities that must perform asynchronous work without holding the workflow scheduler thread and blocking any activities that may be able to run in parallel."
"As a result of going asynchronous, an AsyncCodeActivity may induce an idle point during execution. Due to the volatile nature of asynchronous work, an AsyncCodeActivity always creates a no persist block for the duration of the activity’s execution. This prevents the workflow runtime from persisting the workflow instance in the middle of the asynchronous work, and also prevents the workflow instance from unloading while the asynchronous code is executing."
Source: http://msdn.microsoft.com/en-us/library/ee358731.aspx
Edit: I noticed that only pointed out the disadvantages of using async I would consider the responses of Ron and Tim to make a better decision
In general I strongly encourage activity developers who are doing any kind of I/O to use AsyncCodeActivity and to call the underlying Async APIs whenever possible. Even if the query is short this is always preferrable.
Obviously - it's not going to make a difference unless you're actually calling an Async API inside your activity implementation.
That said, even if it makes a difference it might not make a noticeable difference in many apps. Potential reasons:
The query just runs too fast.
You aren't running multiple queries in parallel. (Running many async operations in parallel is faster than doing them synchronously and thereby sequentially.)
You don't run a large number of workflows in parallel such as would be needed to experience thread contention.
Is there any way I can abstract away what thread a particular delegate may execute on, such that I could execute it on the calling thread initially, but move execution to a background thread if it ends up taking longer than a certain amount of time?
Assume the delegate is written to be asynchronous. I'm not trying to take synchronous blocks and move them to background threads to increase parallelism, but rather I'm looking to increase performance of asynchronous execution by avoiding the overhead of threads for simple operations.
Basically I'm wondering if there's any way the execution of a delegate or lambda can be paused, moved to another thread and resumed, if I could establish clear stack boundaries, etc.
I doubt this is possible, I'm just curious.
It is possible, but it would be awkward and difficult to get right. The best way to make this happen is to use coroutines. The only mechanism in .NET that currently fits the coroutine paradigm is C#'s iterators via the yield return keyword. You could theorectically hack something together that allows the execution of a method to transition from one thread to another1. However, this would be nothing less than a blog worthy hack, but I do think it is possible.2
The next best option is to go ahead and upgrade to the Async CTP. This is a feature that will be available in C# and which will allow you do exactly what you are asking for. This is accomplished elegantly with the proposed await keyword and some clever exploits that will also be included. The end result would look something like the follwing.
public async void SomeMethod()
{
// Do stuff on the calling thread.
await ThreadPool.SwitchTo(); // Switch to the ThreadPool.
// Do stuff on a ThreadPool thread now!
await MyForm.Dispatcher.SwitchTo(); // Switch to the UI thread.
// Do stuff on the UI thread now!
}
This is just one of the many wicked cool tricks you can do with the new await keyword.
1The only way you can actually inject the execution of code onto an existing thread is if the target is specifically designed to accept the injection in the form of a work item.
2You can see my answer here for one such attempt at mimicking the await keyword with iterators. The MindTouch Dream framework is another, probably better, variation. The point is that it should be possible to cause the thread switching with some ingenious hacking.
Not easily.
If you structure your delegate as a state machine, you could track execution time between states and, when you reach your desired threshold, launch the next state in a new thread.
A simpler solution would be to launch it in a new thread to start with. Any reason that's not acceptable?
(posting from my phone - I'll provide some pseudocode when I'm at a real keyboard if necessary)
No I don't think that is possible. At least not directly with regular delegates. If you created some kind of IEnumerable that yielded after a little bit of work, then you could manually run a few iterations of it and then switch to running it on a background thread after so many iterations.
The ThreadPool and TPL's Task should be plenty performant, simply always run it on a background thread. Unless you have a specific benchmark showing that using a Task causes a bunch of overhead it sounds like you are trying to prematurely optimize.
My goal is to write a program that handles an arbitrary number of tasks based on given user input.
Let's say the # of tasks are 1000 in this case.
Now, I'd like to be able to have a dynamic number of threads that are spawned and start handling the tasks one by one.
I would assume I need to use a "synchronous" method, as opposed to a "asynchronous" one, so that in case one tasks has a problem, I wouldn't want it to slow down the completion of the rest.
What method would I use to accomplish the above? Semaphores? ThreadPools? And how do I make sure that a thread does not try to start a task that is already being handled by another thread? Would a "lock" handle this?
Code examples and/or links to sites that will point me in the right direction will be appreciated.
edit: The problem with the MSDN Fibonacci example is that the waitall method can only handle up to 64 waits. I need more than that due to the 1000 tasks. How to fix that situation without creating deadlocks?
Are these tasks independent? If so, you basically want a producer/consumer queue or a custom threadpool, which are effectively different views on the same thing. You need to be able to place tasks in a queue, and have multiple threads be able to read from that queue.
I have a custom threadpool in MiscUtil or there's a simple (nongeneric due to age) producer/consumer queue in my threading tutorial (about half way down this page).
If these tasks are reasonably long-running, I wouldn't use the system threadpool for this - it will spawn more threads than you probably want. If you're using .NET 4.0 beta 1 you could use Parallel Extensions though.
I'm not quite sure about your comment on WaitAll... are you trying to work out when everything's finished? In the producer/consumer queue case, that would probably involve having some sort of "stop" entry in the queue (e.g. null references which the consuming threads would understand to mean "quit") and then add a "WaitUntilEmpty" method (which should be fairly easy to implement). Note that you wouldn't need to wait until the last items had been processed, as they'd all be stop signals... by the time the queue has emptied, all the real work items will definitely have been processed anyway.
You'll probably want to use the ThreadPool to manage this.
I recommend reading up on MSDN on How to use the ThreadPool in C#. It covers many aspects of this, including firing tasks, and simple synchronization.
Using Threading in C# is the main section, and will cover other options.
If you happen to be using VS 2010 beta, and targetting .NET 4, the Task Parallel Library is a very good option for this - it simplifies some of these patterns.
You can't use it (yet) but the new Task class in .NET 4 would be ideal for this kind of situation.
Until then, the ThreadPool is your best bet. It has a (very) limited form of load-balancing. Note that if you try to start 1000 Threads you will probably get an Out Of Memory exception. The ThreadPool will handle that with ease.
Your sync problem can be handled with a simple (Interlocked) counter, if the timing is such that you can tolerate a Sleep(1) loop in the main thread. The ThreadPool is missing a more convenient way to do this.
A simple strategy to avoid a task is get by two or more thread is a syncronized (with a mutext for example) vector.
See this http://msdn.microsoft.com/en-us/library/yy12yx1f.aspx
Perhaps you can use the BackgroundWorker class. It creates a nice abstraction on top of the thread pool. You can even subclass it if you want to setup many similar jobs.
As has been mentioned, .NET 4 features the excellent Task Parallel Library. But you can use the June 2008 CTP of it in .NET 3.5 just fine. I've been doing this for some hobby projects myself, but if this is a commercial project, you should check out if there are legal issues.
What exactly do I need delegates, and threads for?
Delegates act as the logical (but safe) equivalent to function-pointers; they allow you to talk about an operation in an abstract way. The typical example of this is events, but I'm going to use a more "functional programming" example: searching in a list:
List<Person> people = ...
Person fred = people.Find( x => x.Name == "Fred");
Console.WriteLine(fred.Id);
The "lambda" here is essentially an instance of a delegate - a delegate of type Predicate<Person> - i.e. "given a person, is something true or false". Using delegates allows very flexible code - i.e. the List<T>.Find method can find all sorts of things based on the delegate that the caller passes in.
In this way, they act largely like a 1-method interface - but much more succinctly.
Delegates: Basically, a delegate is a method to reference a method. It's like a pointer to a method which you can set it to different methods that match its signature and use it to pass the reference to that method around.
Thread is a sequentual stream of instructions that execute one after another to complete a computation. You can have different threads running simultaneously to accomplish a specific task. A thread runs on a single logical processor.
Delegates are used to add methods to events dynamically.
Threads run inside of processes, and allow you to run 2 or more tasks at once that share resources.
I'd suggest have a search on these terms, there is plenty of information out there. They are pretty fundamental concepts, wiki is a high level place to start:
http://en.wikipedia.org/wiki/Thread_(computer_science)
http://en.wikipedia.org/wiki/C_Sharp_(programming_language)
Concrete examples always help me so here is one for threads. Consider your web server. As requests arrive at the server, they are sent to the Web Server process for handling. It could handle each as it arrives, fully processing the request and producing the page before turning to the next one. But consider just how much of the processing takes place at hard drive speeds (rather than CPU speeds) as the requested page is pulled from the disk (or data is pulled from the database) before the response can be fulfilled.
By pulling threads from a thread pool and giving each request its own thread, we can take care of the non-disk needs for hundreds of requests before the disk has returned data for the first one. This will permit a degree of virtual parallelism that can significantly enhance performance. Keep in mind that there is a lot more to Web Server performance but this should give you a concrete model for how threading can be useful.
They are useful for the same reason high-level languages are useful. You don't need them for anything, since really they are just abstractions over what is really happening. They do make things significantly easier and faster to program or understand.
Marc Gravell provided a nice answer for 'what is a delegate.'
Andrew Troelsen defines a thread as
...a path of execution within a process. "Pro C# 2008 and the .NET 3.5 Platform," APress.
All processes that are run on your system have at least one thread. Let's call it the main thread. You can create additional threads for any variety of reasons, but the clearest example for illustrating the purpose of threads is printing.
Let's say you open your favorite word processing application (WPA), type a few lines, and then want to print those lines. If your WPA uses the main thread to print the document, the WPA's user interface will be 'frozen' until the printing is finished. This is because the main thread has to print the lines before it can process any user interface events, i.e., button clicks, mouse movements, etc. It's as if the code were written like this:
do
{
ProcessUserInterfaceEvents();
PrintDocument();
} while (true);
Clearly, this is not what users want. Users want the user interface to be responsive while the document is being printed.
The answer, of course, is to print the lines in a second thread. In this way, the user interface can focus on processing user interface events while the secondary thread focuses on printing the lines.
The illusion is that both tasks happen simultaneously. On a single processor machine, this cannot be true since the processor can only execute one thread at a time. However, switching between the threads happens so fast that the illusion is usually maintained. On a multi-processor (or mulit-core) machine, this can be literally true since the main thread can run on one processor while the secondary thread runs on another processor.
In .NET, threading is a breeze. You can utilize the System.Threading.ThreadPool class, use asynchronous delegates, or create your own System.Threading.Thread objects.
If you are new to threading, I would throw out two cautions.
First, you can actually hurt your application's performance if you choose the wrong threading model. Be careful to avoid using too many threads or trying to thread things that should really happen sequentially.
Second (and more importantly), be aware that if you share data between threads, you will likely need to sychronize access to that shared data, e.g., using the lock keyword in C#. There is a wealth of information on this topic available online, so I won't repeat it here. Just be aware that you can run into intermittent, not-always-repeatable bugs if you do not do this carefully.
Your question is to vague...
But you probably just want to know how to use them in order to have a window, a time consuming process running and a progress bar...
So create a thread to do the time consuming process and use the delegates to increase the progress bar! :)