I have several low-imprtance tasks to be performed when some cpu time is available. I don't want this task to perform if other more import task are running. Ie if a normal/high priority task comes I want the low-importance task to pause until the importance task is done.
There is a pretty big number of low importance task to be performed (50 to 1000). So I don't want to create one thread per task. However I believe that the threadpool do not allow some priority specification, does it ?
How would you do solve this ?
You can new up a Thread and use a Dispatcher to send it takes of various priorities.
The priorities are a bit UI-centric but that doesn't really matter.
You shouldn't mess with the priority of the regular ThreadPool, since you aren't the only consumer. I suppose the logical approach would be to write your own - perhaps as simple as a producer/consumer queue, using your own Thread(s) as the consumer(s) - setting the thread priority yourself.
.NET 4.0 includes new libraries (the TPL etc) to make all this easier - until then you need additional code to create a custom thread pool or work queue.
When you are using the build in ThreadPool all threads execute with the default priority. If you mess with this setting it will be ignored. This is a case where you should roll your own ThreadPool. A few years ago I extended the SmartThreadPool to meet my needs. This may satisfy yours as well.
I'd create a shared Queue of pending task objects, with each object specifying its priority. Then write a dispatcher thread that watches the Queue and launches a new thread for each task, up to some max thread limit, and specifying the thread priority as it creates it. Its only a small amount of work to do that, and you can have the dispatcher report activity and even dynamically adjust the number of running threads. That concept has worked very well for me, and can be wrapped in a windows service to boot if you make your queue a database table.
Related
I've helped a client with an application which stopped doing it's work after a while (it eventually started doing work again).
The problem was that when a Task failed it used Thread.Sleep for 5 seconds (in the task). As there could be up to 800 tasks queued every two second you can imagine the problem if many of those jobs fails and invoked Thread.Sleep. None of those jobs were marked with TaskCreationOptions.LongRunning.
I rewrote the tasks so that Thread.Sleep (or Task.Delay for that matter) wasn't nessacary.
However, I'm interested in what the TaskScheduler (the default) did in that scenario. How and when do it increase the number of threads?
According to MSDN
Behind the scenes, tasks are queued to the ThreadPool, which has been
enhanced with algorithms (like hill-climbing) that determine and
adjust to the number of threads that maximizes throughput.
The ThreadPool will have a maximum number of threads depending on the environment. As you create more Tasks the pool can run them concurrently until it reaches its maximum number of threads, at which point any further tasks will be queued.
If you want to find out the maximum number of ThreadPool threads you can use System.Threading.ThreadPool.GetMaxThreads (you need to pass in two out int parameters, one that will be populated with the number of maximum worker threads and another that will be populated with the maximum number of asynchronous I/O threads).
If you want to get a better idea of what is happening in your application at runtime you can use Visual Studio's threads window by going to Debug -> Windows -> Threads (The entry will only be there when you are debugging so you'll need to set a break point in your application first).
This post is possibly of interest. It would seem the default task scheduler simply queues the task up in the ThreadPool's queue unless you use TaskCreationOptions.LongRunning. This means that it's up to the ThreadPool to decide when to create new threads.
I have an application in C# with a list of work to do. I'm looking to do as much of that work as possible in parallel. However I need to be able to control the maximum amount of parallel tasks.
From what I understand this is possible with a ThreadPool or with Tasks. Is there an difference in which one I use? My main concern is being able to control how many threads are active at one time.
Please take a look at ParallelOptions.MaxDegreeOfParallelism for Tasks.
I would advise you to use Tasks, because they provide a higher level abstraction than the ThreadPool.
A very good read on the topic can be found here. Really, a must-have book and it's free on top of that :)
In TPL you can use the WithDegreeOfParallelism on a ParallelEnumerable or ParallelOptions.MaxDegreeOfParallism
There is also the CountdownEvent which may be a better option if you are just using custom threads or tasks.
In the ThreadPool, when you use SetMaxThreads its global for the AppDomain so you could potentially be limiting unrelated code unnecessarily.
You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer.
If the common language runtime is hosted, for example by Internet Information Services (IIS) or SQL Server, the host can limit or prevent changes to the thread pool size.
Use caution when changing the maximum number of threads in the thread pool. While your code might benefit, the changes might have an adverse effect on code libraries you use.
Setting the thread pool size too large can cause performance problems. If too many threads are executing at the same time, the task switching overhead becomes a significant factor.
I agree with the other answer that you should use TPL over the ThreadPool as its a better abstraction of multi-threading, but its possible to accomplish what you want in both.
In this article on msdn, they explain why they recommend Tasks instead of ThreadPool for Parallelism.
Task have a very charming feature to me, you can build chains of tasks. Which are executed on certain results of the task before.
A feature I often use is following: Task A is running in background to do some long running work. I chain Task B after it, only executing when Task A has finished regulary and I configure it to run in the foreground, so I can easily update my controls with the result of long running Task A.
You can also create a semaphore to control how many threads can execute at a single time. You can create a new semaphore and in the constructor specify how many simultaneous threads are able to use that semaphore at a single time. Since I don't know how you are going to be using the threads, this would be a good starting point.
MSDN Article on the Semaphore class
-Wesley
As some may have seen in .NET 4.0, they've added a new namespace System.Threading.Tasks which basically is what is means, a task. I've only been using it for a few days, from using ThreadPool.
Which one is more efficient and less resource consuming? (Or just better overall?)
The objective of the Tasks namespace is to provide a pluggable architecture to make multi-tasking applications easier to write and more flexible.
The implementation uses a TaskScheduler object to control the handling of tasks. This has virtual methods that you can override to create your own task handling. Methods include for instance
protected virtual void QueueTask(Task task)
public virtual int MaximumConcurrencyLevel
There will be a tiny overhead to using the default implementation as there's a wrapper around the .NET threads implementation, but I'd not expect it to be huge.
There is a (draft) implementation of a custom TaskScheduler that implements multiple tasks on a single thread here.
which one is more efficient and less
resource consuming?
Irrelevant, there will be very little difference.
(Or just better overall)
The Task class will be the easier-to-use as it offers a very clean interface for starting and joining threads, and transfers exceptions. It also supports a (limited) form of load balancing.
"Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code."
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Thread
The bare metal thing, you probably don't need to use it, you probably can use a LongRunning Task and benefit from its facilities.
Tasks
Abstraction above the Threads. It uses the thread pool (unless you specify the task as a LongRunning operation, if so, a new thread is created under the hood for you).
Thread Pool
As the name suggests: a pool of threads. Is the .NET framework handling a limited number of threads for you. Why? Because opening 100 threads to execute expensive CPU operations on a CPU with just 8 cores definitely is not a good idea. The framework will maintain this pool for you, reusing the threads (not creating/killing them at each operation), and executing some of they in parallel in a way that your CPU will not burn.
OK, but when to use each one?
In resume: always use tasks.
Task is an abstratcion, so it is a lot easier to use. I advise you to always try to use Tasks and if you face some problem that makes you need to handle a thread by yourself (probably 1% of the time) then use threads.
BUT be aware that:
I/O Bound: For I/O bound operations (database calls, read/write files, APIs calls, etc) never use normal tasks, use LongRunning tasks or threads if you need to, but not normal tasks. Because it would lead you to a thread pool with a few threads busy and a lot of another tasks waiting for its turn to take the pool.
CPU Bound: For CPU bound operations just use the normal tasks and be happy.
Scheduling is an important aspect of parallel tasks.
Unlike threads, new tasks don't necessarily begin executing immediately. Instead, they are placed in a work queue. Tasks run when their associated task scheduler removes them from the queue, usually as cores become available. The task scheduler attempts to optimize overall throughput by controlling the system's degree of concurrency. As long as there are enough tasks and the tasks are sufficiently free of serializing dependencies, the program's performance scales with the number of available cores. In this way, tasks embody the concept of potential parallelism
As I saw on msdn http://msdn.microsoft.com/en-us/library/ff963549.aspx
ThreadPool and Task difference is very simple.
To understand task you should know about the threadpool.
ThreadPool is basically help to manage and reuse the free threads. In
other words a threadpool is the collection of background thread.
Simple definition of task can be:
Task work asynchronously manages the the unit of work. In easy words
Task doesn’t create new threads. Instead it efficiently manages the
threads of a threadpool.Tasks are executed by TaskScheduler, which queues tasks onto threads.
Another good point to consider about task is, when you use ThreadPool, you don't have any way to abort or wait on the running threads (unless you do it manually in the method of thread), but using task it is possible. Please correct me if I'm wrong
Currently, I have a large number of C# computations (method calls) residing in a queue that will be run sequentially. Each computation will use some high-latency service (network, disk...).
I was going to use Mono coroutines to allow the next computation in the computation queue to continue while a previous computation is waiting for the high latency service to return. However, I prefer to not depend on Mono coroutines.
Is there a design pattern that's implementable in pure C# that will enable me to process additional computations while waiting for high latency services to return?
Thanks
Update:
I need to execute a huge number (>10000) of tasks, and each task will be using some high-latency service. On Windows, you can't create that much threads.
Update:
Basically, I need a design pattern that emulates the advantages (as follows) of tasklets in Stackless Python (http://www.stackless.com/)
Huge # of tasks
If a task blocks the next task in the queue executes
No wasted cpu cycle
Minimal overhead switching between tasks
You can simulate cooperative microthreading using IEnumerable. Unfortunately this won't work with blocking APIs, so you need to find APIs that you can poll, or which have callbacks that you can use for signalling.
Consider a method
IEnumerable Thread ()
{
//do some stuff
Foo ();
//co-operatively yield
yield null;
//do some more stuff
Bar ();
//sleep 2 seconds
yield new TimeSpan (2000);
}
The C# compiler will unwrap this into a state machine - but the appearance is that of a co-operative microthread.
The pattern is quite straightforward. You implement a "scheduler" that keeps a list of all the active IEnumerators. As it cycles through the list, it "runs" each one using MoveNext (). If the value of MoveNext is false, the thread has ended, and the scheduler removes it from the list. If it's true, then the scheduler accesses the Current property to determine the current state of the thread. If it's a TimeSpan, the thread wishes to sleep, and the scheduler moved it onto some queue that can be flushed back into the main list when the sleep timespans have ended.
You can use other return objects to implement other signalling mechanisms. For example, define some kind of WaitHandle. If the thread yields one of these, it can be moved to a waiting queue until the handle is signalled. Or you could support WaitAll by yielding an array of wait handles. You could even implement priorities.
I did a simple implementation of this scheduler in about 150LOC but I haven't got round to blogging the code yet. It was for our PhyreSharp PhyreEngine wrapper (which won't be public), where it seems to work pretty well for controlling a couple of hundred characters in one of our demos. We borrowed the concept from the Unity3D engine -- they have some online docs that explain it from a user point of view.
.NET 4.0 comes with extensive support for Task parallelism:
How to: Use Parallel.Invoke to Execute Simple Parallel Tasks
How to: Return a Value from a Task
How to: Chain Multiple Tasks with Continuations
I'd recommend using the Thread Pool to execute multiple tasks from your queue at once in manageable batches using a list of active tasks that feeds off of the task queue.
In this scenario your main worker thread would initially pop N tasks from the queue into the active tasks list to be dispatched to the thread pool (most likely using QueueUserWorkItem), where N represents a manageable amount that won't overload the thread pool, bog your app down with thread scheduling and synchronization costs, or suck up available memory due to the combined I/O memory overhead of each task.
Whenever a task signals completion to the worker thread, you can remove it from the active tasks list and add the next one from your task queue to be executed.
This will allow you to have a rolling set of N tasks from your queue. You can manipulate N to affect the performance characteristics and find what is best in your particular circumstances.
Since you are ultimately bottlenecked by hardware operations (disk I/O and network I/O, CPU) I imagine smaller is better. Two thread pool tasks working on disk I/O most likely won't execute faster than one.
You could also implement flexibility in the size and contents of the active task list by restricting it to a set number of particular type of task. For example if you are running on a machine with 4 cores, you might find that the highest performing configuration is four CPU-bound tasks running concurrently along with one disk-bound task and a network task.
If you already have one task classified as a disk IO task, you may choose to wait until it is complete before adding another disk IO task, and you may choose to schedule a CPU-bound or network-bound task in the meanwhile.
Hope this makes sense!
PS: Do you have any dependancies on the order of tasks?
You should definitely check out the Concurrency and Coordination Runtime. One of their samples describes exactly what you're talking about: you call out to long-latency services, and the CCR efficiently allows some other task to run while you wait. It can handle huge number of tasks because it doesn't need to spawn a thread for each one, though it will use all your cores if you ask it to.
Isn't this a conventional use of multi-threaded processing?
Have a look at patterns such as Reactor here
Writing it to use Async IO might be sufficient.
This can lead to nasy, hard to debug code without strong structure in the design.
You should take a look at this:
http://www.replicator.org/node/80
This should do exactly what you want. It is a hack, though.
Some more information about the "Reactive" pattern (as mentioned by another poster) with respect to an implementation in .NET; aka "Linq to Events"
http://themechanicalbride.blogspot.com/2009/07/introducing-rx-linq-to-events.html
-Oisin
In fact, if you use one thread for a task, you will lose the game. Think about why Node.js can support huge number of conections. Using a few number of thread with async IO!!! Async and await functions can help on this.
foreach (var task in tasks)
{
await SendAsync(task.value);
ReadAsync();
}
SendAsync() and ReadAsync() are faked functions to async IO call.
Task parallelism is also a good choose. But I am not sure which one is faster. You can test both of them
in your case.
Yes of course you can. You just need to build a dispatcher mechanism that will call back on a lambda that you provide and goes into a queue. All the code I write in unity uses this approach and I never use coroutines. I wrap methods that use coroutines such as WWW stuff to just get rid of it. In theory, coroutines can be faster because there is less overhead. Practically they introduce new syntax to a language to do a fairly trivial task and furthermore you can't follow the stack trace properly on an error in a co-routine because all you'll see is ->Next. You'll have to then implement the ability to run the tasks in the queue on another thread. However, there is parallel functions in the latest .net and you'd be essentially writing similar functionality. It wouldn't be many lines of code really.
If anyone is interested I would send the code, don't have it on me.
So I have a program that serves as a sort of 'shell' for other programs. At its core, it gets passed a class, a method name, and some args, and handles the execution of the function, with the idea of allowing other programmers to basically schedule their processes to run on this shell service. Everything works fine except for one issue. Frequently, these processes that are scheduled for execution are very CPU heavy. At times, the processes called start using so much of the CPU, that the threads that I have which are responsible for checking schedules and firing off other jobs don't get a chance to run for quite some time, resulting in scheduling issues and a lack of sufficient responsiveness. Unfortunately, I can't insert Thread.Sleep() calls in the actual running code, as I don't really 'own' it.
So my question is: Is it possible to force an arbitrary thread (that I started) to sleep (yield) every so often, without modifying the actual code running in that thread? Barring that, is there some way to 'inject' Thread.Sleep() calls into code that I'm about to run dynamically, at run-time?
Not really. You could try changing the priority of the other thread with the Priority property, but you can't really inject a "sleep" call.
There's Thread.Suspend() but I strongly recommend you don't use it. You don't really want to get another thread to suspend at arbitrary times, as it might hold important resources. (That's why the method is marked as obsolete and has all kinds of warnings around it :)
Reducing the priority of the other tasks and potentially boosting the priority of your own tasks is probably the best approach.
No there is not (to my knowledge), you can however change the scheduling priority of a thread - http://msdn.microsoft.com/en-us/library/system.threading.thread.priority.aspx.
Lower the priority of the thread that is running the problem code as you start the thread (or raise the priority of your scheduling threads)
You can suspend a thread with Thread.Suspend (deprecated and not recommended) or you can lower its priority by changing Thread.Priority.
Actually, what you are stating is very much a security risk and thus isn't going to be supported by most platforms. It's generally not a good policy to allow other programs to affect your status without your permission.
If possible, you could probably load the prog in question on its own process that you create in your code, and then you can set that processe's priority or make it sleep, or what have you. I am no expert in windows programming, so your mileage may vary.
You should not be doing this. Threads should not Sleep() unless they themselves decide it's a good idea.
It's not relevant to your problem anyway. Your problem apparently is that management threads do not run, due to CPU starvation. To fix that, it's sufficient to give those threads a higher priority. At the end of each timeslice, the OS will determine which threads should run next. Thread priority is somewhat dynamic: threads that haven't run for a while will move up relative to threasd that just ran. But the higher base thread priority of your management threads will mean they don't need this dynamic boost to still get scheduled.
If this is still not sufficient, lower the priority of the worker threads. This will mean the worker thread has to be dormant for even more time before it gets a new timeslice.
Finally, make sure that the thread kicking off new worker threads runs at a very low priority. This will mean that no new threads are created if the system is hitting CPU limits. This implements a simple but robust throttling mechanism.