Let's say I have a business object that is very expensive to instantiate, and I would never want to create more than say 10 instances of that object in my application. So, that would mean I would never want to have more than 10 concurrent worker threads running at one time.
I'd like to use the new System.Threading.Tasks to create a task like this:
var task = Task.Factory.StartNew(() => myPrivateObject.DoSomethingProductive());
Is there a sample out there that would show how to:
create an 'object pool' for use by the TaskFactory?
limit the TaskFactory to a specified number of threads?
lock an instance in the object pool so it can only be used by one task at a time?
Igby's answer led me to this excellent blog post from Justin Etheridge. which then prompted me to write this sample:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace MyThreadedApplication
{
class Program
{
static void Main(string[] args)
{
// build a list of 10 expensive working object instances
var expensiveStuff = new BlockingCollection<ExpensiveWorkObject>();
for (int i = 65; i < 75; i++)
{
expensiveStuff.Add(new ExpensiveWorkObject(Convert.ToChar(i)));
}
Console.WriteLine("{0} expensive objects created", expensiveStuff.Count);
// build a list of work to be performed
Random r = new Random();
var work = new ConcurrentQueue<int>();
for (int i = 0; i < 1000; i++)
{
work.Enqueue(r.Next(10000));
}
Console.WriteLine("{0} items in work queue", work.Count);
// process the list of work items in fifteen threads
for (int i = 1; i < 15; i++)
{
Task.Factory.StartNew(() =>
{
while (true)
{
var expensiveThing = expensiveStuff.Take();
try
{
int workValue;
if (work.TryDequeue(out workValue))
{
expensiveThing.DoWork(workValue);
}
}
finally
{
expensiveStuff.Add(expensiveThing);
}
}
});
}
}
}
}
class ExpensiveWorkObject
{
char identity;
public void DoWork(int someDelay)
{
System.Threading.Thread.Sleep(someDelay);
Console.WriteLine("{0}: {1}", identity, someDelay);
}
public ExpensiveWorkObject(char Identifier)
{
identity = Identifier;
}
}
So, I'm using the BlockingCollection as an object pool, and the worker threads don't check the queue for available work until they have an exclusive control over one of the expensive object instances. I think this meets my requirements, but I would really like feedback from people who know this stuff better than I do...
Two thoughts:
Limited Concurrency Scheduler
You can use a custom task scheduler which limits the number of concurrent tasks. Internally it will allocate up to n Task instances. If you pass it more tasks than it has available instances, it will put them in a queue. Adding custom schedulers like this is a design feature of the TPL.
Here is a good example of such a scheduler. I have sucessfully used a modified version of this.
Object Pool
Another option is to use an object pool. It's a very similar concept except that instead of putting the limitation at the task level, you put it on the number of object instances, and force tasks to wait for a free instance to become available. This has the benefit of reducing the overhead of object creation, but you need to ensure the object is written in a way that allows instances of it to be recycled. You could create an object pool around a concurrent producer-consumer collection such as ConcurrentStack where the consumer adds the instance back to the collection when it's finished.
Related
New to threading and tasks here :)
So, I wrote a simple threading program that creates a few threads and runs them asynchronously then waits for them to finish.
I then changed it to a Task. The code does exactly the same thing and the only change is I change a couple of statements.
So, two questions really:
In the below code, what is the difference?
I'm struggling to figure out async/await. How would I integrate it into the below, or given all examples seem to be one method calls another that are both async/await return is this a bad example of using Task to do background work?
Thanks.
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
ThreadSample();
TaskSample();
}
private static void ThreadSample()
{
Random r = new Random();
MyThreadTest[] myThreads = new MyThreadTest[4];
Thread[] threads = new Thread[4];
for (int i = 0; i < 4; i++)
{
myThreads[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
threads[i] = new Thread(new ThreadStart(myThreads[i].ThreadSample));
threads[i].Start();
}
for (int i = 0; i < 4; i++)
{
threads[i].Join();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
private static void TaskSample()
{
Random r = new Random();
MyThreadTest[] myTasks = new MyThreadTest[4];
Task[] tasks = new Task[4];
for (int i = 0; i < 4; i++)
{
myTasks[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
tasks[i] = new Task(new Action(myTasks[i].ThreadSample));
tasks[i].Start();
}
for (int i = 0; i < 4; i++)
{
tasks[i].Wait();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
}
class MyThreadTest
{
private string name;
private int interval;
public MyThreadTest(string name, int interval)
{
this.name = name;
this.interval = interval;
Console.WriteLine($"Thread created: {name},{interval}");
}
public void ThreadSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
public void TaskSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}
The Task Parallel Library (TPL) is an abstraction, and you shouldn't try to compare Tasks directly with threads. The Task object represents the abstract concept of an asynchronous task - a piece of code that should execute asynchronously and which will either complete, fault (throw an exception) or be canceled. The abstraction means you can write and use such tasks without worrying too much about exactly how they're executed asynchronously. There are lots of useful things like ContinueWith() you can use to compose, sequence and otherwise manage tasks.
Threads are a much lower level concrete system facility that can be used to run code asynchronously, but without all the niceties you get from the Task Parallel Library (TPL). If you want to sequence tasks or anything like that, you have to code it yourself.
In your example code, you're not actually directly creating any threads. Instead, the Actions you've written are being executed by the system thread pool. Of course, this can be changed. The TPL abstraction layer provides the TaskScheduler class which you can extend - if you have some special way of running code asynchronously, you can write a TaskScheduler to use TPL with it.
async/await is 100% compiler sugar. The compiler decomposes an async method into chunks, each of which becomes a Task, and those chunks execute sequentially with the help of a state machine, all generated by the compiler. One caution: by default, await captures the current SynchronizationContext and resumes on that context. So if you're doing this in WPF or Windows Forms, your continuation code after an await isn't actually running in a thread at all, it's running on the UI thread. You can disable this by calling ConfigureAwait(false). Really, async/await are primarily intended for asynchronous programming in UI environments where synchronization to a main thread is important.
In the below code, what is the difference?
The difference is big. Task is a unit of work, which will use a thread(s) from thread pool allocated based on estimated amount of work to be computed. if there is another Task, and there are paused, but still alive threads, in the pool, instead of spinning of a new thread (which is very costy) it reuses already created one. Multiple tasks can end-up using the same thread eventually (non simultaneously obviously)
Task based parallelism in nutshell is: Tasks are jobs, ThreadPool provisions resource to complete those jobs. Consequence, more clever, elastic thread/resource utilization, especially in general purpose programs targeting variety of execution environments and resource availability, for example VMs on cloud.
I'm struggling to figure out async/await.
await implied dependency of one task from another. If in your case you don't have it, other than waiting all of them to complete, what are you doing is pretty much enough.
If you need, you can achieve that with TPL too via, for example, ContinueWith
I have an integration service which runs a calculation heavy, data bound process. I want to make sure that there are never more than say, n = 5, (but n will be configurable, changeable at runtime) of these processes running at the same. The idea is to throttle the load on the server to a safe level. The amount of data processed by the method is limited by batching, so I don't need to worry about 1 process representing a much bigger load than another.
The processing method is called by another process, where requests to run payroll are held on a queue, and I can insert some logic at that point to determine whether to process this request now, or leave it on the queue.
So i want a seperate method on the same service as the processing method, which can tell me if the server can accept another call to the processing method. It's going to ask, "how many payroll runs are going on? is that less than n?" What's a neat way of achieving this?
-----------edit------------
I think I need to make it clear, the process that decides whether to take the request off the queue this is seperated from the service that processes the payroll data by a WCF boundary. Stopping a thread on the payroll processing process isn't going to prevent more requests coming in
You can use a Semaphore to do this.
public class Foo
{
private Semaphore semaphore;
public Foo(int numConcurrentCalls)
{
semaphore = new Semaphore(numConcurrentCalls, numConcurrentCalls);
}
public bool isReady()
{
return semaphore.WaitOne(0);
}
public void Bar()
{
try
{
semaphore.WaitOne();//it will only get past this line if there are less than
//"numConcurrentCalls" threads in this method currently.
//do stuff
}
finally
{
semaphore.Release();
}
}
}
Review the Object Pool pattern. This is what you're describing. While not strictly required by the pattern, you can expose the number of objects currently in the pool, the maximum (configured) number, the high-watermark, etc.
I think that you might want a BlockingCollection, where each item in the collection represents one of the concurrent calls.
Also see IProducerConsumerCollection.
If you were just using threads, I'd suggest you look at the methods for limiting thread concurrency (e.g. the TaskScheduler.MaximumConcurrencyLevel property, and this example.).
Also see ParallelEnumerable.WithDegreeOfParallelism
void ThreadTest()
{
ConcurrentQueue<int> q = new ConcurrentQueue<int>();
int MaxCount = 5;
Random r = new Random();
for (int i = 0; i <= 10000; i++)
{
q.Enqueue(r.Next(100000, 200000));
}
ThreadStart proc = null;
proc = () =>
{
int read = 0;
if (q.TryDequeue(out read))
{
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] starting: {0}... #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(r.Next(100, 1000));
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] {0} ended! #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
proc();
}
};
for (int i = 0; i <= MaxCount; i++)
{
new Thread(proc).Start();
}
}
I've got application which has two main task: encoding, processing video.
These tasks are independant.
Each task I would like run with configurable number of threads.
For this reason for one task I usually use ThreadPool and SetMaxThreads. But now I've got two tasks and would like "two configurable(number of threads) threapool for each task".
Well, ThreadPool is a static class. So how can I implement my strategy(easy configurable number of threads for each task).
Thanks
You will probably want your own thread pool. If you are using .NET 4.0 then it is actually fairly easy to roll your own if you use the BlockingCollection class.
public class CustomThreadPool
{
private BlockingCollection<Action> m_WorkItems = new BlockingCollection<Action>();
public CustomThreadPool(int numberOfThreads)
{
for (int i = 0; i < numberOfThreads; i++)
{
var thread = new Thread(
() =>
{
while (true)
{
Action action = m_WorkItems.Take();
action();
}
});
thread.IsBackground = true;
thread.Start();
}
}
public void QueueUserWorkItem(Action action)
{
m_WorkItems.Add(action);
}
}
That is really all there is to it. You would create a CustomThreadPool for each actual pool you want to control. I posted the minimum amount of code to get a crude thread pool going. Naturally, you might want to tweak and expand this implementation to suit your specific need.
Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.
There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.
What is the most recommended .NET custom threadpool that can have separate instances i.e more than one threadpool per application?
I need an unlimited queue size (building a crawler), and need to run a separate threadpool in parallel for each site I am crawling.
Edit :
I need to mine these sites for information as fast as possible, using a separate threadpool for each site would give me the ability to control the number of threads working on each site at any given time. (no more than 2-3)
Thanks
Roey
I believe Smart Thread Pool can do this. It's ThreadPool class is instantiated so you should be able to create and manage your separate site specific instances as you require.
Ami bar wrote an excellent Smart thread pool that can be instantiated.
take a look here
Ask Jon Skeet: http://www.yoda.arachsys.com/csharp/miscutil/
Parallel extensions for .Net (TPL) should actually work much better if you want a large number of parallel running tasks.
Using BlockingCollection can be used as a queue for the threads.
Here is an implementation of it.
Updated at 2018-04-23:
public class WorkerPool<T> : IDisposable
{
BlockingCollection<T> queue = new BlockingCollection<T>();
List<Task> taskList;
private CancellationTokenSource cancellationToken;
int maxWorkers;
private bool wasShutDown;
int waitingUnits;
public WorkerPool(CancellationTokenSource cancellationToken, int maxWorkers)
{
this.cancellationToken = cancellationToken;
this.maxWorkers = maxWorkers;
this.taskList = new List<Task>();
}
public void enqueue(T value)
{
queue.Add(value);
waitingUnits++;
}
//call to signal that there are no more item
public void CompleteAdding()
{
queue.CompleteAdding();
}
//create workers and put then running
public void startWorkers(Action<T> worker)
{
for (int i = 0; i < maxWorkers; i++)
{
taskList.Add(new Task(() =>
{
string myname = "worker " + Guid.NewGuid().ToString();
try
{
while (!cancellationToken.IsCancellationRequested)
{
var value = queue.Take();
waitingUnits--;
worker(value);
}
}
catch (Exception ex) when (ex is InvalidOperationException) //throw when collection is closed with CompleteAdding method. No pretty way to do this.
{
//do nothing
}
}));
}
foreach (var task in taskList)
{
task.Start();
}
}
//wait for all workers to be finish their jobs
public void await()
{
while (waitingUnits >0 || !queue.IsAddingCompleted)
Thread.Sleep(100);
shutdown();
}
private void shutdown()
{
wasShutDown = true;
Task.WaitAll(taskList.ToArray());
}
//case something bad happen dismiss all pending work
public void Dispose()
{
if (!wasShutDown)
{
queue.CompleteAdding();
shutdown();
}
}
}
Then use like this:
WorkerPool<int> workerPool = new WorkerPool<int>(new CancellationTokenSource(), 5);
workerPool.startWorkers(value =>
{
log.Debug(value);
});
//enqueue all the work
for (int i = 0; i < 100; i++)
{
workerPool.enqueue(i);
}
//Signal no more work
workerPool.CompleteAdding();
//wait all pending work to finish
workerPool.await();
You can have as many polls has you like simply creating new WorkPool objects.
This free nuget library here: CodeFluentRuntimeClient has a CustomThreadPool class that you can reuse. It's very configurable, you can change pool threads priority, number, COM apartment state, even name (for debugging), and also culture.
Another approach is to use a Dataflow Pipeline. I added these later answer because i find Dataflows a much better approach for these kind of problem, the problem of having several thread pools. They provide a more flexible and structured approach and can easily scale vertically.
You can broke your code into one or more blocks, link then with Dataflows and let then the Dataflow engine allocate threads according to CPU and memory availability
I suggest to broke into 3 blocks, one for preparing the query to the site page , one access site page, and the last one to Analise the data.
This way the slow block (get) may have more threads allocated to compensate.
Here how would look like the Dataflow setup:
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
prepareBlock.LinkTo(get, linkOptions);
getBlock.LinkTo(analiseBlock, linkOptions);
Data will flow from prepareBlock to getBlock and then to analiseBlock.
The interfaces between blocks can be any class, just have to bee the same. See the full example on Dataflow Pipeline
Using the Dataflow would be something like this:
while ...{
...
prepareBlock.Post(...); //to send data to the pipeline
}
prepareBlock.Complete(); //when done
analiseBlock.Completion.Wait(cancellationTokenSource.Token); //to wait for all queues to empty or cancel