Limited concurrency TaskScheduler that can interleave tasks to be explicitly ordered

Limited concurrency TaskScheduler that can interleave tasks to be explicitly ordered - c#

I am looking for a TaskScheduler that:
Allows me to define a number of dedicated threads (e.g. 8) - a standard LimitedConcurrencyLevelTaskScheduler (uses threadpool threads) or WorkStealingTaskScheduler do this.
Allows me to create sub-TaskSchedulers that are fully ordered but schedules the tasks on the dedicated threads of the parent scheduler.
At the moment we use TaskScheduler.Default for the general pool (at the mercy of the threadpool growth algorithm etc) and new OrderedTaskScheduler() whenever we want to order tasks. I want to keep this behavior but limit both requirements to my own pool of dedicated threads.
QueuedTaskScheduler seems to get pretty close. I thought the QueuedTaskScheduler.ActivateNewQueue() method, which returns a child TaskScheduler would execute tasks IN ORDER on the pool of workers from the parent but that doesn't seem to be the case. The child TaskSchedulers seem to have the same level of parallelization as the parent.
I don't necessarily want the child taskscheduler tasks to be prioritised over the parent taskscheduler tasks (although it might be a nice feature in the future).
I have seen a related question here: Limited concurrency level task scheduler (with task priority) handling wrapped tasks but my requirements do not need to handle async tasks (all my enqueued tasks are completely synchronous from start to end, with no continuations).

I assume by "fully ordered" you also mean "one at a time".
In that case, I believe there's a built-in solution that should do quite well: ConcurrentExclusiveSchedulerPair.
Your "parent" scheduler would be a concurrent scheduler:
TaskScheduler _parent = new ConcurrentExclusiveSchedulerPair(TaskScheduler.Default, 8)
.ConcurrentScheduler;
And the "child" schedulers would be an exclusive scheduler that uses the concurrent scheduler underneath:
var myScheduler = new ConcurrentExclusiveSchedulerPair(_parent).ExclusiveScheduler;

After carefully considering the other answers, I decided for my uses it was easier to create a custom QueuedTaskScheduler given I don't need to worry about async tasks or IO completion (although that has given me something to think about).
Firstly when we grab work from the child work pools, we add a semaphore based lock, inside FindNextTask_NeedsLock:
var items = queueForTargetTask._workItems;
if (items.Count > 0
&& queueForTargetTask.TryLock() /* This is added */)
{
targetTask = items.Dequeue();
For the dedicated thread version, inside ThreadBasedDispatchLoop:
// ... and if we found one, run it
if (targetTask != null)
{
queueForTargetTask.ExecuteTask(targetTask);
queueForTargetTask.Release();
}
For the task scheduler version, inside ProcessPrioritizedAndBatchedTasks:
// Now if we finally have a task, run it. If the task
// was associated with one of the round-robin schedulers, we need to use it
// as a thunk to execute its task.
if (targetTask != null)
{
if (queueForTargetTask != null)
{
queueForTargetTask.ExecuteTask(targetTask);
queueForTargetTask.Release();
}
else
{
TryExecuteTask(targetTask);
}
}
Where we create the new child queues:
/// <summary>Creates and activates a new scheduling queue for this scheduler.</summary>
/// <returns>The newly created and activated queue at priority 0 and max concurrency of 1.</returns>
public TaskScheduler ActivateNewQueue() { return ActivateNewQueue(0, 1); }
/// <summary>Creates and activates a new scheduling queue for this scheduler.</summary>
/// <param name="priority">The priority level for the new queue.</param>
/// <returns>The newly created and activated queue at the specified priority.</returns>
public TaskScheduler ActivateNewQueue(int priority, int maxConcurrency)
{
// Create the queue
var createdQueue = new QueuedTaskSchedulerQueue(priority, maxConcurrency, this);
...
}
Finally, inside the nested QueuedTaskSchedulerQueue:
// This is added.
private readonly int _maxConcurrency;
private readonly Semaphore _semaphore;
internal bool TryLock()
{
return _semaphore.WaitOne(0);
}
internal void Release()
{
_semaphore.Release();
_pool.NotifyNewWorkItem();
}
/// <summary>Initializes the queue.</summary>
/// <param name="priority">The priority associated with this queue.</param>
/// <param name="maxConcurrency">Max concurrency for this scheduler.</param>
/// <param name="pool">The scheduler with which this queue is associated.</param>
internal QueuedTaskSchedulerQueue(int priority, int maxConcurrency, QueuedTaskScheduler pool)
{
_priority = priority;
_pool = pool;
_workItems = new Queue<Task>();
// This is added.
_maxConcurrency = maxConcurrency;
_semaphore = new Semaphore(_maxConcurrency, _maxConcurrency);
}
I hope this might be useful for someone trying to do the same as me and interleave unordered tasks with ordered tasks on a single, easy to use scheduler (that can use the default threadpool, or any other scheduler).
=== UPDATE ===
Inspired by Stephen Cleary, I ended up using:
private static readonly Lazy<TaskScheduler> Scheduler = new Lazy<TaskScheduler>(
() => new WorkStealingTaskScheduler(16));
public static TaskScheduler Default
{
get
{
return Scheduler.Value;
}
}
public static TaskScheduler CreateNewOrderedTaskScheduler()
{
return new QueuedTaskScheduler(Default, 1);
}

I understand your tasks have dependencies which is why you want to (partially) order them. You could do this with ContinueWith chains. You just need to keep track of the latest task in any given chain. When a new one comes in you set up the next continuation off of that task and store the new task. You drop the old one.
Alternative solution: Have one SemaphoreSlim per chain and use await sem.WaitAsync() to manually control the DOP very flexibly. Note, that async-waiting on a semaphore does not block any thread. It causes just a little memory usage. No OS resource at all is being used. You can have extremely many semaphores in use.
I don't think schedulers are the right abstraction. Schedulers are for CPU-based work. The other coordination tools can work with any Task including async IO. Consider preferring ordinary task combinators and coordination primitives.

Related

How does Task.CompletedTask work with thread pool?

I'm new to C# asynchronous programming, just a question on the relationship between task and thread pool.
My understanding is:
When we create a Task, this Task is queued in the thread pool and the thread pool will schedule a worker thread to run this Task
And I saw the code below:
public Task InputOutputC() {
return Task.CompletedTask;
}
I don't quite get it, it seems that return a Task has already completed, which means a worker thread has already run this Task, but the meaning of Task is to let a worker thread in the thread pool to run it, if it has already finished, what's the point to return it to thread pool again and get executed again?

the meaning of Task is to let a worker thread in the thread pool to run it,
Running code on the thread pool is one way in which Tasks manifest themselves.
Other ways to create Tasks are to write async methods and to use Task.CompletedTask1 or Task.FromResult<TResult>2.
Just because Task.Run causes code to run on the thread pool does not mean that these other uses of Task must also necessarily involve the thread pool.
For Task.CompletedTask, especially, this is "I've already done the work required, but I want to present it to other code as a Task. No additional code runs anywhere.
We can see in the reference source that this property just returns the task:
/// <summary>A task that's already been completed successfully.</summary>
private static Task s_completedTask;
/// <summary>Gets a task that's already been completed successfully.</summary>
/// <remarks>May not always return the same instance.</remarks>
public static Task CompletedTask
{
get
{
var completedTask = s_completedTask;
if (completedTask == null)
s_completedTask = completedTask = new Task(false, (TaskCreationOptions)InternalTaskOptions.DoNotDispose, default(CancellationToken)); // benign initialization ----
return completedTask;
}
}
1As shown in the reference source later though, we often aren't even creating a new Task here, just reusing an existing one. But the team have obviously decided to forgo thread-safe initialisation of the property in favour of documenting that they won't guarantee to always return the same Task.
2These latter two are quite similar, in that they represent "I've already done the work required, now for some reason I need to pass some other code a Task"3.
3Often, and I'm guessing as is the case here, when you're implementing an interface or overriding a base class method that is Task returning but your code is fast and synchronous so you have no need to be async.

TaskContinuationOptions for completing on the UI Thread

Previously, found some code on stackOverflow that was really useful :
https://stackoverflow.com/a/15120092/858282
But it's forced me to use many 'Invoke's and 'new MethodInvoker's whenever I need to update the User Interface with the result of the Background tasks. Basically, I'm creating a Winforms app that needs data from a database, so data loading happens in the background.
What I'm finding easiest at present is to Queue Tasks that use the retrieved data, as they run after the data retrieval is complete [i.e. queueTask(getData); queueTask(useData)], and sometimes thats ok, but looking at the code I see TaskContinuationOptions, and I was wondering if any of those options allow the 'Next Queue Item' to return to running on the UI thread, or if I could set a callback on a task's completion? So I don't have to use as many Invokes to prevent cross threading errors.
tl;dr; Task.ContinueWith that automagically returns to UI thread or allows a callback to a method running on the UI thread.
https://msdn.microsoft.com/en-us/library/system.threading.tasks.taskcontinuationoptions(v=vs.110).aspx

Answering my own question, after comments from #PeterBons.
Additonal code to add to answer at https://stackoverflow.com/a/15120092/858282
/// <summary>
/// as per http://reedcopsey.com/2009/11/17/synchronizing-net-4-tasks-with-the-ui-thread/
/// from UI, store and pass 'TaskScheduler.FromCurrentSynchronizationContext()' into this method to avoid the
/// need for 'Invoke' to avoid cross threading UI exceptions
/// </summary>
public Task<T> QueueTask<T>(Func<T> work, TaskScheduler tScheduler, CancellationToken cancelToken = default(CancellationToken))
{
lock (key)
{
var task = previousTask.ContinueWith(t => work()
, cancelToken == default(CancellationToken) ? CancellationToken.None : cancelToken
, TaskContinuationOptions.None
, tScheduler);
previousTask = task;
return task;
}
}

How can I make sure a dataflow block only creates threads on a on-demand basis?

I've written a small pipeline using the TPL Dataflow API which receives data from multiple threads and performs handling on them.
Setup 1
When I configure it to use MaxDegreeOfParallelism = Environment.ProcessorCount (comes to 8 in my case) for each block, I notice it fills up buffers in multiple threads and processing the second block doesn't start until +- 1700 elements have been received across all threads. You can see this in action here.
Setup 2
When I set MaxDegreeOfParallelism = 1 then I notice all elements are received on a single thread and processing the sending already starts after +- 40 elements are received. Data here.
Setup 3
When I set MaxDegreeOfParallelism = 1 and I introduce a delay of 1000ms before sending each input, I notice elements get sent as soon as they are received and every received element is put on a separate thread. Data here.
So far the setup. My questions are the following:
When I compare setups 1 & 2 I notice that processing elements starts much faster when done in serial compared to parallel (even after accounting for the fact that parallel has 8x as many threads). What causes this difference?
Since this will be run in an ASP.NET environment, I don't want to spawn unnecessary threads since they all come from a single threadpool. As shown in setup 3 it will still spread itself over multiple threads even when there is only a handful of data. This is also surprising because from setup 1 I would assume that data is spread sequentially over threads (notice how the first 50 elements all go to thread 16). Can I make sure it only creates new threads on a on-demand basis?
There is another concept called the BufferBlock<T>. If the TransformBlock<T> already queues input, what would be the practical difference of swapping the first step in my pipeline (ReceiveElement) for a BufferBlock?
class Program
{
static void Main(string[] args)
{
var dataflowProcessor = new DataflowProcessor<string>();
var amountOfTasks = 5;
var tasks = new Task[amountOfTasks];
for (var i = 0; i < amountOfTasks; i++)
{
tasks[i] = SpawnThread(dataflowProcessor, $"Task {i + 1}");
}
foreach (var task in tasks)
{
task.Start();
}
Task.WaitAll(tasks);
Console.WriteLine("Finished feeding threads"); // Needs to use async main
Console.Read();
}
private static Task SpawnThread(DataflowProcessor<string> dataflowProcessor, string taskName)
{
return new Task(async () =>
{
await FeedData(dataflowProcessor, taskName);
});
}
private static async Task FeedData(DataflowProcessor<string> dataflowProcessor, string threadName)
{
foreach (var i in Enumerable.Range(0, short.MaxValue))
{
await Task.Delay(1000); // Only used for the delayedSerialProcessing test
dataflowProcessor.Process($"Thread name: {threadName}\t Thread ID:{Thread.CurrentThread.ManagedThreadId}\t Value:{i}");
}
}
}
public class DataflowProcessor<T>
{
private static readonly ExecutionDataflowBlockOptions ExecutionOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
private static readonly TransformBlock<T, T> ReceiveElement = new TransformBlock<T, T>(element =>
{
Console.WriteLine($"Processing received element in thread {Thread.CurrentThread.ManagedThreadId}");
return element;
}, ExecutionOptions);
private static readonly ActionBlock<T> SendElement = new ActionBlock<T>(element =>
{
Console.WriteLine($"Processing sent element in thread {Thread.CurrentThread.ManagedThreadId}");
Console.WriteLine(element);
}, ExecutionOptions);
static DataflowProcessor()
{
ReceiveElement.LinkTo(SendElement);
ReceiveElement.Completion.ContinueWith(x =>
{
if (x.IsFaulted)
{
((IDataflowBlock) ReceiveElement).Fault(x.Exception);
}
else
{
ReceiveElement.Complete();
}
});
}
public void Process(T newElement)
{
ReceiveElement.Post(newElement);
}
}

Before you deploy your solution to the ASP.NET environment, I suggest you to change your architecture: IIS can suspend threads in ASP.NET for it's own use after the request handled so your task could be unfinished. Better approach is to create a separate windows service daemon, which handles your dataflow.
Now back to the TPL Dataflow.
I love the TPL Dataflow library but it's documentation is a real mess.
The only useful document I've found is Introduction to TPL Dataflow.
There are some clues in it which can be helpful, especially the ones about Configuration Settings (I suggest you to investigate the implementing your own TaskScheduler with using your own TheadPool implementation, and MaxMessagesPerTask option) if you need:
The built-in dataflow blocks are configurable, with a wealth of control provided over how and where blocks perform their work. Here are some key knobs available to the developer, all of which are exposed through the DataflowBlockOptions class and its derived types (ExecutionDataflowBlockOptions and GroupingDataflowBlockOptions), instances of which may be provided to blocks at construction time.
TaskScheduler customization, as #i3arnon mentioned:
By default, dataflow blocks schedule work to TaskScheduler.Default, which targets the internal workings of the .NET ThreadPool.
MaxDegreeOfParallelism
It defaults to 1, meaning only one thing may happen in a block at a time. If set to a value higher than 1, that number of messages may be processed concurrently by the block. If set to DataflowBlockOptions.Unbounded (-1), any number of messages may be processed concurrently, with the maximum automatically managed by the underlying scheduler targeted by the dataflow block. Note that MaxDegreeOfParallelism is a maximum, not a requirement.
MaxMessagesPerTask
TPL Dataflow is focused on both efficiency and control. Where there are necessary trade-offs between the two, the system strives to provide a quality default but also enable the developer to customize behavior according to a particular situation. One such example is the trade-off between performance and fairness. By default, dataflow blocks try to minimize the number of task objects that are necessary to process all of their data. This provides for very efficient execution; as long as a block has data available to be processed, that block’s tasks will remain to process the available data, only retiring when no more data is available (until data is available again, at which point more tasks will be spun up). However, this can lead to problems of fairness. If the system is currently saturated processing data from a given set of blocks, and then data arrives at other blocks, those latter blocks will either need to wait for the first blocks to finish processing before they’re able to begin, or alternatively risk oversubscribing the system. This may or may not be the correct behavior for a given situation. To address this, the MaxMessagesPerTask option exists.
It defaults to DataflowBlockOptions.Unbounded (-1), meaning that there is no maximum. However, if set to a positive number, that number will represent the maximum number of messages a given block may use a single task to process. Once that limit is reached, the block must retire the task and replace it with a replica to continue processing. These replicas are treated fairly with regards to all other tasks scheduled to the scheduler, allowing blocks to achieve a modicum of fairness between them. In the extreme, if MaxMessagesPerTask is set to 1, a single task will be used per message, achieving ultimate fairness at the potential expense of more tasks than may otherwise have been necessary.
MaxNumberOfGroups
The grouping blocks are capable of tracking how many groups they’ve produced, and automatically complete themselves (declining further offered messages) after that number of groups has been generated. By default, the number of groups is DataflowBlockOptions.Unbounded (-1), but it may be explicitly set to a value greater than one.
CancellationToken
This token is monitored during the dataflow block’s lifetime. If a cancellation request arrives prior to the block’s completion, the block will cease operation as politely and quickly as possible.
Greedy
By default, target blocks are greedy and want all data offered to them.
BoundedCapacity
This is the limit on the number of items the block may be storing and have in flight at any one time.

Two threads one core

I'm playing around with a simple console app that creates one thread and I do some inter thread communication between the main and the worker thread.
I'm posting objects from the main thread to a concurrent queue and the worker thread is dequeueing that and does some processing.
What strikes me as odd, is that when I profile this app, even despite I have two cores.
One core is 100% free and the other core have done all the work, and I see that both threads have been running in that core.
Why is this?
Is it because I use a wait handle that sets when I post a message and releases when the processing is done?
This is my sample code, now using 2 worker threads.
It still behaves the same, main, worker1 and worker2 is running in the same core.
Ideas?
[EDIT]
It sort of works now, atleast, I get twice the performance compared to yesterday.
the trick was to slow down the consumer just enough to avoid signaling using the AutoResetEvent.
public class SingleThreadDispatcher
{
public long Count;
private readonly ConcurrentQueue<Action> _queue = new ConcurrentQueue<Action>();
private volatile bool _hasMoreTasks;
private volatile bool _running = true;
private int _status;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
public SingleThreadDispatcher()
{
var thread = new Thread(Run)
{
IsBackground = true,
Name = "worker" + Guid.NewGuid(),
};
thread.Start();
}
private void Run()
{
while (_running)
{
_signal.WaitOne();
do
{
_hasMoreTasks = false;
Action task;
while (_queue.TryDequeue(out task) && _running)
{
Count ++;
task();
}
//wait a short while to let _hasMoreTasks to maybe be set to true
//this avoids the roundtrip to the AutoResetEvent
//that is, if there is intense pressure on the pool, we let some new
//tasks have the chance to arrive and be processed w/o signaling
if(!_hasMoreTasks)
Thread.Sleep(5);
Interlocked.Exchange(ref _status, 0);
} while (_hasMoreTasks);
}
}
public void Schedule(Action task)
{
_hasMoreTasks = true;
_queue.Enqueue(task);
SetSignal();
}
private void SetSignal()
{
if (Interlocked.Exchange(ref _status, 1) == 0)
{
_signal.Set();
}
}
}

Is it because I use a wait handle that sets when I post a message and releases when the processing is done?
Without seeing your code it is hard to say for sure, but from your description it appears that the two threads that you wrote act as co-routines: when the main thread is running, the worker thread has nothing to do, and vice versa. It looks like .NET scheduler is smart enough to not load the second core when this happens.
You can change this behavior in several ways - for example
by doing some work on the main thread before waiting on the handle, or
by adding more worker threads that would compete for the tasks that your main thread posts, and could both get a task to work on.

OK, I've figured out what the problem is.
The producer and consumer is pretty much just as fast in this case.
This results in the consumer finishing all its work fast and then looping back to wait for the AutoResetEvent.
The next time the producer sends a task, it has to touch the AutoresetEvent and set it.
The solution was to add a very very small delay in the consumer, making it slightly slower than the producer.
This results in when the producer sends a task, it notices that the consumer is already active and it just has to post to the worker queue w/o touching the AutoResetEvent.
The original behavior resulted in a sort of ping-pong effect, that can be seen on the screenshot.

Dasblinkelight (probably) has the right answer.
Apart from that, it would also be the correct behaviour when one of your threads is I/O bound (that is, it's not stuck on the CPU) - in that case, you've got nothing to gain from using multiple cores, and .NET is smart enough to just change contexts on one core.
This is often the case for UI threads - it has very little work to do, so there usually isn't much of a reason for it to occupy a whole core for itself. And yes, if your concurrent queue is not used properly, it could simply mean that the main thread waits for the worker thread - again, in that case, there's no need to switch cores, since the original thread is waiting anyway.

You should use BlockingCollection rather than ConcurrentQueue. By default, BlockingCollection uses a ConcurrentQueue under the hood, but it has a much easier to use interface. In particular, it does non-busy waits. In addition, BlockingCollection supports cancellation, so your consumer becomes very simple. Here's an example:
public class SingleThreadDispatcher
{
public long Count;
private readonly BlockingCollection<Action> _queue = new BlockingCollection<Action>();
private readonly CancellationTokenSource _cancellation = new CancellationTokenSource();
public SingleThreadDispatcher()
{
var thread = new Thread(Run)
{
IsBackground = true,
Name = "worker" + Guid.NewGuid(),
};
thread.Start();
}
private void Run()
{
foreach (var task in _queue.GetConsumingEnumerable(_cancellation.Token))
{
Count++;
task();
}
}
public void Schedule(Action task)
{
_queue.Add(task);
}
}
The loop with GetConsumingEnumerable will do a non-busy wait on the queue. There's no need to do it with a separate event. It will wait for an item to be added to the queue, or it will exit if you set the cancellation token.
To stop it normally, you just call _queue.CompleteAdding(). That tells the consumer that no more items will be added to the queue. The consumer will empty the queue and then exit.
If you want to quit early, then just call _cancellation.Cancel(). That will cause GetConsumingEnumerable to exit.
In general, you shouldn't ever have to use ConcurrentQueue directly. BlockingCollection is easier to use and provides equivalent performance.

Queue management and new thread

using C# on the .Net 4.0 framework, I have a Windows Forms main thread (the only one, until now) that waits for filesystem events and then must start some predefined processing on the files provided by those events.
I am planning to do the following:
A1. To immediately create a separate thread when the main process start;
A2. Have the main thread to put in a Queue (FIFO) the file names to be
processed;
A3. Have the new thread triggered by a timer every n
seconds;
A4. Have the new thread read the queue, if there is an item
to perform the processing, then have it cancel the queue item just
processed.
Because I never have programmed threads before (I am basically using the Albahari as my compass) but I definitely want to, I have a few questions just to spot possible heavy headache in advance:
Q1. May I incur into concurrence problems on the Queue if the main process writes only and the new ones cancel only queue items? In other words: Is synchronization a significant issue in this case?
Q2. I have seen that I could create a new thread from scratch or can reuse one of the threads made available from an existing pool. It is safer / simpler to use threads from the pool in this context?
Q3. Are there any drawbacks to keeping alive the new thread indefinitely and responding only to timer until the main process is closed?

If you are targeting .Net Framework 4, the Blocking Collection sounds like it will solve your issues; i.e. creating a new thread pooled thread when "work" items become available on the queue (added to the queue on the event handler when new files are added) and process them asynchronously on that thread.
You can use one in a producer/consumer queue:
E.g.:
/// <summary>
/// Producer/consumer queue. Used when a task needs executing, it’s enqueued to ensure order,
/// allowing the caller to get on with other things. The number of consumers can be defined,
/// each running on a thread pool task thread.
/// Adapted from: http://www.albahari.com/threading/part5.aspx#_BlockingCollectionT
/// </summary>
public class ProducerConsumerQueue : IDisposable
{
private BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();
public ProducerConsumerQueue(int workerCount)
{
// Create and start a separate Task for each consumer:
for (int i = 0; i < workerCount; i++)
{
Task.Factory.StartNew(Consume);
}
}
public void Dispose()
{
_taskQ.CompleteAdding();
}
public void EnqueueTask(Action action)
{
_taskQ.Add(action);
}
private void Consume()
{
// This sequence that we’re enumerating will block when no elements
// are available and will end when CompleteAdding is called.
// Note: This removes AND returns items from the collection.
foreach (Action action in _taskQ.GetConsumingEnumerable())
{
// Perform task.
action();
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.