I have a producer-consumer application in WPF. After I click a button.
private async void Start_Click(object sender, RoutedEventArgs e)
{
try
{
// set up data
var producer = Producer();
var consumer = Consumer();
await Task.WhenAll(producer, consumer);
// need log the results in Summary method
Summary();
}
}
The summary method is a void one; I assume it is proper.
private void Summary(){}
async Task Producer(){ await something }
async Task Consumer(){ await something }
EDIT:
My question is in Summary() method I have to use the calculated values from the tasks, however the Consumer task is a long running process. The program run Summary quickly even not getting the updated values. It use the initial values.
My thought:
await Task.WhenAll(producer, consumer);
Summary();
EDIT2: 11:08 AM 11/05/2014
private void Summary()
{
myFail = 100 - mySuccess;
_dataContext.MyFail = myFail; // update window upon property changed
async Task Consumer()
{
try
{
Dictionary<string, string> dict = new Dictionary<string, string>();
var executionDataflowBlockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 5,
CancellationToken = cToken
};
var c = new ActionBlock<T>(
t=>
{
if (cToken.IsCancellationRequested)
return;
dict = Do(t, cToken);
if(dict["Success"] == "Success")
mySuccess++;
The current problem is mySuccess is always the initial value in Summary method.
You can use ContinueWith method to execute Summary after both producer and consumer have finished:
Task.WhenAll(producer, consumer)
.ContinueWith(continuation => Summary());
EDIT 1
It seems that you are abusing or using wrong the Producer/Consumer pattern.
The producer is supposed to produce the values and shovel them into one end of a communication pipe. On the other end of the pipe, the consumer consumes the values as they become available. In other words, the consumer waits for the producer to produce some value and to put the value in the pipe and for the value to arrive at the end of the pipe.
Usually this involves some sort of signaling mechanism where the producer signals (awakes) the consumer whenever a value has been created.
In your case, you don't have the signaling mechanism and I strongly suspect that your producer is generating only one value. If the later is the case you can just return a value from the "producer".
If however, your producer is creating more than one values, you can use the BlockingCollection<T> class to send values from producer to consumer.
In your Producer class, get a reference to the pipe and put data into it:
public class Producer
{
private BlockingCollection<Data> _pipe;
public void Start()
{
while(!done)
{
var value = ProduceValue();
_pipe.Add(value);
}
// Signal the consumer that we're finished
_pipe.CompleteAdding();
}
}
In the Consumer class wait for the values to arrive and process each one:
public class Consumer
{
private BlockingCollection<Data> _pipe;
public void Start()
{
foreach(var value in _pipe.GetConsumingEnumerable())
{
// GetConsumingEnumerable will block until a value arrives and
// will exit when producer calls CompleteAdding()
Process(value);
}
}
}
Having the above in place you can use ContinueWith or await on the WhenAll method to run the Summary.
EDIT 2
As promised in the comments I have analyzed the code you've posted on MSDN Forum. There are several problems in the code.
First of all and the simplest one to fix is that you're not incrementing the counter in a thread-safe manner. An increment (value++) is not an atomic operation so you should be careful when incrementing shared fields. An easy way to do this is:
Interlocked.Increment(ref evenNumber);
Now, the actual problems in your code:
As I mentioned earlier, the consumer does not know when the producer has finished producing the values. So, after the producer exits the for block it should signal that it has finished. The consumer waits for the finish signal of the producer; otherwise it will wait forever for the next value but there won't be one.
You are linking the BufferBlock with the consumer code which starts to execute but you're not waiting for the consumer block to finish - you're only waiting 0.5 of a second and exit the consumer method leaving the worker threads of the consumer block to do their work in vain.
As a consequence of the above, your Report method executes before the processing is finished outputting the value of the evenNumber counter at the moment when the method executes not when all processing is finished.
Below is the edited code with some comments:
class Program
{
public static BufferBlock<int> m_Queue = new BufferBlock<int>(new DataflowBlockOptions { BoundedCapacity = 1000 });
private static int evenNumber;
static void Main(string[] args)
{
var producer = Producer();
var consumer = Consumer();
Task.WhenAll(producer, consumer).Wait();
Report();
}
static void Report()
{
Console.WriteLine("There are {0} even numbers", evenNumber);
Console.Read();
}
static async Task Producer()
{
for (int i = 0; i < 500; i++)
{
// Send a value to the consumer and wait for the value to be processed
await m_Queue.SendAsync(i);
}
// Signal the consumer that there will be no more values
m_Queue.Complete();
}
static async Task Consumer()
{
var executionDataflowBlockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 4
};
var consumerBlock = new ActionBlock<int>(x =>
{
int j = DoWork(x);
if (j % 2 == 0)
// Increment the counter in a thread-safe way
Interlocked.Increment(ref evenNumber);
}, executionDataflowBlockOptions);
// Link the buffer to the consumer
using (m_Queue.LinkTo(consumerBlock, new DataflowLinkOptions { PropagateCompletion = true }))
{
// Wait for the consumer to finish.
// This method will exit after all the data from the buffer was processed.
await consumerBlock.Completion;
}
}
static int DoWork(int x)
{
Thread.Sleep(100);
return x;
}
}
Related
I know that asynchronous programming has seen a lot of changes over the years. I'm somewhat embarrassed that I let myself get this rusty at just 34 years old, but I'm counting on StackOverflow to bring me up to speed.
What I am trying to do is manage a queue of "work" on a separate thread, but in such a way that only one item is processed at a time. I want to post work on this thread and it doesn't need to pass anything back to the caller. Of course I could simply spin up a new Thread object and have it loop over a shared Queue object, using sleeps, interrupts, wait handles, etc. But I know things have gotten better since then. We have BlockingCollection, Task, async/await, not to mention NuGet packages that probably abstract a lot of that.
I know that "What's the best..." questions are generally frowned upon so I'll rephrase it by saying "What is the currently recommended..." way to accomplish something like this using built-in .NET mechanisms preferably. But if a third party NuGet package simplifies things a bunch, it's just as well.
I considered a TaskScheduler instance with a fixed maximum concurrency of 1, but seems there is probably a much less clunky way to do that by now.
Background
Specifically, what I am trying to do in this case is queue an IP geolocation task during a web request. The same IP might wind up getting queued for geolocation multiple times, but the task will know how to detect that and skip out early if it's already been resolved. But the request handler is just going to throw these () => LocateAddress(context.Request.UserHostAddress) calls into a queue and let the LocateAddress method handle duplicate work detection. The geolocation API I am using doesn't like to be bombarded with requests which is why I want to limit it to a single concurrent task at a time. However, it would be nice if the approach was allowed to easily scale to more concurrent tasks with a simple parameter change.
To create an asynchronous single degree of parallelism queue of work you can simply create a SemaphoreSlim, initialized to one, and then have the enqueing method await on the acquisition of that semaphore before starting the requested work.
public class TaskQueue
{
private SemaphoreSlim semaphore;
public TaskQueue()
{
semaphore = new SemaphoreSlim(1);
}
public async Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
{
await semaphore.WaitAsync();
try
{
return await taskGenerator();
}
finally
{
semaphore.Release();
}
}
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
}
Of course, to have a fixed degree of parallelism other than one simply initialize the semaphore to some other number.
Your best option as I see it is using TPL Dataflow's ActionBlock:
var actionBlock = new ActionBlock<string>(address =>
{
if (!IsDuplicate(address))
{
LocateAddress(address);
}
});
actionBlock.Post(context.Request.UserHostAddress);
TPL Dataflow is robust, thread-safe, async-ready and very configurable actor-based framework (available as a nuget)
Here's a simple example for a more complicated case. Let's assume you want to:
Enable concurrency (limited to the available cores).
Limit the queue size (so you won't run out of memory).
Have both LocateAddress and the queue insertion be async.
Cancel everything after an hour.
var actionBlock = new ActionBlock<string>(async address =>
{
if (!IsDuplicate(address))
{
await LocateAddressAsync(address);
}
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 10000,
MaxDegreeOfParallelism = Environment.ProcessorCount,
CancellationToken = new CancellationTokenSource(TimeSpan.FromHours(1)).Token
});
await actionBlock.SendAsync(context.Request.UserHostAddress);
Actually you don't need to run tasks in one thread, you need them to run serially (one after another), and FIFO. TPL doesn't have class for that, but here is my very lightweight, non-blocking implementation with tests. https://github.com/Gentlee/SerialQueue
Also have #Servy implementation there, tests show it is twice slower than mine and it doesn't guarantee FIFO.
Example:
private readonly SerialQueue queue = new SerialQueue();
async Task SomeAsyncMethod()
{
var result = await queue.Enqueue(DoSomething);
}
Use BlockingCollection<Action> to create a producer/consumer pattern with one consumer (only one thing running at a time like you want) and one or many producers.
First define a shared queue somewhere:
BlockingCollection<Action> queue = new BlockingCollection<Action>();
In your consumer Thread or Task you take from it:
//This will block until there's an item available
Action itemToRun = queue.Take()
Then from any number of producers on other threads, simply add to the queue:
queue.Add(() => LocateAddress(context.Request.UserHostAddress));
I'm posting a different solution here. To be honest I'm not sure whether this is a good solution.
I'm used to use BlockingCollection to implement a producer/consumer pattern, with a dedicated thread consuming those items. It's fine if there are always data coming in and consumer thread won't sit there and do nothing.
I encountered a scenario that one of the application would like to send emails on a different thread, but total number of emails is not that big.
My initial solution was to have a dedicated consumer thread (created by Task.Run()), but a lot of time it just sits there and does nothing.
Old solution:
private readonly BlockingCollection<EmailData> _Emails =
new BlockingCollection<EmailData>(new ConcurrentQueue<EmailData>());
// producer can add data here
public void Add(EmailData emailData)
{
_Emails.Add(emailData);
}
public void Run()
{
// create a consumer thread
Task.Run(() =>
{
foreach (var emailData in _Emails.GetConsumingEnumerable())
{
SendEmail(emailData);
}
});
}
// sending email implementation
private void SendEmail(EmailData emailData)
{
throw new NotImplementedException();
}
As you can see, if there are not enough emails to be sent (and it is my case), the consumer thread will spend most of them sitting there and do nothing at all.
I changed my implementation to:
// create an empty task
private Task _SendEmailTask = Task.Run(() => {});
// caller will dispatch the email to here
// continuewith will use a thread pool thread (different to
// _SendEmailTask thread) to send this email
private void Add(EmailData emailData)
{
_SendEmailTask = _SendEmailTask.ContinueWith((t) =>
{
SendEmail(emailData);
});
}
// actual implementation
private void SendEmail(EmailData emailData)
{
throw new NotImplementedException();
}
It's no longer a producer/consumer pattern, but it won't have a thread sitting there and does nothing, instead, every time it is to send an email, it will use thread pool thread to do it.
My lib, It can:
Run random in queue list
Multi queue
Run prioritize first
Re-queue
Event all queue completed
Cancel running or cancel wait for running
Dispatch event to UI thread
public interface IQueue
{
bool IsPrioritize { get; }
bool ReQueue { get; }
/// <summary>
/// Dont use async
/// </summary>
/// <returns></returns>
Task DoWork();
bool CheckEquals(IQueue queue);
void Cancel();
}
public delegate void QueueComplete<T>(T queue) where T : IQueue;
public delegate void RunComplete();
public class TaskQueue<T> where T : IQueue
{
readonly List<T> Queues = new List<T>();
readonly List<T> Runnings = new List<T>();
[Browsable(false), DefaultValue((string)null)]
public Dispatcher Dispatcher { get; set; }
public event RunComplete OnRunComplete;
public event QueueComplete<T> OnQueueComplete;
int _MaxRun = 1;
public int MaxRun
{
get { return _MaxRun; }
set
{
bool flag = value > _MaxRun;
_MaxRun = value;
if (flag && Queues.Count != 0) RunNewQueue();
}
}
public int RunningCount
{
get { return Runnings.Count; }
}
public int QueueCount
{
get { return Queues.Count; }
}
public bool RunRandom { get; set; } = false;
//need lock Queues first
void StartQueue(T queue)
{
if (null != queue)
{
Queues.Remove(queue);
lock (Runnings) Runnings.Add(queue);
queue.DoWork().ContinueWith(ContinueTaskResult, queue);
}
}
void RunNewQueue()
{
lock (Queues)//Prioritize
{
foreach (var q in Queues.Where(x => x.IsPrioritize)) StartQueue(q);
}
if (Runnings.Count >= MaxRun) return;//other
else if (Queues.Count == 0)
{
if (Runnings.Count == 0 && OnRunComplete != null)
{
if (Dispatcher != null && !Dispatcher.CheckAccess()) Dispatcher.Invoke(OnRunComplete);
else OnRunComplete.Invoke();//on completed
}
else return;
}
else
{
lock (Queues)
{
T queue;
if (RunRandom) queue = Queues.OrderBy(x => Guid.NewGuid()).FirstOrDefault();
else queue = Queues.FirstOrDefault();
StartQueue(queue);
}
if (Queues.Count > 0 && Runnings.Count < MaxRun) RunNewQueue();
}
}
void ContinueTaskResult(Task Result, object queue_obj) => QueueCompleted((T)queue_obj);
void QueueCompleted(T queue)
{
lock (Runnings) Runnings.Remove(queue);
if (queue.ReQueue) lock (Queues) Queues.Add(queue);
if (OnQueueComplete != null)
{
if (Dispatcher != null && !Dispatcher.CheckAccess()) Dispatcher.Invoke(OnQueueComplete, queue);
else OnQueueComplete.Invoke(queue);
}
RunNewQueue();
}
public void Add(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
lock (Queues) Queues.Add(queue);
RunNewQueue();
}
public void Cancel(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
lock (Queues) Queues.RemoveAll(o => o.CheckEquals(queue));
lock (Runnings) Runnings.ForEach(o => { if (o.CheckEquals(queue)) o.Cancel(); });
}
public void Reset(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
Cancel(queue);
Add(queue);
}
public void ShutDown()
{
MaxRun = 0;
lock (Queues) Queues.Clear();
lock (Runnings) Runnings.ForEach(o => o.Cancel());
}
}
I know this thread is old, but it seems all the present solutions are extremely onerous. The simplest way I could find uses the Linq Aggregate function to create a daisy-chained list of tasks.
var arr = new int[] { 1, 2, 3, 4, 5};
var queue = arr.Aggregate(Task.CompletedTask,
(prev, item) => prev.ContinueWith(antecedent => PerformWorkHere(item)));
The idea is to get your data into an IEnumerable (I'm using an int array), and then reduce that enumerable to a chain of tasks, starting with a default, completed, task.
I need to have a class that will execute actions in thread pool, but these actions should be queued. For example:
method 1
method 2
method 3
When someone called method 1 from his thread he can also call method 2 or method 3 and they all 3 methods can be performed concurrently, but when another call came from user for method 1 or 2 or 3, this time the thread pool should block these call, until the old ones execution is completed.
Something like the below picture:
Should I use channels?
To should i use channels?, the answer is yes, but there are other features available too.
Dataflow
.NET already offers this feature through the TPL Dataflow classes. You can use an ActionBlock class to pass messages (ie data) to a worker method that executes i the background with guaranteed order and a configurable degree of parallelism. Channels are a new feature which does essentially the same job.
What you describe is actually the simplest way of using an ActionBlock - just post data messages to it and have it process them one by one :
void Method1(MyDataObject1 data){...}
var block=new ActionBlock<MyDataObject1>(Method1);
//Start sending data to the block
for(var msg in someListOfItems)
{
block.PostAsync(msg);
}
By default, an ActionBlock has an infinite input queue. It will use only one task to process messages asynchronously, in the order they are posted.
When you're done with it, you can tell it to Complete() and await asynchronously for all remaining items to finish processing :
block.Complete();
await block.Completion;
To handle different methods, you can simply use multiple blocks, eg :
var block1=new ActionBlock<MyDataObject1>(Method1);
var block2=new ActionBlock<MyDataObject1>(Method2);
Channels
Channels are a lower-level feature than blocks. This means you have to write more code but you get far better control on how the "processing blocks" work. In fact, you can probably rewrite the TPL Dataflow library using channels.
You could create a processing block similar to an ActionBlock with the following (a bit naive) method:
ChannelWriter<TIn> Work(Action<TIn> action)
{
var channel=Channel.CreateUnbounded<TIn>();
var workerTask=Task.Run(async ()=>{
await foreach(var msg in channel.Reader.ReadAllAsync())
{
action(msg);
}
})
var writer=channel.Writer;
return writer;
}
This method creates a channel and runs a task in the background to read data asynchronously and process them. I'm cheating "a bit" here by using await foreach and ChannelReader.ReadAllAsync() which are available in C#8 and .NET Core 3.0.
This method can be used like a block :
ChannelWriter<DataObject1> writer1 = Work(Method1);
foreach(var msg in someListOfItems)
{
writer1.WriteAsync(msg);
}
writer1.Complete();
There's a lot more to Channels though. SignalR for example uses them to allow streaming of notifications to the clients.
Here is my suggestion. For each synchronous method, an asynchronous method should be added. For example the method FireTheGun is synchronous:
private static void FireTheGun(int bulletsCount)
{
var ratata = Enumerable.Repeat("Ta", bulletsCount).Prepend("Ra");
Console.WriteLine(String.Join("-", ratata));
}
The asynchronous counterpart FireTheGunAsync is very simple, because the complexity of queuing the synchronous action is delegated to a helper method QueueAsync.
public static Task FireTheGunAsync(int bulletsCount)
{
return QueueAsync(FireTheGun, bulletsCount);
}
Here is the implementation of QueueAsync. Each action has its dedicated SemaphoreSlim, to prevent multiple concurrent executions:
private static ConcurrentDictionary<MethodInfo, SemaphoreSlim> semaphores =
new ConcurrentDictionary<MethodInfo, SemaphoreSlim>();
public static Task QueueAsync<T1>(Action<T1> action, T1 param1)
{
return Task.Run(async () =>
{
var semaphore = semaphores
.GetOrAdd(action.Method, key => new SemaphoreSlim(1));
await semaphore.WaitAsync();
try
{
action(param1);
}
finally
{
semaphore.Release();
}
});
}
Usage example:
FireTheGunAsync(5);
FireTheGunAsync(8);
Output:
Ra-Ta-Ta-Ta-Ta-Ta
Ra-Ta-Ta-Ta-Ta-Ta-Ta-Ta-Ta
Implementing versions of QueueAsync with different number of parameters should be trivial.
Update: My previous implementation of QueueAsync has the probably undesirable behavior that executes the actions in random order. This happens because the second task may be the first one to acquire the semaphore. Below is an implementation that guaranties the corrent order of execution. The performance could be bad in case of high contention, because each task is entering a loop until it takes the semaphore in the right order.
private class QueueInfo
{
public SemaphoreSlim Semaphore = new SemaphoreSlim(1);
public int TicketToRide = 0;
public int Current = 0;
}
private static ConcurrentDictionary<MethodInfo, QueueInfo> queues =
new ConcurrentDictionary<MethodInfo, QueueInfo>();
public static Task QueueAsync<T1>(Action<T1> action, T1 param1)
{
var queue = queues.GetOrAdd(action.Method, key => new QueueInfo());
var ticket = Interlocked.Increment(ref queue.TicketToRide);
return Task.Run(async () =>
{
while (true) // Loop until our ticket becomes current
{
await queue.Semaphore.WaitAsync();
try
{
if (Interlocked.CompareExchange(ref queue.Current,
ticket, ticket - 1) == ticket - 1)
{
action(param1);
break;
}
}
finally
{
queue.Semaphore.Release();
}
}
});
}
What about this solution?
public class ConcurrentQueue
{
private Dictionary<byte, PoolFiber> Actionsfiber;
public ConcurrentQueue()
{
Actionsfiber = new Dictionary<byte, PoolFiber>()
{
{ 1, new PoolFiber() },
{ 2, new PoolFiber() },
{ 3, new PoolFiber() },
};
foreach (var fiber in Actionsfiber.Values)
{
fiber.Start();
}
}
public void ExecuteAction(Action Action , byte Code)
{
if (Actionsfiber.ContainsKey(Code))
Actionsfiber[Code].Enqueue(() => { Action.Invoke(); });
else
Console.WriteLine($"invalid byte code");
}
}
public static void SomeAction1()
{
Console.WriteLine($"{DateTime.Now} Action 1 is working");
for (long i = 0; i < 2400000000; i++)
{
}
Console.WriteLine($"{DateTime.Now} Action 1 stopped");
}
public static void SomeAction2()
{
Console.WriteLine($"{DateTime.Now} Action 2 is working");
for (long i = 0; i < 5000000000; i++)
{
}
Console.WriteLine($"{DateTime.Now} Action 2 stopped");
}
public static void SomeAction3()
{
Console.WriteLine($"{DateTime.Now} Action 3 is working");
for (long i = 0; i < 5000000000; i++)
{
}
Console.WriteLine($"{DateTime.Now} Action 3 stopped");
}
public static void Main(string[] args)
{
ConcurrentQueue concurrentQueue = new ConcurrentQueue();
concurrentQueue.ExecuteAction(SomeAction1, 1);
concurrentQueue.ExecuteAction(SomeAction2, 2);
concurrentQueue.ExecuteAction(SomeAction3, 3);
concurrentQueue.ExecuteAction(SomeAction1, 1);
concurrentQueue.ExecuteAction(SomeAction2, 2);
concurrentQueue.ExecuteAction(SomeAction3, 3);
Console.WriteLine($"press any key to exit the program");
Console.ReadKey();
}
the output :
8/5/2019 7:56:57 AM Action 1 is working
8/5/2019 7:56:57 AM Action 3 is working
8/5/2019 7:56:57 AM Action 2 is working
8/5/2019 7:57:08 AM Action 1 stopped
8/5/2019 7:57:08 AM Action 1 is working
8/5/2019 7:57:15 AM Action 2 stopped
8/5/2019 7:57:15 AM Action 2 is working
8/5/2019 7:57:16 AM Action 3 stopped
8/5/2019 7:57:16 AM Action 3 is working
8/5/2019 7:57:18 AM Action 1 stopped
8/5/2019 7:57:33 AM Action 2 stopped
8/5/2019 7:57:33 AM Action 3 stopped
the poolFiber is a class in the ExitGames.Concurrency.Fibers namespace.
more info :
How To Avoid Race Conditions And Other Multithreading Issues?
I am trying to simulate work between two collections asynchronously and in parallel, I have a ConcurrentQueue of customers and a collection of workers. I need the workers to take a Customer from the queue perform work on the customer and once done take another customer right away.
I decided I'd use an event-based paradigm where the collection of workers would perform an action on a customer; who holds an event handler that fires off when the customer is done; which would hopefully fire off the DoWork Method once again, that way I can parallelize the workers to take customers from the queue. But I can't figure out how to pass a customer into DoWork in OnCustomerFinished()! The worker shouldn't depend on a queue of customers obviously
public class Worker
{
public async Task DoWork(ConcurrentQueue<Customer> cust)
{
await Task.Run(() =>
{
if (cust.TryDequeue(out Customer temp))
{
Task.Delay(5000);
temp.IsDone = true;
}
});
}
public void OnCustomerFinished()
{
// This is where I'm stuck
DoWork(~HOW TO PASS THE QUEUE OF CUSTOMER HERE?~);
}
}
// Edit - This is the Customer Class
public class Customer
{
private bool _isDone = false;
public EventHandler<EventArgs> CustomerFinished;
public bool IsDone
{
private get { return _isDone; }
set
{
_isDone = value;
if (_isDone)
{
OnCustomerFinished();
}
}
}
protected virtual void OnCustomerFinished()
{
if (CustomerFinished != null)
{
CustomerFinished(this, EventArgs.Empty);
}
}
}
.NET already has pub/sub and worker mechanisms in the form of DataFlow blocks and lately, Channels.
Dataflow
Dataflow blocks from the System.Threading.Tasks.Dataflow namespace are the "old" way (2012 and later) of building workers and pipelines of workers. Each block has an input and/or output buffer. Each message posted to the block is processed by one or more tasks in the background. For blocks with outputs, the output of each iteration is stored in the output buffer.
Blocks can be combined into pipelines similar to a CMD or Powershell pipeline, with each block running on its own task(s).
In the simplest case an ActionBlock can be used as a worker:
void ProcessCustomer(Customer customer)
{
....
}
var block =new ActionBlock<Customer>(cust=>ProcessCustomer(cust));
That's it. There's no need to manually dequeue or poll.
The producer method can start sending customer instances to the block. Each of them will be processed in the background, in the order they were posted :
foreach(var customer in bigCustomerList)
{
block.Post(customer);
}
When done, eg when the application terminates, the producer only needs to call Complete() on the block and wait for any remaining entries to complete.
block.Complete();
await block.Completion;
Blocks can work with asynchronous methods too.
Channels
Channels are a new mechanism, built into .NET Core 3 and available as a NuGet in previous .NET Framework and .NET Core version. The producer writes to a channel using a ChannelWriter and the consumer reads from the channel using a ChannelReader. This may seem a bit strange until you realize it allows some powerful patterns.
The producer could be something like this, eg a producer that "produces" all customers in a list with a 0.5 sec delay :
ChannelReader<Customer> Producer(IEnumerable<Customer> customers,CancellationToken token=default)
{
//Create a channel that can buffer an infinite number of entries
var channel=Channel.CreateUnbounded();
var writer=channel.Writer;
//Start a background task to produce the data
_ = Task.Run(async ()=>{
foreach(var customer in customers)
{
//Exit gracefully in case of cancellation
if (token.IsCancellationRequested)
{
return;
}
await writer.WriteAsync(customer,token);
await Task.Delay(500);
}
},token)
//Ensure we complete the writer no matter what
.ContinueWith(t=>writer.Complete(t.Exception);
return channel.Reader;
}
That's a bit more involved but notice that the only thing the function needs to return is the ChannelReader. The cancellation token is useful for terminating the producer early, eg after a timeout or if the application closes.
When the writer completes, all the channel's readers will also complete.
The consumer only needs that ChannelReader to work :
async Task Consumer(ChannelReader<Customer> reader,CancellationToken token=default)
{
while(await reader.WaitToReadAsync(token))
{
while(reader.TryRead(out var customer))
{
//Process the customer
}
}
}
Should the writer complete, WaitToReadAsync will return false and the loop will exit.
In .NET Core 3 the ChannelReader supports IAsyncEnumerable through the ReadAllAsync method, making the code even simpler :
async Task Consumer(ChannelReader<Customer> reader,CancellationToken token=default)
{
await foreach(var customer in reader.ReadAllAsync(token))
{
//Process the customer
}
}
The reader created by the producer can be passed directly to the consumer :
var customers=new []{......}
var reader=Producer(customers);
await Consumer(reader);
Intermediate steps can read from a previous channel reader and publish data to the next, eg an order generator :
ChannelReader<Order> ConsumerOrders(ChannelReader<Customer> reader,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded();
var writer=channel.Writer;
//Start a background task to produce the data
_ = Task.Run(async ()=>{
await foreach(var customer in reader.ReadAllAsync(token))
{
//Somehow create an order for the customer
var order=new Order(...);
await writer.WriteAsync(order,token);
}
},token)
//Ensure we complete the writer no matter what
.ContinueWith(t=>writer.Complete(t.Exception);
return channel.Reader;
}
Again, all we need to do is pass the readers from one method to the next
var customers=new []{......}
var customerReader=Producer(customers);
var orderReader=CustomerOrders(customerReader);
await ConsumeOrders(orderReader);
I have a fairly simple producer-consumer pattern where (simplified) I have two producers who produce output that is to be consumed by one consumer.
For this I use System.Threading.Tasks.Dataflow.BufferBlock<T>
A BufferBlock object is created. One Consumer is listening to this BufferBlock, and processes any received input.
Two 'Producerssend data to theBufferBlock` simultaneously
Simplified:
BufferBlock<int> bufferBlock = new BufferBlock<int>();
async Task Consume()
{
while(await bufferBlock.OutputAvailable())
{
int dataToProcess = await outputAvailable.ReceiveAsync();
Process(dataToProcess);
}
}
async Task Produce1()
{
IEnumerable<int> numbersToProcess = ...;
foreach (int numberToProcess in numbersToProcess)
{
await bufferBlock.SendAsync(numberToProcess);
// ignore result for this example
}
}
async Task Produce2()
{
IEnumerable<int> numbersToProcess = ...;
foreach (int numberToProcess in numbersToProcess)
{
await bufferBlock.SendAsync(numberToProcess);
// ignore result for this example
}
}
I'd like to start the Consumer first and then start the Producers as separate tasks:
var taskConsumer = Consume(); // do not await yet
var taskProduce1 = Task.Run( () => Produce1());
var taskProduce2 = Task.Run( () => Produce2());
// await until both producers are finished:
await Task.WhenAll(new Task[] {taskProduce1, taskProduce2});
bufferBlock.Complete(); // signal that no more data is expected in bufferBlock
// await for the Consumer to finish:
await taskConsumer;
At first glance, this is exactly how the producer-consumer was meant: several producers produce data while a consumer is consuming the produced data.
Yet, BufferBlock about thread safety says:
Any instance members are not guaranteed to be thread safe.
And I thought that the P in TPL meant Parallel!
Should I worry? Is my code not thread safe?
Is there a different TPL Dataflow class that I should use?
Yes, the BufferBlock class is thread safe. I can't back this claim by pointing to an official document, because the "Thread Safety" section has been removed from the documentation. But I can see in the source that the class contains a lock object for synchronizing the incoming messages:
/// <summary>Gets the lock object used to synchronize incoming requests.</summary>
private object IncomingLock { get { return _source; } }
When the Post extension method is called (source code), the explicitly implemented ITargetBlock.OfferMessage method is invoked (source code). Below is an excerpt of this method:
DataflowMessageStatus ITargetBlock<T>.OfferMessage(DataflowMessageHeader messageHeader,
T messageValue, ISourceBlock<T> source, bool consumeToAccept)
{
//...
lock (IncomingLock)
{
//...
_source.AddMessage(messageValue);
//...
}
}
It would be strange indeed if this class, or any other XxxBlock class included in the TPL Dataflow library, was not thread-safe. It would severely hamper the ease of use of this great library.
I think an ActionBlock<T> would better suit what your doing since it has a built in buffer that many producers can send data in through. The default block options process the data on single background task but you can set a new value for parallelism and bounded capacity. With ActionBlock<T> the main area of concern to ensure thread safety will be in the delegate you pass that processes each message. The operation of that function has to be independent of each message, i.e. not modifying shared state just like any Parrallel... function.
public class ProducerConsumer
{
private ActionBlock<int> Consumer { get; }
public ProducerConsumer()
{
Consumer = new ActionBlock<int>(x => Process(x));
}
public async Task Start()
{
var producer1Tasks = Producer1();
var producer2Tasks = Producer2();
await Task.WhenAll(producer1Tasks.Concat(producer2Tasks));
Consumer.Complete();
await Consumer.Completion;
}
private void Process(int data)
{
// process
}
private IEnumerable<Task> Producer1() => Enumerable.Range(0, 100).Select(x => Consumer.SendAsync(x));
private IEnumerable<Task> Producer2() => Enumerable.Range(0, 100).Select(x => Consumer.SendAsync(x));
}
I have a method called WaitForAction, which takes an Action delegate and executes it in a new Task. The method blocks until the task completes or until a timeout expires. It uses ManualResetEvent to wait for timeout/completion.
The following code shows an attempt to test the method in a multi-threaded environment.
class Program
{
public static void Main()
{
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10; i++)
{
Foo foo = new Foo();
list.Add(foo);
foo.Bar();
}
SpinWait.SpinUntil(() => list.Count(f => f.finished || f.failed) == 10, 2000);
Debug.WriteLine(list.Count(f => f.finished));
}
}
public class Foo
{
public volatile bool finished = false;
public volatile bool failed = false;
public void Bar()
{
Task.Factory.StartNew(() =>
{
try
{
WaitForAction(1000, () => { });
finished = true;
}
catch
{
failed = true;
}
});
}
private void WaitForAction(int iMsToWait, Action action)
{
using (ManualResetEvent waitHandle = new ManualResetEvent(false))
{
Task.Factory.StartNew(() =>
{
action();
waitHandle.SafeSet();
});
if (waitHandle.SafeWaitOne(iMsToWait) == false)
{
throw new Exception("Timeout");
}
}
}
}
As the Action is doing nothing I would expect the 10 tasks started by calling Foo.Bar 10 times to complete well within the timeout. Sometimes this happens, but usually the program takes 2 seconds to execute and reports that only 2 instances of Foo 'finished' without error. In other words, 8 calls to WaitForAction have timed out.
I'm assuming that WaitForAction is thread safe, as each call on a Task-provided thread has its own stack. I have more or less proved this by logging the thread ID and wait handle ID for each call.
I realise that this code presented is a daft example, but I am interested in the principle. Is it possible for the task scheduler to be scheduling a task running the action delegate to the same threadpool thread that is already waiting for another action to complete? Or is there something else going on that I've missed?
Task.Factory utilizes the ThreadPool by default. With every call to WaitHandle.WaitOne, you block a worker thread. The .Net 4/4.5 thread pool starts with a small number of worker threads depending on your hardware platform (e.g., 4 on my machine) and it re-evaluates the pool size periodically (I believe it is every 1 second), creating new workers if necessary.
Since your program blocks all worker threads, and the thread pool doesn't grow fast enough, your waithandles timeout as you saw.
To confirm this, you can either 1) increase the timeouts or 2) increase the beginning thread pool size by adding the following line to the beginning of your program:
ThreadPool.SetMinThreads(32, 4);
then you should see the timeouts don't occur.
I believe your question was more academic than anything else, but you can read about a better implementation of a task timeout mechanism here, e.g.
var task = Task.Run(someAction);
if (task == await Task.WhenAny(task, Task.Delay(millisecondsTimeout)))
await task;
else
throw new TimeoutException();