I know that asynchronous programming has seen a lot of changes over the years. I'm somewhat embarrassed that I let myself get this rusty at just 34 years old, but I'm counting on StackOverflow to bring me up to speed.
What I am trying to do is manage a queue of "work" on a separate thread, but in such a way that only one item is processed at a time. I want to post work on this thread and it doesn't need to pass anything back to the caller. Of course I could simply spin up a new Thread object and have it loop over a shared Queue object, using sleeps, interrupts, wait handles, etc. But I know things have gotten better since then. We have BlockingCollection, Task, async/await, not to mention NuGet packages that probably abstract a lot of that.
I know that "What's the best..." questions are generally frowned upon so I'll rephrase it by saying "What is the currently recommended..." way to accomplish something like this using built-in .NET mechanisms preferably. But if a third party NuGet package simplifies things a bunch, it's just as well.
I considered a TaskScheduler instance with a fixed maximum concurrency of 1, but seems there is probably a much less clunky way to do that by now.
Background
Specifically, what I am trying to do in this case is queue an IP geolocation task during a web request. The same IP might wind up getting queued for geolocation multiple times, but the task will know how to detect that and skip out early if it's already been resolved. But the request handler is just going to throw these () => LocateAddress(context.Request.UserHostAddress) calls into a queue and let the LocateAddress method handle duplicate work detection. The geolocation API I am using doesn't like to be bombarded with requests which is why I want to limit it to a single concurrent task at a time. However, it would be nice if the approach was allowed to easily scale to more concurrent tasks with a simple parameter change.
To create an asynchronous single degree of parallelism queue of work you can simply create a SemaphoreSlim, initialized to one, and then have the enqueing method await on the acquisition of that semaphore before starting the requested work.
public class TaskQueue
{
private SemaphoreSlim semaphore;
public TaskQueue()
{
semaphore = new SemaphoreSlim(1);
}
public async Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
{
await semaphore.WaitAsync();
try
{
return await taskGenerator();
}
finally
{
semaphore.Release();
}
}
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
}
Of course, to have a fixed degree of parallelism other than one simply initialize the semaphore to some other number.
Your best option as I see it is using TPL Dataflow's ActionBlock:
var actionBlock = new ActionBlock<string>(address =>
{
if (!IsDuplicate(address))
{
LocateAddress(address);
}
});
actionBlock.Post(context.Request.UserHostAddress);
TPL Dataflow is robust, thread-safe, async-ready and very configurable actor-based framework (available as a nuget)
Here's a simple example for a more complicated case. Let's assume you want to:
Enable concurrency (limited to the available cores).
Limit the queue size (so you won't run out of memory).
Have both LocateAddress and the queue insertion be async.
Cancel everything after an hour.
var actionBlock = new ActionBlock<string>(async address =>
{
if (!IsDuplicate(address))
{
await LocateAddressAsync(address);
}
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 10000,
MaxDegreeOfParallelism = Environment.ProcessorCount,
CancellationToken = new CancellationTokenSource(TimeSpan.FromHours(1)).Token
});
await actionBlock.SendAsync(context.Request.UserHostAddress);
Actually you don't need to run tasks in one thread, you need them to run serially (one after another), and FIFO. TPL doesn't have class for that, but here is my very lightweight, non-blocking implementation with tests. https://github.com/Gentlee/SerialQueue
Also have #Servy implementation there, tests show it is twice slower than mine and it doesn't guarantee FIFO.
Example:
private readonly SerialQueue queue = new SerialQueue();
async Task SomeAsyncMethod()
{
var result = await queue.Enqueue(DoSomething);
}
Use BlockingCollection<Action> to create a producer/consumer pattern with one consumer (only one thing running at a time like you want) and one or many producers.
First define a shared queue somewhere:
BlockingCollection<Action> queue = new BlockingCollection<Action>();
In your consumer Thread or Task you take from it:
//This will block until there's an item available
Action itemToRun = queue.Take()
Then from any number of producers on other threads, simply add to the queue:
queue.Add(() => LocateAddress(context.Request.UserHostAddress));
I'm posting a different solution here. To be honest I'm not sure whether this is a good solution.
I'm used to use BlockingCollection to implement a producer/consumer pattern, with a dedicated thread consuming those items. It's fine if there are always data coming in and consumer thread won't sit there and do nothing.
I encountered a scenario that one of the application would like to send emails on a different thread, but total number of emails is not that big.
My initial solution was to have a dedicated consumer thread (created by Task.Run()), but a lot of time it just sits there and does nothing.
Old solution:
private readonly BlockingCollection<EmailData> _Emails =
new BlockingCollection<EmailData>(new ConcurrentQueue<EmailData>());
// producer can add data here
public void Add(EmailData emailData)
{
_Emails.Add(emailData);
}
public void Run()
{
// create a consumer thread
Task.Run(() =>
{
foreach (var emailData in _Emails.GetConsumingEnumerable())
{
SendEmail(emailData);
}
});
}
// sending email implementation
private void SendEmail(EmailData emailData)
{
throw new NotImplementedException();
}
As you can see, if there are not enough emails to be sent (and it is my case), the consumer thread will spend most of them sitting there and do nothing at all.
I changed my implementation to:
// create an empty task
private Task _SendEmailTask = Task.Run(() => {});
// caller will dispatch the email to here
// continuewith will use a thread pool thread (different to
// _SendEmailTask thread) to send this email
private void Add(EmailData emailData)
{
_SendEmailTask = _SendEmailTask.ContinueWith((t) =>
{
SendEmail(emailData);
});
}
// actual implementation
private void SendEmail(EmailData emailData)
{
throw new NotImplementedException();
}
It's no longer a producer/consumer pattern, but it won't have a thread sitting there and does nothing, instead, every time it is to send an email, it will use thread pool thread to do it.
My lib, It can:
Run random in queue list
Multi queue
Run prioritize first
Re-queue
Event all queue completed
Cancel running or cancel wait for running
Dispatch event to UI thread
public interface IQueue
{
bool IsPrioritize { get; }
bool ReQueue { get; }
/// <summary>
/// Dont use async
/// </summary>
/// <returns></returns>
Task DoWork();
bool CheckEquals(IQueue queue);
void Cancel();
}
public delegate void QueueComplete<T>(T queue) where T : IQueue;
public delegate void RunComplete();
public class TaskQueue<T> where T : IQueue
{
readonly List<T> Queues = new List<T>();
readonly List<T> Runnings = new List<T>();
[Browsable(false), DefaultValue((string)null)]
public Dispatcher Dispatcher { get; set; }
public event RunComplete OnRunComplete;
public event QueueComplete<T> OnQueueComplete;
int _MaxRun = 1;
public int MaxRun
{
get { return _MaxRun; }
set
{
bool flag = value > _MaxRun;
_MaxRun = value;
if (flag && Queues.Count != 0) RunNewQueue();
}
}
public int RunningCount
{
get { return Runnings.Count; }
}
public int QueueCount
{
get { return Queues.Count; }
}
public bool RunRandom { get; set; } = false;
//need lock Queues first
void StartQueue(T queue)
{
if (null != queue)
{
Queues.Remove(queue);
lock (Runnings) Runnings.Add(queue);
queue.DoWork().ContinueWith(ContinueTaskResult, queue);
}
}
void RunNewQueue()
{
lock (Queues)//Prioritize
{
foreach (var q in Queues.Where(x => x.IsPrioritize)) StartQueue(q);
}
if (Runnings.Count >= MaxRun) return;//other
else if (Queues.Count == 0)
{
if (Runnings.Count == 0 && OnRunComplete != null)
{
if (Dispatcher != null && !Dispatcher.CheckAccess()) Dispatcher.Invoke(OnRunComplete);
else OnRunComplete.Invoke();//on completed
}
else return;
}
else
{
lock (Queues)
{
T queue;
if (RunRandom) queue = Queues.OrderBy(x => Guid.NewGuid()).FirstOrDefault();
else queue = Queues.FirstOrDefault();
StartQueue(queue);
}
if (Queues.Count > 0 && Runnings.Count < MaxRun) RunNewQueue();
}
}
void ContinueTaskResult(Task Result, object queue_obj) => QueueCompleted((T)queue_obj);
void QueueCompleted(T queue)
{
lock (Runnings) Runnings.Remove(queue);
if (queue.ReQueue) lock (Queues) Queues.Add(queue);
if (OnQueueComplete != null)
{
if (Dispatcher != null && !Dispatcher.CheckAccess()) Dispatcher.Invoke(OnQueueComplete, queue);
else OnQueueComplete.Invoke(queue);
}
RunNewQueue();
}
public void Add(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
lock (Queues) Queues.Add(queue);
RunNewQueue();
}
public void Cancel(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
lock (Queues) Queues.RemoveAll(o => o.CheckEquals(queue));
lock (Runnings) Runnings.ForEach(o => { if (o.CheckEquals(queue)) o.Cancel(); });
}
public void Reset(T queue)
{
if (null == queue) throw new ArgumentNullException(nameof(queue));
Cancel(queue);
Add(queue);
}
public void ShutDown()
{
MaxRun = 0;
lock (Queues) Queues.Clear();
lock (Runnings) Runnings.ForEach(o => o.Cancel());
}
}
I know this thread is old, but it seems all the present solutions are extremely onerous. The simplest way I could find uses the Linq Aggregate function to create a daisy-chained list of tasks.
var arr = new int[] { 1, 2, 3, 4, 5};
var queue = arr.Aggregate(Task.CompletedTask,
(prev, item) => prev.ContinueWith(antecedent => PerformWorkHere(item)));
The idea is to get your data into an IEnumerable (I'm using an int array), and then reduce that enumerable to a chain of tasks, starting with a default, completed, task.
Related
I have a windows service which is consuming a messaging system to fetch messages. I have also created a callback mechanism with the help of Timer class which helps me to check the message after some fixed time to fetch and process. Previously, the service is processing the message one by one. But I want after the message arrives the processing mechanism to execute in parallel. So if the first message arrived it should go for processing on one task and even if the processing is not finished for the first message still after the interval time configured using the callback method (callback is working now) next message should be picked and processed on a different task.
Below is my code:
Task.Factory.StartNew(() =>
{
Subsriber<Message> subsriber = new Subsriber<Message>()
{
Interval = 1000
};
subsriber.Callback(Process, m => m != null);
});
public static void Process(Message message)
{
if (message != null)
{
// Processing logic
}
else
{
}
}
But using the Task Factory I am not able to control the number of tasks in parallel so in my case I want to configure the number of tasks on which messages will run on the availability of the tasks?
Update:
Updated my above code to add multiple tasks
Below is the code:
private static void Main()
{
try
{
int taskCount = 5;
Task.Factory.StartNewAsync(() =>
{
Subscriber<Message> consumer = new
Subcriber<Message>()
{
Interval = 1000
};
consumer.CallBack(Process, msg => msg!=
null);
}, taskCount);
Console.ReadLine();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
public static void StartNewAsync(this TaskFactory
target, Action action, int taskCount)
{
var tasks = new Task[taskCount];
for (int i = 0; i < taskCount; i++)
{
tasks[i] = target.StartNew(action);
}
}
public static void Process(Message message)
{
if (message != null)
{
}
else
{ }
}
}
I think what your looking for will result in quite a large sample. I'm trying just to demonstrate how you would do this with ActionBlock<T>. There's still a lot of unknowns so I left the sample as skeleton you can build off. In the sample the ActionBlock will handle and process in parallel all your messages as they're received from your messaging system
public class Processor
{
private readonly IMessagingSystem _messagingSystem;
private readonly ActionBlock<Message> _handler;
private bool _pollForMessages;
public Processor(IMessagingSystem messagingSystem)
{
_messagingSystem = messagingSystem;
_handler = new ActionBlock<Message>(msg => Process(msg), new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 5 //or any configured value
});
}
public async Task Start()
{
_pollForMessages = true;
while (_pollForMessages)
{
var msg = await _messagingSystem.ReceiveMessageAsync();
await _handler.SendAsync(msg);
}
}
public void Stop()
{
_pollForMessages = false;
}
private void Process(Message message)
{
//handle message
}
}
More Examples
And Ideas
Ok, sorry I'm short on time but here's the general idea/skeleton of what I was thinking as an alternative.
If I'm honest though I think the ActionBlock<T> is the better option as there's just so much done for you, with the only limit being that you can't dynamically scale the amount of work it will do it once, although I think the limit can be quite high. If you get into doing it this way you could have more control or just have a kind of dynamic amount of tasks running but you'll have to do a lot of things manually, e.g if you want to limit the amount of tasks running at a time, you'd have to implement a queueing system (something ActionBlock handles for you) and then maintain it. I guess it depends on how many messages you're receiving and how fast your process handles them.
You'll have to check it out and think of how it could apply to your direct use case as I think some of the details area a little sketchily implemented on my side around the concurrentbag idea.
So the idea behind what I've thrown together here is that you can start any number of tasks, or add to the tasks running or cancel tasks individually by using the collection.
The main thing I think is just making the method that the Callback runs fire off a thread that does the work, instead of subscribing within a separate thread.
I used Task.Factory.StartNew as you did, but stored the returned Task object in an object (TaskInfo) which also had it's CancellationTokenSource, it's Id (assigned externally) as properties, and then added that to a collection of TaskInfo which is a property on the class this is all a part of:
Updated - to avoid this being too confusing i've just updated the code that was here previously.
You'll have to update bits of it and fill in the blanks in places like with whatever you have for my HeartbeatController, and the few events that get called because they're beyond the scope of the question but the idea would be the same.
public class TaskContainer
{
private ConcurrentBag<TaskInfo> Tasks;
public TaskContainer(){
Tasks = new ConcurrentBag<TaskInfo>();
}
//entry point
//UPDATED
public void StartAndMonitor(int processorCount)
{
for (int i = 0; i <= processorCount; i++)
{
Processor task = new Processor(ProcessorId = i);
CreateProcessorTask(task);
}
this.IsRunning = true;
MonitorTasks();
}
private void CreateProcessorTask(Processor processor)
{
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
Task taskInstance = Task.Factory.StartNew(
() => processor.Start(cancellationTokenSource.Token)
);
//bind status update event
processor.ProcessorStatusUpdated += ReportProcessorProcess;
Tasks.Add(new ProcessorInfo()
{
ProcessorId = processor.ProcessorId,
Task = taskInstance,
CancellationTokenSource = cancellationTokenSource
});
}
//this method gets called once but the HeartbeatController gets an action as a param that it then
//executes on a timer. I haven't included that but you get the idea
//This method also checks for tasks that have stopped and restarts them if the manifest call says they should be running.
//Will also start any new tasks included in the manifest and stop any that aren't included in the manifest.
internal void MonitorTasks()
{
HeartbeatController.Beat(() =>
{
HeartBeatHappened?.Invoke(this, null);
List<int> tasksToStart = new List<int>();
//this is an api call or whatever drives your config that says what tasks must be running.
var newManifest = this.GetManifest(Properties.Settings.Default.ResourceId);
//task Removed Check - If a Processor is removed from the task pool, cancel it if running and remove it from the Tasks List.
List<int> instanceIds = new List<int>();
newManifest.Processors.ForEach(x => instanceIds.Add(x.ProcessorId));
var removed = Tasks.Select(x => x.ProcessorId).ToList().Except(instanceIds).ToList();
if (removed.Count() > 0)
{
foreach (var extaskId in removed)
{
var task = Tasks.FirstOrDefault(x => x.ProcessorId == extaskId);
task.CancellationTokenSource?.Cancel();
}
}
foreach (var newtask in newManifest.Processors)
{
var oldtask = Tasks.FirstOrDefault(x => x.ProcessorId == newtask.ProcessorId);
//Existing task check
if (oldtask != null && oldtask.Task != null)
{
if (!oldtask.Task.IsCanceled && (oldtask.Task.IsCompleted || oldtask.Task.IsFaulted))
{
var ex = oldtask.Task.Exception;
tasksToStart.Add(oldtask.ProcessorId);
continue;
}
}
else //New task Check
tasksToStart.Add(newtask.ProcessorId);
}
foreach (var item in tasksToStart)
{
var taskToRemove = Tasks.FirstOrDefault(x => x.ProcessorId == item);
if (taskToRemove != null)
Tasks.Remove(taskToRemove);
var task = newManifest.Processors.FirstOrDefault(x => x.ProcessorId == item);
if (task != null)
{
CreateProcessorTask(task);
}
}
});
}
}
//UPDATED
public class Processor{
private int ProcessorId;
private Subsriber<Message> subsriber;
public Processor(int processorId) => ProcessorId = processorId;
public void Start(CancellationToken token)
{
Subsriber<Message> subsriber = new Subsriber<Message>()
{
Interval = 1000
};
subsriber.Callback(Process, m => m != null);
}
private void Process()
{
//do work
}
}
Hope this gives you an idea of how else you can approach your problem and that I didn't miss the point :).
Update
To use events to update progress or which tasks are processing, I'd extract them into their own class, which then has subscribe methods on it, and when creating a new instance of that class, assign the event to a handler in the parent class which can then update your UI or whatever you want it to do with that info.
So the content of Process() would look more like this:
Processor processor = new Processor();
Task task = Task.Factory.StartNew(() => processor.ProcessMessage(cancellationTokenSource.CancellationToken));
processor.StatusUpdated += ReportProcess;
I have a MessagesManager thread to which different threads may send messages and then this MessagesManager thread is responsible to publish these messages inside SendMessageToTcpIP() (start point of MessagesManager thread ).
class MessagesManager : IMessageNotifier
{
//private
private readonly AutoResetEvent _waitTillMessageQueueEmptyARE = new AutoResetEvent(false);
private ConcurrentQueue<string> MessagesQueue = new ConcurrentQueue<string>();
public void PublishMessage(string Message)
{
MessagesQueue.Enqueue(Message);
_waitTillMessageQueueEmptyARE.Set();
}
public void SendMessageToTcpIP()
{
//keep waiting till a new message comes
while (MessagesQueue.Count() == 0)
{
_waitTillMessageQueueEmptyARE.WaitOne();
}
//Copy the Concurrent Queue into a local queue - keep dequeuing the item once it is inserts into the local Queue
Queue<string> localMessagesQueue = new Queue<string>();
while (!MessagesQueue.IsEmpty)
{
string message;
bool isRemoved = MessagesQueue.TryDequeue(out message);
if (isRemoved)
localMessagesQueue.Enqueue(message);
}
//Use the Local Queue for further processing
while (localMessagesQueue.Count() != 0)
{
TcpIpMessageSenderClient.ConnectAndSendMessage(localMessagesQueue.Dequeue().PadRight(80, ' '));
Thread.Sleep(2000);
}
}
}
The different threads (3-4) send their message by calling the PublishMessage(string Message) (using same object to MessageManager). Once the message comes, I push that message into a concurrent queue and notifies the SendMessageToTcpIP() by setting _waitTillMessageQueueEmptyARE.Set();. Inside SendMessageToTcpIP(), I am copying the message from the concurrent queue inside a local queue and then publish one by one.
QUESTIONS: Is it thread safe to do enqueuing and dequeuing in this way? Could there be some strange effects due to it?
While this is probably thread-safe, there are built-in classes in .NET to help with "many publishers one consumer" pattern, like BlockingCollection. You can rewrite your class like this:
class MessagesManager : IDisposable {
// note that your ConcurrentQueue is still in play, passed to constructor
private readonly BlockingCollection<string> MessagesQueue = new BlockingCollection<string>(new ConcurrentQueue<string>());
public MessagesManager() {
// start consumer thread here
new Thread(SendLoop) {
IsBackground = true
}.Start();
}
public void PublishMessage(string Message) {
// no need to notify here, will be done for you
MessagesQueue.Add(Message);
}
private void SendLoop() {
// this blocks until new items are available
foreach (var item in MessagesQueue.GetConsumingEnumerable()) {
// ensure that you handle exceptions here, or whole thing will break on exception
TcpIpMessageSenderClient.ConnectAndSendMessage(item.PadRight(80, ' '));
Thread.Sleep(2000); // only if you are sure this is required
}
}
public void Dispose() {
// this will "complete" GetConsumingEnumerable, so your thread will complete
MessagesQueue.CompleteAdding();
MessagesQueue.Dispose();
}
}
.NET already provides ActionBlock< T> that allows posting messages to a buffer and processing them asynchronously. By default, only one message is processed at a time.
Your code could be rewritten as:
//In an initialization function
ActionBlock<string> _hmiAgent=new ActionBlock<string>(async msg=>{
TcpIpMessageSenderClient.ConnectAndSendMessage(msg.PadRight(80, ' '));
await Task.Delay(2000);
);
//In some other thread ...
foreach ( ....)
{
_hmiAgent.Post(someMessage);
}
// When the application closes
_hmiAgent.Complete();
await _hmiAgent.Completion;
ActionBlock offers many benefits - you can specify a limit to the number of items it can accept in a buffer and specify that multiple messages can be processed in parallel. You can also combine multiple blocks in a processing pipeline. In a desktop application, a message can be posted to a pipeline in response to an event, get processed by separate blocks and results posted to a final block that updates the UI.
Padding, for example, could be performed by an intermediary TransformBlock< TIn,TOut>. This transformation is trivial and the cost of using the block is greater than the method, but that's just an illustration:
//In an initialization function
TransformBlock<string> _hmiAgent=new TransformBlock<string,string>(
msg=>msg.PadRight(80, ' '));
ActionBlock<string> _tcpBlock=new ActionBlock<string>(async msg=>{
TcpIpMessageSenderClient.ConnectAndSendMessage());
await Task.Delay(2000);
);
var linkOptions=new DataflowLinkOptions{PropagateCompletion = true};
_hmiAgent.LinkTo(_tcpBlock);
The posting code doesn't change at all
_hmiAgent.Post(someMessage);
When the application terminates, we need to wait for the _tcpBlock to complete:
_hmiAgent.Complete();
await _tcpBlock.Completion;
Visual Studio 2015+ itself uses TPL Dataflow for such scenarios
Bar Arnon provides a better example in TPL Dataflow Is The Best Library You're Not Using, that shows how both synchronous and asynchronous methods can be used in a block.
The code is thread safe since both ConcurrentQueue and AutoResetEvent are thread safe. your strings are anyway being read and never being written to, so this code is thread safe.
However, You have to make sure you call SendMessageToTcpIP in some sort of a loop.
otherwise , you have a dangerous race condition - some messages may get lost:
while (!MessagesQueue.IsEmpty)
{
string message;
bool isRemoved = MessagesQueue.TryDequeue(out message);
if (isRemoved)
localMessagesQueue.Enqueue(message);
}
//<<--- what happens if another thread enqueues a message here?
while (localMessagesQueue.Count() != 0)
{
TcpIpMessageSenderClient.ConnectAndSendMessage(localMessagesQueue.Dequeue().PadRight(80, ' '));
Thread.Sleep(2000);
}
Other than that, AutoResetEvent is extremely heavy object. it uses a kernel object to synchronize threads. every call is a system call which may be costly. consider using user mode synchronization object (doesn't .net provides some sort of condition variable?)
This is an refactored code snippet of how I would implement this functionality:
class MessagesManager {
private readonly AutoResetEvent messagesAvailableSignal = new AutoResetEvent(false);
private readonly ConcurrentQueue<string> messageQueue = new ConcurrentQueue<string>();
public void PublishMessage(string Message) {
messageQueue.Enqueue(Message);
messagesAvailableSignal.Set();
}
public void SendMessageToTcpIP() {
while (true) {
messagesAvailableSignal.WaitOne();
while (!messageQueue.IsEmpty) {
string message;
if (messageQueue.TryDequeue(out message)) {
TcpIpMessageSenderClient.ConnectAndSendMessage(message.PadRight(80, ' '));
}
}
}
}
}
Points to note here:
This drains the queue completely: if there is at least one message, it will process all of them
The 2000ms Thread sleep is removed
I have a rather tricky question to solve. I have multiple (up to hundred or more) tasks, each of them produce a piece of data, say, string. These tasks can be spawn in every moment and there can be huge amount of them in one time and no at another. Each task must receive bool, indicating, whether is was completed correctly or not (that's important).
I want to implement some kind of buffer, to agregate data from tasks and flush it to external service, returning operation state (ok or fail). Also, my buffer must be flushed by timeout (to prevent waiting for new tasks to generate data for too long).
So far i tried to make some shared list of items. Tasks can add items to list and there is another task, checking timer or count of items in list and flushing them. But in this approach i can't tell status of flush operation to task, which is very bad for me.
I'll be gratefull for any approarch to solve my problem.
As I understand, you need to save result of each task to database/service, but you don't want to do it immediately.
There can be more than one solution to your problem, but it's difficult to come up with the best one, so I'll describe how I would have done it ... quickly.
A container for data you need to save/send.
public class TaskResultEventArgs : EventArgs
{
public bool Result { get; set; }
}
A notifier which also runs the task for you. I assumed you can delay execution of tasks.
public class NotifyingTaskRunner
{
public event EventHandler<TaskResultEventArgs> TaskCompleted;
public void RunAndNotify(Task<bool> task)
{
task.ContinueWith(t =>
{
OnTaskCompleted(this, new TaskResultEventArgs { Result = t.Result });
}, TaskContinuationOptions.OnlyOnRanToCompletion);
task.Start();
}
protected virtual void OnTaskCompleted(object sender, TaskResultEventArgs e)
{
var h = TaskCompleted;
if (h != null)
{
h.Invoke(sender, e);
}
}
}
A listener which can buffer and/or flush results (or you might want to delegate this to another class).
public class Listener
{
private ConcurrentQueue<bool> _queue = new ConcurrentQueue<bool>();
public Listener(NotifyingTaskRunner runner)
{
runner.TaskCompleted += Flush;
}
public async void Flush(object sender, TaskResultEventArgs e)
{
// Enqueue status to flush everything later (or flush it immediately)
_queue.Enqueue(e.Result);
}
}
And this is how you can use everything together.
var runner = new NotifyingTaskRunner();
var listener = new Listener(runner);
var t1 = new Task<bool>(() => { return true; });
var t2 = new Task<bool>(() => { return false; });
runner.RunAndNotify(t1);
runner.RunAndNotify(t2);
Please see below pseudo code
//Single or multiple Producers produce using below method
void Produce(object itemToQueue)
{
concurrentQueue.enqueue(itemToQueue);
consumerSignal.set;
}
//somewhere else we have started a consumer like this
//we have only one consumer
void StartConsumer()
{
while (!concurrentQueue.IsEmpty())
{
if (concurrentQueue.TrydeQueue(out item))
{
//long running processing of item
}
}
consumerSignal.WaitOne();
}
How do I port this pattern I have used since time immemorial to use taskfactory created tasks and the new signalling features of net 4. In other words if someone were to write this pattern using net 4 how would it look like ? Pseudo code is fine. Iam already using .net 4 concurrentQueue as you can see. How do I use a task and possibly use some newer signalling mechanism if possible. thanks
Solution to my problem below thanks to Jon/Dan. Sweet.
No manual signalling or while(true) or while(itemstoProcess) type loops like the old days
//Single or multiple Producers produce using below method
void Produce(object itemToQueue)
{
blockingCollection.add(item);
}
//somewhere else we have started a consumer like this
//this supports multiple consumers !
task(StartConsuming()).Start;
void StartConsuming()
{
foreach (object item in blockingCollection.GetConsumingEnumerable())
{
//long running processing of item
}
}
cancellations are handled using cancel tokens
You would use BlockingCollection<T>. There's an example in the documentation.
That class is specifically designed to make this trivial.
Your second block of code looks better. But, starting a Task and then immediately waiting on it is pointless. Just call Take and then process the item that is returned directly on the consuming thread. That is how the producer-consumer pattern is meant to be done. If you think the processing of work items is intensive enough to warrant more consumers then by all means start more consumers. BlockingCollection is safe multiple producers and multiple consumers.
public class YourCode
{
private BlockingCollection<object> queue = new BlockingCollection<object>();
public YourCode()
{
var thread = new Thread(StartConsuming);
thread.IsBackground = true;
thread.Start();
}
public void Produce(object item)
{
queue.Add(item);
}
private void StartConsuming()
{
while (true)
{
object item = queue.Take();
// Add your code to process the item here.
// Do not start another task or thread.
}
}
}
I've used a pattern before that creates a sort of 'on-demand' queue consumer (based on consuming from a ConcurrentQueue):
private void FireAndForget(Action fire)
{
_firedEvents.Enqueue(fire);
lock (_taskLock)
{
if (_launcherTask == null)
{
_launcherTask = new Task(LaunchEvents);
_launcherTask.ContinueWith(EventsComplete);
_launcherTask.Start();
}
}
}
private void LaunchEvents()
{
Action nextEvent;
while (_firedEvents.TryDequeue(out nextEvent))
{
if (_synchronized)
{
var syncEvent = nextEvent;
_mediator._syncContext.Send(state => syncEvent(), null);
}
else
{
nextEvent();
}
lock (_taskLock)
{
if (_firedEvents.Count == 0)
{
_launcherTask = null;
break;
}
}
}
}
private void EventsComplete(Task task)
{
if (task.IsFaulted && task.Exception != null)
{
// Do something with task Exception here
}
}
What is the most recommended .NET custom threadpool that can have separate instances i.e more than one threadpool per application?
I need an unlimited queue size (building a crawler), and need to run a separate threadpool in parallel for each site I am crawling.
Edit :
I need to mine these sites for information as fast as possible, using a separate threadpool for each site would give me the ability to control the number of threads working on each site at any given time. (no more than 2-3)
Thanks
Roey
I believe Smart Thread Pool can do this. It's ThreadPool class is instantiated so you should be able to create and manage your separate site specific instances as you require.
Ami bar wrote an excellent Smart thread pool that can be instantiated.
take a look here
Ask Jon Skeet: http://www.yoda.arachsys.com/csharp/miscutil/
Parallel extensions for .Net (TPL) should actually work much better if you want a large number of parallel running tasks.
Using BlockingCollection can be used as a queue for the threads.
Here is an implementation of it.
Updated at 2018-04-23:
public class WorkerPool<T> : IDisposable
{
BlockingCollection<T> queue = new BlockingCollection<T>();
List<Task> taskList;
private CancellationTokenSource cancellationToken;
int maxWorkers;
private bool wasShutDown;
int waitingUnits;
public WorkerPool(CancellationTokenSource cancellationToken, int maxWorkers)
{
this.cancellationToken = cancellationToken;
this.maxWorkers = maxWorkers;
this.taskList = new List<Task>();
}
public void enqueue(T value)
{
queue.Add(value);
waitingUnits++;
}
//call to signal that there are no more item
public void CompleteAdding()
{
queue.CompleteAdding();
}
//create workers and put then running
public void startWorkers(Action<T> worker)
{
for (int i = 0; i < maxWorkers; i++)
{
taskList.Add(new Task(() =>
{
string myname = "worker " + Guid.NewGuid().ToString();
try
{
while (!cancellationToken.IsCancellationRequested)
{
var value = queue.Take();
waitingUnits--;
worker(value);
}
}
catch (Exception ex) when (ex is InvalidOperationException) //throw when collection is closed with CompleteAdding method. No pretty way to do this.
{
//do nothing
}
}));
}
foreach (var task in taskList)
{
task.Start();
}
}
//wait for all workers to be finish their jobs
public void await()
{
while (waitingUnits >0 || !queue.IsAddingCompleted)
Thread.Sleep(100);
shutdown();
}
private void shutdown()
{
wasShutDown = true;
Task.WaitAll(taskList.ToArray());
}
//case something bad happen dismiss all pending work
public void Dispose()
{
if (!wasShutDown)
{
queue.CompleteAdding();
shutdown();
}
}
}
Then use like this:
WorkerPool<int> workerPool = new WorkerPool<int>(new CancellationTokenSource(), 5);
workerPool.startWorkers(value =>
{
log.Debug(value);
});
//enqueue all the work
for (int i = 0; i < 100; i++)
{
workerPool.enqueue(i);
}
//Signal no more work
workerPool.CompleteAdding();
//wait all pending work to finish
workerPool.await();
You can have as many polls has you like simply creating new WorkPool objects.
This free nuget library here: CodeFluentRuntimeClient has a CustomThreadPool class that you can reuse. It's very configurable, you can change pool threads priority, number, COM apartment state, even name (for debugging), and also culture.
Another approach is to use a Dataflow Pipeline. I added these later answer because i find Dataflows a much better approach for these kind of problem, the problem of having several thread pools. They provide a more flexible and structured approach and can easily scale vertically.
You can broke your code into one or more blocks, link then with Dataflows and let then the Dataflow engine allocate threads according to CPU and memory availability
I suggest to broke into 3 blocks, one for preparing the query to the site page , one access site page, and the last one to Analise the data.
This way the slow block (get) may have more threads allocated to compensate.
Here how would look like the Dataflow setup:
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
prepareBlock.LinkTo(get, linkOptions);
getBlock.LinkTo(analiseBlock, linkOptions);
Data will flow from prepareBlock to getBlock and then to analiseBlock.
The interfaces between blocks can be any class, just have to bee the same. See the full example on Dataflow Pipeline
Using the Dataflow would be something like this:
while ...{
...
prepareBlock.Post(...); //to send data to the pipeline
}
prepareBlock.Complete(); //when done
analiseBlock.Completion.Wait(cancellationTokenSource.Token); //to wait for all queues to empty or cancel