How to handle/enforce single instance threading - c#

I have a "worker" process that is running constantly on a dedicated server, sending emails, processing data extracts etc.
I want to have all of these processes running asynchronously, but I only want one instance of each process running at any one time. If a process is already running, I don't want to queue up running it again.
[example, simplified]
while (true)
{
// SLEEP HERE
Task task1 = Task.Factory.StartNew(() => DataScheduleWorker.Run());
Task task2 = Task.Factory.StartNew(() => EmailQueueWorker.Run());
}
Basically, I want this entire process to run endlessly, with each of the tasks running parallel to each other, but only one instance of each task running at any point in time.
How can I achieve this in C# 5? What's the cleanest/best way to implement this?
EDIT
Would something as simple as this suffice, or would this be deemed bad?:
Task dataScheduleTask = null;
while (true)
{
Thread.Sleep(600);
// Data schedule worker
if (dataScheduleTask != null && dataScheduleTask.IsCompleted) dataScheduleTask = null;
if (dataScheduleTask == null)
{
dataScheduleTask = Task.Factory.StartNew(() => DataScheduleWorker.Run());
}
}

This sounds like a perfect job for either an actors framework, or possibly TPL Dataflow. Fundamentally you've got one actor (or block) for each job, waiting for messages and processing them independently of the other actors. In either case, your goal should be to write as little of the thread handling and message passing code as possible - ideally none. This problem has already been largely solved; you just need to evaluate the available options and use the best one for your task. I would probably start with Dataflow, personally.

Related

Why is async not allowing for parallel processing unless I manually create new threads? [duplicate]

This question already has answers here:
Async process start and wait for it to finish
(6 answers)
Closed 8 months ago.
I wrote a Windows service, and I would like it to work using the same exact logic that it currently has, but process everything in parallel.
The real codebase is fairly abstracted and private, so I can't post the source here but here's the gist of it.
The app is a persistent process scheduler. It leverages EntityFramework 6 to scan a database for records detailing (among other things): 1.) a path for a process to run, 2.) a date/time to run the process and 3. the scheduled frequency that it is on.
Basic Functionality
It loops through the database for active records and returns all scheduled job details
Checks the date and time against the current date and time, within a buffer
If the job should run, it has been using either new Process().Start(...) with the path from the record, initializes the process if the file is found and is executable, and then wait for an exit or the configured timeout threshold to elapse
The exit code or lack of one (in the event of hanging processes) for each process run is what single-handedly determines if the record remains active and continues to get cycled and re-scheduled dynamically into the future, or instead, deactivated with errors logged to the associated record in the DB.
The process continues perpetually unless explicitly stopped.
Currently Working in Parallel (1000% faster, but it looks like it is possibly skipping records!). Maybe I need to add a lock before accessing the db?
As it turns out I was using (var process) {...} and it was throwing that it was being disposed. After staring at the code a few days, I saw this stupid mistake I had made trying to be tidy ;p
var tasks = new List<Thread>;
schedules.ForEach(schedule => {
// I have also tried ThreadPool.QueueUserWorkerItem(...) but then I read its basically long-hand for Task.Run() and I don't think it was working the same as new Thread using this pattern.
var thread = new Thread(() => await ProcessSchedule(schedule));
// Actually using slim semaphore in wild, but for simplicty sake...
thread.Start();
threads.Add(thread);
});
// Before exiting...
while (!threads.All(instance => !instance.IsAlive))
{
await Delay(debounceValue);
continue;
}
Working in Sequence without Issue Besides it's Blocking Slowness...
var tasks = new List<Task>;
schedules.ForEach(schedule => {
// I have also tried to just await this here, but that obviously will block anything on the same thread, so I add the tasks to a list a wait for completion after the loop is complete and the processes are working on their own process.
tasks.Add(ProcessSchedule(schedule));
});
// Before exiting...
// I expected this to work but it still seems to go record by record :*(
// Also tried using Task.Run(() => await task(...)) with no luck...
await Task.WhenAll(tasks);
Note: I am passing the list of tasks or threads up another level in the real code so it can process and be awaited on while everything is working but this is some simplified borderline-psuedo code strictly for demonstrating the concept I am struggling with as concise as possible
Inside of ProcessSchedule
Async method which starts a new process and waits for an exit. When one is received, a success or exit is written to the database using EntityFramework 6 on the schedule record which drove the process for this instance being parsed and evaluated. EG:
new Process(startInfo).Start();
// Monitor process, persist exit state via
dbContext.SaveChangesAsync();
process.StandardError += handleProcExitListener;
process.StandardOutput += handleProcExitListener;
process.Exited += (...) => handleProcExitListener(...);
I can say that:
I have no non-awaited async methods unless its using await Task.Run(MethodAsync), is in Main(argz) await Task.WhenAll(task);, etc.
Is async-await blocking me because DbContext is not thread safe by default or something? If this is the case, would someone please verify how I can achieve what I am looking for?
I have tried a number of techniques, but I am not able to get the application to run each process simultaneously, then wait and react upon the end state after spawning the processes, unless I use multithreading (new Thread directly maybe also using ThreadPool).
I have not had to resort to using threads in a while, mostly since the introduction of async-await in C#. Therefore, I am questioning myself using it without first fully understanding why. I would really appreciate some help grasping what I am missing.
It seems to me async is just a fancy pattern and facades for easy access to state-machine characteristics. Why then when I researched the subject did I just read that using ThreadPool.QueueUserWorkerItem(...) is rather obsolete since TPL async/await. If async/await does not give you new threads to work with, is running a process in parallel possible without it? Also, these processes take anywhere from 10min to 45min each to run. So you can see the importance of running them all together.
Since I am stuck with .NET 4.8, I, unfortunately, cannot use the async version of WaitForExitAsync() introduced in v5+.
Solution
I modeled a solution from the following Async process start and wait for it to finish
public static Task<bool> WaitForExitAsync(this Process process, TimeSpan timeout)
{
ManualResetEvent processWaitObject = new ManualResetEvent(false);
processWaitObject.SafeWaitHandle = new SafeWaitHandle(process.Handle, false);
TaskCompletionSource<bool> tcs = new TaskCompletionSource<bool>();
RegisteredWaitHandle registeredProcessWaitHandle = null;
registeredProcessWaitHandle = ThreadPool.RegisterWaitForSingleObject(
processWaitObject,
delegate(object state, bool timedOut)
{
if (!timedOut)
{
registeredProcessWaitHandle.Unregister(null);
}
processWaitObject.Dispose();
tcs.SetResult(!timedOut);
},
null /* state */,
timeout,
true /* executeOnlyOnce */);
return tcs.Task;
}
Even though you have omitted some of the Process code, I'm assuming that you are calling the blocking method Process.WaitForExit instead of the async equivalent of it. I have created a mock type solution and this runs in parallel.
private static async Task RunPowershellProcess()
{
using var process = new Process();
process.StartInfo.FileName = #"C:\windows\system32\windowspowershell\v1.0\powershell.exe";
process.StartInfo.UseShellExecute = true;
process.Exited += (a, _) =>
{
var p = a as Process;
Console.WriteLine(p?.ExitCode);
};
process.EnableRaisingEvents = true;
process.Start();
await process.WaitForExitAsync();
}
static async Task Main(string[] args)
{
var tasks = new List<Task>(10);
for (var x = 0; x < 10; x++)
{
tasks.Add(RunPowershellProcess());
}
await Task.WhenAll(tasks);
}

Waiting on a continuous UI background polling task

I am somewhat new to parallel programming C# (When I started my project I worked through the MSDN examples for TPL) and would appreciate some input on the following example code.
It is one of several background worker tasks. This specific task pushes status messages to a log.
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new ConcurrentQueue<string>();
var backgroundUiTask = new Task(
() =>
{
while (!uiCts.IsCancellationRequested)
{
while (globalMsgQueue.Count > 0)
ConsumeMsgQueue();
Thread.Sleep(backgroundUiTimeOut);
}
},
uiCts.Token);
// Somewhere else entirely
backgroundUiTask.Start();
Task.WaitAll(backgroundUiTask);
I'm asking for professional input after reading several topics like Alternatives to using Thread.Sleep for waiting, Is it always bad to use Thread.Sleep()?, When to use Task.Delay, when to use Thread.Sleep?, Continuous polling using Tasks
Which prompts me to use Task.Delay instead of Thread.Sleep as a first step and introduce TaskCreationOptions.LongRunning.
But I wonder what other caveats I might be missing? Is polling the MsgQueue.Count a code smell? Would a better version rely on an event instead?
First of all, there's no reason to use Task.Start or use the Task constructor. Tasks aren't threads, they don't run themselves. They are a promise that something will complete in the future and may or may not produce any results. Some of them will run on a threadpool thread. Use Task.Run to create and run the task in a single step when you need to.
I assume the actual problem is how to create a buffered background worker. .NET already offers classes that can do this.
ActionBlock< T >
The ActionBlock class already implements this and a lot more - it allows you to specify how big the input buffer is, how many tasks will process incoming messages concurrently, supports cancellation and asynchronous completion.
A logging block could be as simple as this :
_logBlock=new ActionBlock<string>(msg=>File.AppendAllText("myLog.txt",msg));
The ActionBlock class itself takes care of buffering the inputs, feeding new messages to the worker function when it arrives, potentially blocking senders if the buffer gets full etc. There's no need for polling.
Other code can use Post or SendAsync to send messages to the block :
_block.Post("some message");
When we are done, we can tell the block to Complete() and await for it to process any remaining messages :
_block.Complete();
await _block.Completion;
Channels
A newer, lower-level option is to use Channels. You can think of channels as a kind of asynchronous queue, although they can be used to implement complex processing pipelines. If ActionBlock was written today, it would use Channels internally.
With channels, you need to provide the "worker" task yourself. There's no need for polling though, as the ChannelReader class allows you to read messages asynchronously or even use await foreach.
The writer method could look like this :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
File.AppendAllText(path,msg);
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
....
_logWriter=LogIt(somePath);
Other code can send messages by using WriteAsync or TryWrite, eg :
_logWriter.TryWrite(someMessage);
When we're done, we can call Complete() or TryComplete() on the writer :
_logWriter.TryComplete();
The line
.ContinueWith(t=>writer.TryComplete(t.Exception);
is needed to ensure the channel is closed even if an exception occurs or the cancellation token is signaled.
This may seem too cumbersome at first. Channels allow us to easily run initialization code or carry state from one message to the next. We could open a stream before the loop starts and use it instead of reopening the file each time we call File.AppendAllText, eg :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
//***** Can't do this with an ActionBlock ****
using(var writer=File.AppendText(somePath))
{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
writer.WriteLine(msg);
//Or
//await writer.WriteLineAsync(msg);
}
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
Definitely Task.Delay is better than Thread.Sleep, because you will not be blocking the thread on the pool, and during the wait the thread on the pool will be available to handle other tasks. Then, you don't need to make your task long-running. Long-running tasks are run in a dedicated thread, and then Task.Delay is meaningless.
Instead, I will recommend a different approach. Just use System.Threading.Timer and make your life simple. Timers are kernel objects that will run their callback on the thread pool, and you will not have to worry about delay or sleep.
The TPL Dataflow library is the preferred tool for this kind of job. It allows building efficient producer-consumer pairs quite easily, and more complex pipelines as well, while offering a complete set of configuration options. In your case using a single ActionBlock should be enough.
A simpler solution you might consider is to use a BlockingCollection. It has the advantage of not requiring the installation of any package (because it is built-in), and it's also much easier to learn. You don't have to learn more than the methods Add, CompleteAdding, and GetConsumingEnumerable. It also supports cancellation. The drawback is that it's a blocking collection, so it blocks the consumer thread while waiting for new messages to arrive, and the producer thread while waiting for available space in the internal buffer (only if you specify a boundedCapacity in the constructor).
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new BlockingCollection<string>();
var backgroundUiTask = new Task(() =>
{
foreach (var item in globalMsgQueue.GetConsumingEnumerable(uiCts.Token))
{
ConsumeMsgQueueItem(item);
}
}, uiCts.Token);
The BlockingCollection uses a ConcurrentQueue internally as a buffer.

Running multipe Task<> in an enterprise application in a safe way

I'm designing the software architecture for a product who can instantiate a series of "agents" doing some useful things.
Let's say each agent implement an interface having a function:
Task AsyncRun(CancellationToken token)
Because since these agents are doing a lot of I/O it could make some sense having as an async function. More over, the AsyncRun is supposed never complete, if no exception or explict cancellation occour.
Now the question is: main program has to run this on multiple agents, I would like to know the correct way of running that multiple task, signal each single completion ( that are due to cancellation/errors ):
for example I'm thinking on something like having an infinite loop like this
//.... all task cretaed are in the array tasks..
while(true)
{
await Task.WhenAny(tasks)
//.... check each single task for understand which one(s) exited
// re-run the task if requested replacing in the array tasks
}
but not sure if it is the correct ( or even best way )
And moreover I would like to know if this is the correct pattern, especially because the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
// re-run the task if requested replacing in the array tasks
This is the first thing I'd consider changing. It's far better to not let an application handle its own "restarting". If an operation failed, then there's no guarantee that an application can recover. This is true for any kind of operation in any language/runtime.
A better solution is to let another application restart this one. Allow the exception to propagate (logging it if possible), and allow it to terminate the application. Then have your "manager" process (literally a separate executable process) restart as necessary. This is the way all modern high-availability systems work, from the Win32 services manager, to ASP.NET, to the Kubernetes container manager, to the Azure Functions runtime.
Note that if you do want to take this route, it may make sense to split up the tasks to different processes, so they can be restarted independently. That way a restart in one won't cause a restart in others.
However, if you want to keep all your tasks in the same process, then the solution you have is fine. If you have a known number of tasks at the beginning of the process, and that number won't change (unless they fail), then you can simplify the code a bit by factoring out the restarting and using Task.WhenAll instead of Task.WhenAny:
async Task RunAsync(Func<CancellationToken, Task> work, CancellationToken token)
{
while (true)
{
try { await work(token); }
catch
{
// log...
}
if (we-should-not-restart)
break;
}
}
List<Func<CancellationToken, Task>> workToDo = ...;
var tasks = workToDo.Select(work => RunAsync(work, token));
await Task.WhenAll(tasks);
// Only gets here if they all complete/fail and were not restarted.
the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
The best way to prevent this is to wrap the call in Task.Run, so this:
await work(token);
becomes this:
await Task.Run(() => work(token));
In order to know whether the task completes successfully, or is cancelled or faulted, you could use a continuation. The continuation will be invoked as soon as the task finishes, whether that's because of failure, cancellation or completion. :
using (var tokenSource = new CancellationTokenSource())
{
IEnumerable<IAgent> agents; // TODO: initialize
var tasks = new List<Task>();
foreach (var agent in agents)
{
var task = agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
if (t.IsCanceled)
{
// Do something if cancelled.
}
else if (t.IsFaulted)
{
// Do something if faulted (with t.Exception)
}
else
{
// Do something if the task has completed.
}
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
In the end you will wait for the continued tasks. Also see this answer.
If you are afraid that the IAgent implementations will create blocking calls and want to prevent the application from hanging, you can wrap the call to the async method in Task.Run. This way the call to the agent is executed on the threadpool and is therefore non-blocking:
var task = Task.Run(async () =>
await agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
// Same as above
}));
You may want to use Task.Factory.StartNew instead to mark the task as longrunning for example.

EventHub ForEach Parallel Async

Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..
given the following trivial example:
// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };
i can post each event to an eventhub simply as follows:
foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}
the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??
Changing ICollector to IAsyncCollector, and achieve the following:
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..
Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:
Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});
Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.
Finally, i could turn them all in to Tasks i thought..
private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}
var t = new List<Task>();
foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}
await Task.WhenAll(t);
now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??
I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..
Parallel != async
This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:
Simple foreach
This is non-parallel and non-async. Nothing to talk about.
Await inside foreach
This is async code that is non-parallel.
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.
Parallel Foreach (async)
This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.
Parallel foreach (sync)
A.k.a. Parallel but not async.
Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});
Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.
You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...
Tasks
Async and potentially parallel (depends on the usage).
Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:
What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.
You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.
Back to Parallel.Foreach and async
At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.
I hope this cleared most things for you, if not, let me know what to improve on.
Why don't you send those events asynchronously in parallel, like this:
var tasks = new List<Task>();
foreach( var e in strEvents )
{
tasks.Add(outEventHub.AddAsync(e));
}
await Task.WhenAll(tasks);
await outEventHub.FlushAsync();

Manual threads vs Parallel.Foreach in task scheduler

I have a Windows Service that processes tasks created by users. This Service runs on a server with 4 cores. The tasks mostly involve heavy database work (generating a report for example). The server also has a few other services running so I don't want to spin up too many threads (let's say a maximum of 4).
If I use a BlockingCollection<MyCustomTask>, is it a better idea to create 4 Thread objects and use these to consume from the BlockingCollection<MyCustomTask> or should I use Parallel.Foreach to accomplish this?
I'm looking at the ParallelExtensionsExtras which contains a StaTaskScheduler which uses the former, like so (slightly modified the code for clarity):
var threads = Enumerable.Range(0, numberOfThreads).Select(i =>
{
var thread = new Thread(() =>
{
// Continually get the next task and try to execute it.
// This will continue until the scheduler is disposed and no more tasks remain.
foreach (var t in _tasks.GetConsumingEnumerable())
{
TryExecuteTask(t);
}
});
thread.IsBackground = true;
thread.SetApartmentState(ApartmentState.STA);
return thread;
}).ToList();
// Start all of the threads
threads.ForEach(t => t.Start());
However, there's also a BlockingCollectionPartitioner in the same ParallelExtensionsExtras which would enable the use of Parallel.Foreach on a BlockingCollection<Task>, like so:
var blockingCollection = new BlockingCollection<MyCustomTask>();
Parallel.ForEach(blockingCollection.GetConsumingEnumerable(), task =>
{
task.DoSomething();
});
It's my understanding that the latter leverages the ThreadPool. Would using Parallel.ForEach have any benefits in this case?
This answer is relevant if Task class in your code has nothing to do with System.Threading.Tasks.Task.
As a simple rule, use Parallel.ForEach to run tasks that will end eventually. Like execute some work in parallel with some other work
Use Threads when they run routine for the whole life of application.
So, it looks like in your case you should use Threads approach.

Categories