Guarantee immediate start of parallel threads/tasks/whatever - c#

I will use "Process" to refer to the work that is going to happen in parallel, and "enqueue" to refer to whatever process is going to be used to initiate that process (whether that be Task.Run, ThreadPool.QUWI, new Thread() ... whatever).
We have a performance sensitive program that spawn multiple parallel processes to gather data.
We're having issues with the spawning, that the processes are not beginning immediately.
Specifically, if we prepare a process, start a timer, enqueue the process, and check the timer as the very first action in the process ... then we see that the time delay occasionally stretches into 100s or even 1000s of milliseconds.
Given that the process itself is supposed to only run for 3-10 seconds, having a 2sec delay between enqueuing and activation of the process is a major issue.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Currently our implementations started using TP.QUWI, and then we move to using Task.Run.
Our initial investigation lead us to the Thread-Creation-Strategy used by Threadpool and using ThreadPool.SetMinThreads(), so we're pursuing that angle, to see if that will completely resolve the issue.
But is there another change/approach that we should be looking at, if our goal is to have the process start immediately after enqueuing?

Taken from here (I strongly suggest you have a read up):
Seems as though what you want can be achieved by overridding the default task scheduler.... scarey...
You can't assume that all parallel tasks will immediately run. Depending on the current work load and system configuration, tasks might be scheduled to run one after another, or they might run at the same time. For more information about how tasks are scheduled, see the section, "The Default Task Scheduler," later in this chapter.
Creating Tasks with Custom Scheduling
You can customize the details of how tasks in .NET are scheduled and run by overriding the default task scheduler that's used by the task factory methods. For example, you can provide a custom task scheduler as an argument to one of the overloaded versions of the TaskFactory.StartNew method.
There are some cases where you might want to override the default scheduler. The most common case occurs when you want your task to run in a particular thread context... Other cases occur when the load-balancing heuristics of the default task scheduler don't work well for your application. For more information, see the section, "Thread Injection," later in this chapter.
Unless you specify otherwise, any new tasks will use the current task scheduler...
You can implement your own task scheduler class. For more information, see the section, "Writing a Custom Task Scheduler," later in this chapter.
Thread Injection
The .NET thread pool automatically manages the number of worker threads in the pool...
Have a read of this SO post "replacing the task scheduler in c sharp with a custom built one"

Related

Long-running task without IHostedService running the entire life of the application?

I have a website page that needs the option of performing an operation that could take several minutes. To avoid performance issues and time outs, I want to run this operation outside of the HTTP request.
After some research, I found IHostedService and BackgroundService, which can be registered as a singleton using AddHostedService<T>().
But my concern is that a hosted service is always running. Doesn't that seem like a waste of resources when I just want it to run on demand?
Does anyone know a better option to run a lengthy task, or a way to use IHostedService that doesn't need to run endlessly?
Note that the operation calls and waits for an API call. And so I cannot report the progress of the operation, nor can I set a flag in a common database regarding whether the operation has completed.
One option to run a lengthy task on demand while avoiding performance issues and time outs is to use a message queue. You can have your Razor Pages website send a message to the queue when the operation is requested, and have a separate service, such as a background worker, consume messages from the queue and perform the operation. This allows you to decouple the task from the web request, and also allows for the possibility of adding more worker instances to handle the workload.
Another option is to use a task scheduler that runs on demand, such as Hangfire. It allows you to schedule background jobs and monitor their progress, which can be useful in your scenario where you cannot report the progress of the operation.
You can also use IHostedService, but you need to make sure that the service is only running when it is needed. You can use a flag or a semaphore to control whether the service is running or not. You can set the flag or semaphore when the operation is requested, and clear it when the operation is completed. The service can then check the flag or semaphore in its main loop, and exit if the flag is not set.
In summary:
message queue, task scheduler, and IHostedService with controlling flag/semaphore are all viable options for running a lengthy task on demand. The best option depends on your specific use case and requirements.

Proper way to start and fire-and-forget asynchronous calls?

I have an async call (DoAsyncWork()), that I would like to start in a fire-and-forget way, i.e. I'm not interesting in its result and would like the calling thread to continue even before the async method is finished.
What is the proper way to do this? I need this in both, .NET Framework 4.6 as well as .NET Core 2, in case there are differences.
public async Task<MyResult> DoWorkAsync(){...}
public void StarterA(){
Task.Run(() => DoWorkAsync());
}
public void StarterB(){
Task.Run(async () => await DoWorkAsync());
}
Is it one of those two or something different/better?
//edit: Ideally without any extra libraries.
What is the proper way to do this?
First, you need to decide whether you really want fire-and-forget. In my experience, about 90% of people who ask for this actually don't want fire-and-forget; they want a background processing service.
Specifically, fire-and-forget means:
You don't care when the action completes.
You don't care if there are any exceptions when executing the action.
You don't care if the action completes at all.
So the real-world use cases for fire-and-forget are astoundingly small. An action like updating a server-side cache would be OK. Sending emails, generating documents, or anything business related is not OK, because you would (1) want the action to be completed, and (2) get notified if the action had an error.
The vast majority of the time, people don't want fire-and-forget at all; they want a background processing service. The proper way to build one of those is to add a reliable queue (e.g., Azure Queue / Amazon SQS, or even a database), and have an independent background process (e.g., Azure Function / Amazon Lambda / .NET Core BackgroundService / Win32 service) processing that queue. This is essentially what Hangfire provides (using a database for a queue, and running the background process in-proc in the ASP.NET process).
Is it one of those two or something different/better?
In the general case, there's a number of small behavior differences when eliding async and await. It's not something you would want to do "by default".
However, in this specific case - where the async lambda is only calling a single method - eliding async and await is fine.
It depends on what you mean by proper :)
For instance: are you interested in the exceptions being thrown in your "fire and forget" calls? If not, than this is sort of fine. Though what you might need to think about is in what environment the task lives.
For instance, if this is a asp.net application and you do this inside the lifetime of a thread instantiated due to a call to a .aspx or .svc. The Task becomes a background thread of that (foreground)thread. The foreground thread might get cleaned up by the application pool before your "fire and forget" task is completed.
So also think about in which thread your tasks live.
I think this article gives you some useful information on that:
https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
Also note that if you do not return a value in your Tasks, a task will not return exception info. Source for that is the ref book for microsoft exam 70-483
There is probably a free version of that online somewhere ;P https://www.amazon.com/Exam-Ref-70-483-Programming-C/dp/0735676828
Maybe useful to know is that if your have an async method being called by a non-async and you wish to know its result. You can use .GetAwaiter().GetResult().
Also I think it is important to note the difference between async and multi-threading.
Async is only useful if there are operations that use other parts of a computer that is not the CPU. So things like networking or I/O operations. Using async then tells the system to go ahead and use CPU power somewhere else instead of "blocking" that thread in the CPU for just waiting for a response.
multi-threading is the allocation of operations on different threads in a CPU (for instance, creating a task which creates a background thread of the foreground thread... foreground threads being the threads that make up your application, they are primary, background threads exist linked to foreground threads. If you close the linked foreground thread, the background thread closes as well)
This allows the CPU to work on different tasks at the same time.
Combining these two makes sure the CPU does not get blocked up on just 4 threads if it is a 4 thread CPU. But can open more while it waits for async tasks that are waiting for I/O operations.
I hope this gives your the information needed to do, what ever it is you are doing :)

The "bag of tasks" concept in C#, enqueue,pause,cancel logical tasks

The app I'm developing is composed this way:
A producer task scan the file system for text files and put a reference to them in a bag.
Many consumer tasks take file refs from the bag concurrently and read the files (and do some short work with their content)
I must be able to pause and resume the whole process.
I've tried using TPL, creating a task for every file ref as they are put in the bag (in this case the bag is just a concept, the producer directly create the consumers task as it find files) but this way I don't have control over the task I create, I can't (or I don't know how to) pause them. I can write some code to suspend the thread currently executing the task but that will ruin the point of working with logical tasks instead of manully creating threads wouldn't it? I would want something like "task already assigned to phisical thread can complete but waiting logical tasks should not start until resume command"
How can I achive this? Can it be done with TPL or should I use something else?
EDIT:
Your answers are all valid but my main doubt remains unanswered. We are talking about tasks, if I use TPL my producer and my many consumer will be tasks (right?) not threads (well, ok at the moment of the execution tasks will be mapped on threads). Every synchronization mechanism i've found (like the one proposed in the comment "ManualResetEventSlim") work at thread level.
E.g. the description of the Wait() method of "ManualResetEventSlim" is "Blocks the current thread until the current ManualResetEventSlim is set."
My knowledge of task is purely academic, I don't know how things works in the "real world" but it seem logical to me that I need a way to coordinate (wait/signal/...) tasks at task level or things could get weird... like... two task may be mapped on the same thread but one was supposed to signal the other that was waiting then deadlock. I'm a bit confused. This is why I asked if my app could use TPL instead of old style simple threads.
Yes, you can do that. First, you have a main thread, your application. There you have two workers, represented by threads. The first worker would be a producer and the second worker would be a consumer.
When your application starts, you start the workers. Both of them operates on the concurrency collection, the bag. Producer searches for files and puts references to the bag and consumer takes references from the bag and starts a task per reference.
When you want to signal pause, simply pause the producer. If you do that, consumer also stops working if there is nothing in the bag. If this is not a desired behaviour, you can simply define that pausing of the producer also clears the bag - backup your bag first and than clear it. This way all running tasks will finish their job and consumer will not start new tasks, but it can still run and wait for the results.
EDIT:
Based on your edit. I don't know how to achieve it the way you want, but although it is nice try to use new technologies, don't let your mind be clouded. Using a ThreadPool is also nice thing. It will take more time to start the application, but once it is running, consuming will be faster, because you already have workers ready.
It is not a bad idea, you can specify a maximum number of workers. If you create a task for every item in the bag, it will be more memory-consuming because you will still allocate and release memory. This will not happen with ThreadPool.
Sure you can use TPL for this. And may be also reactive extensions and LINQ to simplify grouping and pausing/resuming the thread works.
If you have just a short job on each file, it is pretty good idea to not to disturb the handler function with cancellations. You can just suspend queueing the workers instead.
I imagine something like this:
You directory scanner thread puts the found files into an observable collection.
The consumer thread subscribes the collection changes and gets/removes the files and assigns them to workers.

.NET Task Parallel Library

I have read documenation and many tutorials on TPL but none covers model I want to achieve.
There were always fixed number of iterations for some algorithm.
I need constantly running threads (as many as possible):
while(true)
get data from MAIN thread
perform heavy time-consuming task (in separate thread)
update MAIN thread information
Additionaly I need mechanism which will be able to set alarm clock (e.g. 5 seconds). After five seconds all work must be suspended for a while and then resumed.
Should I use Task.ContinueWith the same task? But I am not processing result of previous task launch, but instead I update data structure in MAIN Thread and then decide what will be the input of new task iteration...
How can I leave to TPL decision how many task should be created for best efficiency?
No I am using BackgroundWorkers, becase they have nice RunEventCompleted event - inside it I am on my main thread so I can update my MAIN structure, check time constraints and then eventually call StartAsync again on the BackgroundWorker which completed. It is nice and clear, but probably very inneficient.
I need to make it highly efficient on multi-processor, multi-core servers.
One problem is that computation is always online, never stops. There is some networking also, which enables to ask remotely of current state of MAIN structure.
Second problem is critical time control (I must have precise timer - when it stops which no thread can be restarted). Then comes special high priority task after it ends, all work is resumed.
Third problem is that there is no upper bound for operations to do.
These three constraints, from what I observed, do not go along TPL well - I can't use something like Parallel.For because the collection is modified by results of task itself in realtime...
I don't know also how to combine:
ability to let TPL decide how many threads should be created
with sort of lifetime runing of threads (with pauses and synchronization points between consecutive restarts)
creating threads only once at the begining (they should be only restarted with constantly new parameters)
Can someone give me clues?
I know how to do it bad, inefficent way. There are some small requirements which I described, which prevent me from doing this right. I am a little bit confused.
You need to use messaging + actors + a scheduler imo. And then you need to use a language capable for it. Have a look at this code that asynchronously receives from Azure Service Bus, enqueues in a shared queue and manages runtime state through an actor.
Inline:
Should I use Task.ContinueWith the same task?
No, ContinueWith will get your program killed based on exception handling inside of each continuation passing; there's no good way in TPL to marshal failed state into the call-side/main thread.
But I am not processing result of previous task launch, but
instead I update data structure in
MAIN Thread and then decide what will be the input of new task
iteration...
You need to move beyond threading for this, unless you're willing to spend A LOT of time on the problem.
How can I leave to TPL decision how many task should be created for
best efficiency?
That's handled by the framework that runs your async workflows.
No I am using BackgroundWorkers, becase they have nice
RunEventCompleted event - inside it I am on my main thread so I can
update my MAIN structure, check time constraints and then eventually
call StartAsync again on the BackgroundWorker which completed. It is
nice and clear, but probably very inneficient. I need to make it
highly efficient on multi-processor, multi-core servers.
One problem is that computation is always online, never stops. There
is some networking also, which enables to ask remotely of current
state of MAIN structure. Second problem is critical time control (I
must have precise timer - when it stops which no thread can be
restarted).
If you run everything asynchronously, you can pass messages to your actor that suspends it. You scheduling actor is responsible for calling all its subscribers with their schedulled messages; have a look at the paused state in the code linked. If you have outstanding requests you can pass them a cancellation token and handle a 'hard' cancellation/socket abort that way.
Then comes special high priority task after it ends, all
work is resumed. These two constraints, from what I observed, do not
go along TPL well - I can't use something like Parallel.For because
the collection is modified by results of task itself in realtime...
You probably need a pattern called pipes-and-filters. You pipe your input into a chain of workers (actors); each worker consumes from the other worker's output. Signalling is done using a control channel (in my case that is the inbox of the actor).
I think you should read
MSDN: How to implement a producer / consumer dataflow pattern
I had the same problem: one producer produced items, while several consumers consumed them and decided to send them to other consumers. Each consumer was working asynchronously and independent from other consumers.
Your main task is the producer. He produces items that your other tasks should process. The class with the code of your main task has a function:
public async Task ProduceOutputAsync(...)
Your main program starts this Task using:
var producerTask = Task.Run( () => MyProducer.ProduceOutputAsync(...)
Once this is called the producer task starts producing output. Meanwhile your main program can continue doing other things, like for instance start the consumers.
But let's first focus on the Producer task.
The producer task produces items of type T to be processed by other tasks. They are carried over to the other task using objects that implement ITargetBlock'.
Every time the producer task has finished creating an object of type T it sends it to the target block using ITargetBlock.Post, or preferably the async version:
while (continueProducing())
{
T product = await CreateProduct(...)
bool accepted = await this.TargetBlock(product)
// process the return value
}
// if here, nothing to produce anymore. Notify the consumers:
this.TargetBlock.Complete();
The producer needs an ITargetBlock<T>. In my application a BufferBlock<T> was enough. Check MSDN for the other possible targets.
Anyway, the data flow block should also implement ISourceBlock<T>. Your receiver waits for input to arrive at the source, fetches it and processes it. Once finished, it can send the result to its own target block, and wait for the next input until there is no input expected anymore. Of course if your consumer doesn't produce output it doesn't have to send anything to a target.
Waiting for input is done as follows:
ISourceBlock`<T`> mySource = ...;
while (await mySource.ReceiveAsync())
{ // a object of type T is available at the source
T objectToProcess = await mySource.ReceiveAsync();
// keep in mind that someone else might have fetched your object
// so only process it if you've got it.
if (objectToProcess != null)
{
await ProcessAsync(objectToProcess);
// if your processing produces output send the output to your target:
var myOutput = await ProduceOutput(objectToprocess);
await myTarget.SendAsync(myOutput);
}
}
// if here, no input expected anymore, notify my consumers:
myTarget.Complete();
construct your producer
construct all consumers
give the producer a BufferBlock to send its output to
Start the producer MyProducer.ProduceOutputAsync(...)
While the producer produces output and sends it to the buffer block:
give the consumers the same BufferBlock
Start the consumers as a separate task
await Task.WhenAll(...) to wait for all tasks to complete.
Each consumer will stop as soon as it hears that no input is expected anymore.
After all tasks have completed your main function can read the results and return

C# lower thread priority in thread pool

I have several low-imprtance tasks to be performed when some cpu time is available. I don't want this task to perform if other more import task are running. Ie if a normal/high priority task comes I want the low-importance task to pause until the importance task is done.
There is a pretty big number of low importance task to be performed (50 to 1000). So I don't want to create one thread per task. However I believe that the threadpool do not allow some priority specification, does it ?
How would you do solve this ?
You can new up a Thread and use a Dispatcher to send it takes of various priorities.
The priorities are a bit UI-centric but that doesn't really matter.
You shouldn't mess with the priority of the regular ThreadPool, since you aren't the only consumer. I suppose the logical approach would be to write your own - perhaps as simple as a producer/consumer queue, using your own Thread(s) as the consumer(s) - setting the thread priority yourself.
.NET 4.0 includes new libraries (the TPL etc) to make all this easier - until then you need additional code to create a custom thread pool or work queue.
When you are using the build in ThreadPool all threads execute with the default priority. If you mess with this setting it will be ignored. This is a case where you should roll your own ThreadPool. A few years ago I extended the SmartThreadPool to meet my needs. This may satisfy yours as well.
I'd create a shared Queue of pending task objects, with each object specifying its priority. Then write a dispatcher thread that watches the Queue and launches a new thread for each task, up to some max thread limit, and specifying the thread priority as it creates it. Its only a small amount of work to do that, and you can have the dispatcher report activity and even dynamically adjust the number of running threads. That concept has worked very well for me, and can be wrapped in a windows service to boot if you make your queue a database table.

Categories