I have the following problem: I need to execute a function that is delayed after processing the HTTP request.
A user can assign for a certain task, after 45 minutes I have to check if the task is done. If not, I have to reopen the task for others.
I have tried the following code:
[HttpPost]
[ActionName("addJob")]
public string AddJob([FromBody] Task task)
{
// Add task ...
RemoveTaskAfterTime(task);
return "Job has been added";
}
private async Task RemoveTaskAfterTime(Task task)
{
System.Diagnostics.Debug.WriteLine("started to wait");
await Task.Delay(5000);
System.Diagnostics.Debug.WriteLine("remove task");
}
For some reason, "started to wait" gets called but "remove task" not. It works with Thread.sleep, but in that case also the response takes 45 minutes, so that´s no solution.
Would be awesome if somebody could help me!
Thank you in advance
I suppose that the problem is in the Task.Delay that was used.
Task.Delay should be used in async methods
45 minutes is too long to wait in the memory (however it is possible). What would you do with jobs are being waiting in memory if service (app pool, server whatever) is restarted ?
You can use the database to mark jobs as waiting using AddJob method. Job waiting start time should be set to check the job age later.
Then you can use the BackgroundService to check all waiting jobs age. You can do those checks each one minute (for example). Find jobs that are waiting more than 45 minutes and release them (set job status to available)
Your problem is one of scope.
You probably haven't given any thought to this but AddJob is an instance method defined on a class. IIS handles the HTTP request by instantiating an object and calling the method. The child thread on which the Task runs is killed when the instance is disposed, because background threads are killed when all foreground threads of their owner are terminated. This is why your task starts but doesn't end.
If you want the Task to survive the object handling the request then you could make the task and its lifecycle management static. Of course that would not suit a server accepting any number of potentially concurrent requests, so the static Task would have to be a collection of Task into which you put the task object. We just introduced concurrency issues so you will need a thread-safe queue.
As soon as you start doing this sort of thing you take on responsibility for the object lifecycle, because it won't be garbage collected until you remove it from the collection.
You need a background process that periodically checks the time in queue for each of these objects and when they reach the required age the process should de-queue them and do whatever is supposed to happen when they reach the required age. This means you need to record the age of each task. You dequeue each task, check whether it's ripe and either process it or re-queue it.
Frankly I wouldn't use a Task object, I would create a class with properties for the housekeeping details and method implementing the behaviours. This is a combination of the Memento and Command design patterns.
As mentioned in another answer in a robust solution your tasks will survive server restarts. You can achieve this using Memento/Command and a persistent message queue in place of the memory queue. On Windows MSMQ is available for free. An advantage of this way is MSMQ takes over responsibility for thread safety in queue management.
To use an external message queue you will need to learn about (de)serialisation. Another answer uses a database server rather than a message queue to persist the serialised messages and this does work but it does not scale well. Purpose-built message queues rely a bunch of assumptions that can't be made in a general purpose database engine and this allows them to handle unplanned outages much more robustly and handle much higher levels of concurrency (or stress your server less for a given level of traffic).
Your controller action has to return Task<string> and be marked with async. Asynchronous methods used in the body of your action have to be awaited.
However, async/await is meant for shorter waits, usually network requests (eg. database or network service), not for 45 minute tasks. Client's browser connection will hit timeout in 1-2 minutes.
[HttpPost]
[ActionName("addJob")]
public async Task<string> AddJob([FromBody] Task task)
{
// Add task ...
await RemoveTaskAfterTime(task);
return "Job has been added";
}
private async Task RemoveTaskAfterTime(Task task)
{
System.Diagnostics.Debug.WriteLine("started to wait");
await Task.Delay(5000);
System.Diagnostics.Debug.WriteLine("remove task");
}
Related
I thought that they were basically the same thing — writing programs that split tasks between processors (on machines that have 2+ processors). Then I'm reading this, which says:
Async methods are intended to be non-blocking operations. An await
expression in an async method doesn’t block the current thread while
the awaited task is running. Instead, the expression signs up the rest
of the method as a continuation and returns control to the caller of
the async method.
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
and I'm wondering whether someone can translate that to English for me. It seems to draw a distinction between asynchronicity (is that a word?) and threading and imply that you can have a program that has asynchronous tasks but no multithreading.
Now I understand the idea of asynchronous tasks such as the example on pg. 467 of Jon Skeet's C# In Depth, Third Edition
async void DisplayWebsiteLength ( object sender, EventArgs e )
{
label.Text = "Fetching ...";
using ( HttpClient client = new HttpClient() )
{
Task<string> task = client.GetStringAsync("http://csharpindepth.com");
string text = await task;
label.Text = text.Length.ToString();
}
}
The async keyword means "This function, whenever it is called, will not be called in a context in which its completion is required for everything after its call to be called."
In other words, writing it in the middle of some task
int x = 5;
DisplayWebsiteLength();
double y = Math.Pow((double)x,2000.0);
, since DisplayWebsiteLength() has nothing to do with x or y, will cause DisplayWebsiteLength() to be executed "in the background", like
processor 1 | processor 2
-------------------------------------------------------------------
int x = 5; | DisplayWebsiteLength()
double y = Math.Pow((double)x,2000.0); |
Obviously that's a stupid example, but am I correct or am I totally confused or what?
(Also, I'm confused about why sender and e aren't ever used in the body of the above function.)
Your misunderstanding is extremely common. Many people are taught that multithreading and asynchrony are the same thing, but they are not.
An analogy usually helps. You are cooking in a restaurant. An order comes in for eggs and toast.
Synchronous: you cook the eggs, then you cook the toast.
Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
Now does it make sense that multithreading is only one kind of asynchrony? Threading is about workers; asynchrony is about tasks. In multithreaded workflows you assign tasks to workers. In asynchronous single-threaded workflows you have a graph of tasks where some tasks depend on the results of others; as each task completes it invokes the code that schedules the next task that can run, given the results of the just-completed task. But you (hopefully) only need one worker to perform all the tasks, not one worker per task.
It will help to realize that many tasks are not processor-bound. For processor-bound tasks it makes sense to hire as many workers (threads) as there are processors, assign one task to each worker, assign one processor to each worker, and have each processor do the job of nothing else but computing the result as quickly as possible. But for tasks that are not waiting on a processor, you don't need to assign a worker at all. You just wait for the message to arrive that the result is available and do something else while you're waiting. When that message arrives then you can schedule the continuation of the completed task as the next thing on your to-do list to check off.
So let's look at Jon's example in more detail. What happens?
Someone invokes DisplayWebSiteLength. Who? We don't care.
It sets a label, creates a client, and asks the client to fetch something. The client returns an object representing the task of fetching something. That task is in progress.
Is it in progress on another thread? Probably not. Read Stephen's article on why there is no thread.
Now we await the task. What happens? We check to see if the task has completed between the time we created it and we awaited it. If yes, then we fetch the result and keep running. Let's suppose it has not completed. We sign up the remainder of this method as the continuation of that task and return.
Now control has returned to the caller. What does it do? Whatever it wants.
Now suppose the task completes. How did it do that? Maybe it was running on another thread, or maybe the caller that we just returned to allowed it to run to completion on the current thread. Regardless, we now have a completed task.
The completed task asks the correct thread -- again, likely the only thread -- to run the continuation of the task.
Control passes immediately back into the method we just left at the point of the await. Now there is a result available so we can assign text and run the rest of the method.
It's just like in my analogy. Someone asks you for a document. You send away in the mail for the document, and keep on doing other work. When it arrives in the mail you are signalled, and when you feel like it, you do the rest of the workflow -- open the envelope, pay the delivery fees, whatever. You don't need to hire another worker to do all that for you.
In-browser Javascript is a great example of an asynchronous program that has no multithreading.
You don't have to worry about multiple pieces of code touching the same objects at the same time: each function will finish running before any other javascript is allowed to run on the page. (Update: Since this was written, JavaScript has added async functions and generator functions. These functions do not always run to completion before any other javascript is executed: whenever they reach a yield or await keyword, they yield execution to other javascript, and can continue execution later, similar to C#'s async methods.)
However, when doing something like an AJAX request, no code is running at all, so other javascript can respond to things like click events until that request comes back and invokes the callback associated with it. If one of these other event handlers is still running when the AJAX request gets back, its handler won't be called until they're done. There's only one JavaScript "thread" running, even though it's possible for you to effectively pause the thing you were doing until you have the information you need.
In C# applications, the same thing happens any time you're dealing with UI elements--you're only allowed to interact with UI elements when you're on the UI thread. If the user clicked a button, and you wanted to respond by reading a large file from the disk, an inexperienced programmer might make the mistake of reading the file within the click event handler itself, which would cause the application to "freeze" until the file finished loading because it's not allowed to respond to any more clicking, hovering, or any other UI-related events until that thread is freed.
One option programmers might use to avoid this problem is to create a new thread to load the file, and then tell that thread's code that when the file is loaded it needs to run the remaining code on the UI thread again so it can update UI elements based on what it found in the file. Until recently, this approach was very popular because it was what the C# libraries and language made easy, but it's fundamentally more complicated than it has to be.
If you think about what the CPU is doing when it reads a file at the level of the hardware and Operating System, it's basically issuing an instruction to read pieces of data from the disk into memory, and to hit the operating system with an "interrupt" when the read is complete. In other words, reading from disk (or any I/O really) is an inherently asynchronous operation. The concept of a thread waiting for that I/O to complete is an abstraction that the library developers created to make it easier to program against. It's not necessary.
Now, most I/O operations in .NET have a corresponding ...Async() method you can invoke, which returns a Task almost immediately. You can add callbacks to this Task to specify code that you want to have run when the asynchronous operation completes. You can also specify which thread you want that code to run on, and you can provide a token which the asynchronous operation can check from time to time to see if you decided to cancel the asynchronous task, giving it the opportunity to stop its work quickly and gracefully.
Until the async/await keywords were added, C# was much more obvious about how callback code gets invoked, because those callbacks were in the form of delegates that you associated with the task. In order to still give you the benefit of using the ...Async() operation, while avoiding complexity in code, async/await abstracts away the creation of those delegates. But they're still there in the compiled code.
So you can have your UI event handler await an I/O operation, freeing up the UI thread to do other things, and more-or-less automatically returning to the UI thread once you've finished reading the file--without ever having to create a new thread.
I have a dot net core 5 console app which is processing around 100,000+ messages per min from rabbitmq
When a message is received from rabbitmq, a thread goes off and crunches some numbers, however one of those operations is to call an external API to get information about its location.
When this external API service slows down and response times go up, I see thread starvation and thread count on windows task manager can get into the 1000's and the app basically slows to doing nothing
When the app loads the main thread establishes a connection to rabbitmq and subscribes to new messages arriving in the rabbitmq, and every time a message arrives, my console app consumes each message, and starts a threadpool item and continues getting new rabbitmq messages
private void Consumer_Received(object sender, BasicDeliverEventArgs deliveryArgs)
{
var data = Encoding.UTF8.GetString(deliveryArgs.Body.ToArray());
ThreadPool.QueueUserWorkItem(new WaitCallback(StartProcessing), data);
}
If I put a breakpoint, this void keeps being hit and a new threadpool process calls the StartProcessing void which is where the cpu crunching happens and the external api call
public void StartProcessing(object xdata)
{
//1. crunch cpu
//2. call external API
}
Each message is processed in around 100ms for the cpu stuff, but the external API is taking between 80-500ms on a normal day, but when there are issues (possibly network) it can take upto 10 secs to respond to 1 request, this is when the app starts to break.
My question is surrounding this implementation and how stop thread starvation.
This is a high throughput multithreaded app and it needs to process as many messages as possible.
The app needs to relieve back pressure when the external API is slow to respond and its constantly context switching threads.
Is using ThreadPool.QueueUserWorkItem the correct implementation or should I be using Async await etc?
I'm also open to hearing if this is a bad implementation and if there is another pattern I should be using for this.
//////////////////////////////////
UPDATE 1
//////////////////////////////////
So i changed the code to use async task and its super slow to get messages from rabbitmq
The old code got all messages (200,000) within a few seconds, the new code got through about 1,000 in a few minutes
the new code is
private void Consumer_Received(object sender, BasicDeliverEventArgs deliveryArgs)
{
StartProcessing(deliveryArgs.Body.ToArray()).ConfigureAwait(false);
}
public static async Task<bool> StartProcessing(ReadOnlyMemory<byte> data)
{
await Task.Run(() =>
{
ReadOnlySpan<byte> xdata = data.Span; //defensiveCopy of in memory pointer
//do stuff
}).ConfigureAwait(false);
return true;
}
I is there something wrong with my implentation?
The "StartProcessing" code should be fire and forgot really as the main thread should continue to the next message in rabbitmq
I seems like its waiting for the message to process before continuing
//////////////////////////////////
It sounds like this is the exact scenario asynchronous functions where made for.
If you are using the CPU using background threads will help you a bit, but only up to however many hardware threads you have.
But it sounds like you are mostly blocking on network IO. Using a thread that is just blocked until some kind of IO responds is quite wasteful since each thread consumes some resources. And it can easily result in problems like maxing out the thread pool.
By now, .Net and many libraries have been updated to provide true asynchronous functions for IO. This releases the thread to do other stuff instead of blocking, and when the IO is done it will schedule the remaining work on a new background thread. And using async/await lets you write the code more or less as you would for regular synchronous code, letting the compiler rewrite it to a state machine to deal with the complicated issue of maintaining state. Ideally, you should not need to more threads than the number of hardware threads you have, since each thread should be doing actual work.
Keep in mind that just because there is a async method returning a task it does not necessarily mean it is truly asynchronous. Some base classes/interfaces, like stream, have been extended with asynchronous versions. And some library vendors, rather than doing the work of providing an actual asynchronous implementation, just wraps the synchronous method, providing no real benefit.
For example:
private async void Consumer_Received(...)
{
try{
var result = await Task.Run(()=> MyCpuBoundWork());
await MyNetworkCall(result);
}
catch{
// handle exceptions
}
}
As a message is received this will use another background thread to do the CPU bound work. I'm not sure how rabbitMq generates messages, the Task.Run part is only needed if it uses a single thread for all messages. After the CPU bound is done it will continue with the network call.
I have a windows service that is responsible for listening JMS messages. I am giving a simplified version of implementation details. As messages arrive they are handed over for processing to a different Task (thread) and limit a max number of tasks with the help of BlockingCollection. There is a retry mechanism in place to retry until the processing is successful with some amount of delay between each retry or max retry attempts are exhausted. The reason for retry mechanism is to cope with issues in Legacy applications that consume these messages. Legacy systems are built using Pessimistic locking and sometimes the processing of message runs into errors, which eventually goes thru after few retry attempts. Due to cost benefit analysis, it was decided not to address the issues in Legacy systems as those applications will be replaced in 2 to 3 years.
This retry mechanism runs on the same task thread that is responsible for handling the processing of message. Initially I used Thread.Sleep to introduce delay between each retry attempt. It worked, but when I try to shutdown the windows service, it is taking longer if there are messages currently being processed and waiting to be retried.
I then went on an adventure of implementing a way to cancel the waiting mechanism if a shutdown event was triggered.
I used two different approaches.
Option #1
One using ManualResetEvent and when I have to wait I have following code in place (posting only relevant code blocks)
private readonly ManualResetEvent _lockEvent = new ManualResetEvent(false);
if (_lockEvent.WaitOne(TimeSpan.FromMilliseconds(120000)))
{
Log.Info($"Thread interrupted. Retrying will resume after windows service restarts for message id {messageId}");
return;
}
When a shutdown event occurs, I cancel the cancellationTokenSource and set the ManualResetEvent. Everything appears to do what I want. Its just that I have to do two operations so that any code that depends on CancellationToken know to gracefully cancel and also gracefully break the retry waiting.
_subscriberCancellationTokenSource.Cancel();
_lockEvent.Set();
Option #2
After upgrading to .Net 4.6, I started using Task type wherever I can. I realized, I could use Task to implement a delay as well, so here is a simplified version of code that I tried
private void WaitBeforeRetrying(CancellationToken cancellationToken)
{
var waitingTask = Task.Delay(120000, cancellationToken);
waitingTask.Wait(cancellationToken);
}
Where ever I need delay, I just invoke the method by passing a CancellationToken
WaitBeforeRetrying(SubscriberCancellationToken);
When a shutdown event occurs, I simply invoke cancel on CancellationTokenSource and everything shuts down gracefully.
_subscriberCancellationTokenSource.Cancel();
Both Option 1 and Option 2 appears to be doing the job.
Are there any drawbacks for Option 2 over Option 1? Any other better option than what I have so far? Really appreciate any input.
UPDATE
After reading the comments from #EricLippert, I understood what I was doing wrong. Most of my threads were going into a waiting state instead of actually doing any productive work. It was the result of sprinkling few asynchronous calls in the synchronous workflow.
I now modified my delay method as follows
private async Task WaitBeforeRetrying(CancellationToken cancellationToken)
{
await Task.Delay(120000, cancellationToken);
}
And I invoke it as
await WaitBeforeRetrying(SubscriberCancellationToken);
And then refactored rest of the code to propagate async mechanism all the way to the top layer. It not only helped to easily cancel the delay if I don't have to wait, but also prevented the threads to be in a blocked state unnecessarily. Really appreciate every ones feedback.
I don't think they're any different. The effect of both is that the thread is blocked until the time runs out.
If you're using this in ASP.NET, then blocking threads is not a good thing. In that case, you can make your method async and use await Task.Delay. That'll resume the code after the delay, but allow the thread to work on other things in the mean time.
private async Task WaitBeforeRetrying(CancellationToken cancellationToken)
{
await Task.Delay(120000, cancellationToken);
}
I would like to preface this question with the following:
I'm familiar with the IAsyncStateMachine implementation that the await keyword in C# generates.
My question is not about the basic flow of control that ensures when you use the async and await keywords.
Assumption A
The default threading behaviour in any threading environment, whether it be at the Windows operating system level or in POSIX systems or in the .NET thread pool, has been that when a thread makes a request for an I/O bound operation, say for a disk read, it issues the request to the disk device driver and enters a waiting state. Of course, I am glossing over the details because they are not of moment to our discussion.
Importantly, that thread can do nothing useful until it is unblocked by an interrupt from the device driver notifying it of completion. During this time, the thread remains on the wait queue and cannot be re-used for any other work.
I would first like a confirmation of the above description.
Assumption B
Secondly, even with the introduction of TPL, and its enhancements done in v4.5 of the .NET framework, and with the language level support for asynchronous operations involving tasks, this default behaviour described in Assumption A has not changed.
Question
Then, I'm at a loss trying to reconcile Assumptions A and B with the claim that suddenly emerged in all TPL literature that:
When the, say, main thread, starts this request for this I/O bound
work, it immediately returns and continues executing the rest of
the queued up messages in the message pump.
Well, what makes that thread return back to do other work? Isn't that thread supposed to be in the waiting state in the wait queue?
You might be tempted to reply that the code in the state machine launches the task awaiter and if the awaiter hasn't completed, the main thread returns.
That beggars the question -- what thread does the awaiter run on?
And the answer that springs up to mind is: whatever the implementation of the method be, of whose task it is awaiting.
That drives us down the rabbit hole further until we reach the last of such implementations that actually delivers the I/O request.
Where is that part of the source code in the .NET framework that changes this underlying fundamental mechanism about how threads work?
Side Note
While some blocking asynchronous methods such as WebClient.DownloadDataTaskAsync, if one were to follow their code
through their (the method's and not one's own) oval tract into their
intestines, one would see that they ultimately either execute the
download synchronously, blocking the current thread if the operation
was requested to be performed synchronously
(Task.RunSynchronously()) or if requested asynchronously, they
offload the blocking I/O bound call to a thread pool thread using the
Asynchronous Programming Model (APM) Begin and End methods.
This surely will cause the main thread to return immediately because
it just offloaded blocking I/O work to a thread pool thread, thereby
adding approximately diddlysquat to the application's scalability.
But this was a case where, within the bowels of the beast, the work
was secretly offloaded to a thread pool thread. In the case of an API
that doesn't do that, say an API that looks like this:
public async Task<string> GetDataAsync()
{
var tcs = new TaskCompletionSource<string>();
// If GetDataInternalAsync makes the network request
// on the same thread as the calling thread, it will block, right?
// How then do they claim that the thread will return immediately?
// If you look inside the state machine, it just asks the TaskAwaiter
// if it completed the task, and if it hasn't it registers a continuation
// and comes back. But that implies that the awaiter is on another thread
// and that thread is happily sleeping until it gets a kick in the butt
// from a wait handle, right?
// So, the only way would be to delegate the making of the request
// to a thread pool thread, in which case, we have not really improved
// scalability but only improved responsiveness of the main/UI thread
var s = await GetDataInternalAsync();
tcs.SetResult(s); // omitting SetException and
// cancellation for the sake of brevity
return tcs.Task;
}
Please be gentle with me if my question appears to be nonsensical. The extent of knowledge of things in almost all matters is limited. I am just learning anything.
When you are talking about an async I/O operation, the truth, as pointed out here by Stephen Cleary (http://blog.stephencleary.com/2013/11/there-is-no-thread.html) is that there is no thread. An async I/O operation is completed at a lower level than the threading model. It generally occurs within interrupt handler routines. Therefore, there is no I/O thread handling the request.
You ask how a thread that launches a blocking I/O request returns immediately. The answer is because an I/O request is not at its core actually blocking. You could block a thread such that you are intentionally saying not to do anything else until that I/O request finishes, but it was never the I/O that was blocking, it was the thread deciding to spin (or possibly yield its time slice).
The thread returns immediately because nothing has to sit there polling or querying the I/O operation. That is the core of true asynchronicity. An I/O request is made, and ultimately the completion bubbles up from an ISR. Yes, this may bubble up into the thread pool to set the task completion, but that happens in a nearly imperceptible amount of time. The work itself never had to be ran on a thread. The request itself may have been issued from a thread, but as it is an asynchronous request, the thread can immediately return.
Let's forget C# for a moment. Lets say I am writing some embedded code and I request data from a SPI bus. I send the request, continue my main loop, and when the SPI data is ready, an ISR is triggered. My main loop resumes immediately precisely because my request is asynchronous. All it has to do is push some data into a shift register and continue on. When data is ready for me to read back, an interrupt triggers. This is not running on a thread. It may interrupt a thread to complete the ISR, but you could not say that it actually ran on that thread. Just because its C#, this process is not ultimately any different.
Similarly, lets say I want to transfer data over USB. I place the data in a DMA location, set a flag to tell the bus to transfer my URB, and then immediately return. When I get a response back it also is moved into memory, an interrupt occurs and sets a flag to let the system know hey, heres a packet of data sitting in a buffer for you.
So once again, I/O is never truly blocking. It could appear to block, but that is not what is happening at the low level. It is higher level processes that may decide that an I/O operation has to happen synchronously with some other code. This is not to say of course that I/O is instant. Just that the CPU is not stuck doing work to service the I/O. It COULD block if implemented that way, and this COULD involve threads. But that is not how async I/O is implemented.
I have read documenation and many tutorials on TPL but none covers model I want to achieve.
There were always fixed number of iterations for some algorithm.
I need constantly running threads (as many as possible):
while(true)
get data from MAIN thread
perform heavy time-consuming task (in separate thread)
update MAIN thread information
Additionaly I need mechanism which will be able to set alarm clock (e.g. 5 seconds). After five seconds all work must be suspended for a while and then resumed.
Should I use Task.ContinueWith the same task? But I am not processing result of previous task launch, but instead I update data structure in MAIN Thread and then decide what will be the input of new task iteration...
How can I leave to TPL decision how many task should be created for best efficiency?
No I am using BackgroundWorkers, becase they have nice RunEventCompleted event - inside it I am on my main thread so I can update my MAIN structure, check time constraints and then eventually call StartAsync again on the BackgroundWorker which completed. It is nice and clear, but probably very inneficient.
I need to make it highly efficient on multi-processor, multi-core servers.
One problem is that computation is always online, never stops. There is some networking also, which enables to ask remotely of current state of MAIN structure.
Second problem is critical time control (I must have precise timer - when it stops which no thread can be restarted). Then comes special high priority task after it ends, all work is resumed.
Third problem is that there is no upper bound for operations to do.
These three constraints, from what I observed, do not go along TPL well - I can't use something like Parallel.For because the collection is modified by results of task itself in realtime...
I don't know also how to combine:
ability to let TPL decide how many threads should be created
with sort of lifetime runing of threads (with pauses and synchronization points between consecutive restarts)
creating threads only once at the begining (they should be only restarted with constantly new parameters)
Can someone give me clues?
I know how to do it bad, inefficent way. There are some small requirements which I described, which prevent me from doing this right. I am a little bit confused.
You need to use messaging + actors + a scheduler imo. And then you need to use a language capable for it. Have a look at this code that asynchronously receives from Azure Service Bus, enqueues in a shared queue and manages runtime state through an actor.
Inline:
Should I use Task.ContinueWith the same task?
No, ContinueWith will get your program killed based on exception handling inside of each continuation passing; there's no good way in TPL to marshal failed state into the call-side/main thread.
But I am not processing result of previous task launch, but
instead I update data structure in
MAIN Thread and then decide what will be the input of new task
iteration...
You need to move beyond threading for this, unless you're willing to spend A LOT of time on the problem.
How can I leave to TPL decision how many task should be created for
best efficiency?
That's handled by the framework that runs your async workflows.
No I am using BackgroundWorkers, becase they have nice
RunEventCompleted event - inside it I am on my main thread so I can
update my MAIN structure, check time constraints and then eventually
call StartAsync again on the BackgroundWorker which completed. It is
nice and clear, but probably very inneficient. I need to make it
highly efficient on multi-processor, multi-core servers.
One problem is that computation is always online, never stops. There
is some networking also, which enables to ask remotely of current
state of MAIN structure. Second problem is critical time control (I
must have precise timer - when it stops which no thread can be
restarted).
If you run everything asynchronously, you can pass messages to your actor that suspends it. You scheduling actor is responsible for calling all its subscribers with their schedulled messages; have a look at the paused state in the code linked. If you have outstanding requests you can pass them a cancellation token and handle a 'hard' cancellation/socket abort that way.
Then comes special high priority task after it ends, all
work is resumed. These two constraints, from what I observed, do not
go along TPL well - I can't use something like Parallel.For because
the collection is modified by results of task itself in realtime...
You probably need a pattern called pipes-and-filters. You pipe your input into a chain of workers (actors); each worker consumes from the other worker's output. Signalling is done using a control channel (in my case that is the inbox of the actor).
I think you should read
MSDN: How to implement a producer / consumer dataflow pattern
I had the same problem: one producer produced items, while several consumers consumed them and decided to send them to other consumers. Each consumer was working asynchronously and independent from other consumers.
Your main task is the producer. He produces items that your other tasks should process. The class with the code of your main task has a function:
public async Task ProduceOutputAsync(...)
Your main program starts this Task using:
var producerTask = Task.Run( () => MyProducer.ProduceOutputAsync(...)
Once this is called the producer task starts producing output. Meanwhile your main program can continue doing other things, like for instance start the consumers.
But let's first focus on the Producer task.
The producer task produces items of type T to be processed by other tasks. They are carried over to the other task using objects that implement ITargetBlock'.
Every time the producer task has finished creating an object of type T it sends it to the target block using ITargetBlock.Post, or preferably the async version:
while (continueProducing())
{
T product = await CreateProduct(...)
bool accepted = await this.TargetBlock(product)
// process the return value
}
// if here, nothing to produce anymore. Notify the consumers:
this.TargetBlock.Complete();
The producer needs an ITargetBlock<T>. In my application a BufferBlock<T> was enough. Check MSDN for the other possible targets.
Anyway, the data flow block should also implement ISourceBlock<T>. Your receiver waits for input to arrive at the source, fetches it and processes it. Once finished, it can send the result to its own target block, and wait for the next input until there is no input expected anymore. Of course if your consumer doesn't produce output it doesn't have to send anything to a target.
Waiting for input is done as follows:
ISourceBlock`<T`> mySource = ...;
while (await mySource.ReceiveAsync())
{ // a object of type T is available at the source
T objectToProcess = await mySource.ReceiveAsync();
// keep in mind that someone else might have fetched your object
// so only process it if you've got it.
if (objectToProcess != null)
{
await ProcessAsync(objectToProcess);
// if your processing produces output send the output to your target:
var myOutput = await ProduceOutput(objectToprocess);
await myTarget.SendAsync(myOutput);
}
}
// if here, no input expected anymore, notify my consumers:
myTarget.Complete();
construct your producer
construct all consumers
give the producer a BufferBlock to send its output to
Start the producer MyProducer.ProduceOutputAsync(...)
While the producer produces output and sends it to the buffer block:
give the consumers the same BufferBlock
Start the consumers as a separate task
await Task.WhenAll(...) to wait for all tasks to complete.
Each consumer will stop as soon as it hears that no input is expected anymore.
After all tasks have completed your main function can read the results and return