I have a fairly simple Azure Worker Process application with a main thread that calls a number of functions.
Each function basically grabs some data via an API call to an external site, manipulates the data, then pushes it somewhere else via an API call. Each function calls a different external site.
The code looks like this:
public partial class WorkerRole : RoleEntryPoint
{
public override void Run()
{
Trace.TraceInformation("Entry point called", "Information");
while (true)
{
Function1();
Function2();
// etc etc, there are nearly a hundred functions
}
Thread.Sleep(Timeout.Infinite);
}
}
I have no experience in using the Async properties, and very little with Azure itself. Each function takes an hour or so to run, so my 100 functions take 100 hours. How can I change the function to run async? Is it all within .NET, or do I need to programatically spin up additional Azure processes and call different functions to them?
Thanks for your help.
The worker role is your application entry point, the interface between your code and the host code. The Run method is how Azure gives you a thread to run your application code. If you need to run lots of work in parallel, you can use normal .NET threading techniques to do so:
Create a new System.Threading.Thread to run each task on. Good for long-running operations where you will not yield the thread.
Create tasks using Task.Factory.StartNew to have them scheduled for execution on the Thread Pool. Good if you have short-running tasks, or tasks which regularly yield the thread.
Unless you have 100 cores available, trying to make 100 tasks execute simultaneously will cause a lot of thread-switching overhead and may result in substantially slower performance than you could reach if you were to queue up the activities to work over a smaller number of thread.
If you say you have 100 things which take 1 hour each, are they all doing CPU-central activities for that full hour? If, like most applications, they're spending most of their time making requests of other resources and waiting for results, you should try and take advantage of Asynchronous programming techniques, like using Task with async and await to yield the thread often.
Yielding a thread is the best way to increase your system throughput - it's able to do other work whilst you wait for the task to complete. Your CPU cores are kept busy and all the work can proceed more quickly.
Related
I have a dot net core 5 console app which is processing around 100,000+ messages per min from rabbitmq
When a message is received from rabbitmq, a thread goes off and crunches some numbers, however one of those operations is to call an external API to get information about its location.
When this external API service slows down and response times go up, I see thread starvation and thread count on windows task manager can get into the 1000's and the app basically slows to doing nothing
When the app loads the main thread establishes a connection to rabbitmq and subscribes to new messages arriving in the rabbitmq, and every time a message arrives, my console app consumes each message, and starts a threadpool item and continues getting new rabbitmq messages
private void Consumer_Received(object sender, BasicDeliverEventArgs deliveryArgs)
{
var data = Encoding.UTF8.GetString(deliveryArgs.Body.ToArray());
ThreadPool.QueueUserWorkItem(new WaitCallback(StartProcessing), data);
}
If I put a breakpoint, this void keeps being hit and a new threadpool process calls the StartProcessing void which is where the cpu crunching happens and the external api call
public void StartProcessing(object xdata)
{
//1. crunch cpu
//2. call external API
}
Each message is processed in around 100ms for the cpu stuff, but the external API is taking between 80-500ms on a normal day, but when there are issues (possibly network) it can take upto 10 secs to respond to 1 request, this is when the app starts to break.
My question is surrounding this implementation and how stop thread starvation.
This is a high throughput multithreaded app and it needs to process as many messages as possible.
The app needs to relieve back pressure when the external API is slow to respond and its constantly context switching threads.
Is using ThreadPool.QueueUserWorkItem the correct implementation or should I be using Async await etc?
I'm also open to hearing if this is a bad implementation and if there is another pattern I should be using for this.
//////////////////////////////////
UPDATE 1
//////////////////////////////////
So i changed the code to use async task and its super slow to get messages from rabbitmq
The old code got all messages (200,000) within a few seconds, the new code got through about 1,000 in a few minutes
the new code is
private void Consumer_Received(object sender, BasicDeliverEventArgs deliveryArgs)
{
StartProcessing(deliveryArgs.Body.ToArray()).ConfigureAwait(false);
}
public static async Task<bool> StartProcessing(ReadOnlyMemory<byte> data)
{
await Task.Run(() =>
{
ReadOnlySpan<byte> xdata = data.Span; //defensiveCopy of in memory pointer
//do stuff
}).ConfigureAwait(false);
return true;
}
I is there something wrong with my implentation?
The "StartProcessing" code should be fire and forgot really as the main thread should continue to the next message in rabbitmq
I seems like its waiting for the message to process before continuing
//////////////////////////////////
It sounds like this is the exact scenario asynchronous functions where made for.
If you are using the CPU using background threads will help you a bit, but only up to however many hardware threads you have.
But it sounds like you are mostly blocking on network IO. Using a thread that is just blocked until some kind of IO responds is quite wasteful since each thread consumes some resources. And it can easily result in problems like maxing out the thread pool.
By now, .Net and many libraries have been updated to provide true asynchronous functions for IO. This releases the thread to do other stuff instead of blocking, and when the IO is done it will schedule the remaining work on a new background thread. And using async/await lets you write the code more or less as you would for regular synchronous code, letting the compiler rewrite it to a state machine to deal with the complicated issue of maintaining state. Ideally, you should not need to more threads than the number of hardware threads you have, since each thread should be doing actual work.
Keep in mind that just because there is a async method returning a task it does not necessarily mean it is truly asynchronous. Some base classes/interfaces, like stream, have been extended with asynchronous versions. And some library vendors, rather than doing the work of providing an actual asynchronous implementation, just wraps the synchronous method, providing no real benefit.
For example:
private async void Consumer_Received(...)
{
try{
var result = await Task.Run(()=> MyCpuBoundWork());
await MyNetworkCall(result);
}
catch{
// handle exceptions
}
}
As a message is received this will use another background thread to do the CPU bound work. I'm not sure how rabbitMq generates messages, the Task.Run part is only needed if it uses a single thread for all messages. After the CPU bound is done it will continue with the network call.
I have a nightly C# Windows service that updates tables and calls a long running (~ hour) stored procedure on 5-10 Azure SQL databases. Since I can run all database work at the same time in Azure SQL with little to no performance hit, I'd like to run them asynchronously. I can either call a method with async void (using Await Task.Delay(10)) or create a new thread.
What is the better approach?... does it matter?
EDIT: Here is an example of the asynch void method. I want to make sure it's clear what it's doing.
public static async void DoRanking(object e)
{
// will start all orgs concurrently
await Task.Delay(10);
... do a bunch of database work here ...
... call a long running stored procedure here ...
}
You can use the older Thread/ThreadPool APIs or the newer TPL. But in either case you shouldn't just start a long-running process on a threadpool thread.
Use a new Thread or a LongRunning Task:
Specifies that a task will be a long-running, coarse-grained operation
involving fewer, larger components than fine-grained systems. It
provides a hint to the TaskScheduler that oversubscription may be
warranted. Oversubscription lets you create more threads than the
available number of hardware threads. It also provides a hint to the
task scheduler that an additional thread might be required for the
task so that it does not block the forward progress of other threads
or work items on the local thread-pool queue.
Documentation for TaskCreationOptions.LongRunning
I have an async call (DoAsyncWork()), that I would like to start in a fire-and-forget way, i.e. I'm not interesting in its result and would like the calling thread to continue even before the async method is finished.
What is the proper way to do this? I need this in both, .NET Framework 4.6 as well as .NET Core 2, in case there are differences.
public async Task<MyResult> DoWorkAsync(){...}
public void StarterA(){
Task.Run(() => DoWorkAsync());
}
public void StarterB(){
Task.Run(async () => await DoWorkAsync());
}
Is it one of those two or something different/better?
//edit: Ideally without any extra libraries.
What is the proper way to do this?
First, you need to decide whether you really want fire-and-forget. In my experience, about 90% of people who ask for this actually don't want fire-and-forget; they want a background processing service.
Specifically, fire-and-forget means:
You don't care when the action completes.
You don't care if there are any exceptions when executing the action.
You don't care if the action completes at all.
So the real-world use cases for fire-and-forget are astoundingly small. An action like updating a server-side cache would be OK. Sending emails, generating documents, or anything business related is not OK, because you would (1) want the action to be completed, and (2) get notified if the action had an error.
The vast majority of the time, people don't want fire-and-forget at all; they want a background processing service. The proper way to build one of those is to add a reliable queue (e.g., Azure Queue / Amazon SQS, or even a database), and have an independent background process (e.g., Azure Function / Amazon Lambda / .NET Core BackgroundService / Win32 service) processing that queue. This is essentially what Hangfire provides (using a database for a queue, and running the background process in-proc in the ASP.NET process).
Is it one of those two or something different/better?
In the general case, there's a number of small behavior differences when eliding async and await. It's not something you would want to do "by default".
However, in this specific case - where the async lambda is only calling a single method - eliding async and await is fine.
It depends on what you mean by proper :)
For instance: are you interested in the exceptions being thrown in your "fire and forget" calls? If not, than this is sort of fine. Though what you might need to think about is in what environment the task lives.
For instance, if this is a asp.net application and you do this inside the lifetime of a thread instantiated due to a call to a .aspx or .svc. The Task becomes a background thread of that (foreground)thread. The foreground thread might get cleaned up by the application pool before your "fire and forget" task is completed.
So also think about in which thread your tasks live.
I think this article gives you some useful information on that:
https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
Also note that if you do not return a value in your Tasks, a task will not return exception info. Source for that is the ref book for microsoft exam 70-483
There is probably a free version of that online somewhere ;P https://www.amazon.com/Exam-Ref-70-483-Programming-C/dp/0735676828
Maybe useful to know is that if your have an async method being called by a non-async and you wish to know its result. You can use .GetAwaiter().GetResult().
Also I think it is important to note the difference between async and multi-threading.
Async is only useful if there are operations that use other parts of a computer that is not the CPU. So things like networking or I/O operations. Using async then tells the system to go ahead and use CPU power somewhere else instead of "blocking" that thread in the CPU for just waiting for a response.
multi-threading is the allocation of operations on different threads in a CPU (for instance, creating a task which creates a background thread of the foreground thread... foreground threads being the threads that make up your application, they are primary, background threads exist linked to foreground threads. If you close the linked foreground thread, the background thread closes as well)
This allows the CPU to work on different tasks at the same time.
Combining these two makes sure the CPU does not get blocked up on just 4 threads if it is a 4 thread CPU. But can open more while it waits for async tasks that are waiting for I/O operations.
I hope this gives your the information needed to do, what ever it is you are doing :)
I know this question might be a bit trivial, but all the answers I find on the internet leave me confused.
I'm aware with basic principles of how async/await works (how await asynchroniously waits for the task to complete not blocking the main thread),
but I don't understand its real benefit, because it seems to me everything you do with async/await you can do using Task Paralel Library.
Please consider this example, to better understand what I mean:
Let's say I have a SuperComplexMethod that returns some value and I would like to execute it in parallel, meanwhile doing some other things. Normally I would do it this way:
internal class Program
{
private static void Main()
{
//I will start a task first that will run asynchroniously
var task = Task.Run(() => SuperComplexMethod());
//Then I will be doing some other work, and then get the result when I need it
Console.WriteLine("Doing some other work...");
var result = task.Result;
}
static string SuperComplexMethod()
{
Console.WriteLine("Doing very complex calculations...");
Thread.Sleep(3000);
return "Some result";
}
}
Here how I would have to do it using async/await:
internal class Program
{
private static void Main()
{
var task = SuperComplexMethodAsync();
Console.WriteLine("Doing some other work...");
var result = task.Result;
}
//I have to create this async wrapper that can wait for the task to complete
async static Task<string> SuperComplexMethodAsync()
{
return await Task.Run(() => SuperComplexMethod());
}
static string SuperComplexMethod()
{
Console.WriteLine("Doing very complex calculations...");
Thread.Sleep(3000);
return "Some result";
}
}
As you can see in the second example in order to use async/await approach, I have to create a wrapper method that starts a task and asynchronously waits for it to complete. Obviously it seems redundant to me, because I can achieve the very same behavior without using this wrapper marked async/await.
Can you please explain me what is so special about async/await, and what actual benefits it provides over using tools of Task Parallel Library alone?
Arguably the main reason to use async/await is thread sparing. Imagine the following scenario (I'll simplify to make the point): a) you have a web application that has 10 threads available to process incoming requests; b) all requests involve I/O (e.g. connecting to a remote database, connecting to upstream network services via HTTP/SOAP) to process/complete; c) each request takes 2 seconds to process.
Now imagine 20 requests arrive at about the same time. Without async/await, your web app would start to process the first 10 requests. While this is happening the other 10 would just sit in the queue for 2 seconds, with your web app out of threads and hence unable to process them. Only when the first 10 complete would the second 10 begin to be processed.
Under async/await, the first 10 requests would instead begin tasks, and, while awaiting those tasks, the threads that were processing them would be returned to the web app to process other requests. So your web app would begin processing the second 10 almost straight away, rather than waiting. As each of the awaited tasks from the first 10 completes, the web app would continue processing the rest of their methods, either on a thread-pool thread or one of the web app's threads (which it is depends on how you configure things). We can usually expect in an I/O scenario that the I/O is by far the bulk of the duration of the call, so we can make a reasonable assumption that in the above scenario, the network/database call might take 1.9s and the rest of the code (adapting DTOs, some business logic, etc.) might take 0.1s. If we assume the continuation (after the await) is processed by a web app thread, that thread is now only tied up for 0.1 of the 2 seconds, instead of the full 2 seconds in the non async/await scenario.
You might naturally think: well I've just pushed the threads out of one pool of threads and into another, and that will eventually fill up too. To understand why this isn't really true in practise in truly async scenarios, you need to read There Is No Thread.
The upshot is that you are now able to concurrently process many more requests than you have threads available to process them.
You'll notice the above is focused on I/O, and that's really where async/await shines. If your web app instead processed requests by performing complex mathematical calculations using the CPU, you would not see the above benefit, hence why async/await is not really suited for nor intended for use with CPU-bound activities.
Before others jump in with all the exceptions to the rules (and there are some), I'm only presenting a vanilla simplified scenario to show the value of async/await in I/O-bound scenarios. Covering everything about async/await would create a very long answer (and this one is long enough already!)
I should also add that there are other ways to process web requests asynchronously, ways that pre-date async/await, but async/await very significantly simplifies the implementation.
--
Moving briefly to say a WinForms or similar app, the scenario is very similar, except now you really only have one thread available to process UI requests, and any time you hold onto that thread, the UI will be unresponsive, so you can use a similar approach to move long-running operations off the UI thread. In the UI scenario, it becomes more reasonable to perform CPU-bound operations off the UI thread as well. When doing this, a thread pool thread will instead perform that CPU work, freeing up the UI thread to keep the UI responsive. Now there is a thread, but at least it's not the UI one. This is generally called "offloading", which is one of the other primary uses for async/await.
--
Your example is a console app - there's often not a lot to be gained in that context, except for the ability to fairly easily (arguably more easily than creating your own threads) execute several requests concurrently on the thread pool.
When using async and await the compiler generates a state machine in the background
public async Task MyMethodAsync()
{
Task<int> longRunningTask = LongRunningOperationAsync();
// independent work which doesn't need the result of LongRunningOperationAsync can be done here
//and now we call await on the task
int result = await longRunningTask;
//use the result
Console.WriteLine(result);
}
public async Task<int> LongRunningOperationAsync() // assume we return an int from this long running operation
{
await Task.Delay(1000); // 1 second delay
return 1;
}
so what happens here:
Task longRunningTask = LongRunningOperationAsync(); starts
executing LongRunningOperation
Independent work is done on let's assume the Main Thread (Thread ID
= 1) then await long running task is reached.
Now, if the longRunningTask hasn't finished and it is still running, MyMethodAsync() will return to its calling method, thus the main thread doesn't get blocked. When the longRunningTask is done then a thread from the ThreadPool (can be any thread) will return to MyMethodAsync() in its previous context and continue execution (in this case printing the result to the console).
A second case would be that the longRunningTask has already finished its execution and the result is available. When reaching the await longRunningTask we already have the result so the code will continue executing on the very same thread. (in this case printing result to console). Of course this is not the case for the above example, where there's a Task.Delay(1000) involved.
For More ... Refer :=>
Async/Await - Best Practices
Simplifying Asynchronous
Here's the setup: I'm trying to make a relatively simple Winforms app, a feed reader using the FeedDotNet library. The question I have is about using the threadpool. Since FeedDotNet is making synchronous HttpWebRequests, it is blocking the GUI thread. So the best thing seemed like putting the synchronous call on a ThreadPool thread, and while it is working, invoke the controls that need updating on the form. Some rough code:
private void ThreadProc(object state)
{
Interlocked.Increment(ref updatesPending);
// check that main form isn't closed/closing so that we don't get an ObjectDisposedException exception
if (this.IsDisposed || !this.IsHandleCreated) return;
if (this.InvokeRequired)
this.Invoke((MethodInvoker)delegate
{
if (!marqueeProgressBar.Visible)
this.marqueeProgressBar.Visible = true;
});
ThreadAction t = state as ThreadAction;
Feed feed = FeedReader.Read(t.XmlUri);
Interlocked.Decrement(ref updatesPending);
if (this.IsDisposed || !this.IsHandleCreated) return;
if (this.InvokeRequired)
this.Invoke((MethodInvoker)delegate { ProcessFeedResult(feed, t.Action, t.Node); });
// finished everything, hide progress bar
if (updatesPending == 0)
{
if (this.IsDisposed || !this.IsHandleCreated) return;
if (this.InvokeRequired)
this.Invoke((MethodInvoker)delegate { this.marqueeProgressBar.Visible = false; });
}
}
this = main form instance
updatesPending = volatile int in the main form
ProcessFeedResult = method that does some operations on the Feed object. Since a threadpool thread can't return a result, is this an acceptable way of processing the result via the main thread?
The main thing I'm worried about is how this scales. I've tried ~250 requests at once. The max number of threads I've seen was around 53 and once all threads were completed, back to 21. I recall in one exceptional instance of me playing around with the code, I had seen it rise as high as 120. This isn't normal, is it? Also, being on Windows XP, I reckon that with such high number of connections, there would be a bottleneck somewhere. Am I right?
What can I do to ensure maximum efficiency of threads/connections?
Having all these questions also made me wonder whether this is the right case for a Threadpool use. MSDN and other sources say it should be used for "short-lived" tasks. Is 1-2 seconds "short-lived" enough, considering I'm on a relatively fast connection? What if the user is on a 56K dial-up and one request could take from 5-12 seconds and ever more. Would the threadpool be an efficient solution then too?
The ThreadPool, unchecked is probably a bad idea.
Out of the box you get 250 threads in the threadpool per cpu.
Imagine if in a single burst you flatten out someones net connection and get them banned from getting notifications from a site cause they are suspected to be running a DoS attack.
Instead, when downloading stuff from the net you should build in tons of control. The user should be able to decide how many concurrent requests they make (and how many concurrent requests per domain), ideally you also want to offer controls for the amount of bandwidth.
Though this could be orchestrated with the ThreadPool, having dedicated threads or using something like a bunch of instances of the BackgroundWorker class is a better option.
My understanding of the ThreadPool is that it is designed for this type of situation. I think the definition of short-lived is of this order of time - perhaps even up to minutes. A "long-lived" thread would be one that was alive for the lifetime of the application.
Don't forget Microsoft would have spent some getting the efficiency of the ThreadPool as high as it could. Do you think that you could write something that was more efficient? I know I couldn't.
The .NET thread pool is designed specifically for executing short-running tasks for which the overhead of creating a new thread would negate the benefits of creating a new thread. It is not designed for tasks which block for prolonged periods or have a long execution time.
The idea is to for a task to hop onto a thread, run quickly, complete and hop off.
The BackgroundWorker class provides an easy way to execute tasks on a thread pool thread, and provides mechanisms for the task to report progress and handle cancel requests.
In this MSDN article on the BackgroundWorker Component, file downloads are explicitly given as examples of the appropriate use of this class. That should hopefully encourage you to use this class to perform the work you need.
If you're worried about overusing the thread pool, you can be assured the runtime does manage the number of available threads based on demand. Tasks are queued on the thread pool for execution. When a thread becomes available to do work, the task is loaded onto the thread. At regular intervals, a monitoring process checks the state of the thread pool. If there are tasks waiting to be executed, it can create more threads. If there are several idle threads, it can shut down some to release resources.
In a worse-case scenario, where all threads are busy and you have work queued up, the runtime will be adding threads to deal with the extra workload. The application will be running more slowly as it has to wait for more threads to be made available, but it will continue to run.
A few points, and to combine info form a few other answers:
your ThreadProc does not contain Exception handling. You should add that or 1 I/O error will halt your process.
Sam Saffron is quite right that you should limit the number of threads. You could use a (ThreadSafe) Queue to push your feeds into (WorkItems) and have 1+ threads reading from the queue in a loop.
The BackgrounWorker might be a good idea, it would provide you with both the Exception handling and Synchronization you need.
And the BackgrounWorker uses the ThreadPool, and that is fine
You may want to take a look to the "BackgroundWorker" class.