Is multithreading an API application a thing? - c#

I'm learning C#/DOTNET as one of the main reasons are incredible speeds over Node.js and OO syntax.
Now the tutorial I am following all of a sudden introduced async, and that's cool, but I could have done that with Node.js as well, so I feel a little disappointed.
My thought was maybe we could take this to the next level with Multithreading, but a lot of questions came up, with discrepancy in the database (like thread one is expecting to get data that thread two updated, but thread two was not executed before thread one retrieved, so thread one is working with an outdated data).
And searching for this seems to return very little information, mostly it's people misunderstanding multithreading and asynchronous programing.
So I'm guessing you would not want to mix API with multithreading?

Yes, it's a thing, and you're already doing it with async tasks.
.NET has a Task Scheduler that assigns your tasks to available threads from the Thread Pool. Default behavior is to create a pool of threads for each available CPU.
Clarification: this doesn't mean 1 task : 1 thread. There's a large collection of work to be done by a number of workers. Scheduler hands a worker a job, worker works until it's done or an 'await' is reached.
From the perspective of a regular async method, it can be hard to see where the 'multi-threading' comes into play. There isn't an obvious difference between Get() and await GetAsync() when your code has to sit and wait either way.
But it's not always about your code. This example might make it more clear.
List<Task> work = new();
foreach(var uri in uriList)
{
work.Add(http.GetAsync(uri));
}
await Task.WhenAll(work);
This code will execute all those GetAsyncs at the same time.
The framework making your API work is doing something similar. It would be pretty silly if the whole server was tied up because a single user requested a big file over dialup.

Async await is used for multi-threading but it is not used only for multi-threading.
I have not pesronally used/seen multi-threading in API but only console jobs. Using TPL in console jobs has improved the efficiency more than 100% for me
Async/Await is powerful and should be used for asynchronic processing in API's too.
Please go through Shiv's videos https://www.youtube.com/watch?v=iMcycFie-nk

Related

Is it bad practice to have async call within Task.Run?

I have a C# console app processing around 100,000 JSON messages from RabbitMQ every 1 min
After getting each/a bunch of messages from RabbitMQ I then call
await Task.Run(async () =>
{
//do lots of CPU stuff here, including 2 external API calls using await async call
}
Everything I've read says use await Task.Run for CPU bound operations. And use await async for the HTTP external calls.
If I change it to:
await Task.Run(() =>
Then it complains as I have an async API call in the lines below, so it needs the async keyword in the Task.Run statement.
There are about 2000+ (complex if then business rules) lines of code in this section, and the sometimes the API call is not needed.
So I'm faced with either a massive restructure of the application, with lots of testing needed, or if its ok to do API calls alongside the CPU bound operations then I'll leave it as is.
To summarise, is this bad practice, or is it ok to have CPU bound work and API calls inside the same task? The task is processing one JSON message.
Everything I've read says use await task.run for cpu bound operations . And use await async for the http external calls
The general guideline is to use async/await for I/O. Task.Run is useful for CPU-bound operations if you need to offload them from a UI thread. For example, in server scenarios such as ASP.NET, you wouldn't want to use Task.Run for CPU-bound code. This is because ASP.NET already schedules your code on a separate thread pool thread.
In your case, you have a Console application, which doesn't have a UI thread. But it also doesn't have that automatic scheduling onto a thread pool thread that ASP.NET gives you.
if its ok to do api calls alongside the cpu bound operations then i'll leave it as is.
This is fine either way. Since the code is awaiting the Task.Run, it won't continue (presumably processing the next message) until the operation completes on another thread pool thread. So the Task.Run isn't helping much, but it isn't hurting much, either.
If you need more performance - specifically, processing messages concurrently - then you should look into something like TPL Dataflow or System.Threading.Channels that would allow you to replace the Task.Run with a queue of work that can run in parallel. That would give you something more like what ASP.NET provides out of the box.
General
(...) use await Task.Run for CPU bound operations. And use await async for the HTTP external calls.
This advice comes from the fact that if you run code that doesn't 'let go' enough then you may not get a lot of the benefit Tasks give because the current thread / thread pool just cannot handle the work. In the extreme case when you run 100% synchronous code you won't get any parallelism because the current thread cannot let go and cannot do any other work - your tasks would be executed sequentially. It's important to remember that this is not a style issue; running busy synchronous code with Tasks just work well in some scenarios. In this sense the problem polices itself: if you structure the solution incorrectly it doesn't do what you need.
If you run a mixture of busy and waiting then Task.Run may or may not be great and it will depend on the specific workload. If it works for you, it's fine - you're not doing anything incorrect.
Generally the picture is nuanced and tasks can be and are used to do all kinds of jobs. In certain circumstances the situation is clear cut - e.g. if you run long running work in the UI thread you will lock the UI which is bad or if you have (long-running) busy synchronous code. It's worth keeping in mind that this has been a problem before C# had Tasks.
BTW. If you look at the reference documentation for Task.WhenAll Method it contains examples with both I/O (ping) and CPU (dummy for loop) style work. Yes, these are toy examples but it shows it isn't incorrect to run both types of work with tasks.
Parallel.ForEachAsync?
If you can use .NET 6, Parallel.ForEachAsync could improve performance of your solution and/or make the code look cleaner. Example on Twitter (as picture!).

Can I change 'Task.Run' to 'async void'? (Xamarin.forms)

I'm making app with using XF pcl. Even I launched my app on the store already, I'm still newbie in c# world. I'm having a trouble especially using a Thread.
In XF/iOS, I faced after I launched app and took a while(longer than a day), all of Task.Run() of my code does not start new thread. A person advised me if there is a chance that I'm starting many thread and somehow they are not terminated. So new thread's not started.
So I searched my project and I have Task.Run at about 20 places in my code.
I used it when I call 'async Task' method even it background thread is not necessary.
So, I'm going to change it by using 'async void'. But I already changed it like this. and no problem.
Let's say AAAAA() is a 'async Task' method from some nuget library I'm using. So
I can not change method.
void Something()
{
...
Task.Run(async () => await XXXXX.AAAAA());
...
}
to
async void Something()
{
...
await XXXXX.AAAAA();
...
}
But sometimes, I faced that I can't change a method to async easily. So I'm going to change like this at that time.
void Something()
{
...
AA();
...
}
async void AA()
{
await XXXXX.AAAAA();
}
Is this OK unless background thread is not necessary?
I ask this question because I watched lots of videos that saying not to use "Async void".
I wonder if I could use like this if there seems no problem.
Any advice will help me.
Thanks.
Don't do async void. There are several worst practices about it.
Instead, try to solve your threading problems from the root with a good approach to asynchronous programming.
1. Define your task boundaries
Do not just "fire and forget". Expect your task to end and release resources. There are good reasons not to do Task.Run(...) and forget about it.
Async methods exist for a reason. They return in the Future (to quote the Java world). If you fire too many Async task that take long time to complete or get stuck in a loop, you drain your system resources and may end up unable to spawn new tasks.
So analyze your prolem, don't just run random methods from random packages. Design your workflow and identify parallelisms.
A simple straightforward solution is to Task.Run(()=>).Wait(). This destroys all kinds of parallelism but will constrain the resources and, most importantly, adheres with your synchronous programming.
2. A Task is not a Thread
While I discourage the unbounded/uncontrolled use of threads, the truth is that Task.Run(...) won't necessarily spawn new threads. It may not actually do anything under some circumstances.
For example I was forced to do this to force starting a new thread
Task.Factory.StartNew(()=>..., cancellationToken: tokenSource.Token, creationOptions:
TaskCreationOptions.LongRunning, scheduler: TaskScheduler.Default);
TaskCreationOptions.LongRunning tells the Task factory to use an available separate thread. Normally Task.Run runs on the same current thread by exploiting VM waits to run code from other tasks, so as to perform a lightweight context switch. If your synchronous code blocks in a synchronous way the runtime may not give control to other tasks.
3. TPL is made for 2 things
One is responsiveness. If your application is completely asynchronous, then a good use of the TPL leaves your UI thread responsive over waits, e.g. if you click on a button you won't see the whole window greyed and "stuck". This behaviour was introduced by Microsoft to help developers that are unfriendly with proper multithread programming
The other is I/O optimization. If you need to download 5 files, parse a text file from disk and store a bunch of rows in the database you can fire 7 task that leverage the I/O wait times of each task (e.g. SSL handshake, disk buffering, SQL response wait) so that the 7 tasks will reasonably complete by the time of the longest.
If you just invoke asynchronous methods because you found them on your NuGet library you are just doing it wrong, as you may need to invoke the corresponding synchronous version
Summarizing
Your question reveals a lack of understanding of parallel programming. In fact you said you are new to C#. Welcome to the world of .NET.
Parallel programming is not easy, and without a knowledge of your application design it is impossible to help you in a single short answer. You need to take several examples and/or ask questions about specific best practice for some parts of your application by posting real or simil-real code.

Async-Await vs ThreadPool vs MultiThreading on High-Performance Sockets (C10k Solutions?)

I'm really confused about async-awaits, pools and threads. The main problem starts with this question: "What can I do when I have to handle 10k socket I/O?" (aka The C10k Problem).
First, I tried to make a custom pooling architecture with threads
that uses one main Queue and multiple Threads to process all
incoming datas. It was a great experience about understanding
thread-safety and multi-threading but thread is an overkill
with async-await nowadays.
Later, I implemented a simple architecture with async-await but I
can't understand why "The async and await keywords don't cause
additional threads to be created." (from MSDN)? I think there
must be some threads to do jobs like BackgroundWorker.
Finally, I implemented another architecture with ThreadPool and it
looks like my first custom pooling.
Now, I think there should be someone else with me who confused about handling The C10k. My project is a dedicated (central) server for my game project that is hub/lobby server like MCSG's lobbies or COD's matchmaking servers. I'll do the login operations, game server command executions/queries and information serving (like version, patch).
Last part might be more specific about my project but I really need some good suggestions about real world solutions about multiple (heavy) data handling.
(Also yes, 1k-10k-100k connection handling depending on server hardware but this is a general question)
The key point: Choosing Between the Task Parallel Library and the ThreadPool (MSDN Blog)
[ADDITIONAL] Good (basic) things to read who wants to understand what are we talking about:
Threads
Async, Await
ThreadPool
BackgroundWorker
async/await is roughly analogous to the "Serve many clients with each thread, and use asynchronous I/O and completion notification" approach in your referenced article.
While async and await by themselves do not cause any additional threads, they will make use of thread pool threads if an async method resumes on a thread pool context. Note that the async interaction with ThreadPool is highly optimized; it is very doubtful that you can use Thread or ThreadPool to get the same performance (with a reasonable time for development).
If you can, I'd recommend using an existing protocol - e.g., SignalR. This will greatly simplify your code, since there are many (many) pitfalls to writing your own TCP/IP protocol. SignalR can be self-hosted or hosted on ASP.NET.
No. If we use asynchronous programming pattern that .NET introduced in 4.5, in most of the cases we need not to create manual thread by us. The compiler does the difficult work that the developer used to do. Creating a new thread is costly, it takes time. Unless we need to control a thread, then “Task-based Asynchronous Pattern (TAP)” and “Task Parallel Library (TPL)” is good enough for asynchronous and parallel programming. TAP and TPL uses Task. In general Task uses the thread from ThreadPool(A thread pool is a collection of threads already created and maintained by .NET framework. If we use Task, most of the cases we need not to use thread pool directly. A thread can do many more useful things. You can read more about Thread Pooling
You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. Asynchrony is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
Await is specifically designed to deal with something taking time, most typically an I/O request. Which traditionally was done with a callback when the I/O request was complete. Writing code that relies on these callbacks is quite difficult, await greatly simplifies it. Await just takes care of dealing with the delay, it doesn't otherwise do anything that a thread does. The await expression, what's at the right of the await keyword, is what gets the job done. You can use Async with any method that returns a Task. The XxxxAsync() methods are just precooked ones in the .NET framework for common operations that take time. Like downloading data from a web server.
I would recommend you to read Asynchronous Programming with Async and Await

Scheduling of I/O-bound operations in .NET

If I'm on a thread which doesn't need to be responsive, and for which continued execution relies on the result of an I/O bound call (HttpClient request), is there any value in implementing the call asynchronously in .NET?
Will Windows know that I'm waiting on an I/O operation and refrain from scheduling the thread until data arrives?
I recall reading somewhere that it does, but I'm afraid I still have difficulty understanding how this works and when I can rely on it.
No, there is no value in using async there. As you suspect, Windows will know that the thread is waiting for IO and won't schedule the thread until the data arrives.
However, the idea of async is that you don't really need to create a new thread. The idea of async is that (I'm cutting a few corners here; there is better documentation available on the Internet) it tries to do something like you're doing here manually. So instead of you having to create a new thread, async does this for you. (It doesn't actually create a new thread, but you get the idea.)
If this needs to be high performance, I would not advise to do it the way you're implementing it now. Async would be much better for this. In your case, when you're doing 1000 requests, you would have 1000 threads, which is not a good idea. Async would accomplish this a lot smarter, and will give you better performance.
The basic advantage of using async (besides performance) is that it's like you're actually programming only on the UI thread. Previously, that would have locked up your application, but with async your application stays responsive. That's really the primary advantage of async.

How to do multi-threading with asynchronous webrequests

I'm trying to implement .NET 4 helper/utility class which should retrieve HTML page sources based on the url list for webtesting tool. The solution should be scalable and have high performance.
I have been researching and trying different solutions already many days, but cannot find out proper solution.
Based on my understanding best way to achieve my goal would be to use asynchronous webrequests running parallel using TPL.
In order to have full control to headers etc. I'm using HttpWebResponse instead of WebClient which is wrapping HttpWebResponse. In some cases the output should be chained to other tasks thus using TPL tasks could make sense.
What I have achieved so far after many different trials/approaches,
Implemented basic synchronous, asynchronous (APM) and parallel (using TPL tasks) solutions to see performance level of different solutions.
To see the performance of asynchrounous parallel solution I used APM approach, BeginGetResponse and BeginRead, and run it in Parallel.ForEach. Everything works fine and I'm happy with the performance. Somehow I feel that using simple Parallel.ForEach is not the way to go and for example I don't know how would I use task chaining.
Then I tried more sophisticated system using tasks for wrapping the APM solution by using TaskCompletionSource and iterator to iterate through the APM flow. I believe that this solution could be what I'm looking for, but there is a strange delay, something between 6-10s, which happens 2-3 times when running 500 urls list.
Based on the logs the execution has went back to the thread which is calling async fetch in a loop when the delay happens. The delay doesn't happen always when execution moves back to the loop, just 2-3 times, other times it works fine. It looks like that the looping thread would create a set of tasks those would be processed by other threads and while most/all tasks are completed there would be delay (6-8s) before the loop continues creating remaining tasks and other threads are active again.
The principle of iterator inside loop is:
IEnumerable<Task> DoExample(string input)
{
var aResult = DoAAsync(input);
yield return aResult;
var bResult = DoBAsync(aResult.Result);
yield return bResult;
var cResult = DoCAsync(bResult.Result);
yield return cResult;
…
}
Task t = Iterate(DoExample(“42”));
I'm resolving the connection limit by using System.Net.ServicePointManager.DefaultConnectionLimit and timeout using ThreadPool.RegisterWaitForSingleObject
My question is simply, what would be the best approach to implement helper/utility class for retrieving html pages which would:
be scalable and have high performance
use webrequests
be easily chained to other tasks
be able to use timeout
use .NET 4 framework
If you think that the solution of using APM, TaskCompletionSource and iterator, which I presented above, is fine I would appreciate any help for trying to solve the delay problem.
I'm totally new to C# and Windows development so please don't mind if something what I'm trying out doesn't make too much sense.
Any help would be highly appreciated as without getting this solved I have to drop my test tool development.
Thanks
Using iterators was a great solution in the pre-TPL .NET (e.g., the Coordination and Concurrency Runtime (CCR) out of MS Robotics made heavy use of them and helped inspire TPL). One problem is that iterators alone aren't going to give you what you need - you also need a scheduler to effectively distribute the workload. That's almost done by Stephen Toub's snippet that you linked to - but note that one line:
enumerator.Current.ContinueWith(recursiveBody, TaskContinuationOptions.ExecuteSynchronously);
I think the intermittent problems you're seeing might be linked to forcing "ExecuteSynchronously" - it could be causing an uneven distribution of work across the available cores/threads.
Take a look at some of the other alternatives that Stephen proposes in his blog article. In particular, see what just doing a simple chaining of ContinueWith() calls will do (if necessary, followed by matching Unwrap() calls). The syntax won't be the prettiest, but it's the simplest and interferes as little as possible with the underlying work-stealing runtime, so you'll hopefully get better results.

Categories