.NET async framework - is there a limitation with thread-local data - c#

I am looking to write a Windows Service that will start various "jobs".
Each "job" will:
be distinct in what it accomplishes
run for the lifetime of the Service, so "long running". Typically, a job will get 10 tasks from the database and process them, then sleep, and then repeat this cycle again and again.
Share the same "context". The application will be loosely coupled and call an IoC to get classes. It will also store some data on this context too
I need each job to be able to run in parallel and effectively run as separate programs.
My first thought was to create one thread per job. This is okay but has the drawback that a ManualResetEvent stops the thread in its tracks, and the Abort doesn't allow much chance for the Thread to exit in a graceful manner.
I then explored some of the new async framework in .NET 4.5 and boy does it seem to simplify coding.
However, whilst some of the data held on the context may be freely shared between each job, some can not: so each job requires it's own copy of certain data.
I attempted to solve this using ThreadLocal<T> properties. However, whilst this works fine for a specific thread that I've created, this doesn't work for the async methods. The thread that starts an async method is often not the thread that finishes the method, particularly when the method uses "await".
So, what is the preferred pattern for what I am attempting to accomplish?
FYI: Albahari's posting was a great help.

Related

Proper way to start and fire-and-forget asynchronous calls?

I have an async call (DoAsyncWork()), that I would like to start in a fire-and-forget way, i.e. I'm not interesting in its result and would like the calling thread to continue even before the async method is finished.
What is the proper way to do this? I need this in both, .NET Framework 4.6 as well as .NET Core 2, in case there are differences.
public async Task<MyResult> DoWorkAsync(){...}
public void StarterA(){
Task.Run(() => DoWorkAsync());
}
public void StarterB(){
Task.Run(async () => await DoWorkAsync());
}
Is it one of those two or something different/better?
//edit: Ideally without any extra libraries.
What is the proper way to do this?
First, you need to decide whether you really want fire-and-forget. In my experience, about 90% of people who ask for this actually don't want fire-and-forget; they want a background processing service.
Specifically, fire-and-forget means:
You don't care when the action completes.
You don't care if there are any exceptions when executing the action.
You don't care if the action completes at all.
So the real-world use cases for fire-and-forget are astoundingly small. An action like updating a server-side cache would be OK. Sending emails, generating documents, or anything business related is not OK, because you would (1) want the action to be completed, and (2) get notified if the action had an error.
The vast majority of the time, people don't want fire-and-forget at all; they want a background processing service. The proper way to build one of those is to add a reliable queue (e.g., Azure Queue / Amazon SQS, or even a database), and have an independent background process (e.g., Azure Function / Amazon Lambda / .NET Core BackgroundService / Win32 service) processing that queue. This is essentially what Hangfire provides (using a database for a queue, and running the background process in-proc in the ASP.NET process).
Is it one of those two or something different/better?
In the general case, there's a number of small behavior differences when eliding async and await. It's not something you would want to do "by default".
However, in this specific case - where the async lambda is only calling a single method - eliding async and await is fine.
It depends on what you mean by proper :)
For instance: are you interested in the exceptions being thrown in your "fire and forget" calls? If not, than this is sort of fine. Though what you might need to think about is in what environment the task lives.
For instance, if this is a asp.net application and you do this inside the lifetime of a thread instantiated due to a call to a .aspx or .svc. The Task becomes a background thread of that (foreground)thread. The foreground thread might get cleaned up by the application pool before your "fire and forget" task is completed.
So also think about in which thread your tasks live.
I think this article gives you some useful information on that:
https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
Also note that if you do not return a value in your Tasks, a task will not return exception info. Source for that is the ref book for microsoft exam 70-483
There is probably a free version of that online somewhere ;P https://www.amazon.com/Exam-Ref-70-483-Programming-C/dp/0735676828
Maybe useful to know is that if your have an async method being called by a non-async and you wish to know its result. You can use .GetAwaiter().GetResult().
Also I think it is important to note the difference between async and multi-threading.
Async is only useful if there are operations that use other parts of a computer that is not the CPU. So things like networking or I/O operations. Using async then tells the system to go ahead and use CPU power somewhere else instead of "blocking" that thread in the CPU for just waiting for a response.
multi-threading is the allocation of operations on different threads in a CPU (for instance, creating a task which creates a background thread of the foreground thread... foreground threads being the threads that make up your application, they are primary, background threads exist linked to foreground threads. If you close the linked foreground thread, the background thread closes as well)
This allows the CPU to work on different tasks at the same time.
Combining these two makes sure the CPU does not get blocked up on just 4 threads if it is a 4 thread CPU. But can open more while it waits for async tasks that are waiting for I/O operations.
I hope this gives your the information needed to do, what ever it is you are doing :)

Async-Await vs ThreadPool vs MultiThreading on High-Performance Sockets (C10k Solutions?)

I'm really confused about async-awaits, pools and threads. The main problem starts with this question: "What can I do when I have to handle 10k socket I/O?" (aka The C10k Problem).
First, I tried to make a custom pooling architecture with threads
that uses one main Queue and multiple Threads to process all
incoming datas. It was a great experience about understanding
thread-safety and multi-threading but thread is an overkill
with async-await nowadays.
Later, I implemented a simple architecture with async-await but I
can't understand why "The async and await keywords don't cause
additional threads to be created." (from MSDN)? I think there
must be some threads to do jobs like BackgroundWorker.
Finally, I implemented another architecture with ThreadPool and it
looks like my first custom pooling.
Now, I think there should be someone else with me who confused about handling The C10k. My project is a dedicated (central) server for my game project that is hub/lobby server like MCSG's lobbies or COD's matchmaking servers. I'll do the login operations, game server command executions/queries and information serving (like version, patch).
Last part might be more specific about my project but I really need some good suggestions about real world solutions about multiple (heavy) data handling.
(Also yes, 1k-10k-100k connection handling depending on server hardware but this is a general question)
The key point: Choosing Between the Task Parallel Library and the ThreadPool (MSDN Blog)
[ADDITIONAL] Good (basic) things to read who wants to understand what are we talking about:
Threads
Async, Await
ThreadPool
BackgroundWorker
async/await is roughly analogous to the "Serve many clients with each thread, and use asynchronous I/O and completion notification" approach in your referenced article.
While async and await by themselves do not cause any additional threads, they will make use of thread pool threads if an async method resumes on a thread pool context. Note that the async interaction with ThreadPool is highly optimized; it is very doubtful that you can use Thread or ThreadPool to get the same performance (with a reasonable time for development).
If you can, I'd recommend using an existing protocol - e.g., SignalR. This will greatly simplify your code, since there are many (many) pitfalls to writing your own TCP/IP protocol. SignalR can be self-hosted or hosted on ASP.NET.
No. If we use asynchronous programming pattern that .NET introduced in 4.5, in most of the cases we need not to create manual thread by us. The compiler does the difficult work that the developer used to do. Creating a new thread is costly, it takes time. Unless we need to control a thread, then “Task-based Asynchronous Pattern (TAP)” and “Task Parallel Library (TPL)” is good enough for asynchronous and parallel programming. TAP and TPL uses Task. In general Task uses the thread from ThreadPool(A thread pool is a collection of threads already created and maintained by .NET framework. If we use Task, most of the cases we need not to use thread pool directly. A thread can do many more useful things. You can read more about Thread Pooling
You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. Asynchrony is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
Await is specifically designed to deal with something taking time, most typically an I/O request. Which traditionally was done with a callback when the I/O request was complete. Writing code that relies on these callbacks is quite difficult, await greatly simplifies it. Await just takes care of dealing with the delay, it doesn't otherwise do anything that a thread does. The await expression, what's at the right of the await keyword, is what gets the job done. You can use Async with any method that returns a Task. The XxxxAsync() methods are just precooked ones in the .NET framework for common operations that take time. Like downloading data from a web server.
I would recommend you to read Asynchronous Programming with Async and Await

Indefinitely running Producer/Consumer application

Problem I'm tasked to resolve is (from my understanding) a typical producer/consumer problem. We have data incoming 24/7/365. The incoming data (call it raw data) is stored in a table and is unusable for the end user. We then select all raw data that has not been processed and start processing one by one. After each unit of data is processed, its stored in another table and is now ready to be consumed by the client application.
The process from loading the raw data till persisting processed data takes 2 - 5 seconds on average. But its highly dependent on the third party web services that we use to process the data. If the web services are slow, we are no longer processing data as fast as we're getting it in and accumulate backlog, hence causing our customers to loose live feed.
We want to make this process a multithreaded one. From my research I can see that the process can be divided into three discreet parts:
LOADING - A loader task (producer) that runs indefinitely and loads unprocessed data from DB to BlockingCollection<T> (or some other variation of a concurrent collection). My choice of BlockingCollection is due to the fact that it is designed with Producer/Consumer pattern in mind and offers GetConsumingEnumerable() method.
PROCESSING - Multiple consumers that consume data from the above BlockingCollection<T>. In its current implementation I have a Parallel.ForEach loop through GetConsumingEnumerable() that on each iteration starts a task with two task continuations: First step of the task is to call a third party web service, wait for the result and output the result for the second task to consume. Second task does calculations based on the first task's output and outputs the result for the third task, which basically just stores that result into the second BlockingCollection<T> (this one being an output collection). So my consumers are effectively producers too. Ideally each unit of data that has been loaded by the task 1 would be queued for processing in parallel.
PERSISTING - A single consumer runs against the second BlockingCollection mentioned above and persists processed data into database.
Problem I'm facing is the item number 2 from the list above. It does not seem to be fast enough (just by using Parallel.ForEach). I tried inside Parallel.ForEach instead of directly starting a task with continuation, start a wrapping thread that will in turn start the processing task. But this caused OutOfMemory exception, because thread count went out of control and reached 1200 very soon. I also tried scheduling work using ThreadPool with no avail.
Could you please advise if my approach is good enough for what we need done, or is there a better way of doing it?
If the bottleneck is some 3rd party service and this will not handle parallel execution but will queue your request then you cannot do a thing.
But first you can try this:
use the ThreadPool or Tasks (those will use ThreadPool too) - don't fire up Threads yourself
try to make your request async instead of using the thread exclusively
run your service/app through an performance profiler and check where you are "wasting" your time
make a spike/check for the 3rd party service and see how it handles parallel requests
think about caching the answers from this service (if possible)
That's all I can think of without further info right now.
I recently faced a problem which was very much similar to yours,
Here's what i did, hope it might help:
It seems like your 1st and 3rd part are rather simple, and can be
managed on their respective threads without any problem,
The 2nd part must firstly be started on a new thread, Then use System.Threading.timer, to make your web-service calls,
the method that calls the web-service passes the response(result) to the processing method by Invoking it asynchronously and letting it process the data at it's own pace,
this solved my problem, i hope it helps you too, if any doubts ask me, i'll explain it here...

parallel execution freezes service

I have a problem understand how Workflow Scheduler works, my architecture is the follow: I have several operations that call asyncronously a service from UI, it inizialize a new WorkflowApplication and calls Run() method, than it take some time to accomplish the operation, it goes through some steps and than an activity does the big work
I understand the workflow scheduler can process one workflow instance at time, but while workflows are running that seems to "freeze" my entire website, I can't access any other service, it become slow until all workflows finish. (I have also tried to call service just once and start all workflow inside it but the behavior is pretty the same)
Could someone help me understand that? There is some way to avoid this?
First impression is that there is thread starvation, or something similar with another resource contention, going on.
Are you using a WorkflowApplication or a WorkflowServiceHost to execute your workflow? I believe the first but it isn't quite clear in your question.
If you are using a WorkflowApplication: Are you setting the SynchronisationContext and are you waiting for the workflow to complete before finishing the request?
How many workflows are you starting and approximation how many ASP.NET requests are there being executed?

.NET threading solution for long queries

Senerio
We have a C# .Net Web Application that records incidents. An external database needs to be queried when an incident is approved by a supervisor. The queries to this external database are sometimes taking a while to run. This lag is experienced through the browser.
Possible Solution
I want to use threading to eliminate the simulated hang to the browser. I have used the Thread class before and heard about ThreadPool. But, I just found BackgroundWorker in this post.
MSDN states:
The BackgroundWorker class allows you to run an operation on a separate, dedicated thread. Time-consuming operations like downloads and database transactions can cause your user interface (UI) to seem as though it has stopped responding while they are running. When you want a responsive UI and you are faced with long delays associated with such operations, the BackgroundWorker class provides a convenient solution.
Is BackgroundWorker the way to go when handling long running queries?
What happens when 2 or more BackgroundWorker processes are ran simultaneously? Is it handled like a pool?
Yes, BackgroundWorker can significantly simplify your threading code for long-running operations. The key is registering for the DoWork, ProgressChanged, and RunWorkerCompleted events. These help you avoid having to have a bunch of synchronization objects passed back and forth with the thread to try to determine the progress of the operation.
Also, I believe the progress events are called on the UI thread, avoiding the need for calls to Control.Invoke to update your UI.
To answer your last question, yes, threads are allocated from the .NET thread pool, so you while you may instantiate as many BackgroundWorker objects as you'd like, you can only run as many concurrent operations as the thread pool will allow.
If you're using .NET 4 (or can use the TPL backport from the Rx Framework), then one nice option is to use a Task created with the LongRunning hint.
This provides many options difficult to accomplish via the ThreadPool or BackgroundWorker, including allowing for continuations to be specified at creation time, as well as allowing for clean cancellation and exception/error handling.
I had ran in similar situation with long running queries. I used the asynchronous invoke provided by delegates. You can use the BeginInvoke method of the delegate.
BackgroundWrokerks are just like any other threads, accept they can be killed or quit, w/out exiting the main thread and your application.
ThreadPool uses a pool of BackgroundWorkers. It is the preferred way of most multi threading scenarios because .net manages threads for you, and it re-uses them instead of creating new ones as needed which is a expensive process.
Such threading scenarios are great for processor intensive code.
For something like a query which happens externally, you also have the option of asynchronous data access. You can hand off the query request, and give it the name of your callback method, which will be called when query is finished and than do something with the result (i.e. update UI status or display returned data)..
.Net has inbuilt support for asynchronous data querying
http://www.devx.com/dotnet/Article/26747

Categories