We're developing a web application in Asp.Net MVC 4 intended for hundreds of users.
We need to have a background service per user to work in an interval of a few minutes.
We are not sure whether to use Windows Services (multiple windows services) or to use a Thread Pool of processes. We think of Windows Services cause they're maintainable easily via windows server and that approach can save the overhead of programming a UI and manage threads. It also can easily run in an interval of time.
Is it possible for a Windows Service to automatically initiate a new instance for a new user who has just signed up (so we have multiple background windows services instances, one for each user)? if not the Windows Services option falls.
If the upper is possible, should We choose Windows Services approach or make our own managed Thread Pool of processes?
Certainly, starting a process per user guarantees you high memory overhead and non-scalability when you get into the 1000s. I don't see what starting a process (as opposed to a thread) could possibly save because the new process will contain at least one thread. Also, Windows Services have nothing to do with "logged in users". They are not made for multi-instancing.
You seem to want to run background work in ASP.NET MVC. Be aware, that this is hard. Using one Windows Service can make sense here.
The hard thing about background work is that worker processes can exit for many reasons. You must tolerate this. A service has the same problem: You need to deploy new versions and restart the server regularly. You also need an HA strategy so you need multiple servers.
I'm not convinced that a Windows Service would be a better choice even for long-running background work.
With 100s of concurrent background workers you should probably use async IO to not have 100s of threads dedicated.
I assume your background work waits most of the time. You can make the waiting async (using a timer) and all the rest synchronous. That gives you a simple implementation and vast savings in memory usage.
We wrote a open-source project, Revalee, to handle this type of workload. It uses a Windows Service to manage the task scheduling, but leverages your existing ASP.NET MVC application to handle the action out-of-band. Tasks can be in the hundreds of thousands and are persisted until successfully dispatched.
Here is a simple workflow diagram:
user initiated action
|
| ......................
| : future callback :
V V :
====================== ===========================
| Your web app | | Revalee Windows Service |
| | | |
====================== ===========================
| ^
| registers a callback now |
|________________________________|
Using a separate service to perform background tasks might introduce additional burden:
Having two separate apps will certainly increase complexity of your project: if you need any communication between the web app and the service, it will be more complex (and slower) than having the whole thing inside the web app. You also need to deploy two separate projects, so there some plumbing overhead that will double this way.
Performance wise, there is nothing you gain from this approach. On the contrary, having your own managed pool inside the web app will allow better scheduling of these threads and quite possibly allow you to run them more efficiently than simply letting Windows take care of this. You really don't want to spawn hundreds of processes (or threads) which would compete for resources simultaneously on the same machine.
If nothing else, keeping the whole functionality inside the web app app might simplify your hosting options. Installing and managing a Windows service might require more privileges than a cheap hosting provider is prepared to give you.
Having said that, running background tasks in ASP.NET means that you need to be prepared to have your threads aborted abruptly, due to exceptions, recycling or any other reason IIS can think of. Running these background tasks in a separate process will certainly be less susceptible to these ASP.NET quirks, so it ultimately boils down to a compromise: how important it is for you to make sure these tasks are never interrupted, and does it justify additional programming and maintenance effort?
[Edit]
If your concern is how to schedule these tasks inside the service, take a look at some scheduling libraries for .NET, like Quartz. It allows better control over scheduling than simply using a timer and (should you ever need them) provides some advanced features like persisting jobs (useful if you want to make sure your jobs will finish after restarts) and clustering for large scale applications.
Using a simple timer will work, but make sure that you understand how each of .NET timers dispatches its events.
Related
Background: I have a simple ASP.NET Core 3.1 site. Very rarely (three or four times per week), a user might fill out a form that triggers an email to be sent.
I don't want to delay the page response while running the 'send email' operation (even though it only takes a second or two), so from everything I've read, it seems like the code that should handle the email should be a background worker/hosted service, and the Razor pages code should place the data object to be sent in a collection that gets monitored by the background service.
What I'm not fully understanding is why this is necessary in modern ASP.NET Core.
If I was doing this in a normal C# application (not ASP), I'd simply make the 'send email' method async (it's using MailKit, which has async methods), and call the async method without awaiting, allowing the the work be done on the threadpool while allowing the response thread to continue.
But existing answers and blog posts say that calling an async method without an await in ASP is dangerous, due to the fact that IIS can restart ASP processes (application pool recycling).
Yet, most things I've read say Application Recycling is an artifact of old ASP when memory leaks were common, and it's not really a thing on .Net Core. Additionally, many ASP applications aren't even hosted in IIS anymore.
Further, as far as I can tell, IHostedService/Background Worker objects aren't doing anything special - they don't seem to add any additional threading; they just look like singletons that have additional notification for environment startup and shutdown.
So:
Is calling a fire-and-forget async method in ASP.NET Core still considered poor practice, especially if the fire and forget task is short-lived? If so, why? [see edit below for clarification]
Other than notifications for shutdown, is there any reason why a background service is considered better than borrowing a managed threadpool thread (via Task.Run or QueueBackgroundWorkItem)? Wouldn't waking a background service (if it was awaiting on object to be placed in a collection) consume a pool thread in the same way?
Edit: I acknowledge that starting a task, and reporting success to the user, when there's a chance that operation could be terminated, is poor form. There's benefit to being notified of a shutdown and being able to finalize tasks.
Perhaps a better question is, does the old behavior of cycling still exist in modern ASP (on IIS or Kestrel)? Are there other reasons an orderly shutdown might be triggered (other than server shutdown/manual stop)?
I would still call it a poor practice.
The main concern here as well as in the referenced post is mainly about the promise of task completion.
Without being aware of the ghost background tasks, the runtime will not be able to notify the tasks to gracefully stop. This may or may not cause serious issues depending on the status of the tasks at the point the termination occurs.
Using fire forget task often means, your task is at the risk of having to start all over again when the process restarts. And sometimes this is not possible due to loss of context. Imagine your fire-forget task is calling another web API with parameters provided by a web request. The parameters are likely to get wiped out from memory if the process restarts.
And remember, the recycling is not always triggered by IIS / server. It could also be triggered by people. Say when your application runs into a memory leak issue, and you may want to recycle the app process every 1 hour as a temporary relief. Then you need to make sure you don't break your background tasks.
In terms of hosting - it is still possible to host ASP.Net Core applications in-process, in which the app pool gets recycled by IIS after a configured time period, or by default 29 hours.
In terms of lifetime - hosted services are types you register to DI, so DI features could be used, for example, this built-in hosted service implements IDisposable, which means proper clean up could be done upon shutting down.
Frankly, background tasks and hosted services both allow you to do fire and forget. But when you need reliability and resilience, hosted services win.
To answer the second half of your question, the app will wait for all hosted services' StopAsync methods to finish before shutting down. As long as you await your Tasks in the hosted service, this effectively means you can assume your Tasks will be allowed to finish running before the app shuts down. The app could still be force-shutdown, which in that case, nothing is guaranteed anymore.
If you need more guarantees about your background tasks, you should move them to run in a separate process. You could use something like Runly to make it easier to break out functionality into background jobs. It also makes it easy to provide real-time feedback to the user so that you are not lying to the user when you say "everything is done" while something is still running in the background.
Full disclosure: I cofounded Runly.
This has been bugging me for a while. Assuming that I am not using any explicit form of task parallelism (e.g. Paralell.ForEach), how many threads are natively used when I deploy a web application to a Windows Server?
In other words, if the web server has an eight-core processor, does the application only use one of those cores if I am not explicitly telling it to use more?
I'll bet I have missed something simple here, so flame on--but I still want to know the answer. Thanks.
First, you have to consider that a web server is, by its nature, a multi-threaded system because it has to be able to respond to multiple requests at the same time (there are ways to do this in a single threaded environment, but it's not very efficient, and IIS does not support it). If two users access the site at the same time, two separate threads are servicing each request in most cases.
Once you get into scalability issues, and async support, then a thread can actually service multiple requests (or rather, a request can be put on a queue and the IIS worker threads can be reused to process other requests).
So you are looking at this from the aspect of a single user. And that single user will likely run only a single thread of activity during that web request (ignoring async, io completion ports, etc..). i.e. you can think of a single web request as being equal to a single thread in the general sense (but there are so many things that can spin off other threads, that this isn't something that you can really count on).
When you use something like Parallel.ForEach it's so that your single users request can now execute things in multiple threads to improve efficiency.
An IIS web site has a worker process associated with it. That worker process can be configured and tuned to control how many threads it uses. But just remember, that your web site is running under the control of IIS, not as it's own application.
The topic of threads can get really confusing as you move along .NET versions.
But to over-simplify, it depends on your processModel configuration for ASP.NET. (machine.config) autoConfig=true/false etc.
at a bare minimum, in .NET 4.0, the min/max WorkerThreads and min/max IOCompletionThreads determine the amount of threads in play.
e.g. If the minWorkerThreads = 1 (default value), then the total minThreads = 1 * no:of cores..
same with maxWorkerThreads.
Based on the load, the worker process can ramp up the threads needed to service the requests, using an undisclosed algorith.. minWorkerThreads to maxWorkerThreads. (of course based on availability of ThreadPool threads etc.)
I have an Work Tracker WPF application which deployed in Windows Server 2008 and this Tracker application is communicating with (Tracker)windows service VIA WCF Service.
User can create any work entry/edit/add/delete/Cancel any work entry from Worker Tracker GUI application. Internally it will send a request to the Windows service. Windows Service will get the work request and process it in multithreading. Each workrequest entry will actually create n number of work files (based on work priority) in a output folder location.
So each work request will take to complete the work addition process.
Now my question is If I cancel the currently creating work entry. I want to to stop the current windows service work in RUNTIME. The current thread which is creating output files for the work should get STOPPED. All the thread should killed. All the thread resources should get removed once the user requested for CANCEL.
My workaround:
I use Windows Service On Custom Command method to send custom values to the windows service on runtime. What I am achieving here is it is processing the current work or current thread (ie creating output files for the work item recieved).and then it is coming to custom command for cancelling the request.
Is there any way so that the Work item request should get stopped once we get the custom command.
Any work around is much appreciated.
Summary
You are essentially talking about running a task host for long running tasks, and being able to cancel those tasks. Your specific question seems to want to know the best way to implement this in .NET. Your architecture is good, although you are brave to roll your own rather than using existing frameworks, and you haven't mentioned scaling your architecture later.
My preference is for using the TPL Task object. It supports cancellation, and is easy to poll for progress, etc. You can only use this in .NET 4 onwards.
It is hard to provide code without basically designing a whole job hosting engine for you and knowing your .NET version. I have described the steps in detail below, with references to example code.
Your approach of using the Windows Service OnCustomCommand is fine, you could also use a messaging service (see below) if you have that option for client-service comms. This would be more appropriate for a scenario where you have many clients talking to a central job service, and the job service is not on the same machine as the client.
Running and cancelling tasks on threads
Before we look at your exact context, it would be good to review MSDN - Asynchronous Programming Patterns. There are three main .NET patterns to run and cancel jobs on threads, and I list them in order of preference for use:
TAP: Task-based Asynchronous Pattern
Based on Task, which has been available only since .NET 4
The prefered way to run and control any thread-based activity from .NET 4 onwards
Much simpler to implement that EAP
EAP: Event-based Asynchronous Pattern
Your only option if you don't have .NET 4 or later.
Hard to implement, but once you have understood it you can roll it out and it is very reliable to use
APM: Asynchronous Programming Model
No longer relevant unless you maintain legacy code or use old APIs.
Even with .NET 1.1 you can implement a version of EAP, so I will not cover this as you say you are implementing your own solution
The architecture
Imagine this like a REST based service.
The client submits a job, and gets returned an identifier for the job
A job engine then picks up the job when it is ready, and starts running it
If the client doesn't want the job any more, then they delete the job, using it's identifier
This way the client is completely isolated from the workings of the job engine, and the job engine can be improved over time.
The job engine
The approach is as follows:
For a submitted task, generate a universal identifier (UID) so that you can:
Identify a running task
Poll for results
Cancel the task if required
return that UID to the client
queue the job using that identifier
when you have resources
run the job by creating a Task
store the Task in a dictionary against the UID as a key
When the client wants results, they send the request with the UID and you return progress by checking against the Task that you retrieve from the dictionary. If the task is complete they can then send a request for the completed data, or in your case just go and read the completed files.
When they want to cancel they send the request with the UID, and you cancel the Task by finding it in the dictionary and telling it to cancel.
Cancelling inside a job
Inside your code you will need to regularly check your cancellation token to see if you should stop running code (see How do I abort/cancel TPL Tasks? if you are using the TAP pattern, or Albahari if you are using EAP). At that point you will exit your job processing, and your code, if designed well, should dispose of IDiposables where required, remove big strings from memory etc.
The basic premise of cancellation is that you check your cancellation token:
After a block of work that takes a long time (e.g. a call to an external API)
Inside a loop (for, foreach, do or while) that you control, you check on each iteration
Within a long block of sequential code, that might take "some time", you insert points to check on a regular basis
You need to define how quickly you need to react to a cancellation - for a windows service it should be within milliseconds, preferably, to make sure that windows doesn't have problems restarting or stopping the service.
Some people do this whole process with threads, and by terminating the thread - this is ugly and not recommended any more.
Reliability
You need to ask: what happens if your server restarts, the windows service crashes, or any other exception happens causing you to lose incomplete jobs? In this case you may want a queue architecture that is reliable in order to be able to restart jobs, or rebuild the queue of jobs you haven't started yet.
If you don't want to scale, this is simple - use a local database that the windows service stored job information in.
On submission of a job, record its details in the database
When you start a job, record that against the job record in the database
When the client collects the job, mark it for delayed garbage collection in the database, and then delete it after a set amount of time (1 hour, 1 day ...)
If your service restarts and there are "in progress jobs" then requeue them and then start your job engine again.
If you do want to scale, or your clients are on many computers, and you have a job engine "farm" of 1 or more servers, then look at using a message queue instead of directly communicating using OnCustomCommand.
Message Queues have multiple benefits. They will allow you to reliably submit jobs to a central queue that many workers can then pick up and process, and to decouple your clients and servers so you can scale out your job running services. They are used to ensure jobs are reliably submitted and processed in a highly decoupled fashion, and this can work locally or globally, but always reliably, you can even then combine it with running your windows service on cloud workers which you can dynamically scale.
Examples of technologies are MSMQ (if you want to maintain your own, or must stay inside your own firewall), or Windows Azure Service Bus (WASB) - which is cheap, and already done for you. In either case you will want to use Patterns and Best Practices for Enterprise Integration. In the case of WASB then there are many (MSDN), many (MSDN samples for BrokeredMessaging etc.), many (new Task-based API) developer resources, and NuGet packages for you to use
I’m looking for the best way of using threads considering scalability and performance.
In my site I have two scenarios that need threading:
UI trigger: for example the user clicks a button, the server should read data from the DB and send some emails. Those actions take time and I don’t want the user request getting delayed. This scenario happens very frequently.
Background service: when the app starts it trigger a thread that run every 10 min, read from the DB and send emails.
The solutions I found:
A. Use thread pool - BeginInvoke:
This is what I use today for both scenarios.
It works fine, but it uses the same threads that serve the pages, so I think I may run into scalability issues, can this become a problem?
B. No use of the pool – ThreadStart:
I know starting a new thread takes more resources then using a thread pool.
Can this approach work better for my scenarios?
What is the best way to reuse the opened threads?
C. Custom thread pool:
Because my scenarios occurs frequently maybe the best way is to start a new thread pool?
Thanks.
I would personally put this into a different service. Make your UI action write to the database, and have a separate service which either polls the database or reacts to a trigger, and sends the emails at that point.
By separating it into a different service, you don't need to worry about AppDomain recycling etc - and you can put it on an entire different server if and when you want to. I think it'll give you a more flexible solution.
I do this kind of thing by calling a webservice, which then calls a method using a delegate asynchronously. The original webservice call returns a Guid to allow tracking of the processing.
For the first scenario use ASP.NET Asynchronous Pages. Async Pages are very good choice when it comes to scalability, because during async execution HTTP request thread is released and can be re-used.
I agree with Jon Skeet, that for second scenario you should use separate service - windows service is a good choice here.
Out of your three solutions, don't use BeginInvoke. As you said, it will have a negative impact on scalability.
Between the other two, if the tasks are truly background and the user isn't waiting for a response, then a single, permanent thread should do the job. A thread pool makes more sense when you have multiple tasks that should be executing in parallel.
However, keep in mind that web servers sometimes crash, AppPools recycle, etc. So if any of the queued work needs to be reliably executed, then moving it out of process is a probably a better idea (such as into a Windows Service). One way of doing that, which preserves the order of requests and maintains persistence, is to use Service Broker. You write the request to a Service Broker queue from your web tier (with an async request), and then read those messages from a service running on the same machine or a different one. You can also scale nicely that way by simply adding more instances of the service (or more threads in it).
In case it helps, I walk through using both a background thread and Service Broker in detail in my book, including code examples: Ultra-Fast ASP.NET.
I know there are many cases which are good cases to use multi-thread in an application, but when is it the best to multi-thread a .net web application?
A web application is almost certainly already multi threaded by the hosting environment (IIS etc). If your page is CPU-bound (and want to use multiple cores), then arguably multiple threads is a bad idea, as when your system is under load you are already using them.
The time it might help is when you are IO bound; for example, you have a web-page that needs to talk to 3 external web-services, talk to a database, and write a file (all unrelated). You can do those in parallel on different threads (ideally using the inbuilt async operations, to maximise completion-port usage) to reduce the overall processing time - all without impacting local CPU overly much (here the real delay is on the network).
Of course, in such cases you might also do better by simply queuing the work in the web application, and having a separate service dequeue and process them - but then you can't provide an immediate response to the caller (they'd need to check back later to verify completion etc).
IMHO you should avoid the use of multithread in a web based application.
maybe a multithreaded application could increase the performance in a standard app (with the right design), but in a web application you may want to keep a high throughput instead of speed.
but if you have a few concurrent connection maybe you can use parallel thread without a global performance degradation
Multithreading is a technique to provide a single process with more processing time to allow it to run faster. It has more threads thus it eats more CPU cycles. (From multiple CPU's, if you have any.) For a desktop application, this makes a lot of sense. But granting more CPU cycles to a web user would take away the same cycles from the 99 other users who are doing requests at the same time! So technically, it's a bad thing.
However, a web application might use other services and processes that are using multiple threads. Databases, for example, won't create a separate thread for every user that connects to them. They limit the number of threads to just a few, adding connections to a connection pool for faster usage. As long as there are connections available or pooled, the user will have database access. When the database runs out of connections, the user will have to wait.
So, basically, the use of multiple threads could be used for web applications to reduce the number of active users at a specific moment! It allows the system to share resources with multiple users without overloading the resource. Instead, users will just have to stand in line before it's their turn.
This would not be multi-threading in the web application itself, but multi-threading in a service that is consumed by the web application. In this case, it's used as a limitation by only allowing a small amount of threads to be active.
In order to benefit from multithreading your application has to do a significant amount of work that can be run in parallel. If this is not the case, the overhead of multithreading may very well top the benefits.
In my experience most web applications consist of a number of short running methods, so apart from the parallelism already offered by the hosting environment, I would say that it is rare to benefit from multithreading within the individual parts of a web application. There are probably examples where it will offer a benefit, but my guess is that it isn't very common.
ASP.NET is already capable of spawning several threads for processing several requests in parallel, so for simple request processing there is rarely a case when you would need to manually spawn another thread. However, there are a few uncommon scenarios that I have come across which warranted the creation of another thread:
If there is some operation that might take a while and can run in parallel with the rest of the page processing, you might spawn a secondary thread there. For example, if there was a webservice that you had to poll as a result of the request, you might spawn another thread in Page_Init, and check for results in Page_PreRender (waiting if necessary). Though it's still a question if this would be a performance benefit or not - spawning a thread isn't cheap and the time between a typical Page_Init and Page_Prerender is measured in milliseconds anyway. Keeping a thread pool for this might be a little bit more efficient, and ASP.NET also has something called "asynchronous pages" that might be even better suited for this need.
If there is a pool of resources that you wish to clean up periodically. For example, imagine that you are using some weird DBMS that comes with limited .NET bindings, but there is no pooling support (this was my case). In that case you might want to implement the DB connection pool yourself, and this would necessitate a "cleaner thread" which would wake up, say, once a minute and check if there are connections that have not been used for a long while (and thus can be closed off).
Another thing to keep in mind when implementing your own threads in ASP.NET - ASP.NET likes to kill off its processes if they have been inactive for a while. Thus you should not rely on your thread staying alive forever. It might get terminated at any moment and you better be ready for it.