I have the following action in my controller:
[HttpPost("run")]
public Task<object> Run([FromBody] ResearchRequest researchRequest)
{
researchService.RunAsync(researchRequest);
return new{ queued = true };
}
The controller needs to handle a task that takes several minutes.
Is this the correct way to release researchService.RunAsync to handle its job?
Or is there a better approach.
Thanks
If you are wanting to check that a process is already running, you could mark it as in progress somewhere on the server side (in a task in the database or such) and then when displaying the UI call a method on the server to check the state of your in progress flag.
that way the UI could navigate away from that page and return to still see that the process had been started
You can do that if RunAsync is making I/O requests, but you wouldn't return a Task:
[HttpPost("run")]
public object Run([FromBody] ResearchRequest researchRequest)
{
researchService.RunAsync(researchRequest);
return new{ queued = true };
}
That will start running RunAsync on the same thread, just like any other method. At the first await in RunAsync that acts on an incomplete Task, RunAsync will return its own incomplete Task, at which point control returns back to your Run action and your object is returned. You won't be waiting for whatever I/O operation RunAsync makes.
If RunAsync is taking a long time because of CPU calculations (not I/O), then that won't do anything for you because, remember, it starts running on the same thread. You will have to start it on another thread, which you can do using Task.Run:
[HttpPost("run")]
public Task<object> Run([FromBody] ResearchRequest researchRequest)
{
Task.Run(() => researchService.RunAsync(researchRequest));
return new{ queued = true };
}
But!
In both cases, ASP.NET will have no idea that RunAsync is running in the background. If the IIS app pool is shut down or recycled for any reason, that job will be killed part way through. Note that by default, IIS is configured to shut down an app pool after 20 minutes of no HTTP requests coming in.
If that is unacceptable to you, then you're better off writing the job to a queue in a database or something and doing that background processing in a Windows service.
We're running an ASP.NET WebAPI 2 service and we want to log some requests with our logger to email/database.
Because it's background work, and because in asp.net I figured we should use HostingEnvironment.QueueBackgroundWorkItem to run it in the background.
I want my logs to all be in order - to my surprise I could not find anything indicating that QueueBackgroundWorkItem actually guarantees that the queued work items run in order or indicates it doesn't.
So, my question is: Does QueueBackgroundWorkItem guarantee that work queued gets executed in order?
HostingEnvironment.QueueBackgroundWorkItem((e) => Console.WriteLine("A"));
HostingEnvironment.QueueBackgroundWorkItem((e) => Console.WriteLine("B"));
Do I know that the output of the above snippet is always:
A
B
Or can it be out of order?
There appears to be nothing contractual in the documentation.
Looking at the reference source, it appears to use a class called BackgroundWorker to actually execute these tasks.
In turn, this appears to be running the tasks on the ThreadPool and explicitly may be executing multiple tasks in parallel:
public void ScheduleWorkItem(Func<CancellationToken, Task> workItem) {
Debug.Assert(workItem != null);
if (_cancellationTokenHelper.IsCancellationRequested) {
return; // we're not going to run this work item
}
// Unsafe* since we want to get rid of Principal and other constructs specific to the current ExecutionContext
ThreadPool.UnsafeQueueUserWorkItem(state => {
lock (this) {
if (_cancellationTokenHelper.IsCancellationRequested) {
return; // we're not going to run this work item
}
else {
_numExecutingWorkItems++;
}
}
RunWorkItemImpl((Func<CancellationToken, Task>)state);
}, workItem);
}
So I'd say it's unsafe to assume anything about what order two queued tasks will complete in.
QueueBackgroundWorkItem guarantee that work queued gets executed in order
Reference from HostingEnvironment.QueueBackgroundWorkItem
New HostingEnvironment.QueueBackgroundWorkItem method that lets you schedule small background work items. ASP.NET tracks these items and prevents IIS from abruptly terminating the worker process until all background work items have completed. These will enable ASP.NET applications to reliably schedule Async work items.
We have some logic to calculate an expensive value per user for our ASP.NET Web Forms application. Currently it sits in the Page_Load header user control that is on every page like this:
//note that we are not awaiting this
Task.Factory.StartNew(() => CacheManager.GetExpensiveValue(UserId));
And then in the static CacheManager.GetExpensiveValue(int userID):
private static object locker = new object();
lock (locker)
{
if (!AlreadyDone(userID))
{
var expensiveValue = ReallyExpensiveMethod(userID);
//our static cache wrapper class that uses an ObjectCache object
OurCache.Add(userID, expensiveValue);
}
else
{
return OurCache.Get(userID);
}
}
This works, but when ReallyExpensiveMethod() takes a REALLY long time (I'm also working on improving the performance of the logic behind that), users will block on that lock when navigating between pages.
My question is, how could I restructure this to not cause blocking? I've thought about utilizing a ConcurrentDictionary with the values in the dictionary being Task wrappers for ReallyExpensiveMethod() and the keys being the UserID to prevent duplicate work, but I'm unsure if that would actually get me anywhere.
We currently are not using any asynchronous logic in this app, and I'm sure the powers-that-be would rather not introduce a change that would require adding Async="true" to every single page in the app since this header logic is in every page.
My question is, how could I restructure this to not cause blocking? ... rather not introduce [asynchrony]
You're between a rock and a hard place, there. Any request has to block or asynchronously wait for the process to complete; there's no other option, unless you can use something like SignalR to send the process results to the client side (but that would probably require significant architectural changes).
That said, you can certainly minimize the effect of the lock; it is currently blocking gets for other users if one user is doing the process.
I'm assuming that this calculation is pure (causes no side effects), and that the cache is an in-process, in-memory cache.
In that case, I would cache the task instead of the result. While I'm not wild about parallel processing on ASP.NET, I suppose this would be OK.
I do recommend you use the cache. ConcurrentDictionary has similar logic, but no easy way to flush old entries.
So, something like this:
// In Page_Load
CacheManager.GetOrAdd(UserID);
Task<Results> CacheManager.GetOrAdd(int userId)
{
lock (locker)
{
if (!OurCache.Contains(userId))
{
var task = Task.Run(() => ReallyExpensiveMethod(userId));
OurCache.Add(userId, task);
return task;
}
else
return OurCache.Get(userId);
}
}
// Usage:
Results results = CacheManager.GetOrAdd(UserID).Result;
I'm not wild about the blocking (calling Task<T>.Result on the last line), but since you don't want to do asynchronous requests you're stuck with that kind of hack.
This code minimizes the time the lock is held. Instead of locking it for the duration of the processing, it is only locked long enough to start the processing on another thread and update the cache.
I post a lot here regarding multithreading, and the great stackoverflow community have helped me alot in understand multithreading.
All the examples I have seen online only deal with one thread.
My application is a scraper for an insurance company (family company ... all free of charge). Anyway, the user is able to select how many threads they want to run. So lets say for example the user wants the application to scrape 5 sites at one time, and then later in the day he choses 20 threads because his computer isn't doing anything else so it has the resources to spare.
Basically the application builds a list of say 1000 sites to scrape. A thread goes off and does that and updates the UI and builds the list.
When thats finished another thread is called to start the scraping. Depending on the number of threads the user has set to use it will create x number of threads.
Whats the best way to create these threads? Should I create 1000 threads in a list. And loop through them? If the user has set 5 threads to run, it will loop through 5 at a time.
I understand threading, but it's the application logic which is catching me out.
Any ideas or resources on the web that can help me out?
You could consider using a thread pool for that:
using System;
using System.Threading;
public class Example
{
public static void Main()
{
ThreadPool.SetMaxThreads(100, 10);
// Queue the task.
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
}
// This thread procedure performs the task.
static void ThreadProc(Object stateInfo)
{
Console.WriteLine("Hello from the thread pool.");
}
}
This scraper, does it use a lot of CPU when its running?
If it does a lot of communication with these 1000 remote sites, downloading their pages, that may be taking more time than the actual analysis of the pages.
And how many CPU cores does your user have? If they have 2 (which is common these days) then beyond two simultaneous threads performing analysis, they aren't going to see any speed up.
So you probably need to "parallelize" the downloading of the pages. I doubt you need to do the same for the analysis of the pages.
Take a look into asynchronous IO, instead of explicit multi-threading. It lets you launch a bunch of downloads in parallel and then get called back when each one completes.
If you really just want the application, use something someone else already spent time developing and perfecting:
http://arachnode.net/
arachnode.net is a complete and comprehensive .NET web crawler for
downloading, indexing and storing
Internet content including e-mail
addresses, files, hyperlinks, images,
and Web pages.
Whether interested or involved in
screen scraping, data mining, text
mining, research or any other
application where a high-performance
crawling application is key to the
success of your endeavors,
arachnode.net provides the solution
you need for success.
If you also want to write one yourself because it's a fun thing to write (I wrote one not long ago, and yes, it is alot of fun ) then you can refer to this pdf provided by arachnode.net which really explains in detail the theory behind a good web crawler:
http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1
Download the pdf entitled: "Crawling the Web" (second link from top). Scroll to Section 2.6 entitled: "2.6 Multi-threaded Crawlers". That's what I used to build my crawler, and I must say, I think it works quite well.
I think this example is basically what you need.
public class WebScraper
{
private readonly int totalThreads;
private readonly List<System.Threading.Thread> threads;
private readonly List<Exception> exceptions;
private readonly object locker = new object();
private volatile bool stop;
public WebScraper(int totalThreads)
{
this.totalThreads = totalThreads;
threads = new List<System.Threading.Thread>(totalThreads);
exceptions = new List<Exception>();
for (int i = 0; i < totalThreads; i++)
{
var thread = new System.Threading.Thread(Execute);
thread.IsBackground = true;
threads.Add(thread);
}
}
public void Start()
{
foreach (var thread in threads)
{
thread.Start();
}
}
public void Stop()
{
stop = true;
foreach (var thread in threads)
{
if (thread.IsAlive)
{
thread.Join();
}
}
}
private void Execute()
{
try
{
while (!stop)
{
// Scrap away!
}
}
catch (Exception ex)
{
lock (locker)
{
// You could have a thread checking this collection and
// reporting it as you see fit.
exceptions.Add(ex);
}
}
}
}
The basic logic is:
You have a single queue in which you put the URLs to scrape then you create your threads and use a queue object to which every thread has access. Let the threads start a loop:
lock the queue
check if there are items in the queue, if not, unlock queue and end thread
dequeue first item in the queue
unlock queue
process item
invoke an event that updates the UI (Remember to lock the UI Controller)
return to step 1
Just let the Threads do the "get stuff from the queue" part (pulling the jobs) instead of giving them the urls (pushing the jobs), that way you just say
YourThreadManager.StartThreads(numberOfThreadsTheUserWants);
and everything else happens automagically. See the other replies to find out how to create and manage the threads .
I solved a similar problem by creating a worker class that uses a callback to signal the main app that a worker is done. Then I create a queue of 1000 threads and then call a method that launches threads until the running thread limit is reached, keeping track of the active threads with a dictionary keyed by the thread's ManagedThreadId. As each thread completes, the callback removes its thread from the dictionary and calls the thread launcher.
If a connection is dropped or times out, the callback reinserts the thread back into the queue. Lock around the queue and the dictionary. I create threads vs using the thread pool because the overhead of creating a thread is insignificant compared to the connection time, and it allows me to have a lot more threads in flight. The callback also provides a convenient place with which to update the user interface, even allowing you to change the thread limit while it's running. I've had over 50 open connections at one time. Remember to increase your MacConnections property in your app.config (default is two).
I would use a queue and a condition variable and mutex, and start just the requested number of threads, for example, 5 or 20 (and not start 1,000).
Each thread blocks on the condition variable. When woken up, it dequeues the first item, unlocks the queue, works with the item, locks the queue and checks for more items. If the queue is empty, sleep on the condition variable. If not, unlock, work, repeat.
While the mutex is locked, it can also check if the user has requested the count of threads to be reduced. Just check if count > max_count, and if so, the thread terminates itself.
Any time you have more sites to queue, just lock the mutex and add them to the queue, then broadcast on the condition variable. Any threads that are not already working will wake up and take new work.
Any time the user increases the requested thread count, just start them up and they will lock the queue, check for work, and either sleep on the condition variable or get going.
Each thread will be continually pulling more work from the queue, or sleeping. You don't need more than 5 or 20.
Consider using the event-based asynchronous pattern (AsyncOperation and AsyncOperationManager Classes)
You might want to take a look at the ProcessQueue article on CodeProject.
Essentially, you'll want to create (and start) the number of threads that are appropriate, in your case that number comes from the user. Each of these threads should process a site, then find the next site needed to process. Even if you don't use the object itself (though it sounds like it would suit your purposes pretty well, though I'm obviously biased!) it should give you some good insight into how this sort of thing would be done.
I have a synchronous web service call that returns a message. I need to quickly return a message that basically says that order was received. I then need to spend a couple of minutes processing the order, but cannot block the service call for that long. So how can I return from the web service, and then do some more stuff? I'm guessing I need to fork some other thread or something before I return, but I'm not sure of the best approach.
string ProcessOrder(Order order)
{
if(order.IsValid)
{
return "Great!";
//Then I need to process the order
}
}
You can open a new thread and have it do what you need, while you're main thread returns great.
string ProcessOrder(Order order)
{
if(order.IsValid)
{
//Starts a new thread
ThreadPool.QueueUserWorkItem(th =>
{
//Process Order here
});
return "Great!";
}
}
You could start your big amount of work in a seperate thread
public string ProcessOrder(Order order)
{
if(order.IsValid)
{
System.Threading.ParameterizedThreadStart pts = new System.Threading.ParameterizedThreadStart(DoHardWork);
System.Threading.Thread t = new System.Threading.Thread(pts);
t.Start(order);
return "Great!!!";
}
}
public void DoHardWork(object order)
{
//Stuff Goes Here
}
Is the work you're doing "important?" I assume it is. You could use a thread, but you'll have to be ok with the possibility that your work might get interrupted if the machine restarts or possibly if the asp.net worker process recycles. This would likely lead to the work not getting done even though you already told the client you had accepted it. This might be or not depending on your use case.
I would consider taking the work item you receive from the synchronous service request and putting it in a persistent queue. An easy way to do this is to use a transational MSMQ queue. Your synchronous service puts the work request in the queue and you have a few worker threads pulling work requests out of the queue. Wrap your queue read and the work in a transaction and don't commit the transaction until the work is completed. If you machine or process shuts down in the middle of a request, it will be restarted automatically the next time it starts up.
You could also look at utilizing the PIAB (Policy Injection Application Block) to accomplish work after a method call.