I am using IBackgroundJobManager with Hangfire integration.
Use case:
I am processing a single uploaded file. After the file is saved, I would like to start two separate Abp.BackgroundJobs sequentially. Only after the first job completes, should the second job start.
Here is my code:
var measureJob1 = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
})
.ContinueWith<AnalyticsBackgroundJob<AnalyticsJobArgsDto>>("measureJob",x => x);
Problem:
I can not figure out the syntax for what I need when using the .ContinueWith<???>(???).
This seems to be an XY Problem.
How to start the second background job after the first completes? (Problem X)
There is no support for guaranteeing the sequential and conditional execution of jobs.
That said, background jobs are automatically retried, which allows for Way 3 below.
So, you can effectively achieve that by one of these ways:
Combine the two jobs into one job.
Enqueue the second job inside the first job.
Enqueue both jobs (you don't need to use ContinueWith, await is much neater). In the second job, check if the first job created what it needed to. Otherwise, throw an exception and rely on retry.
What is the syntax for what I need when using ContinueWith? (Solution Y)
There is no syntax for that, other than a less desirable variant of Way 3.
Task.ContinueWith is a C# construct that runs after the EnqueueAsync task, not the actual background job.
The syntax for Way 3 with ContinueWith would be something like:
var measureJobId = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
}
)
.ContinueWith(task => _backgroundJobManager.EnqueueAsync<AnalyticsBackgroundJob, AnalyticsJobArgsDto>(
new AnalyticsJobArgsDto("measureJob")
)
.Unwrap();
Compare that to:
var processJobId = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
}
);
var measureJobId = await _backgroundJobManager.EnqueueAsync<AnalyticsBackgroundJob, AnalyticsJobArgsDto>(
new AnalyticsJobArgsDto("measureJob")
);
Related
I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}
I am calling this action (ASP.Net Core 2.0) over AJAX:
[HttpGet]
public async Task<IActionResult> GetPostsOfUser(Guid userId, Guid? categoryId)
{
var posts = await postService.GetPostsOfUserAsync(userId, categoryId);
var postVMs = await Task.WhenAll(
posts.Select(async p => new PostViewModel
{
PostId = p.Id,
PostContent = p.Content,
PostTitle = p.Title,
WriterAvatarUri = fileService.GetFileUri(p.Writer.Profile.AvatarId, Url),
WriterFullName = p.Writer.Profile.FullName,
WriterId = p.WriterId,
Liked = await postService.IsPostLikedByUserAsync(p.Id, UserId),// TODO this takes too long!!!!
}));
return Json(postVMs);
}
But it took too long to response (20 seconds!!!) in case I have many post objects in posts
array (e.g. 30 posts).
That is caused because of this line await postService.IsPostLikedByUserAsync.
Digging into the source code of this function:
public async Task<bool> IsPostLikedByUserAsync(Guid postId, Guid userId)
{
logger.LogDebug("Place 0 passed!");
var user = await dbContext.Users
.SingleOrDefaultAsync(u => u.Id == userId);
logger.LogDebug("Place 1 passed!");
var post = await dbContext.Posts
.SingleOrDefaultAsync(u => u.Id == postId);
logger.LogDebug("Place 2 passed!");
if (user == null || post == null)
return false;
return post.PostLikes.SingleOrDefault(pl => pl.UserId == userId) != null;
}
The investigations showed, after some seconds, ALL "Place 1 passed!" logging methods are executed together for every post object. In other words, it seems that every post awaits until the previous post finishes executing this part:
var user = await dbContext.Users
.Include(u => u.PostLikes)
.SingleOrDefaultAsync(u => u.Id == userId);
And then -when every post finishes that part- the place 1 of log is executed for all post objects.
The same happens for logging place 2, every single post seems to await for the previous post to finish executing var post = await dbContext.Pos..., and then the function can go further to execute log place 2 (after few seconds from log 1, ALL log 2 appear together).
That means I have no asynchronous execution here. Could some one help me to understand and solve this problem?
UPDATE:
Changing the code a bit to look like this:
/// <summary>
/// Returns all post of a user in a specific category.
/// If the category is null, then all of that user posts will be returned from all categories
/// </summary>
/// <param name="userId"></param>
/// <param name="categoryId"></param>
/// <returns></returns>
[Authorize]
[HttpGet]
public async Task<IActionResult> GetPostsOfUser(Guid userId, Guid? categoryId)
{
var posts = await postService.GetPostsOfUserAsync(userId, categoryId);
var i = 0;
var j = 0;
var postVMs = await Task.WhenAll(
posts.Select(async p =>
{
logger.LogDebug("DEBUG NUMBER HERE BEFORE RETURN: {0}", i++);
var isLiked = await postService.IsPostLikedByUserAsync(p.Id, UserId);// TODO this takes too long!!!!
logger.LogDebug("DEBUG NUMBER HERE AFTER RETURN: {0}", j++);
return new PostViewModel
{
PostId = p.Id,
PostContent = p.Content,
PostTitle = p.Title,
WriterAvatarUri = fileService.GetFileUri(p.Writer.Profile.AvatarId, Url),
WriterFullName = p.Writer.Profile.FullName,
WriterId = p.WriterId,
Liked = isLiked,
};
}));
return Json(postVMs);
}
That shows, that this line "DEBUG NUMBER HERE AFTER RETURN" is printed for ALL select methods together, that means that ALL select methods waits for each other before going further, how can I prevent that?
UPDATE 2
Substituting the previous IsPostLikedByUserAsyncmethod, with the following one:
public async Task<bool> IsPostLikedByUserAsync(Guid postId, Guid userId)
{
await Task.Delay(1000);
}
Showed no problem in async running, I had to wait only 1 second, not 1 x 30.
That means it is something specific to EF.
Why does the problem happen ONLY with entity framework (with the original function)? I notice the problem even with only 3 post objects! Any new ideas?
The deductions you've made are not necessarily true.
If these methods were firing in a non-asynchronous fashion, you would see all of the logs from one method invocation reach the console before the next method invocation's console logs. You would see the pattern 123123123 instead of 111222333. What you are seeing is that the three awaits seem to synchronize after some asynchronous batching occurs. Thus it appears that the operations are made in stages. But why?
There are a couple reasons this might happen. Firstly, the scheduler may be scheduling all of your tasks to the same thread, causing each task to be queued and then processed when the previous execution flow is complete. Since Task.WhenAll is awaited outside of the Select loop, all synchronous portions of your async methods are executed before any one Task is awaited, therefore causing all of the "first" log invocations to be called immediately following the invocation of that method.
So then what's the deal with the others syncing up later? The same thing is happening. Once all of your methods hit their first await, the execution flow is yielded to whatever code invoked that method. In this case, that is your Select statement. Behind the scenes, however, all of those async operations are processing. This creates a race condition.
Shouldn't there be some chance of the third log of some methods being called before the second log of another method, due to varying request/response times? Most of the time, yes. Except you've introduced a sort of "delay" in to the equation, making the race condition more predictable. Console logging is actually quite slow, and is also synchronous. This causes all of your methods to block at the logging line until the previous logs have completed. But blocking, by itself, may not be enough to make all of those log calls sync up in pretty little batches. There may be another factor at play.
It would appear that you are querying a database. Since this is an IO operation, it takes considerably longer to complete than other operations (including console logging, probably). This means that, although the queries aren't synchronous, they will in all likelihood receive a response after all of the queries/requests have already been sent, and therefore after the second log line from each method has already executed. The remaining log lines are processed eventually, and therefore fall in to the last batch.
Your code is being processed asynchronously. It just doesn't look quite how you might expect. Async doesn't mean random order. It just means some code flow is paused until a later condition is met, allowing other code to be processed in the mean time. If the conditions happen to sync up, then so does your code flow.
Actually async execution works, but it doesn't work as you expect. Select statement starts tasks for all posts and then they all work concurrently that leads you to performance problems you.
The best approach to achieve expected behavior is to reduce the degree of parallelism. There are no build-in tools to do that so I can offer 2 workarounds:
Use TPL DataFlow library. It is developed by Microsoft but not very popular. You can easily find enough examples though.
Manage parallel tasks by yourself with SemaphoreSlim. It would look like this:
semaphore = new SemaphoreSlim(degreeOfParallelism);
cts = new CancellationTokenSource();
var postVMs = await Task.WhenAll(
posts.Select(async p =>
{
await semaphore.WaitAsync(cts.Token).ConfigureAwait(false);
cts.Token.ThrowIfCancellationRequested();
new PostViewModel
{
PostId = p.Id,
PostContent = p.Content,
PostTitle = p.Title,
WriterAvatarUri = fileService.GetFileUri(p.Writer.Profile.AvatarId, Url),
WriterFullName = p.Writer.Profile.FullName,
WriterId = p.WriterId,
Liked = await postService.IsPostLikedByUserAsync(p.Id, UserId),// TODO this takes too long!!!!
}
semaphore.Release();
}));
And don't forget to use .ConfigureAwait(false) whenever it's possible.
My C# application stops responding for a long time, as I break the Debug it stops on a function.
foreach (var item in list)
{
xmldiff.Compare(item, secondary, output);
...
}
I guess the running time of this function is long or it hangs. Anyway, I want to wait for a certain time (e.g. 5 seconds) for the execution of this function, and if it exceeds this time, I skip it and go to the next item in the loop. How can I do it? I found some similar question but they are mostly for processes or asynchronous methods.
You can do it the brutal way: spin up a thread to do the work, join it with timeout, then abort it, if the join didn't work.
Example:
var worker = new Thread( () => { xmlDiff.Compare(item, secondary, output); } );
worker.Start();
if (!worker.Join( TimeSpan.FromSeconds( 1 ) ))
worker.Abort();
But be warned - aborting threads is not considered nice and can make your app unstable. If at all possible try to modify Compare to accept a CancellationToken to cancel the comparison.
I would avoid directly using threads and use Microsoft's Reactive Extensions (NuGet "Rx-Main") to abstract away the management of the threads.
I don't know the exact signature of xmldiff.Compare(item, secondary, output) but if I assume it produces an integer then I could do this with Rx:
var query =
from item in list.ToObservable()
from result in
Observable
.Start(() => xmldiff.Compare(item, secondary, output))
.Timeout(TimeSpan.FromSeconds(5.0), Observable.Return(-1))
select new { item, result };
var subscription =
query
.Subscribe(x =>
{
/* do something with `x.item` and/or `x.result` */
});
This automatically iterates through each item and starts a background computation of xmldiff.Compare, but only allows each computation to take as much as 5.0 seconds before returning a default value of -1.
The subscription variable is an IDisposable, so if you want to abort the entire query before it completes just call .Dispose().
I skip it and go to the next item in the loop
By "skip it", do you mean "leave it there" or "cancel it"? The two scenarios are quite different. But for both two I suggest you use Task.
//generate 10 example tasks
var tasks = Enumerable
.Range(0, 10)
.Select(n => new Task(() => DoSomething(n)))
.ToList();
var maxExecutionTime = TimeSpan.FromSeconds(5);
foreach (var task in tasks)
{
if (task.Wait(maxExecutionTime))
{
//the task is finished in time
}
else
{
// the task is over time
// just leave it there
// the loop continues
// if you want to cancel it, see
// http://stackoverflow.com/questions/4783865/how-do-i-abort-cancel-tpl-tasks
}
}
One thing to improve is "do you really need to run your tasks one by one?" If they are independent you can run them in parallel.
I have a C# requirement for individually processing a 'great many' (perhaps > 100,000) records. Running this process sequentially is proving to be very slow with each record taking a good second or so to complete (with a timeout error set at 5 seconds).
I would like to try running these tasks asynchronously by using a set number of worker 'threads' (I use the term 'thread' here cautiously as I am not sure if I should be looking at a thread, or a task or something else).
I have looked at the ThreadPool, but I can't imagine it could queue the volume of requests required. My ideal pseudo code would look something like this...
public void ProcessRecords() {
SetMaxNumberOfThreads(20);
MyRecord rec;
while ((rec = GetNextRecord()) != null) {
var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
task.Start()
}
}
I will also need a mechanism that the processing method can report back to the parent/calling class.
Can anyone point me in the right direction with perhaps some example code?
A possible simple solution would be to use a TPL Dataflow block which is a higher abstraction over the TPL with configurations for degree of parallelism and so forth. You simply create the block (ActionBlock in this case), Post everything to it, wait asynchronously for completion and TPL Dataflow handles all the rest for you:
var block = new ActionBlock<MyRecord>(
rec => ProcessRecord(rec),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
block.Post(rec);
}
block.Complete();
await block.Completion
Another benefit is that the block starts working as soon as the first record arrives and not only when all the records have been received.
If you need to report back on each record you can use a TransformBlock to do the actual processing and link an ActionBlock to it that does the updates:
var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
ProcessRecord(rec);
return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
var reporter = new ActionBlock<Report>(report =>
{
RaiseEvent(report) // Or any other mechanism...
});
transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
transform.Post(rec);
}
transform.Complete();
await transform.Completion
Have you thought about using parallel processing with Actions?
ie, create a method to process a single record, add each record method as an action into a list, and then perform a parrallel.for on the list.
Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())
This is in vb.net, but I think you get the gist.
NOTE: This is for winrt Universal app *
I'm fetching data from a service through use of an async method:
var activityList = await Task.Run(() => dataService.GetActivitiesAsync());
But for each activity I still need to load Athlete data.
So now I just loop the activities and load up the athlete data like so ( there is no bulk load option for athlete ):
foreach(var activity in activityList)
{
activity.Athlete = dataService.GetAthleteAsync(activity.AthleteID);
}
But this GetAthleteAsync will also return a Task, so I was just wondering isn't there a better way to get this done in a background thread? Somehow with Task.WhenAll or anything else?
The Athlete property on the Activity object has NotifyPropertyChanged so the UI will show the needed data when it's set.
The GetAthleteAsync method will already try to cache athletes based on the given id.
Any suggestions on how to make this perform better?
Some details, all methods are connecting to a web API.
Mostly we'll be targetting 10 unique Athletes ( but depending on the user, this could increase ).
Hms, if dataService.GetActivitiesAsync() is async and everything is called from the UI-thread, what you can do is this:
// no wrapping in Task, it is async
var activityList = await dataService.GetActivitiesAsync();
// Select a good enough tuple
var results = (from activity in activityList
select new {
Activity = activity,
AthleteTask = dataService.GetAthleteAsync(activity.AthleteID)
}).ToList(); // begin enumeration
// Wait for them to finish, ie relinquish control of the thread
await Task.WhenAll(results.Select(t => t.AthleteTask));
// Set the athletes
foreach(var pair in results)
{
pair.Activity.Athlete = pair.AthleteTask.Result;
}
(Written of the top of my head, ie no syntax checking, so some method calls could be wrong)