Tracking progress of a multi-step Task - c#

I am working on a simple server that exposes webservices to clients. Some of the requests may take a long time to complete, and are logically broken into multiple steps. For such requests, it is required to report progress during execution. In addition, a new request may be initiated before a previous one completes, and it is required that both execute concurrently (barring some system-specific limitations).
I was thinking of having the server return a TaskId to its clients, and having the clients track the progress of the requests using the TaskId. I think this is a good approach, and I am left with the issue of how tasks are managed.
Never having used the TPL, I was thinking it would be a good way to approach this problem. Indeed, it allows me to run multiple tasks concurrently without having to manually manage threads. I can even create multi-step tasks relatively easily using ContinueWith.
I can't come up with a good way of tracking a task's progress, though. I realize that when my requests consist of a single "step", then the step has to cooperatively report its state. This is something I would prefer to avoid at this point. However, when a request consists of multiple steps, I would like to know which step is currently executing and report progress accordingly. The only way I could come up with is extremely tiresome:
Task<int> firstTask = new Task( () => { DoFirstStep(); return 3.14; } );
firstTask.
ContinueWith<int>( task => { UpdateProgress("50%"); return task.Result; } ).
ContinueWith<string>( task => { DoSecondStep(task.Result); return "blah"; }.
ContinueWith<string>( task => { UpdateProgress("100%"); return task.Result; } ).
And even this is not perfect since I would like the Task to store its own progress, instead of having UpdateProgress update some known location. Plus it has the obvious downside of having to change a lot of places when adding a new step (since now the progress is 33%, 66%, 100% instead of 50%, 100%).
Does anyone have a good solution?
Thanks!

This isn't really a scenario that the Task Parallel Library supports that fully.
You might consider an approach where you fed progress updates to a queue and read them on another Task:
static void Main(string[] args)
{
Example();
}
static BlockingCollection<Tuple<int, int, string>> _progressMessages =
new BlockingCollection<Tuple<int, int, string>>();
public static void Example()
{
List<Task<int>> tasks = new List<Task<int>>();
for (int i = 0; i < 10; i++)
tasks.Add(Task.Factory.StartNew((object state) =>
{
int id = (int)state;
DoFirstStep(id);
_progressMessages.Add(new Tuple<int, int, string>(
id, 1, "10.0%"));
DoSecondStep(id);
_progressMessages.Add(new Tuple<int, int, string>(
id, 2, "50.0%"));
// ...
return 1;
},
(object)i
));
Task logger = Task.Factory.StartNew(() =>
{
foreach (var m in _progressMessages.GetConsumingEnumerable())
Console.WriteLine("Task {0}: Step {1}, progress {2}.",
m.Item1, m.Item2, m.Item3);
});
List<Task> waitOn = new List<Task>(tasks.ToArray());
waitOn.Add(logger);
Task.WaitAll(waitOn.ToArray());
Console.ReadLine();
}
private static void DoSecondStep(int id)
{
Console.WriteLine("{0}: First step", id);
}
private static void DoFirstStep(int id)
{
Console.WriteLine("{0}: Second step", id);
}
This sample doesn't show cancellation, error handling or account for your requirement that your task may be long running. Long running tasks place special requirements on the scheduler. More discussion of this can be found at http://parallelpatterns.codeplex.com/, download the book draft and look at Chapter 3.
This is simply an approach for using the Task Parallel Library in a scenario like this. The TPL may well not be the best approach here.
If your web services are running inside ASP.NET (or a similar web application server) then you should also consider the likely impact of using threads from the thread pool to execute tasks, rather than service web requests:
How does Task Parallel Library scale on a terminal server or in a web application?

I don't think the solution you are looking for will involve the Task API. Or at least, not directly. It doesn't support the notion of percentage complete, and the Task/ContinueWith functions need to participate in that logic because it's data that is only available at that level (only the final invocation of ContinueWith is in any position to know the percentage complete, and even then, doing so algorithmically will be a guess at best because it certainly doesn't know if one task is going to take a lot longer than the other. I suggest you create your own API to do this, possibly leveraging the Task API to do the actual work.

This might help: http://blog.stephencleary.com/2010/06/reporting-progress-from-tasks.html. In addition to reporting progress, this solution also enables updating form controls without getting the Cross-thread operation not valid exception.

Related

How to ensure parallel tasks dequeue unique entries from ConcurrentQueue<T>?

Hi I have a concurrent Queue that is loaded with files from database. These files are to be processed by parallel Tasks that will dequeue the files. However I run into issues where after some time, I start getting tasks that dequeue the same file at the same time (which leads to "used by another process errors on the file). And I also get more tasks than are supposed to be allocated. I have even seen 8 tasks running at once which should not be happening. The active tasks limit is 5
Rough code:
private void ParseQueuedTDXFiles()
{
while (_signalParseQueuedFilesEvent.WaitOne())
{
Task.Run(() => SetParsersTask());
}
}
The _signalParseQueuedFilesEvent is set on a timer in a Windows Service
The above function then calls SetParsersTask. This is why I use a concurrent Dictionary to track how many active tasks there are. And make sure they are below _ActiveTasksLimit:
private void SetParsersTask()
{
if (_ConcurrentqueuedTdxFilesToParse.Count > 0)
{
if (_activeParserTasksDict.Count < _ActiveTasksLimit) //ConcurrentTask Dictionary Used to control how many Tasks should run
{
int parserCountToStart = _ActiveTasksLimit - _activeParserTasksDict.Count;
Parallel.For(0, parserCountToStart, parserToStart =>
{
lock(_concurrentQueueLock)
Task.Run(() => PrepTdxParser());
});
}
}
}
Which then calls this function which dequeues the Concurrent Queue:
private void PrepTdxParser()
{
TdxFileToProcessData fileToProcess;
lock (_concurrentQueueLock)
_ConcurrentqueuedTdxFilesToParse.TryDequeue(out fileToProcess);
if (!string.IsNullOrEmpty(fileToProcess.TdxFileName))
{
LaunchTDXParser(fileToProcess);
}
}
I even put a lock on _ConcurrentqueuedTdxFilesToParse even though I know it doesn't need one. All to make sure that I never run into a situation where two Tasks are dequeuing the same file.
This function is where I add and remove Tasks as well as launch the file parser for the dequeued file:
private void LaunchTDXParser(TdxFileToProcessData fileToProcess)
{
string fileName = fileToProcess.TdxFileName;
Task startParserTask = new Task(() => ConfigureAndStartProcess(fileName));
_activeParserTasksDict.TryAdd(fileName, startParserTask);
startParserTask.Start();
Task.WaitAll(startParserTask);
_activeParserTasksDict.TryRemove(fileName, out Task taskToBeRemoved);
}
Can you guys help me understand why I am getting the same file dequeued in two different Tasks? And why I am getting more Tasks than the _ActiveTasksLimit?
There is a number of red flags in this¹ code:
Using a WaitHandle. This tool it too primitive. I've never seen a problem solved with WaitHandles, that can't be solved in a simpler way without them.
Launching Task.Run tasks in a fire-and-forget fashion.
Launching a Parallel.For loop without configuring the MaxDegreeOfParallelism. This practically guarantees that the ThreadPool will get saturated.
Protecting a queue (_queuedTdxFilesToParse) with a lock (_concurrentQueueLock) only partially. If the queue is a Queue<T>, you must protect it on each and every operation, otherwise the behavior of the program is undefined. If the queue is a ConcurrentQueue<T>, there is no need to protect it because it is thread-safe by itself.
Calling Task.Factory.StartNew and Task.Start without configuring the scheduler argument.
So I am not surprised that your code is not working as expected. I can't point to a specific error that needs to be fixed. For me the whole approach is dubious, and needs to be reworked/scraped. Some concepts and tools that you might want to research before attempting to rewrite this code:
The producer-consumer pattern.
The BlockingCollection<T> class.
The TPL Dataflow library.
Optionally you could consider familiarizing yourself with asynchronous programming. It can help at reducing the number of threads that your program uses while running, resulting in a more efficient and scalable program. Two powerful asynchronous tools is the Channel<T> class and the Parallel.ForEachAsync API (available from .NET 6 and later).
¹ This answer was intended for a related question that is now deleted.
So I fixed my problem. The solution was first to not add more parallelism than needs be. I was trying to create a situaion where private void SetParsersTask() would not be held by tasks that still needed to finish process a file. So I foolishly threw in Parallel.For in addition to Task.Start which is already parallel. I fixed this by generating Fire and Forget Tasks in a normal for loop as opposed to Paralle.For:
private void SetParsersTask()
{
if (_queuedTdxFilesToParse.Count > 0)
{
if (_activeParserTasksDict.Count < _tdxParsersInstanceCount)
{
int parserCountToStart = _tdxParsersInstanceCount - _activeParserTasksDict.Count;
_queuedTdxFilesToParse = new ConcurrentQueue<TdxFileToProcessData>(_queuedTdxFilesToParse.Distinct());
for (int i = 0; i < parserCountToStart; i++)
{
Task.Run(() => PrepTdxParser());
}
}
}
}
After that I was still getting the occasional duplicate files so I moved the queue loading to another long running thread. And for that thread I use an AutoResetEvent so that the queue is only populated only once at any instance of time. As opposed to potentially another task loading it with duplicate files. It could be that both my enqueue and dequeue were both responsible and now it's addressed:
var _loadQueueTask = Task.Factory.StartNew(() => LoadQueue(), TaskCreationOptions.LongRunning);
private void LoadQueue()
{
while (_loadConcurrentQueueEvent.WaitOne())
{
if (_queuedTdxFilesToParse.Count < _tdxParsersInstanceCount)
{
int numFilesToGet = _tdxParsersInstanceCount - _activeParserTasksDict.Count;
var filesToAdd = ServiceDBHelper.GetTdxFilesToEnqueueForProcessingFromDB(numFilesToGet);
foreach (var fileToProc in filesToAdd)
{
ServiceDBHelper.UpdateTdxFileToProcessStatusAndUpdateDateTime(fileToProc.TdxFileName, 1, DateTime.Now);
_queuedTdxFilesToParse.Enqueue(fileToProc);
}
}
}
}
Thanks to Theo for pointing me to additional tools and making me look closer in my parallel loops

Why an additional async operation is making my code faster than when the operation is not taking place at all?

I'm working on a SMS-based game (Value Added Service), in which a question must be sent to each subscriber on a daily basis. There are over 500,000 subscribers and therefore performance is a key factor. Since each subscriber can be a difference state of the competition with different variables, database must be queried separately for each subscriber before sending a text message. To achieve the best performance I'm using .Net Task Parallel Library (TPL) to spawn parallel threadpool threads and do as much async operations as possible in each thread to finally send texts asap.
Before describing the actual problem there are some more information necessary to give about the code.
At first there was no async operation in the code. I just scheduled some 500,000 tasks with the default task scheduler into the Threadpool and each task would work through the routines, blocking on all EF (Entity Framework) queries and sequentially finishing its job. It was good, but not fast enough. Then I changed all EF queries to Async, the outcome was superb in speed but there has been so many deadlocks and timeouts in SQL server that about a third of the subscribers never received a text! After trying different solutions, I decided not to do too many Async Database operations while I have over 500,000 tasks running on a 24 core server (with at least 24 concurrent threadpool threads)!
I rolled back all the changes (the Asycn ones) expect for one web service call in each task which remained Async.
Now the weird case:
In my code, I have a boolean variable named "isCrossSellActive". When the variable is set some more DB operations take place and an asycn webservice call will happen on which the thread awaits. When this variable is false, none of these operations will happen including the async webservice call. Awkwardly when the variable is set the code runs so much faster than when it's not! It seems like for some reason the awaited async code (the cooperative thread) is making the code faster.
Here is the code:
public async Task AutoSendMessages(...)
{
//Get list of subscriptions plus some initialization
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(numberOfThreads);
TaskFactory taskFactory = new TaskFactory(lcts);
List<Task> tasks = new List<Task>();
//....
foreach (var sub in subscriptions)
{
AutoSendData data = new AutoSendData
{
ServiceId = serviceId,
MSISDN = sub.subscriber,
IsCrossSellActive = bolCrossSellHeader
};
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
}
GC.Collect();
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ae)
{
ae.Handle((ex) =>
{
_logRepo.LogException(1, "", ex);
return true;
});
}
await _autoSendRepo.SetAutoSendingStatusEnd(statusId);
}
public async Task SendQuestion(object data)
{
//extract variables from input parameter
try
{
if (isCrossSellActive)
{
int pieceCount = subscriptionRepo.GetSubscriberCarPieces(curSubscription.service, curSubscription.subscriber).Count(c => c.isConfirmed);
foreach (var rule in csRules)
{
if (rule.Applies)
{
if (await HttpClientHelper.GetJsonAsync<bool>(url, rule.TargetServiceBaseAddress))
{
int noOfAddedPieces = SomeCalculations();
if (noOfAddedPieces > 0)
{
crossSellRepo.SetPromissedPieces(curSubscription.subscriber, curSubscription.service,
rule.TargetShortCode, noOfAddedPieces, 0, rule.ExpirationLimitDays);
}
}
}
}
}
// The rest of the code. (Some db CRUD)
await SmsClient.SendSoapMessage(subscriber, smsBody);
}
catch (Exception ex){//...}
}
Ok, thanks to #usr and the clue he gave me, the problem is finally solved!
His comment drew my attention to the awaited taskFactory.StartNew(...) line which sequentially adds new tasks to the "tasks" list which is then awaited on by Task.WaitAll(tasks);
At first I removed the await keyword before the taskFactory.StartNew() and it led the code towards a horrible state of malfunction! I then returned the await keyword to before taskFactory.StartNew() and debugged the code using breakpoints and amazingly saw that the threads are ran one after another and sequentially before the first thread reaches the first await inside the "SendQuestion" routine. When the "isCrossSellActive" flag was set despite the more jobs a thread should do the first await keyword is reached earlier thus enabling the next scheduled task to run. But when its not set the only await keyword is the last line of the routine so its most likely to run sequentially to the end.
usr's suggestion to remove the await keyword in the for loop seemed to be correct but the problem was the Task.WaitAll() line would wait on the wrong list of Task<Task<void>> instead of Task<void>. I finally used Task.Run instead of TaskFactory.StartNew and everything changed. Now the service is working well. The final code inside the for loop is:
tasks.Add(Task.Run(async () =>
{
await SendQuestion(data);
}));
and the problem was solved.
Thank you all.
P.S. Read this article on Task.Run and why TaskFactory.StartNew is dangerous: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html
It's extremly hard to tell unless you add some profiling that tell you which code is taking longer now.
Without seeing more numbers my best guess would be that the SMS service doesn't like when you send too many requests in a short time and chokes. When you add the extra DB calls the extra delay make the sms service work better.
A few other small details:
await Task.WhenAll is usually a bit better than Task.WaitAll. WaitAll means the thread will sit around waiting. Making a deadlock slightly more likely.
Instead of:
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
You should be able to do
tasks.Add(SendQuestion(data));

Correct usage of Async/Await for Multiple Tasks To Db

I have a simple scenario but I would like to know if my approach is correct, is it better advised to chose a single task to save my failed orders or can i kick off and fire off multiple tasks and wait for them all to complete. What is the correct approach for this scenario when it comes to connecting to a Db and saving entities.
I already have a single task based version of the below that saves one entity into the db.
public async static Task SaveOrdersAsync(OrderService oService, OrderItemService oiService, IEnumerable<OrderTemplate> toSaveList, IUnitOfWork uow, IProgress<string> progress)
{
var toSave = toSaveList as IList<OrderTemplate> ?? toSaveList.ToList();
var tasks = new Task[toSave.Count()];
for (var i = 0; i < tasks.Length; i++)
{
var i1 = i;
tasks[i] = new Task(() => SaveToDb(oService, oiService, toSave.ElementAt(i1), uow), TaskCreationOptions.PreferFairness);
var message = string.Format("- Order: {0} has been resaved.\n", toSave.ElementAt(i1).Order.FriendlyId);
if (progress != null)
progress.Report(message);
}
await Task.WhenAll(tasks);
}
At the moment, I have tested the above and believe that tasks have not started as the progress bar keeps looping around. My assumption is that Task.WhenAll should start my tasks for me - thats what I think?
or should be using this in the loop:
tasks[i] = Task.Run(() => SaveToDb(oService, oiService, toSave.ElementAt(i1), uow));
I think I am close, just want someone to tell me if I a doing this correctly or not.
Feedback incorporated version:
public async static Task SaveOrdersAsync(OrderService oService, OrderItemService oiService, IEnumerable<OrderTemplate> toSaveList, IUnitOfWork uow, IProgress<string> progress)
{
var saveList = toSaveList as IList<OrderTemplate> ?? toSaveList.ToList();
var saveTask = Task.Run(() =>
{
foreach (var ot in saveList)
{
SaveToDbBatch(oService, oiService, ot);
var message = string.Format("- Order: {0} has been resaved.\n", ot.Order.FriendlyId);
if (progress != null)
progress.Report(message);
}
});
await saveTask;
await Cache.UoW.SaveAsync();
}
What is the correct approach for this scenario when it comes to connecting to a Db and saving entities.
Generally speaking, you should:
Batch your saves, if possible. In other words, call a single method to update multiple records simultaneously. E.g., EF has SaveChangesAsync.
Use the natural async APIs for your database instead of Task.Run (or - even worse - the task constructor). E.g., EF has SaveChangesAsync.
Yes, you're correct that creating a task doesn't start it. Calling Task.Run(...) is the better option.
However, an even better option is to use the task that is returned from your call to ExecuteAsync(...) and await on that. This is because the ExecuteAsync task is an IO task & not a thread, so it executes differently and doesn't use up a thread pool thread.
As a side-note: Depending on the complexity of the "Save" it might be more reliable to do each "Save" consecutively. This is because if there are any database errors (like constraint violations) caused by a parallel task, then it will be extremely hard to reproduce if they are executed in parallel (i.e. at random times).
new Task(...) does not start the task. It is not the responsibility of Task.WhenAll to start them. The Task ctor should almost never be used.
Use Task.Run.
Seems like condolidating this down to one task as posted in my update worked and it also solved a side issue that I thought I would raise here incase some else is keen on pursuing my original approach. But i agree with #jaytre, that depending on the complexity of your saves and the object being saved it might be better to do each save consecutively for error handling - but thats up to you.
So if you pursue my original approach you might run into this error:
An EdmType cannot be mapped to CLR classes multiple times. The EdmType 'FrootPipe.Data.Order' is mapped more than once.
Which was basically down to a locking/synchronization issue - so different tasks were accessing the model at more or the less same time as one another, all trying to re-add a failed order back into the data model. So the error for my scenario is a little hard to tell but some googling lead me to the below.
For further reading see here: Entity framework MappingException: The type 'XXX has been mapped more than once

How do to iteratively call a method without waiting for response from method in c# .Net 4.5.1?

First off I apologize for terrible wording of that question...here's the scenario:
I built a WEB API method that receives a ProductID and then uploads that products images to Amazon S3. This part is working just fine.
I am now trying to get a console app running that will grab a range of ProductIDs and loop through them, calling the API method, and not wait for the results...
Can anyone point me in the right direction?
I suppose another caveat would be to not eat up all the resources on the machine running the console app...so maybe a thread cap?
UPDATE (This still seems to be synchronous):
class Program
{
async static void DoUpload(int itemid)
{
Console.WriteLine("Starting #:" + itemid);
Thread.Sleep(2000); //Simulates long call to API
Console.WriteLine("Finishing #:" + itemid);
}
static void Main(string[] args)
{
for (int i = 0; i < 20; i++)
{
DoUpload(i);
}
}
}
There are a couple easy ways to do this.
I'd recommend using Parallels. It makes the most optimized use of your environments many threads/cores. For your example, you'd simply do something like this:
var status = Parallel.For(0, 20, DoUpload);
while (!status.IsCompleted)
{
//Do something while you wait
}
The other method would be to use Tasks, and send each call as a separate task. Be careful with this approach though because you could overload the system with pending tasks if you have too many iterations.
List<Tasks> tasks = new List<Tasks>();
for (int i = 0; i < 20; i++)
{
var task = Task.Run(() => DoUpload(i));
tasks.Add(task);
}
//wait for completion of all tasks
Task.WaitAll(tasks.ToArray());
I do not recommend using Parallel.For. It does not give an satisfactory control of parallelism (you probably don't want to hammer away hundrades of requests which will start to timeout) also it requires unnecessary context switching.
Threads/cores isn't the limiting factor in case http requests.
In the example change
Thread.Sleep(2000)
to
await Task.Delay(2000)
and when using real web api calls
await httpClient.PostAsync(...)
also remember to wait in Main
Console.ReadLine() // or something more sophisticated
otherwise the program will terminate before the calls have been made.
Then to control the level of parallelism I think the easiest solution is to use a Semaphore to count the number of outstanding calls, waiting in the main loop for the semaphore to be signaled again before issuing new requests.

Async methods in C#

I have several process that need to run in the background of a Windows Forms application, because they take too much time and I do not want to freeze the user interface until they completely finish, I would like to have an indicator to show the process of each operation, so far I have a form to show the progress of each operation but my operations run synchronously.
So my question is what is the easiest way to run these operation (that access the database) async??
I forgot one important feature that the application requires, the user will have the option to cancel any operation at any time. I think this requirement complicates the application a lot, at least with my current skills, so basically I would like to enphatize that I need a solution easy to understand, and easy to implement. i am aware there will be good practices to follow but at this point I would like some code working later I with more time I would refactor the code
.NET 4 added the Task Parallel Library, which provides a very clean mechanism for making synchronous operations asynchronous.
It allows you to wrap the sync operation into a Task, which you can then either wait on, or use with a continuation (some code that executes when the task completes).
This will often look something like:
Task processTask = Task.Factory.StartNew(() => YourProcess(foo, bar));
Once you have the task, you have quite a few options, including blocking:
// Do other work, then:
processTask.Wait(); // This blocks until the task is completed
Or, if you want a continuation (code to run when it's complete):
processTask.ContinueWith( t => ProcessCompletionMethod());
You can also use this to combine multiple asynchronous operations, and complete when any or all of them are finished, etc.
Note that using Task or Task<T> in this way has another huge advantage - if you later migrate to .NET 4.5, your API will work as-is, with no code changes, with the new async/await language features coming in C# 5.
I forgot one important feature that the application requires, the user will have the option to cancel any operation at any time.
The TPL was also designed, from it's inception, to work nicely in conjunction with the new cooperative cancellation model for .NET 4. This allows you to have a CancellationTokenSource which can be used to cancel any or all of your tasks.
Well in C# there are several ways to accomplish this
Personally I would recommend you to try the Reactive Extensions
http://msdn.microsoft.com/en-us/data/gg577609.aspx
You can actually do something like this:
https://stackoverflow.com/a/10804404/1268570
I created this for you, this is really easy although it is not thread-safe but this would be a good start point
In a form
var a = Observable.Start(() => Thread.Sleep(8000)).StartAsync(CancellationToken.None);
var b = Observable.Start(() => Thread.Sleep(15000)).StartAsync(CancellationToken.None);
var c = Observable.Start(() => Thread.Sleep(3000)).StartAsync(CancellationToken.None);
Manager.Add("a", a.ObserveOn(this).Subscribe(x => MessageBox.Show("a done")));
Manager.Add("b", b.ObserveOn(this).Subscribe(x => MessageBox.Show("b done")));
Manager.Add("c", c.ObserveOn(this).Subscribe(x => MessageBox.Show("c done")));
private void button1_Click(object sender, EventArgs e)
{
Manager.Cancel("b");
}
Manager utility
public static class Manager
{
private static IDictionary<string, IDisposable> runningOperations;
static Manager()
{
runningOperations = new Dictionary<string, IDisposable>();
}
public static void Add(string key, IDisposable runningOperation)
{
if (runningOperations.ContainsKey(key))
{
throw new ArgumentOutOfRangeException("key");
}
runningOperations.Add(key, runningOperation);
}
public static void Cancel(string key)
{
IDisposable value = null;
if (runningOperations.TryGetValue(key, out value))
{
value.Dispose();
runningOperations.Remove(key);
}
}
If the ORM/database API doesn't come with async methods itself, have a look at the BackgroundWorker Class. It supports both cancellation (CancelAsync/CancellationPending) and progress reporting (ReportProgress/ProgressChanged).

Categories