Restarting a task in the background if certain errors occur - c#

I am using some REST requests using Mono.Mac (3.2.3) to communicate with a server, and as a retry mechanism I am quietly attempting to give the HTTP actions multiple tries if they fail, or time out.
I have the following;
var tries = 0;
while (tries <= ALLOWED_TRIES)
{
try
{
postTask.Start();
tries++;
if (!postTask.Wait(Timeout))
{
throw new TimeoutException("Operation timed out");
}
break;
} catch (Exception e) {
if (tries > ALLOWED_TRIES)
{
throw new Exception("Failed to access Resource.", e);
}
}
}
Where the task uses parameters of the parent method like so;
var postTask = new Task<HttpWebResponse>(() => {return someStuff(foo, bar);},
Task.Factory.CancellationToken,
Task.Factory.CreationOptions);
The problem seems to be that the task does not want to be run again with postTask.Start() after it's first completion (and subsequent failure). Is there a simple way of doing this, or am I misusing tasks in this way? Is there some sort of method that resets the task to its initial state, or am I better off using a factory of some sort?

You're indeed misusing the Task here, for a few reasons:
You cannot run the same task more than once. When it's done, it's done.
It is not recommended to construct a Task object manually, there's Task.Run and Task.Factory.Start for that.
You should not use Task.Run/Task.Factory.Start for a task which does IO-bound work. They are intended for CPU-bound work, as they "borrow" a thread from ThreadPool to execute the task action. Instead, use pure async Task-based APIs for this, which do not need a dedicate thread to complete.
For example, below you can call GetResponseWithRetryAsync from the UI thread and still keep the UI responsive:
async Task<HttpWebResponse> GetResponseWithRetryAsync(string url, int retries)
{
if (retries < 0)
throw new ArgumentOutOfRangeException();
var request = WebRequest.Create(url);
while (true)
{
try
{
var result = await request.GetResponseAsync();
return (HttpWebResponse)result;
}
catch (Exception ex)
{
if (--retries == 0)
throw; // rethrow last error
// otherwise, log the error and retry
Debug.Print("Retrying after error: " + ex.Message);
}
}
}
More reading:
"Task.Factory.StartNew" vs "new Task(...).Start".
Task.Run vs Task.Factory.StartNew.

I would recommend doing something like this:
private int retryCount = 3;
...
public async Task OperationWithBasicRetryAsync()
{
int currentRetry = 0;
for (; ;)
{
try
{
// Calling external service.
await TransientOperationAsync();
// Return or break.
break;
}
catch (Exception ex)
{
Trace.TraceError("Operation Exception");
currentRetry++;
// Check if the exception thrown was a transient exception
// based on the logic in the error detection strategy.
// Determine whether to retry the operation, as well as how
// long to wait, based on the retry strategy.
if (currentRetry > this.retryCount || !IsTransient(ex))
{
// If this is not a transient error
// or we should not retry re-throw the exception.
throw;
}
}
// Wait to retry the operation.
// Consider calculating an exponential delay here and
// using a strategy best suited for the operation and fault.
Await.Task.Delay();
}
}
// Async method that wraps a call to a remote service (details not shown).
private async Task TransientOperationAsync()
{
...
}
This code is from the Retry Pattern Design from Microsoft. You can check it out here: https://msdn.microsoft.com/en-us/library/dn589788.aspx

Related

Retry Mechanism catching exception wont work

I am implementing a retry mechanism for an API call to make the same request if the stated business logic occured. I made it really easy for now and will enchance it later with exceptionType handling etc. I simply throw exception if the logic happens and catch them in the retry mechanism but in below the code is just killing the thread and does not execute after the first try. Can you help me please what I am missing here ?
This is the retry logic I am trying to use.
public static T Do<T>(Func<T> action, TimeSpan retryInterval, int attemptCount)
{
var exceptions = new List<Exception>();
for (int attempted = 0; attempted < attemptCount; attempted++)
{
try
{
if (attempted > 0)
{
Thread.Sleep(retryInterval);
}
return action();
}
catch (Exception ex)
{
exceptions.Add(ex);
}
}
throw new AggregateException(exceptions);
}
This is how I invoke the retry method.
await Task.Run(() =>
{
RetryHelper.Do(() => ConfirmRequestRetryAsync(request, true), TimeSpan.FromSeconds(60), 10);
});
And this is the method which can throw exceptions due to logic.
public async void ConfirmRequestRetryAsync(ConfirmRequest request, bool flag)
{
logger.Info($"Confirm Request Async Called for the request : {JsonConvert.SerializeObject(request)}");
var confirmRequest = GetSignedConfirmRequest(request.PaymentId);
var confirmResponse = await MakeRequest(confirmRequest);
//Added flag and sending at false at first try to not throw exception
//Then in retry mechanism this exceptions will be use to trigger retry logic.
if (flag)
{
var statu = ConfirmResponseXmlConvert(confirmResponse);
if (statu.Item1 == "0" && statu.Item2 == "InProcess")
{
throw new Exception("InProcess");
}
else if (statu.Item1 == "-1" && statu.Item2 != "Declined")
{
throw new Exception("Error");
}
}
}
The ConfirmRequestRetryAsync is an async void method. This means it will return at the first await. The rest of the code will run at some later time, in the same thread context as the caller. So when the exceptions are thrown the retrying method has already returned and there is nothing to catch the exceptions.
The fix is to make it an async Task method, and await this in your retry-method. This might require two variants of the retrying method, one for async methods and one for non-async.
A rule of thumb for async void is to never allow exceptions to be thrown from them, since there is no possibility for anyone to catch them. Always use async Task if exceptions is a possibility.

Timeout for asynchronous Task<T> with additional exception handling

In my project, I reference types and interfaces from a dynamic link library.
The very first thing I have to do when using this specific library is to create an instance of EA.Repository, which is defined within the library and serves as kind of an entry point for further usage.
The instantiation EA.Repository repository = new EA.Repository() performs some complex stuff in the background, and I find myself confronted with three possible outcomes:
Instantiation takes some time but finishes successfully in the end
An exception is thrown (either immediately or after some time)
The instantiation blocks forever (in which case I'd like to cancel and inform the user)
I was able to come up with an asynchronous approach using Task:
public static void Connect()
{
// Do the lengthy instantiation asynchronously
Task<EA.Repository> task = Task.Run(() => { return new EA.Repository(); });
bool isCompletedInTime;
try
{
// Timeout after 5.0 seconds
isCompletedInTime = task.Wait(5000);
}
catch (Exception)
{
// If the instantiation fails (in time), throw a custom exception
throw new ConnectionException();
}
if (isCompletedInTime)
{
// If the instantiation finishes in time, store the object for later
EapManager.Repository = task.Result;
}
else
{
// If the instantiation did not finish in time, throw a custom exception
throw new TimeoutException();
}
}
(I know, you can probably already spot a lot of issues here. Please be patient with me... Recommendations would be appreciated!)
This approach works so far - I can simulate both the "exception" and the "timeout" scenario and I obtain the desired behavior.
However, I have identified another edge case: Let's assume the instantiation task takes long enough that the timeout expires and then throws an exception. In this case, I sometimes end up with an AggregateException saying that the task has not been observed.
I'm struggling to find a feasible solution to this. I can't really cancel the task when the timeout expires, because the blocking instantiation obviously prevents me from using the CancellationToken approach.
The only thing I could come up with is to start observing the task asynchronously (i.e. start another task) right before throwing my custom TimeoutException:
Task observerTask = Task.Run(() => {
try { task.Wait(); }
catch (Exception) { }
});
throw new TimeoutException();
Of course, if the instantiation really blocks forever, I already had the first task never finish. With the observer task, now I even have two!
I'm quite insecure about this whole approach, so any advice would be welcome!
Thank you very much in advance!
I'm not sure if I fully understood what you're trying to achieve, but what if you do something like this -
public static void Connect()
{
Task<EA.Repository> _realWork = Task.Run(() => { return new EA.Repository(); });
Task _timeoutTask = Task.Delay(5000);
Task.WaitAny(new Task[]{_realWork, timeoutTask});
if (_timeoutTask.Completed)
{
// timed out
}
else
{
// all good, access _realWork.Result
}
}
or you can even go a bit shorter -
public static void Connect()
{
Task<EA.Repository> _realWork = Task.Run(() => { return new EA.Repository(); });
var completedTaskIndex = Task.WaitAny(new Task[]{_realWork}, 5000);
if (completedTaskIndex == -1)
{
// timed out
}
else
{
// all good, access _realWork.Result
}
}
You can also always call Task.Run with a CancellationToken that will time out, but that will raise an exception - the above solutions give you control of the behaviour without an exception being thrown (even though you can always try/catch)
Here is an extension method that you could use to explicitly observe the tasks that may fail while unobserved:
public static Task<T> AsObserved<T>(this Task<T> task)
{
task.ContinueWith(t => t.Exception);
return task;
}
Usage example:
var task = Task.Run(() => new EA.Repository()).AsObserved();

What is the best way to implement a Retry Wrapper in C#?

We currently have a naive RetryWrapper which retries a given func upon the occurrence of an exception:
public T Repeat<T, TException>(Func<T> work, TimeSpan retryInterval, int maxExecutionCount = 3) where TException : Exception
{
...
And for the retryInterval we are using the below logic to "wait" before the next attempt.
_stopwatch.Start();
while (_stopwatch.Elapsed <= retryInterval)
{
// do nothing but actuallky it does! lots of CPU usage specially if retryInterval is high
}
_stopwatch.Reset();
I don't particularly like this logic, also ideally I would prefer the retry logic NOT to happen on the main thread, can you think of a better way?
Note: I am happy to consider answers for .Net >= 3.5
So long as your method signature returns a T, the main thread will have to block until all retries are completed. However, you can reduce CPU by having the thread sleep instead of doing a manual reset event:
Thread.Sleep(retryInterval);
If you are willing to change your API, you can make it so that you don't block the main thread. For example, you could use an async method:
public async Task<T> RepeatAsync<T, TException>(Func<T> work, TimeSpan retryInterval, int maxExecutionCount = 3) where TException : Exception
{
for (var i = 0; i < maxExecutionCount; ++i)
{
try { return work(); }
catch (TException ex)
{
// allow the program to continue in this case
}
// this will use a system timer under the hood, so no thread is consumed while
// waiting
await Task.Delay(retryInterval);
}
}
This can be consumed synchronously with:
RepeatAsync<T, TException>(work, retryInterval).Result;
However, you can also start the task and then wait for it later:
var task = RepeatAsync<T, TException>(work, retryInterval);
// do other work here
// later, if you need the result, just do
var result = task.Result;
// or, if the current method is async:
var result = await task;
// alternatively, you could just schedule some code to run asynchronously
// when the task finishes:
task.ContinueWith(t => {
if (t.IsFaulted) { /* log t.Exception */ }
else { /* success case */ }
});
Consider using Transient Fault Handling Application Block
The Microsoft Enterprise Library Transient Fault Handling Application
Block lets developers make their applications more resilient by adding
robust transient fault handling logic. Transient faults are errors
that occur because of some temporary condition such as network
connectivity issues or service unavailability. Typically, if you retry
the operation that resulted in a transient error a short time later,
you find that the error has disappeared.
It is available as a NuGet package.
using Microsoft.Practices.TransientFaultHandling;
using Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling;
...
// Define your retry strategy: retry 5 times, starting 1 second apart
// and adding 2 seconds to the interval each retry.
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2));
// Define your retry policy using the retry strategy and the Windows Azure storage
// transient fault detection strategy.
var retryPolicy =
new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);
// Receive notifications about retries.
retryPolicy.Retrying += (sender, args) =>
{
// Log details of the retry.
var msg = String.Format("Retry - Count:{0}, Delay:{1}, Exception:{2}",
args.CurrentRetryCount, args.Delay, args.LastException);
Trace.WriteLine(msg, "Information");
};
try
{
// Do some work that may result in a transient fault.
retryPolicy.ExecuteAction(
() =>
{
// Your method goes here!
});
}
catch (Exception)
{
// All the retries failed.
}
How about using a timer instead of stopwatch?
For example:
TimeSpan retryInterval = new TimeSpan(0, 0, 5);
DateTime startTime;
DateTime retryTime;
Timer checkInterval = new Timer();
private void waitMethod()
{
checkInterval.Interval = 1000;
checkInterval.Tick += checkInterval_Tick;
startTime = DateTime.Now;
retryTime = startTime + retryInterval;
checkInterval.Start();
}
void checkInterval_Tick(object sender, EventArgs e)
{
if (DateTime.Now >= retryTime)
{
checkInterval.Stop();
// Retry Interval Elapsed
}
}

Azure ServiceBus & async - To be, or not to be?

I'm running Service Bus on Azure, pumping about 10-100 messages per second.
Recently I've switched to .net 4.5 and all excited refactored all the code to have 'async' and 'await' at least twice in each line to make sure it's done 'properly' :)
Now I'm wondering whether it's actually for better or for worse. If you could have a look at the code snippets and let me know what your thoughts are. I especially worried if the thread context switching is not giving me more grief than benefit, from all the asynchrony... (looking at !dumpheap it's definitely a factor)
Just a bit of description - I will be posting 2 methods - one that does a while loop on a ConcurrentQueue, waiting for new messages and the other method that sends one message at a time. I'm also using the Transient Fault Handling block exactly as Dr. Azure prescribed.
Sending loop (started at the beginning, waiting for new messages):
private async void SendingLoop()
{
try
{
await this.RecreateMessageFactory();
this.loopSemaphore.Reset();
Buffer<SendMessage> message = null;
while (true)
{
if (this.cancel.Token.IsCancellationRequested)
{
break;
}
this.semaphore.WaitOne();
if (this.cancel.Token.IsCancellationRequested)
{
break;
}
while (this.queue.TryDequeue(out message))
{
try
{
using (message)
{
//only take send the latest message
if (!this.queue.IsEmpty)
{
this.Log.Debug("Skipping qeued message, Topic: " + message.Value.Topic);
continue;
}
else
{
if (this.Topic == null || this.Topic.Path != message.Value.Topic)
await this.EnsureTopicExists(message.Value.Topic, this.cancel.Token);
if (this.cancel.Token.IsCancellationRequested)
break;
await this.SendMessage(message, this.cancel.Token);
}
}
}
catch (OperationCanceledException)
{
break;
}
catch (Exception ex)
{
ex.LogError();
}
}
}
}
catch (OperationCanceledException)
{ }
catch (Exception ex)
{
ex.LogError();
}
finally
{
if (this.loopSemaphore != null)
this.loopSemaphore.Set();
}
}
Sending a message:
private async Task SendMessage(Buffer<SendMessage> message, CancellationToken cancellationToken)
{
//this.Log.Debug("MessageBroadcaster.SendMessage to " + this.GetTopic());
bool entityNotFound = false;
if (this.MessageSender.IsClosed)
{
//this.Log.Debug("MessageBroadcaster.SendMessage MessageSender closed, recreating " + this.GetTopic());
await this.EnsureMessageSender(cancellationToken);
}
try
{
await this.sendMessageRetryPolicy.ExecuteAsync(async () =>
{
message.Value.Body.Seek(0, SeekOrigin.Begin);
using (var msg = new BrokeredMessage(message.Value.Body, false))
{
await Task.Factory.FromAsync(this.MessageSender.BeginSend, this.MessageSender.EndSend, msg, null);
}
}, cancellationToken);
}
catch (MessagingEntityNotFoundException)
{
entityNotFound = true;
}
catch (OperationCanceledException)
{ }
catch (ObjectDisposedException)
{ }
catch (Exception ex)
{
ex.LogError();
}
if (entityNotFound)
{
if (!cancellationToken.IsCancellationRequested)
{
await this.EnsureTopicExists(message.Value.Topic, cancellationToken);
}
}
}
The code above is from a 'Sender' class that sends 1 message/second. I have about 50-100 instances running at any given time, so it could be quite a number of threads.
Btw do not worry about EnsureMessageSender, RecreateMessageFactory, EnsureTopicExists too much, they are not called that often.
Would I not be better of just having one background thread working through the message queue and sending messages synchronously, provided all I need is send one message at a time, not worry about the async stuff and avoid the overheads coming with it.
Note that usually it's a matter of milliseconds to send one Message to Azure Service Bus, it's not really expensive. (Except at times when it's slow, times out or there is a problem with Service Bus backend, it could be hanging for a while trying to send stuff).
Thanks and sorry for the long post,
Stevo
Proposed Solution
Would this example be a solution to my situation?
static void Main(string[] args)
{
var broadcaster = new BufferBlock<int>(); //queue
var cancel = new CancellationTokenSource();
var run = Task.Run(async () =>
{
try
{
while (true)
{
//check if we are not finished
if (cancel.IsCancellationRequested)
break;
//async wait until a value is available
var val = await broadcaster.ReceiveAsync(cancel.Token).ConfigureAwait(false);
int next = 0;
//greedy - eat up and ignore all the values but last
while (broadcaster.TryReceive(out next))
{
Console.WriteLine("Skipping " + val);
val = next;
}
//check if we are not finished
if (cancel.IsCancellationRequested)
break;
Console.WriteLine("Sending " + val);
//simulate sending delay
await Task.Delay(1000).ConfigureAwait(false);
Console.WriteLine("Value sent " + val);
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}, cancel.Token);
//simulate sending messages. One every 200mls
for (int i = 0; i < 20; i++)
{
Console.WriteLine("Broadcasting " + i);
broadcaster.Post(i);
Thread.Sleep(200);
}
cancel.Cancel();
run.Wait();
}
You say:
The code above is from a 'Sender' class that sends 1 message/second. I
have about 50-100 instances running at any given time, so it could be
quite a number of threads.
This is a good case for async. You save lots of threads here. Async reduces context switching because it is not thread-based. It does not context-switch in case of something requiring a wait. Instead, the next work item is being processed on the same thread (if there is one).
For that reason you async solution will definitely scale better than a synchronous one. Whether it actually uses less CPU at 50-100 instances of your workflow needs to be measured. The more instances there are the higher the probability of async being faster becomes.
Now, there is one problem with the implementation: You're using a ConcurrentQueue which is not async-ready. So you actually do use 50-100 threads even in your async version. They will either block (which you wanted to avoid) or busy-wait burning 100% CPU (which seems to be the case in your implementation!). You need to get rid of this problem and make the queuing async, too. Maybe a SemaphoreSlim is of help here as it can be waited on asynchronously.
First, keep in mind that Task != Thread. Tasks (and async method continuations) are scheduled to the thread pool, where Microsoft has put in tons of optimizations that work wonders as long as your tasks are fairly short.
Reviewing your code, one line raises a flag: semaphore.WaitOne. I assume you're using this as a kind of signal that there is data available in the queue. This is bad because it's a blocking wait inside an async method. By using a blocking wait, the code changes from a lightweight continuation into a much heavier thread pool thread.
So, I would follow #usr's recommendation and replace the queue (and the semaphore) with an async-ready queue. TPL Dataflow's BufferBlock<T> is an async-ready producer/consumer queue available via NuGet. I recommend this one first because it sounds like your project could benefit from using dataflow more extensively than just as a queue (but the queue is a fine place to start).
Other async-ready data structures exist; my AsyncEx library has a couple of them. It's also not hard to build a simple one yourself; I have a blog post on the subject. But I recommend TPL Dataflow in your situation.

How to Correctly Cancel a TPL Task with Continuation

I have a long running operation which I am putting on a background thread using TPL. What I have currently works but I am confused over where I should be handling my AggregateException during a cancellation request.
In a button click event I start my process:
private void button1_Click(object sender, EventArgs e)
{
Utils.ShowWaitCursor();
buttonCancel.Enabled = buttonCancel.Visible = true;
try
{
// Thread cancellation.
cancelSource = new CancellationTokenSource();
token = cancelSource.Token;
// Get the database names.
string strDbA = textBox1.Text;
string strDbB = textBox2.Text;
// Start duplication on seperate thread.
asyncDupSqlProcs =
new Task<bool>(state =>
UtilsDB.DuplicateSqlProcsFrom(token, mainForm.mainConnection, strDbA, strDbB),
"Duplicating SQL Proceedures");
asyncDupSqlProcs.Start();
//TaskScheduler uiThread = TaskScheduler.FromCurrentSynchronizationContext();
asyncDupSqlProcs.ContinueWith(task =>
{
switch (task.Status)
{
// Handle any exceptions to prevent UnobservedTaskException.
case TaskStatus.Faulted:
Utils.ShowDefaultCursor();
break;
case TaskStatus.RanToCompletion:
if (asyncDupSqlProcs.Result)
{
Utils.ShowDefaultCursor();
Utils.InfoMsg(String.Format(
"SQL stored procedures and functions successfully copied from '{0}' to '{1}'.",
strDbA, strDbB));
}
break;
case TaskStatus.Canceled:
Utils.ShowDefaultCursor();
Utils.InfoMsg("Copy cancelled at users request.");
break;
default:
Utils.ShowDefaultCursor();
break;
}
}, TaskScheduler.FromCurrentSynchronizationContext()); // Or uiThread.
return;
}
catch (Exception)
{
// Do stuff...
}
}
In the method DuplicateSqlProcsFrom(CancellationToken _token, SqlConnection masterConn, string _strDatabaseA, string _strDatabaseB, bool _bCopyStoredProcs = true, bool _bCopyFuncs = true) I have
DuplicateSqlProcsFrom(CancellationToken _token, SqlConnection masterConn, string _strDatabaseA, string _strDatabaseB, bool _bCopyStoredProcs = true, bool _bCopyFuncs = true)
{
try
{
for (int i = 0; i < someSmallInt; i++)
{
for (int j = 0; j < someBigInt; j++)
{
// Some cool stuff...
}
if (_token.IsCancellationRequested)
_token.ThrowIfCancellationRequested();
}
}
catch (AggregateException aggEx)
{
if (aggEx.InnerException is OperationCanceledException)
Utils.InfoMsg("Copy operation cancelled at users request.");
return false;
}
catch (OperationCanceledException)
{
Utils.InfoMsg("Copy operation cancelled at users request.");
return false;
}
}
In a button Click event (or using a delegate (buttonCancel.Click += delegate { /Cancel the Task/ }) I cancel theTask` as follows:
private void buttonCancel_Click(object sender, EventArgs e)
{
try
{
cancelSource.Cancel();
asyncDupSqlProcs.Wait();
}
catch (AggregateException aggEx)
{
if (aggEx.InnerException is OperationCanceledException)
Utils.InfoMsg("Copy cancelled at users request.");
}
}
This catches the OperationCanceledException fine in method DuplicateSqlProcsFrom and prints my message, but in the call-back provided by the asyncDupSqlProcs.ContinueWith(task => { ... }); above the task.Status is always RanToCompletion; it should be cancelled!
What is the right way to capture and deal with the Cancel() task in this case. I know how this is done in the simple cases shown in this example from the CodeProject and from the examples on MSDN but I am confused in this case when running a continuation.
How do I capture the cancel task in this case and how to ensure the task.Status is dealt with properly?
You're catching the OperationCanceledException in your DuplicateSqlProcsFrom method, which prevents its Task from ever seeing it and accordingly setting its status to Canceled. Because the exception is handled, DuplicateSqlProcsFrom finishes without throwing any exceptions and its corresponding task finishes in the RanToCompletion state.
DuplicateSqlProcsFrom shouldn't be catching either OperationCanceledException or AggregateException, unless it's waiting on subtasks of its own. Any exceptions thrown (including OperationCanceledException) should be left uncaught to propagate to the continuation task. In your continuation's switch statement, you should be checking task.Exception in the Faulted case and handling Canceled in the appropriate case as well.
In your continuation lambda, task.Exception will be an AggregateException, which has some handy methods for determining what the root cause of an error was, and handling it. Check the MSDN docs particularly for the InnerExceptions (note the "S"), GetBaseException, Flatten and Handle members.
EDIT: on getting a TaskStatus of Faulted instead of Canceled.
On the line where you construct your asyncDupSqlProcs task, use a Task constructor which accepts both your DuplicateSqlProcsFrom delegate and the CancellationToken. That associates your token with the task.
When you call ThrowIfCancellationRequested on the token in DuplicateSqlProcsFrom, the OperationCanceledException that is thrown contains a reference to the token that was cancelled. When the Task catches the exception, it compares that reference to the CancellationToken associated with it. If they match, then the task transitions to Canceled. If they don't, the Task infrastructure has been written to assume that this is an unforeseen bug, and the task transitions to Faulted instead.
Task Cancellation in MSDN
Sacha Barber has great series of articles about TPL. Try this one, he describe simple task with continuation and canceling

Categories