How do I use a circuit breaker? - c#

I'm looking for ways to make remote calls to services out of my control until a connect is successful. I also don't want to simply set a timer where an action gets executed every n seconds/minutes until successful. After a bunch of research it appears that the circuit breaker pattern is a great fit.
I found an implementation that uses an Castle Windsor interceptor, which looks awesome. The only problem is I don't know how to use it. From the few articles I found regarding the topic the only usage example I was able to find was to simply use the circuit breaker to call an action only once, which doesn't seem very useful. From that it seems I need to simply run my action using the circuit breaker in a while(true) loop.
How do I use the Windsor interceptor to execute an action making a call to an external service until it is successful without slamming our servers?
Could someone please fill in the missing pieces?
Here is what I was able to come up with
while(true)
{
try
{
service.Subscribe();
break;
}
catch (Exception e)
{
Console.WriteLine("Gotcha!");
Thread.Sleep(TimeSpan.FromSeconds(10));
}
}
Console.WriteLine("Success!");
public interface IService
{
void Subscribe();
}
public class Service : IService
{
private readonly Random _random = new Random();
public void Subscribe()
{
var a = _random.Next(0, 10) % 2421;
if(_random.Next(0, 10) % 2 != 0)
throw new AbandonedMutexException();
}
}
Based on that I think I now understand this concept as well as how to apply it.

This is an interesting idea if you have lots of threads hitting the same resource. The way this works is by pooling the count for attempts from all threads. Rather than worrying about writing a loop to try and hit the database 5 times before actually failing, you have the circuit breaker keep track of all attempts to hit the resource.
In one example, you have say 5 threads running a loop like this (pseudo-code):
int errorCount = 0;
while(errorCount < 10) // 10 tries
{
if(tryConnect() == false)
errorCount++;
else
break;
}
Assuming your error handling is correct and all, this loop could be run 5 times, and ping the resource a total of 50 times.
The circuit breaker tries to reduce the total number of times it attempts to reach the resource. Each thread, or request attempt, will increment a single error counter. Once the error limit is reached, the circuit breaker will not try to connect to it's resource for any more calls on any threads until the timeout has elapsed. It's still the same effect of polling the resource until it's ready, but you reduce the total load.
static volatile int errorCount = 0;
while(errorCount < 10)
{
if(tryConnect() == false)
errorCount++;
else
break;
}
With this interceptor implementation, the interceptor is being registered as a singleton. So, all instances of your resource class will have code redirected through the circuit breaker first for any call made to any method. The interceptor is just a proxy to your class. It basically overrides your methods and calls the interceptor method first before calling your method.
The Open/Closed bit might be confusing if you don't have any circuit theory knowledge.
wiki:
An electric circuit is an "open circuit" if it lacks a complete path
between the positive and negative terminals of its power source
In theory, this circuit is Open when the connection is down and Closed when the connection is available. The important part of your example is this:
public void Intercept(IInvocation invocation)
{
using (TimedLock.Lock(monitor))
{
state.ProtectedCodeIsAboutToBeCalled(); /* only throws an exception when state is Open, otherwise, it doesn't do anything. */
}
try
{
invocation.Proceed(); /* tells the interceptor to call the 'actual' method for the class that's being proxied.*/
}
catch (Exception e)
{
using (TimedLock.Lock(monitor))
{
failures++; /* increments the shared error count */
state.ActUponException(e); /* only implemented in the ClosedState class, so it changes the state to Open if the error count is at it's threshold. */
}
throw;
}
using (TimedLock.Lock(monitor))
{
state.ProtectedCodeHasBeenCalled(); /* only implemented in HalfOpen, if it succeeds the "switch" is thrown in the closed position */
}
}

I've created a library called CircuitBreaker.Net that encapsulates all serving logic to safely perform calls. It's easy to use, an example could look like:
// Initialize the circuit breaker
var circuitBreaker = new CircuitBreaker(
TaskScheduler.Default,
maxFailures: 3,
invocationTimeout: TimeSpan.FromMilliseconds(100),
circuitResetTimeout: TimeSpan.FromMilliseconds(10000));
try
{
// perform a potentially fragile call through the circuit breaker
circuitBreaker.Execute(externalService.Call);
// or its async version
// await circuitBreaker.ExecuteAsync(externalService.CallAsync);
}
catch (CircuitBreakerOpenException)
{
// the service is unavailable, failover here
}
catch (CircuitBreakerTimeoutException)
{
// handle timeouts
}
catch (Exception)
{
// handle other unexpected exceptions
}
It's available via a nuget package. You can find the sources on github.

Related

InvalidOperationException "NoGCRegion mode was already in progress" on call to GC.TryStartNoGCRegion

I'm doing performance testing comparing various algorithms and want to eliminate the scatter in the results due to garbage collection perhaps being done during the critical phase of the test. I'm turning off the garbage collection using GC.TryStartNoGCRegion(long) (see https://learn.microsoft.com/en-us/dotnet/api/system.gc.trystartnogcregion?view=net-6.0) before the critical test phase and reactivating it immediately afterwards.
My code looks like this:
long allocatedBefore;
int collectionsBefore;
long allocatedAfter;
int collectionsAfter;
bool noGCSucceeded;
try
{
// Just in case end "no GC region"
if (GCSettings.LatencyMode == GCLatencyMode.NoGCRegion)
{
GC.EndNoGCRegion();
}
// Exception is thrown sometimes in this line
noGCSucceeded = GC.TryStartNoGCRegion(solveAllocation);
allocatedBefore = GC.GetAllocatedBytesForCurrentThread();
collectionsBefore = getTotalGCCollectionCount();
stopwatch.Restart();
doMyTest();
stopwatch.Stop();
allocatedAfter = GC.GetAllocatedBytesForCurrentThread();
collectionsAfter = getTotalGCCollectionCount();
}
finally
{
// Reactivate garbage collection
if (GCSettings.LatencyMode == GCLatencyMode.NoGCRegion)
{
GC.EndNoGCRegion();
}
}
//...
private int getTotalGCCollectionCount()
{
int collections = 0;
for (int i = 0; i < GC.MaxGeneration; i++)
{
collections += GC.CollectionCount(i);
}
return collections;
}
The following exception is thrown from time to time (about once in 1500 tests):
System.InvalidOperationException
The NoGCRegion mode was already in progress
bei System.GC.StartNoGCRegionWorker(Int64 totalSize, Boolean hasLohSize, Int64 lohSize, Boolean disallowFullBlockingGC)
bei MyMethod.cs:Zeile 409.
bei MyCaller.cs:Zeile 155.
The test might start a second thread that creates some objects it needs in a pool.
As far as I can see, the finally should always turn the GC back on, and the (theoretically unnecessary) check at the beginning should also do it in any case, but nevertheless, there is an error that NoGCRegion was already active.
The question C# TryStartNoGCRegion 'The NoGCRegion mode was already in progress' exception when the GC is in LowLatency mode got the same error message, but there was clear code path to have activated NoGCRegion more than once there. I can't see how that could happen here.
The test itself is not accessing GC operations except for GC.SuppressFinalize in some Dispose() methods.
The test itself does not run in parallel with any other test; my Main() method loops over a set of input files and calls the test method for each one.
The test method uses an external c++ library which would be unmanaged memory in the .NET context.
What could be causing the exception, and why doesn't the call to GC.EndNoGCRegion(); prevent the problem?

Implementing ConcurrentDictionary

I'm trying to create my own Cache implementation for an API. It is the first time I work with ConcurrentDictionary and I do not know if I am using it correctly. In a test, something has thrown error and so far I have not been able to reproduce it again. Maybe some concurrency professional / ConcurrentDictionary can look at the code and find what may be wrong. Thank you!
private static readonly ConcurrentDictionary<string, ThrottleInfo> CacheList = new ConcurrentDictionary<string, ThrottleInfo>();
public override void OnActionExecuting(HttpActionContext actionExecutingContext)
{
if (CacheList.TryGetValue(userIdentifier, out var throttleInfo))
{
if (DateTime.Now >= throttleInfo.ExpiresOn)
{
if (CacheList.TryRemove(userIdentifier, out _))
{
//TODO:
}
}
else
{
if (throttleInfo.RequestCount >= defaultMaxRequest)
{
actionExecutingContext.Response = ResponseMessageExtension.TooManyRequestHttpResponseMessage();
}
else
{
throttleInfo.Increment();
}
}
}
else
{
if (CacheList.TryAdd(userIdentifier, new ThrottleInfo(Seconds)))
{
//TODO:
}
}
}
public class ThrottleInfo
{
private int _requestCount;
public int RequestCount => _requestCount;
public ThrottleInfo(int addSeconds)
{
Interlocked.Increment(ref _requestCount);
ExpiresOn = ExpiresOn.AddSeconds(addSeconds);
}
public void Increment()
{
// this is about as thread safe as you can get.
// From MSDN: Increments a specified variable and stores the result, as an atomic operation.
Interlocked.Increment(ref _requestCount);
// you can return the result of Increment if you want the new value,
//but DO NOT set the counter to the result :[i.e. counter = Interlocked.Increment(ref counter);] This will break the atomicity.
}
public DateTime ExpiresOn { get; } = DateTime.Now;
}
If I understand what you are trying to do if the ExpiresOn has passed remove the entry else update it or add if not exists.
You certainly can take advantage of the AddOrUpdateMethod to simplify some of your code.
Take a look here for some good examples: https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/how-to-add-and-remove-items
Hope this helps.
The ConcurrentDictionary is sufficient as a thread-safe container only in cases where (1) the whole state that needs protection is its internal state (the keys and values it contains), and only if (2) this state can be mutated atomically using the specialized API it offers (GetOrAdd, AddOrUpdate). In your case the second requirement is not met, because you need to remove keys conditionally depending on the state of their value, and this scenario is not supported by the ConcurrentDictionary class.
So your current cache implementation is not thread safe. The fact that throws exceptions sporadically is a coincidence. It would still be non-thread-safe if it was totally throw-proof, because it would not be totally error-proof, meaning that it could occasionally (or permanently) transition to a state incompatible with its specifications (returning expired values for example).
Regarding the ThrottleInfo class, it suffers from a visibility bug that could remain unobserved if you tested the class extensively in one machine, and then suddenly emerge when you deployed your app in another machine with a different CPU architecture. The non-volatile private int _requestCount field is exposed through the public property RequestCount, so there is no guarantee (based on the C# specification) that all threads will see its most recent value. You can read this article by Igor Ostrovsky about the peculiarities of the memory models, which may convince you (like me) that employing lock-free techniques (using the Interlocked class in this case) with multithreaded code is more trouble than it's worth. If you read it and like it, there is also a part 2 of this article.

Monitor.TryEnter for multiple resources

I tried searching for this but did not find the suggestion best suited for the issue that I am facing.
My issue is that we have list/stack of available resources (Calculation Engines). These resources are used to perform certain calculation.
The request to perform the calculation is triggered from an external process. So when the request for calculation is made, I need to check if any of the available resources are currently not performing other calculations, If so wait for some time and check again.
I was wondering what the best way to implement this is. I have the following code in place, but not sure if it is very safe.
If you have any further suggestions, that will be great:
void Process(int retries = 0) {
CalcEngineConnection connection = null;
bool securedConnection = false;
foreach (var calcEngineConnection in _connections) {
securedConnection = Monitor.TryEnter(calcEngineConnection);
if (securedConnection) {
connection = calcEngineConnection;
break;
}
}
if (securedConnection) {
//Dequeue the next request
var calcEnginePool = _pendingPool.Dequeue();
//Perform the operation and exit.
connection.RunCalc(calcEnginePool);
Monitor.Exit(connection);
}
else {
if (retries < 10)
retries += 1;
Thread.Sleep(200);
Process(retries);
}
}
I'm not sure that using Monitor is the best approach here anyway, but if you do decide to go that route, I'd refactor the above code to:
bool TryProcessWithRetries(int retries) {
for (int attempt = 0; attempt < retries; attempt++) {
if (TryProcess()) {
return true;
}
Thread.Sleep(200);
}
// Throw an exception here instead?
return false;
}
bool TryProcess() {
foreach (var connection in _connections) {
if (TryProcess(connection)) {
return true;
}
}
return false;
}
bool TryProcess(CalcEngineConnection connection) {
if (!Monitor.TryEnter(connection)) {
return false;
}
try {
var calcEnginePool = _pendingPool.Dequeue();
connection.RunCalc(calcEnginePool);
} finally {
Monitor.Exit(connection);
}
return true;
}
This decomposes the three pieces of logic:
Retrying several times
Trying each connection in a collection
Trying a single connection
It also avoids using recursion for the sake of it, and puts the Monitor.Exit call into a finally block, which it absolutely should be in.
You could replace the middle method implementation with:
return _connections.Any(TryProcess);
... but that may be a little too "clever" for its own good.
Personally I'd be tempted to move TryProcess into CalcEngineConnection itself - that way this code doesn't need to know about whether or not the connection is able to process something - it's up to the object itself. It means you can avoid having publicly visible locks, and also it would be flexible if some resources could (say) process two requests at a time in the future.
There are multiple issues that could potentially occur, but let's simplify your code first:
void Process(int retries = 0)
{
foreach (var connection in _connections)
{
if(Monitor.TryEnter(connection))
{
try
{
//Dequeue the next request
var calcEnginePool = _pendingPool.Dequeue();
//Perform the operation and exit.
connection.RunCalc(calcEnginePool);
}
finally
{
// Release the lock
Monitor.Exit(connection);
}
return;
}
}
if (retries < 10)
{
Thread.Sleep(200);
Process(retries+1);
}
}
This will correctly protect your connection, but note that one of the assumptions here is that your _connections list is safe and it will not be modified by another thread.
Furthermore, you might want to use a thread safe queue for the _connections because at certain load levels you might end up using only the first few connections (not sure if that will make a difference). In order to use all of your connections relatively evenly, I would place them in a queue and dequeue them. This will also guarantee that no two threads are using the same connection and you don't have to use the Monitor.TryEnter().

API Design for Timeouts: TimeoutException or boolean return with out parameter?

The scenario is RPC over message queues - since the underlying mechanism is asynchronous, clients should specify how long they want to wait for a response before timing out. As the client, which of these two code snippets would you rather use?
Most importantly: as a user of the GetResponseTo() method, why would you prefer one over the other? How does your choice make your code more extensible, more readable, more testable, etc?
try
{
IEvent response = _eventMgr.GetResponseTo(myRequest, myTimeSpan);
// I have my response!
}
catch(TimeoutException te)
{
// I didn't get a response to 'myRequest' within 'myTimeSpan'
}
OR
IEvent myResponse = null;
if (_eventMgr.GetResponseTo(myRequest, myTimeSpan, out myResponse)
{
// I got a response!
}
else
{
// I didn't get a response... :(
}
For your information, here's the current implementation of GetResponseTo():
public IEvent GetResponseTo(IEvent request, TimeSpan timeout)
{
if (null == request) { throw new ArgumentNullException("request"); }
// create an interceptor for the request
IEventInterceptor interceptor = new EventInterceptor(request, timeout);
// tell the dispatcher to watch for a response to this request
_eventDispatcher.AddInterceptor(interceptor);
// send the request
_queueManager.SendRequest(request);
// block this thread while we wait for a response. If the timeout elapses,
// this will throw a TimeoutException
interceptor.WaitForResponse();
// return the intercepted response
return interceptor.Response;
}
Neither first nor second, I would like to use the Task Parallel Library, which is the recommended way of doing all things asynchronous beginning with .NET 4.5:
Task<IEvent> task = _eventMgr.GetResponseToAsync(myRequest);
if (task.Wait(myTimeSpan))
{
// I got a response!
}
else
{
// I didn't get a response... :(
}
You could look to use AutoResetEvent class this will handle the plumbing for second one.
Try to avoid your first code snippet as exceptions are expensive
Personally i would prefer the exception Version. If i specify some timeout my opinion is that this IS a exception then if i couldn't get a result within the specified timespan. I don't think event based notification is the best decision here. The following Logic depends on the result so it doesn't make Sense for me.
But if you want to provide asynchronous Methods too, the Task thing is a good idea like stated by dtb
Exceptions are heavy and messy, each API method call should be wrapped by try/catch/finally to hanle custom exception. This approach is not developer-friendly so I do not like it.
Considering that GetResponse() call itself is synchronous for API consumer - it is pretty normal to return a value of operation, but I would suggest introducing something more abstract and informative rather than simple bool state, so you can return any state provided by the underlying messaging system, this could be a custom error code, message, or even object. So since this is API - put interface as well:
enum OperationStatus
{
Unknown,
Timeout,
Ok
}
// pretty simple, only message and status code
interface IOperationResult<T>
{
OperationStatus Status { get; }
string Message { get; }
T Item { get; }
}
class GetResponseResult : IOperationResult<IEvent>
{
...
}
class EventManager
{
public IOperationResult<IEvent> GetResponseTo(
IRequest request,
TimeSpan timeInterval)
{
GetResponseResult result;
// wait for async request
// ...
if (timeout)
{
result = new GetResponseResult
{
Status = OperationStatus.Timeout,
Message = underlyingMessagingLib.ErrorMessage
};
}
else
{
result = new GetResponseResult
{
Status = OperationStatus.Ok,
Item = response
};
}
return result;
}
}
I have elected to use the out parameter.
I wanted to mark someone else as the answer, but I am not able to do so. I attempted to implement the TPL-based approach, but was unable to do so, based on the question/answer that I linked in my comments.
I do not want to muddy my event model by introducing even more concepts, as #sll suggests.
And even though #dasheddot prefers the exception Version, #sll has a good point that someone trying to send a bunch of requests and get a bunch of responses in a loop might have to deal with a lot of exceptions.
// potentially 10 exceptions? meh... let's not go down this road.
for(int i=0;i<10;i++)
{
try
{
IEvent response = _eventMgr.GetResponseTo(myRequest, myTimeSpan);
// I have my response!
}
catch(TimeoutException te)
{
// I didn't get a response to 'myRequest' within 'myTimeSpan'
}
}

Is this a good/preferable pattern to Azure Queue construction for a T4 template?

I'm building a T4 template that will help people construct Azure queues in a consistent and simple manner. I'd like to make this self-documenting, and somewhat consistent.
First I made the queue name at the top of the file, the queue names have to be in lowercase so I added ToLower()
The public constructor uses the built-in StorageClient API's to access the connection strings. I've seen many different approaches to this, and would like to get something that works in almost all situations. (ideas? do share)
I dislike the unneeded HTTP requests to check if the queues have been created so I made is a static bool . I didn't implement a Lock(monitorObject) since I don't think one is needed.
Instead of using a string and parsing it with commas (like most MSDN documentation) I'm serializing the object when passing it into the queue.
For further optimization I'm using a JSON serializer extension method to get the most out of the 8k limit. Not sure if an encoding will help optimize this any more
Added retry logic to handle certain scenarios that occur with the queue (see html link)
Q: Is "DataContext" appropriate name for this class?
Q: Is it a poor practice to name the Queue Action Name in the manner I have done?
What additional changes do you think I should make?
public class AgentQueueDataContext
{
// Queue names must always be in lowercase
// Is named like a const, but isn't one because .ToLower won't compile...
static string AGENT_QUEUE_ACTION_NAME = "AgentQueueActions".ToLower();
static bool QueuesWereCreated { get; set; }
DataModel.SecretDataSource secDataSource = null;
CloudStorageAccount cloudStorageAccount = null;
CloudQueueClient cloudQueueClient = null;
CloudQueue queueAgentQueueActions = null;
static AgentQueueDataContext()
{
QueuesWereCreated = false;
}
public AgentQueueDataContext() : this(false)
{
}
public AgentQueueDataContext(bool CreateQueues)
{
// This pattern of setting up queues is from:
// ttp://convective.wordpress.com/2009/11/15/queues-azure-storage-client-v1-0/
//
this.cloudStorageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
this.cloudQueueClient = cloudStorageAccount.CreateCloudQueueClient();
this.secDataSource = new DataModel.SecretDataSource();
queueAgentQueueActions = cloudQueueClient.GetQueueReference(AGENT_QUEUE_ACTION_NAME);
if (QueuesWereCreated == false || CreateQueues)
{
queueAgentQueueActions.CreateIfNotExist();
QueuesWereCreated = true;
}
}
// This is the method that will be spawned using ThreadStart
public void CheckQueue()
{
while (true)
{
try
{
CloudQueueMessage msg = queueAgentQueueActions.GetMessage();
bool DoRetryDelayLogic = false;
if (msg != null)
{
// Deserialize using JSON (allows more data to be stored)
AgentQueueEntry actionableMessage = msg.AsString.FromJSONString<AgentQueueEntry>();
switch (actionableMessage.ActionType)
{
case AgentQueueActionEnum.EnrollNew:
{
// Add to
break;
}
case AgentQueueActionEnum.LinkToSite:
{
// Link within Agent itself
// Link within Site
break;
}
case AgentQueueActionEnum.DisableKey:
{
// Disable key in site
// Disable key in AgentTable (update modification time)
break;
}
default:
{
break;
}
}
//
// Only delete the message if the requested agent has been missing for
// at least 10 minutes
//
if (DoRetryDelayLogic)
{
if (msg.InsertionTime != null)
if (msg.InsertionTime < DateTime.UtcNow + new TimeSpan(0, 10, 10))
continue;
// ToDo: Log error: AgentID xxx has not been found in table for xxx minutes.
// It is likely the result of a the registratoin host crashing.
// Data is still consistent. Deleting queued message.
}
//
// If execution made it to this point, then we are either fully processed, or
// there is sufficent reason to discard the message.
//
try
{
queueAgentQueueActions.DeleteMessage(msg);
}
catch (StorageClientException ex)
{
// As of July 2010, this is the best way to detect this class of exception
// Description: ttp://blog.smarx.com/posts/deleting-windows-azure-queue-messages-handling-exceptions
if (ex.ExtendedErrorInformation.ErrorCode == "MessageNotFound")
{
// pop receipt must be invalid
// ignore or log (so we can tune the visibility timeout)
}
else
{
// not the error we were expecting
throw;
}
}
}
else
{
// allow control to fall to the bottom, where the sleep timer is...
}
}
catch (Exception e)
{
// Justification: Thread must not fail.
//Todo: Log this exception
// allow control to fall to the bottom, where the sleep timer is...
// Rationale: not doing so may cause queue thrashing on a specific corrupt entry
}
// todo: Thread.Sleep() is bad
// Replace with something better...
Thread.Sleep(9000);
}
Q: Is "DataContext" appropriate name for this class?
In .NET we have a lot of DataContext classes, so in the sense that you want names to appropriately communicate what the class does, I think XyzQueueDataContext properly communicates what the class does - although you can't query from it.
If you want to stay more aligned to accepted pattern languages, Patterns of Enterprise Application Architecture calls any class that encapsulates access to an external system for a Gateway, while more specifically you may want to use the term Channel in the language of Enterprise Integration Patterns - that's what I would do.
Q: Is it a poor practice to name the Queue Action Name in the manner I have done?
Well, it certainly tightly couples the queue name to the class. This means that if you later decide that you want to decouple those, you can't.
As a general comment I think this class might benefit from trying to do less. Using the queue is not the same thing as managing it, so instead of having all of that queue management code there, I'd suggest injecting a CloudQueue into the instance. Here's how I implement my AzureChannel constructor:
private readonly CloudQueue queue;
public AzureChannel(CloudQueue queue)
{
if (queue == null)
{
throw new ArgumentNullException("queue");
}
this.queue = queue;
}
This better fits the Single Responsibility Principle and you can now implement queue management in its own (reusable) class.

Categories