nservicebus + webhooks +Errors +MaxRetries - c#

Feature Description
The NServiceBus gateway, http://docs.particular.net/nservicebus/gateway/, seems to be a way to achieve an internal webhook using the NServiceBus infrastructure.
We need to go further with this concept to open up a few event to any 3rd party subscriber that has access to register a webhook url in our system.
Review
We plan to create two initial window services
1) WebHookBatchService, that can be added as a subscriber to specific messages of interest.
<UnicastBusConfig>
<MessageEndpointMappings>
.......
<add Messages="MyMessages.MyImportantMessage, MyMessages" Endpoint="WebHookBatchService.Queue"/>
.......
</MessageEndpointMappings>
</UnicastBusConfig>
2) WebHookProcessService - actually processes 1 message sent by the WebHookBatchService.
Once messages are received on the WebHookBatchService.Queue our WebHookBatchService will look up all the subscribers for the specific tenant + message type and foreach send individual messages to WebHookProcessService.Queue for the WebHookProcessService (which we can make an instance of nservicebus loadbalancer to bridge the batch and actual processor) to actually process the real messages probably using http://restsharp.org/.
Questions
Are there any existing open source projects that do this today?
Now since we have no control of the durability of the subscribers how should we manage errors?
http://wiki.shopify.com/WebHook
A webhook will be deleted if there are 19 consecutive failures for the exact same webhook.
It doesn't mention any delays in the webhook.. What have people experienced with standard delay in retry logic?
Here are some other thoughts:
proposal 0: MaxRetries="1". Purge WebHookProcessService.ErrorQueue nightly. (no retry - guaranteed message loss if it fails the first time)
proposal 1:
MaxRetries="1" on exception catch send email containing xml version of the message that would have been delivered over http.
Purge WebHookProcessService.ErrorQueue nightly.
-- I see potential a spam issues.
proposal 2: The nservicebus MaxRetries retries right away without delay. So i would need to create (1hr - 24hr) bucket queues and use a RetrySchedulerService although I see this as difficult to maintain and confusing for subscribers when they all at once get 25 messages in a non DateCreated ordered fashion when there service endpoint begins to work.
Digging for ideas...

The Gateway is typically used for communication between physical sites over HTTP. Since you are exposing an endpoint to the world to accept callbacks, I'm thinking you could just use the built-in WCF hosting and expose your endpoint through the firewall to 3rd parties. The rest of your setup sounds appropriate to me.
As for errors, you are correct, NSB retries immediately, but if you using web call backs this may get you by in the cases there are small hiccups. You will need to determine how you want to process the error queues, we just build in a new endpoint to process the error queues with logic to determine the retries, delay etc. A nice way to accomplish this is to use a Saga, which includes a Timeout manager. This enables a workflow where you can retry a specified number of times, try another communication, log everything, and ultimately notify someone who can contact the 3rd party to let them know there stuff is busted.

Related

Real time data storage with Azure Tables using ASP.NET

I am using a Lab View application to simulate a test running, which would post a JSON string to my ASP.NET application. Within the ASP.NET application I format the data with the proper partition and row keys, then send it to Azure Table Storage.
The problem that I am having is that after what seems like a random amount of time (i.e. 5 minutes, 2 hours, 5 hours), the data fails to be saved into Azure. I am try to catch any exceptions within the ASP.NET application and send the error message back to the Lab View app and the Lab View app is also catching any exceptions in may encounter so I can trouble shoot where the issue is occurring.
The only error that I am able to catch is a Timeout Error 56 in the Lab View program. My question is, does anyone have an idea of where I should be looking for the root cause of this? I do not know where to begin.
EDIT:
I am using a table storage writer that I found here to do batch operations with retries.
The constructor for exponential retry policy is below:
public ExponentialRetry(TimeSpan deltaBackoff, int maxAttempts)
when you (or the library you use to be exact) instantiate this as RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(2),100) you are basically setting the max attempts as 100 which means you may end up waiting up to around 2^100 milliseconds (there is some more math behind this but just simplifying) for each of your individual batch requests to fail on the client side until the sdk gives up retrying.
The other issue with that code is it executes batch requests sequentially and synchronously, that has multiple bad effects, first, all subsequent batch requests are blocked by the current batch request, second your cores are blocked waiting on I/O operations, third it has no exception handling so if one of the batch operations throw an exception, the method bails out and would not continue any further processing other batch requests.
My recommendation, do not use that library, batch operations are fairly straight forward. The default retry policy if you do not explicitly define is the exponential retry policy anyways with sensible default parameters (does 3 retries) so you do not even need to define your own retry object. For best scalability and throughput run your batch operations async (and concurrently).
As to why things fail, when you write your own api, catch the StorageException and check the http status code on the exception itself. You could be getting throttled by azure as one of the possibilities but it is hard to say without further debugging or you providing the http status code for the failed batch operations to us.
You need to check whether an exception is transient or not. As Peter said on his comment, Azure Storage client already implements a retry policy. You can also wrap your code with another retry code (e.g using polly) or you should change the default policy associated to Azure Storage Client.

Detect and handle unresponsive RabbitMQ in .NET application

We recently had an outage where one of our APIs became unresponsive due to our rabbit cluster being given artificially high load. We where running out of threads in mono (.NET) and requests to the API failed. Although this is unlikely to happen again we would like to put some protection in against this. Ideally we would have calls to bus.Publish() timeout after a set amount of time but we can't workout how.
We then came across the blocked connections notification feature of RabbitMQ and thought this might help. However we can't figure out how to get at the connection object that is in the IServiceBus. So far we have tried
_serviceBus = serviceBus;
var connection =
((MassTransit.Transports.RabbitMq.RabbitMqEndpointAddress) _serviceBus.Endpoint.Address)
.ConnectionFactory.CreateConnection();
connection.ConnectionBlocked += Connection_ConnectionBlocked;
connection.ConnectionUnblocked += Connection_ConnectionUnblocked;
But when we do this we get a BrokerUnreachableException which I don't understand.
My questions are, is this the right approach to detect timeouts and fail (we have a backup mechanism to collect the data in the message and repost later) and if this is correct, how do we make it work?
I think you can manage this by combining System.Timer or Observable.Timer to schedule checks, and the check, which use request-response. Consumer for the request should be inside the same process. You can specify a cancellation token with reasonable timeout for the Request call and it you get a timeout - your messaging infrastructure is down or too busy, or your endpoint is too busy.

Conditional deletion of messages from RabbitMQ

I have a couple of queues where certain information is queued. Let us say I have "success" and "failed" queues in which Server side component has continuously written some data to these queues for clients.
Clients read this data and display it on a UI for end users. Now, I have a situation to purge any message in these queues older than 30 days. Clients would then only be able to see only 30 days of information at any point of time.
I have searched a lot and could see some command line options to purge whole queue but could not find a relevant suggestion.
Any help in the right direction is appreciated. Thanks
I don't think this is possible; looks like you're trying to use RabbitMq as data storage instead of message server.
The only way to understand if a message is "older" than 30, is to process the message, and by doing this you are removing the messagge from the queue.
Best thing to do here is to process the messages and store them in a long term storage; then you can implement a deletion policy to eliminate the older elements.
If you really want to go down this path, RabbitMQ implements TTL at queue level or message level; take a look at this: https://www.rabbitmq.com/ttl.html
[As discussed in comments]
To keep the message in the queue you can try to use a NACK instead of ACK as confirmation; this way RabbitMQ will consider the message undelivered and it will try to deliver it again and again. Remember to create a durable queue (https://www.rabbitmq.com/confirms.html).
You can also check this answer: Rabbitmq Ack or Nack, leaving messages on the queue

WMQ: Distributing MQ readers over several machines

I am using WMQ to access an IBM WebSphere MQ on a mainframe - using c#.
We are considering spreading out our service on several machines, and we then need to make sure that two services on two different machines cannot read/get the same MQ message at the same time.
My code for getting messages is this:
var connectionProperties = new Hashtable();
const string transport = MQC.TRANSPORT_MQSERIES_CLIENT;
connectionProperties.Add(MQC.TRANSPORT_PROPERTY, transport);
connectionProperties.Add(MQC.HOST_NAME_PROPERTY, mqServerIP);
connectionProperties.Add(MQC.PORT_PROPERTY, mqServerPort);
connectionProperties.Add(MQC.CHANNEL_PROPERTY, mqChannelName);
_mqManager = new MQQueueManager(mqManagerName, connectionProperties);
var queue = _mqManager.AccessQueue(_queueName, MQC.MQOO_INPUT_SHARED + MQC.MQOO_FAIL_IF_QUIESCING);
var queueMessage = new MQMessage {Format = MQC.MQFMT_STRING};
var queueGetMessageOptions = new MQGetMessageOptions {Options = MQC.MQGMO_WAIT, WaitInterval = 2000};
queue.Get(queueMessage, queueGetMessageOptions);
queue.Close();
_mqManager.Commit();
return queueMessage.ReadString(queueMessage.MessageLength);
Is WebSphere MQ transactional by default, or is there something I need to change in my configuration to enable this?
Or - do I need to ask our mainframe guys to do some of their magic?
Thx
Unless you actively BROWSE the message (ie read it but leave it there with no locks), only one getter will ever be able to 'get' the message. Even without transactionality, MQ will still only deliver the message once... but once delivered its gone
MQ is not transactional 'by default' - you need to get with GMO_SYNCPOINT (MQ transactions) and commit at the connection (MQQueueManager level) if you want transactionality (or integrate with .net transactions is another option)
If you use syncpoint then one getter will get the message, the other will ignore it, but if you subsequently have an issue and rollback, then it is made available to any getter (as you would want). It is this scenario where you might see a message twice, but thats because you aborted the transaction and hence asked for it to be put back to how it was before the get.
I wish I'd found this sooner because the accepted answer is incomplete. MQ provides once and only once delivery of messages as described in the other answer and IBM's documentation. If you have many clients listening on the same queue, MQ will deliver only one copy of the message. This is uncontested.
That said, MQ, or any other async messaging for that matter, must deal with session handling and ambiguous outcomes. The affect of these factors is such that any async messaging application should be designed to gracefully handle dupe messages.
Consider an application putting a message onto a queue. If the PUT call receives a 2009 Connection Broken response, it is unclear whether the connection failed before or after the channel agent received and acted on the API call. The application, having no way to tell the difference, must put the message again to assure it is received. Doing the PUT under syncpoint can result in a 2009 on the COMMIT (or equivalent return code in messaging transports other than MQ) and the app doesn't know if the COMMIT was successful or if the PUT will eventually be rolled back. To be safe it must PUT the message again.
Now consider the partner application receiving the messages. A GET issued outside of syncpoint that reaches the channel agent will permanently remove the message from the queue, even if the channel agent cannot then deliver it. So use of transacted sessions ensures that the message is not lost. But suppose that the message has been received and processed and the COMMIT returns a 2009 Connection Broken. The app has no way to know whether the message was removed during the COMMIT or will be rolled back and delivered again. At the very least the app can avoid losing messages by using transacted sessions to retrieve them, but can not guarantee to never receive a dupe.
This is of course endemic to all async messaging, not just MQ, which is why the JMS specification directly address it. The situation is addressed in all versions but in the JMS 1.1 spec look in section 4.4.13 Duplicate Production of Messages which states:
If a failure occurs between the time a client commits its work on a
Session and the commit method returns, the client cannot determine if
the transaction was committed or rolled back. The same ambiguity
exists when a failure occurs between the non-transactional send of a
PERSISTENT message and the return from the sending method.
It is up to a JMS application to deal with this ambiguity. In some
cases, this may cause a client to produce functionally duplicate
messages.
A message that is redelivered due to session recovery is not
considered a duplicate message.
If it is critical that the application receive one and only one copy of the message, use 2-Phase transactions. The transaction manager and XA protocol will provide very strong (but still not absolute) assurance that only one copy of the message will be processed by the application.
The behavior of the messaging transport in delivering one and only one copy of a given message is a measure of the reliability of the transport. By contrast, the behavior of an application which relies on receipt of one and only one copy of the message is a measure of the reliability of the application.
Any duplicate messages received from an IBM MQ transport are almost certainly going to be due to the application's failure to use XA to account for the ambiguous outcomes inherent in async messaging and not a defect in MQ. Please keep this in mind when the Production version of the application chokes on its first duplicate message.
On a related note, if Disaster Recovery is involved, the app must also gracefully recover from lost messages, or else find a way to violate the laws of relativity.

How do wildcards work in topic messaging with wso2/rabbitMq/c#?

If I publish a message to a wso2 topic like so:
channel.BasicPublish(someExchangeName,"farm.cow.brown",null,someMessage);
I can retrieve the message if I am listening to the routing key "farm.cow.brown":
channel.QueueBind(someQueueName,someExchangeName,"farm.cow.brown");
I think I should also be able to get the message if I am listening to a variation such as this:
channel.QueueBind(someQueueName,someExchangeName,"farm.cow.*");
Of the two listening examples above the first works, the second never does, regardless of the routing key combinations attempted (farm.cow.* , farm.*.brown , farm.cow.# , farm.# , etc.).
I am connecting to wso2 using rabbitMq and c#.
Thank you.
This is working for me now. It appears that to use a wildcard to listen to multiple topics/routing paths, there need to be existing queues for each topic.
Here is what I mean: consider the topics "farm.cow.brown" and "farm.cow.white" and a listener consuming route "farm.cow.*".
If there is an existing queue on "farm.cow.brown" but not on "farm.cow.white", I will only get messages published to "farm.cow.brown", even though "farm.cow.white" exists and is getting messages published to it.
If there is a queue on "farm.cow.brown" and another on "farm.cow.white", "farm.cow.*" will get all messages published to "farm.cow.brown" and published to "farm.cow.white".
If neither have queues, "farm.cow.*" retries no messages published to "farm.cow.brown" and "farm.cow.white".
(As an aside, the "farm.cow.*" examples above are work equivalently using "farm.#")
To restate, using wildcards only retrieves messages for topics that have existing queues or subscriptions.
This is my experience. I have been testing this for a few days and it appears to be the consistent behavior.

Categories