Azure service bus paired namespace - simulate failover

Azure service bus paired namespace - simulate failover - c#

I am working with azure service bus paired namespace and need to be able to simulate a failover to the secondary namespace. I did kind of have this working by entering an incorrect connection string for the primary namespace and saw it fail over and send the message to the secondary namespace. This no longer seems to do the trick. I can not find a way through the azure management portal or anywhere else to take a namespace offline. Anyone any ideas how to do this?
Here is my code for reference
var pairedNamespaceConfiguration = this.pairedNamespaceConfigurationDictionary[configurationKey];
MessagingFactory factory = MessagingFactory.CreateFromConnectionString(pairedNamespaceConfiguration.PrimaryNamespace.ConnectionString);
MessagingFactory secondaryMessagingFactory = MessagingFactory.CreateFromConnectionString(pairedNamespaceConfiguration.SecondaryNamespace.ConnectionString);
NamespaceManager secondaryNamespaceManager = NamespaceManager.CreateFromConnectionString(pairedNamespaceConfiguration.SecondaryNamespace.ConnectionString);
SendAvailabilityPairedNamespaceOptions sendAvailabilityOptions = new SendAvailabilityPairedNamespaceOptions(secondaryNamespaceManager, secondaryMessagingFactory, pairedNamespaceConfiguration.BacklogQueueCount, TimeSpan.FromSeconds(pairedNamespaceConfiguration.FailoverIntervalSeconds), false);
factory.PairNamespaceAsync(sendAvailabilityOptions).Wait();
MessageSender messageSender = factory.CreateMessageSender(pairedNamespaceConfiguration.PathName);
string messageContent = JsonConvert.SerializeObject(message);
using(BrokeredMessage brokeredMessage = new BrokeredMessage(messageContent))
{
messageSender.Send(brokeredMessage);
}

Modify your \Windows\system32\drivers\etc\hosts file to point the original namespace to something like 127.0.0.1. This will make the original namespace connection fail.

I'm using this example Geo-replication with Service Bus Relayed Messages to implement the same think. Maybe it's useful for you also.
All Service Bus entities reside in a namespace. A namespace is affiliated to a datacenter. To allow for a failover between datacenters, the user must create one Service Bus and ACS namespace (in case ACS is used) per datacenter. Any Service Bus relay that needs to remain accessible in the presence of datacenter failures must be created in both namespaces.
The server opens two NetTcp relay endpoints, one in each of the two namespaces. The server processes any request that is received via one of these endpoints. Note that the two relays have to have different names (.e.g, address of primary relay is sb://myPrimaryNamespace.servicebus.windows.net/myService-primary and b://mySecondaryNamespace.servicebus.windows.net/myService-secondary).
The client considers one of the two replicated relays as the active relay and the other one as a backup. It opens a channel to the active relay and invokes methods on the service. If the invocation fails with any exception that is not part of the service contract, the client abandons the channel, opens a channel to the backup relay, and invokes the service method again. The client will consider the new channel to be the active channel and continues to use that channel until the next fault occurs.

Related

RabbitMQ C# Client recovering from node failure on cluster with quorum queues

I've set-up a three node RabbitMQ cluster with a quorum durable queue.
I'm trying to find out how to implement a robust way to keep things running both on the producer and the consumer side (two .NET Core processes) in case of node failure.
I'm using the following options on the ConnectionFactory class:
var factory = new ConnectionFactory
{
HostName = hostname,
AutomaticRecoveryEnabled = true,
TopologyRecoveryEnabled = true,
VirtualHost = vhost
};
however, after starting the producer & consumer test processes (which attempt to flood the queue), whenever I stop the master node on the cluster, the clients never recover from this situation, and an OperationInterruptedException is thrown on each call to BasicPublish (on the producer) or BasicAck (on the consumer).
The clients connect to the cluster using a random ip choosen from the three nodes (as given from round-robin dns resolution).
I've read somewhere that for durable classic non-mirrored queues this is the expected behavior, but what about quorum queues? Shouldn't they be a more efficient version of mirrored queues (although with some limitations)?
Is there a way to recover from a single node failure without implementing all the reconnection logic in my clients?

From what I can see in ConnectionFactory class, you can specify list of HostNames while creating a connection (not on the step of declaring factory). Have you tries this one?
// Summary:
// Create a connection using a list of hostnames using the configured port. By default
// each hostname is tried in a random order until a successful connection is found
// or the list is exhausted using the DefaultEndpointResolver. The selection behaviour
// can be overriden by configuring the EndpointResolverFactory.
//
// Parameters:
// hostnames:
// List of hostnames to use for the initial connection and recovery.
//
// Returns:
// Open connection
//
// Exceptions:
// T:RabbitMQ.Client.Exceptions.BrokerUnreachableException:
// When no hostname was reachable.
public IConnection CreateConnection(IList<string> hostnames);

The clients connect to the cluster using a random ip choosen from the
three nodes (as given from round-robin dns resolution).
The problem is likely to be caused by DNS cache. E.g. producer/consumer DNS-resolved the hostname to a single ip address and then cached it (there is a DNS cache in .NET), causing the RabbitMQ client to always connect to the same RabbitMQ node (which could be shut down).
There are at least 3 approaches for making the failover process more robust:
pass multiple hostnames or ip addresses of the RabbitMQ cluster nodes to the RabbitMQ client
put RabbitMQ nodes behind a level-4 load balancer and configure the RabbitMQ client to connect to the load balancer instead of connecting directly to the cluster nodes
disable DNS cache

how to detect connection issues to rabbit mq using mass transit?

how to detect if the message broker configuration is valid or if the connection to the message broker is lost using Mass Transit to RabbitMQ? When publishing messages when RabbitMQ is present does not seems to complain if there is no broker connection right away and seems to recover when the RabbitMQ server comes up. Is there a way to listen in on the connection events and warn if the configuration is not valid?

If you use .NET Core and configure MassTransit as per the docs, you can resolve the instance of IBusHealth and use it in your service.
The AddMassTransit method registers the default instance, which you can ask for the bus health status at any time. That's the method code:
public HealthResult CheckHealth()
{
var endpointHealthResult = _endpointHealth.CheckHealth();
var data = new Dictionary<string, object> {["Endpoints"] = endpointHealthResult.Data};
return _healthy && endpointHealthResult.Status == BusHealthStatus.Healthy
? HealthResult.Healthy("Ready", data)
: HealthResult.Unhealthy($"Not ready: {_failureMessage}", data: data);
}
As you can see, if you call busHealth.CheckHealth() it will return either Healthy or Unhealthy and in the latter case would also give you the list of failing endpoints.
Since BusHealth only monitors the bus itself and all its receive endpoints, you might not get notified when your service failed to publish messages.
You can use the diagnostics listener or create your own publish or send observer, which is called before and after publish/send and on any failure.

Simulate 10,000 Azure IoT Hub Device connections from Azure Service Fabric cluster

We are developing a .Net Core service that shall be hosted in Azure Service Fabric. This SF Service needs to interact with 10,000 devices registered in Azure IoT Hub via it's AMQP 1.0 SSL TLS endpoints. Each IoT Hub devices has it's own security tokens and connection string provided by the IoT Hub service.
For our scenario we need to listen to all cloud-to-devices messages coming from the 10,000 IoT Hub device instances and "route" these to a central Service Bus topic to which the actual "gateways" in the field listen to. So basically we want to forward messages from 10,000 Service Bus Queues into one central Queue.
What is the best approach to handle these 10,000 AMQP listners from a SF Service? Is there a way we can reuse AMQP connections, sessions or links so we cache/share resources? And how can we dynamically spread the load of connection maintenance over the 5 nodes in the SF cluster?
We are evaluating these Nuget packages for the implementation:
Microsoft.Azure.ServiceBus
AMQPNetLite
Microsoft.Azure.Devices.Client
We are doing some tests using the Microsoft.Azure.Devices.Client lib, see a simplified code sample below:
using System;
using System.Fabric;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Devices.Client;
using Microsoft.ServiceFabric.Services.Runtime;
namespace ID.Monitoring.MonServer.ServiceFabric.ServiceBus
{
/// <summary>
/// An instance of this class is created for each service instance by the Service Fabric runtime.
/// </summary>
internal sealed class ServiceBus : StatelessService
{
private readonly DeviceClient _deviceClient;
private ConnectionStatus _status;
public ServiceBus(StatelessServiceContext context)
: base(context)
{
_deviceClient = DeviceClient.CreateFromConnectionString("HostName=id-monitoring-dev.azure-devices.net;DeviceId=100;SharedAccessSignature=SharedAccessSignature sr=id-monitoring-dev.azure-devices.net%2Fdevices%2F100&sig={token}&se=1553265888", TransportType.Amqp_Tcp_Only);
}
/// <summary>
/// This is the main entry point for your service instance.
/// </summary>
/// <param name="cancellationToken">Canceled when Service Fabric needs to shut down this service instance.</param>
protected override async Task RunAsync(CancellationToken cancellationToken)
{
_deviceClient.SetConnectionStatusChangesHandler(ConnectionStatusChangeHandler);
while (true)
{
if (_status != ConnectionStatus.Connected)
{
await _deviceClient.OpenAsync();
}
var receivedMessage = await _deviceClient.ReceiveAsync(TimeSpan.FromSeconds(10)).ConfigureAwait(false);
if (receivedMessage != null)
{
var messageData = Encoding.ASCII.GetString(receivedMessage.GetBytes());
//TODO: handle incoming message and publish to common
await _deviceClient.CompleteAsync(receivedMessage).ConfigureAwait(false);
}
}
}
private void ConnectionStatusChangeHandler(ConnectionStatus status, ConnectionStatusChangeReason reason)
{
_status = status;
}
}
}
Question: Does this scale well to 10,000 Service Fabric service instances? Or are there more efficient ways to have this many AMQP Service Bus Listners maintained from a Service Fabric Service environment? Is there a way we can apply AMQP connection multiplexing maybe?

Take a look at this.
The second answer provides a sample that allows you to multiplex multiple devices onto one Amqp connection.

The approach you choose to monitor your devices won't scale well and will be hard to maintain.
Currently, service fabric has a limitation of how many instances you can place in a single node. For example: if you create an application with your ServiceBus service and span 10000 instances, you will hit this limitation, that is the number of nodes. i.e: if you have a 5 node cluster, you will be able to run only 5 instances of your service by using the default scaling approach.
To bypass this issue you have some options:
Partitioning:
To have a single stateless service running more
partitions than the node count, you have to partition your service.
Assuming you have a 5 node cluster and need 10000 instances, you will
need 2000 partitions running on each node. If you use shared process and have enough ram to this, this approach might help you, please take a look at this thread and this thread before following this approach
Multiple Named Services:
Named service is the running service definition for one service type, in this case you would create one per device. like:
ServiceBusType
ServiceBus-Device1
ServiceBus-Device2
ServiceBus-Device3
This approach will consume too much resources in your machine, as you will be running one instance for each device, but easy to manage, as you can span new instances for each new device without affecting other running services.
Parallel Processing per instance:
Where each instance, would be responsible for processing multiple messages concurrently, in this case you would create 2000 connections for each instance(if running in a 5 instance/node per cluster). This will be lighter than the other approaches on resources consumption, but is a bit harder to maintain, as you will have to handle the balance yourself and might need an extra service to monitor and delegate tasks to all the services and ensure the messages are being processing evenly.
Summary:
One instance handling one connection at one message a time will required 10000 instances of your service, the partitioning will be similar but you can use a shared process to reduce memory consumption, but the memory consumption will still be high in both cases.
Multiple named services could be an option if the number of services were not too high, You also wouldn't be able to share the connection. So I won't recommend this approach for your scenario.
The third option, is the more resource friendly but you will have to find a way to partition the connections evenly throughout the cluster nodes.
You can also use a mixed approach, for example, you can have service handling multiple messages in parallel and a partitioned service to define the key range of devices.
Please take a look in the links I've mentioned.

I found that there is a DeviceClient constructor that allows the AmqpConnectionPoolSettings to be set.

Call Azure Service Fabric from outside

I'm playing with Azure Service Fabric and a console app. I simply want my console app to connect to the cluster and do some stuff.
The console app try to resolve the service address with the following:
static void Main(string[] args)
{
ServicePartitionResolver resolver = null;
try
{
resolver = new ServicePartitionResolver(
new string[] {
"localhost:19000",
"localhost:19001"
});
Uri serviceUri = new Uri("fabric:/StatefullServiceTEST/MyStatefulService");
ResolvedServicePartition partition = resolver.ResolveAsync(serviceUri, new ServicePartitionKey(), CancellationToken.None).GetAwaiter().GetResult();
}
catch (Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
}
Console.WriteLine();
Console.Write("Press any key to exit...");
Console.ReadKey();
}
My problem is that resolver.ResolveAsync throws an exception that doesn't seem to have any connection with Service Fabric:
Unable to cast COM object of type 'System.__ComObject' to interface
type 'IFabricApplicationManagementClient10'. This operation failed
because the QueryInterface call on the COM component for the interface
with IID '{67001225-D106-41AE-8BD4-5A0A119C5C01}' failed due to the
following error: No such interface supported (Exception from HRESULT:
0x80004002 (E_NOINTERFACE)).
Any ideas on this?
UPDATE
I was not so clear explaining my problem and what I want to achive.
I'm playing with Azure Service Fabric (both stateless and stateful services): my question is: what's the best way to call a micro service hosted in Azure Service Fabric?
Regards,
Attilio

You have to create a public facing service (such as Asp.net Core Web Api) which will expose the functionality of your service inside service fabric to outside world (outside the service fabric cluster). FabricClient approach is to be utilzied for calling services from within the service fabric cluster and not outside.
From your Asp.net Core service you will use the FabricClient to access the service hosted, so in general your asp.net core app act as reverse proxy to expose the functionality of actual service.

You can't use ServicePartitionResolver, it is a reliable service feature and must be called from within a service running in your cluster.
I couldn't understand clearly what you want.
If you want to manage the service and get details about it, like query running instances or replicas, add or remove instances, and so on, Use the Fabric Client, below is a quick snippet, check details here and here:
`
using System.Fabric;
using System.Security.Cryptography.X509Certificates;
string clientCertThumb = "71DE04467C9ED0544D021098BCD44C71E183414E";
string serverCertThumb = "A8136758F4AB8962AF2BF3F27921BE1DF67F4326";
string CommonName = "www.clustername.westus.azure.com";
string connection = "clustername.westus.cloudapp.azure.com:19000";
var xc = GetCredentials(clientCertThumb, serverCertThumb, CommonName);
var fc = new FabricClient(xc, connection);`
or,
If you want to communicate to a running service, like an API, you should use a Reverse Proxy to resolve your services via URL, like the below snippet, more details here:
http://mycluster.eastus.cloudapp.azure.com:19081/MyApp/MyService

You cannot access service in an ASF cluster from the outside using the ServicePartitionResolver.
You have to have a public facing endpoint on your cluster, like a stateless service acting as a web api for example.
From the docs:
Services connecting to each other inside a cluster generally can directly access the endpoints of other services because the nodes in a cluster are on the same local network. In some environments, however, a cluster may be behind a load balancer that routes external ingress traffic through a limited set of ports. In these cases, services can still communicate with each other and resolve addresses using the Naming Service, but extra steps must be taken to allow external clients to connect to services.
A Service Fabric cluster in Azure is placed behind an Azure Load Balancer. All external traffic to the cluster must pass through the load balancer. The load balancer will automatically forward traffic inbound on a given port to a random node that has the same port open. The Azure Load Balancer only knows about ports open on the nodes, it does not know about ports open by individual services.
So, unless your console app is hosted in the cluster as a guest executable, you have some more work to do.

Azure Service Bus connection string results in CloudStorageAccount.Parse error

The following call
CloudStorageAccount.Parse(<connection-string>);
returns this error:
"No valid combination of account information found."
with the connection string copied directly from the CONNECTION STRING–PRIMARY KEY field on the Azure service Bus Access Policies -> Policy blade, which looks like this:
Endpoint=sb://xxx.servicebus.windows.net/;SharedAccessKeyName=xxx;SharedAccessKey=xxx;EntityPath=xxx
I need CloudQueueClient and CloudQueue instances to do queue manipulation.
Am I missing something obvious, or is there another way to initialise CloudStorageAccount?
Update: the following syntax allows me to add a new queue using the service level (not queue level) credentials, but I'm not sure how I get from here to a CloudQueue or CloudQueueClient instance.
var queueNamespace = NamespaceManager.CreateFromConnectionString(
"Endpoint=sb://<service-account>.servicebus.windows.net/;
SharedAccessKeyName=sharedaccess;
SharedAccessKey=xxx");

The reason you're getting this error is because you're trying to use storage client library for Service Bus resources. Microsoft.WindowsAzure.Storage is the client library for Azure Storage. Queues in Azure Storage are not a Service Bus Queues.
For Service Bus queues you would need to use its client library that you can install via Nuget from https://www.nuget.org/packages/WindowsAzure.ServiceBus/.
Once you do that, you should be able to create a NamespaceManager using the following code:
var manager = Microsoft.ServiceBus.NamespaceManager.CreateFromConnectionString(ConnectionString);
and then you will be able to perform operations on your Service Bus Queues.
You may find this link useful as well: https://azure.microsoft.com/en-in/documentation/articles/service-bus-dotnet-get-started-with-queues/.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.