Azure - Consume Event Hub to DocumentDB (not Storage Account) - c#

I have two services running on Azure. An Event Hub and a CosmosDB/DocumentDB database.
My goal is two wire the two with a WebApp service so everything that gets on the Event Hub is consumed and properly stored on the database.
I went through the Quick Starts and the Tutorials of both Event Hubs and CosmosDB and I cannot figure out a way of wiring the two.
I know how to establish a connection with DocumentDB, I know how to consume the data of an Event Hub, but I can't manage to do both. Here is the deal:
To create an Event Hub Processor Host I've only found the following constructor
public EventProcessorHost(string eventHubPath, string consumerGroupName, string eventHubConnectionString, string storageConnectionString, string leaseContainerName)
with storageConnectionString being a combination of two strings called StorageAccountName and StorageAccountKey on the official tutorial.
Well that's actually my problem, Storage Account is another service available on Azure. I've created one for testing purposes and it works just fine but, I need to store everything on a DocumentDB CosmosDB database.
I am not excluding the possibility that going through a Storage Account is required, but if that's so, could you tell me why?
Thank you very much.

The Event Hub Processor requires connection information for Azure Storage for lease management and check-pointing purposes. Practically this means that if you have multiple instances of your processor running together all the hard work of figuring out who is reading from which Event Hub partitions is completely managed for you.
The EventProcessor class is extremely generic. You just subclass it and then implement public override Task HandleEventData(IEnumerable<EventData> data). Inside this method you're free to write EventData into Cosmos or do anything else your heart desires with the messages coming off of the Event Hub. For example:
class CosmosEventHubProcessor : EventProcessor
{
private DocumentClient _documentClient;
public CosmosEventHubProcessor()
{
// Initialize DocumentClient
}
public override Task HandleEventData(IEnumerable<EventData> data)
{
// Write data to Cosmos using DocumentClient
}
}

Related

How can I receive an event from Azure Event Hub without using a blob storage container?

I am new to using Azure Event Hubs but I was wondering how I can receive events from the Event Hub without using a blob storage container. Would it be possible to set up event triggers to download the message data whenever a new message is posted (sent)? Would it make sense to use a function like the one below?
[FunctionName("EventHubTriggerCSharp")]
public void Run([EventHubTrigger("samples-workitems", Connection = "EventHubConnectionAppSetting")] string myEventHubMessage, ILogger log)
{
log.LogInformation($"C# function triggered to process a message: {myEventHubMessage}");
}
(Taken from https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs-trigger?tabs=in-process%2Cfunctionsv2%2Cextensionv5&pivots=programming-language-csharp)
Is this the right approach for this problem? Could someone walk me through on why this would/wouldn't work and what exactly is happening here? I couldn't find a better description on the docs itself. Thanks in advance.
Receive events from the Event Hub without using a blob storage container is not possible. To capture event messages, we will have to maintain a checkpoint store.
Default checkpoint store we use is Azure Blob Storage. If not, we need to maintain custom checkpoint manager.
The event processor that supports the Functions trigger needs storage in order to communicate with other instances.
A Function trigger cannot use any of the Event Hubs client library types that can consume messages without storing them first. It can be done by using Event Grid schema.
Utilizing a block blob type storage in this method as well Event Schema
"blobType": "BlockBlob"
Created a sample replica for Sending event messages through the eventhub trigger and able to trigger to portal using blob storage as shown:
Reference:
You can also use EventProcessorHost.

Azure Service Bus - How to Add Topic Subscriber Programmatically

I'm trying to implement a basic pub/sub system with dynamic subscribers. I need to dynamically register a topic subscriber in my .NET APIs, but it seems like I can only do that manually from the Azure Portal. When my program starts, I want to be able to register a subscriber to a topic in the format of subscribername-{timestamp} because I want to be able to deploy as many staging/dev versions as I want without having to manually create these subscribers each time.
I feel like this is a fundamental feature that I'm just blindly missing. I can do this when working with queues, but if I try to do the same with a topic, I get continuous errors of that subscriber path not found. I have searched the internet to no end and while I have found SOME solutions, they are very old and often not compatible with .NET 5 or the package is deprecated. I'm feeling like I'm going against the grain and missing something with what I'm coming up with, so I'd like to get some input on what the proper practice is for this.
I'm using Azure.Messaging.ServiceBus for publishing and subscribing currently. Below is some code -
var processor = ServiceBusClient.CreateProcessor(TopicName, $"DynamicSubscriber-{DateTime.Now}");
try
{
processor.ProcessErrorAsync += ErrorHandler;
processor.ProcessMessageAsync += MessageHandler;
await processor.StartProcessingAsync();
}
catch (Exception e)
{
await processor.DisposeAsync();
await ServiceBusClient.DisposeAsync();
}
finally
{
Console.WriteLine("Press a key to exit.");
Console.ReadLine();
}
Thank You #PeterBons! Yes, when updating/creating/fetching/deleting the Service Bus entities, ServiceBusAdministrationClient is the client Class to be used.
Also, There are few error details given this article when using the method of Queue with ServiceBusAdministrationClient and this SO Thread.
The ServiceBusTopicSubscription class is used to setup the Azure Service bus subscription. The class uses the ServiceBusClient to set up the message handler, the ServiceBusAdministrationClient is used to implement filters and add or remove these rules. The Azure.Messaging.ServiceBus Nuget package is used to connect to the subscription.

How can I avoid duplicate background task processing in Service Fabric hosted services?

Sorry about the vague title, it's rather hard to explain. I have the following setup:
I'm running a .NET Core 2.2 Web API hosted in Service Fabric.
Part of this API's responsibilities is to monitor an external FTP storage for new incoming files.
Each file will trigger a Mediator Command to be invoked with processing logic.
I've implemented a hybrid solution based on https://learn.microsoft.com/en-us/dotnet/architecture/microservices/multi-container-microservice-net-applications/background-tasks-with-ihostedservice and https://blog.maartenballiauw.be/post/2017/08/01/building-a-scheduled-cache-updater-in-aspnet-core-2.html. In essence this is an IHostedService implementation that is registered in the Startup.cs of this API. Its basically a background service running in-process.
As for the problem. The solution above works fine on a 1-node cluster, but causes "duplicates" to be processed when running on a 5-node cluster. The problem lies in the fact that on a 5-node cluster, there are ofcourse 5 identical ScheduledTasks running and will all access the same file on the FTP at the same time.
I've realised this is caused somewhat by improper separation of concerns - aka the API shouldn't be responsible for this, rather a completely separate process should handle this.
This brings me to the different services supported on Service fabric (Stateful, Stateless, Actors and Hosted Guest Exe's). The Actor seems to be the only one that runs single-threaded, even on a 5-node cluster. Additionally, an Actor doesn't seem to be well suited for this kind of scenario, as it needs to be triggered. In my case, I basically need a daemon that runs all the time on a schedule. If I'm not mistaken, the other stateful/stateless services will run with 5 "clones" as well and just cause the same issue as I currently have.
I guess my question is: how can I do efficient background processing with Service Fabric and avoid these multi-threaded/duplicate issues? Thanks in advance for any input.
In service farbic you have 2 options with actors:
Reliable actor timers
Reliable actor reminders
You can use the state to determine if the actor has processed your ftp file.
Have a look at this blog post, to see how they used a reminder to run every 30 seconds.
It's important that the code in your actor allows reantrancy.
Basically because the actors are reliable, your code might get executed multiple times and be canceled in the middle of an execution.
Instead of doing this:
public void Method()
{
_ftpService.Process(file);
}
Consider doing this:
public void Method(int fileId)
{
if (_ftpService.IsNotProcessed(fileId))
{
_ftpService.Process(file);
_ftpService.SetProcessed(fileId);
}
}
If your actor has trouble disposing, you might want to check if you are handling cancelationtokens in your code. I never had this issue, but we are using autofac, with Autofac.ServiceFabric to register our actors with RegisterActor<T>() and we have cancelationtokens in most of our logic. Also the documentation of CancellationTokenSource can help you.
Example
public Ctor()
{
_cancelationTokenSource = new CancellationTokenSource();
_cancellationToken= _cancelationTokenSource.Token;
}
public async Task SomeMethod()
{
while(/*condition*/)
{
_cancellationToken.ThrowIfCancellationRequested();
/*Other code*/
}
}
protected override async Task OnDeactivateAsync()
{
_cancelationTokenSource.Cancel();
}

Trigger WebJob at a particular time after record added to a database

I want to trigger an Azure Webjob 24Hours after I have added a record to a database using .NET . Obviously there will be multiple tasks for the Webjob to handle, all at their designated time. Is there a way ( in the Azure Library for .NET) in which i can schedule this tasks ?
I am free to use Message Queues , but I want to try and avoid the unnecessary polling of the WebJob for new messages.
If you want to trigger the execution of a WebJob 24 hours after a record insertion in a SQL database I would definitely use Azure Queues for this. So after you insert the record, just add a message to the queue.
In order to do this you can easily leverage the initialVisibilityDelay property that can be passed to the CloudQueue.AddMessage() method. This will make the message invisible for 24 hours in your case, and then it will appear to be processed by your Webjob. You don't have to schedule anything, just have a Continuous WebJob listening to a queue running.
Here's some sample code:
public void AddMessage(T message, TimeSpan visibilityDelay)
{
var serializedMessage = JsonConvert.SerializeObject(message);
var queue = GetQueueReference(message);
queue.AddMessage(new CloudQueueMessage(serializedMessage), null, visibilityDelay);
}
private static CloudQueue GetQueueReference(T message)
{
var storageAccount = CloudStorageAccount.Parse("Insert connection string");
var queueClient = storageAccount.CreateCloudQueueClient();
var queueReference = queueClient.GetQueueReference("Insert Queue Name");
queueReference.CreateIfNotExists();
return queueReference;
}
Hope this helps
Since the event of adding a record to the database is the trigger here, You can use Azure Management Libraries to create a Azure Scheduler Job to execute after 24hrs from the time the db record is inserted. Azure Scheduler Jobs can do only 3 things : make HTTP/HTTPS requests or Put Message in Queue. Since you do not want to poll queues, here are two options
Deploy the existing Web Job as Wep API where each task is reachable by unique URLs, so that the scheduler task can execute the right HTTP/HTTPS request
Create a new WebAPI/Wep API which takes accepts request (like a man in the middle) and pro-grammatically run the existing web job on demand, again using Azure management libraries.
Please let me know if any of these strategies help.
To invoke a WebJob from your Website,is not good idea rather than you can add the WebJob code inside your Website and simply call that code. you can still easily use the WebJob SDK from inside your Website.
https://github.com/Azure/azure-webjobs-sdk-samples
we wouldn't recommend to invoke the WebJob from your Website is that the invocation contains a secret you rather not store on your Website (deployment credentials).
Recommendation:
To separate WebJob and Website code, the best thing to do is to communicate using a queue, the WebJob listens on the queue and the Website pushes the request to the queue.

Function call on server by multiple clients: Isolate each client calls

My project was standalone application then I decided to split it as client & server because I need powerful CPU usage and portability at the same time. Now multiple clients can connect to one server.
It was easy when 1 by 1 processing did the job. Now I need to call the same function & scope area again & again at the same time -via client requests-
Please can anyone give me some clue how should I handle these operations, I need to know how can I isolate clients' processes from each other at the server side? My communication is asynchronous, server receives a request and starts a new thread. I think I pass a parameter which one carries the client information, and another parameter as job id -to help client back, client may ask for multiple jobs and some jobs finish quicker than others-
Should I instantiate the class Process on each call? Can I use a static method, etc, any explanation will be of great help!
Below is the part of my code to need modification
class static readonly Data
{
public variable listOfValues[]
}
class Process
{
local variable bestValue
function findBestValue(from, to)
{
...
if(processResult > bestValue) bestValue = processResult
...
}
...
for(i=0;i<10;i++) startThread(findBestValue(i*1000,i*1000+999));
...
}
EDIT: I think I have to instantiate a
new Process class and call the
function for each client and ignore
the same client for same job since job is already running.
Not getting into your application design, since you didn't talk much about it, I think that your problem is ideal for using WCF WebServices. You get client isolation by design because every request will start in it's own thread. You can create WCF host as standalone application/windows service.
You can wrap your communication with WCF service and configure it to be PerCall service (meaning each request will be processed separately from others).
So you'll clean up your buisness logic from syncronization stuff. That's the best way, because managing and creating threads is not difficult to implement, but it is difficult to implement correctly and optimized for resources consumption.

Categories