Azure Event Hub - Receiving events Sequentially - c#

I am using below code receive the events from Azure Event-Hub
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dotnet-framework-getstarted-send#receive-events
I want to Handle the requests coming to event-hub Sequentially. For example someone sent 5 events in very quick time, I want to complete request 1 processing then i want take the second request to process.
How can i handle the events coming to event-hub Sequentially?

Event Hub uses partitions to enable horizontal scaling of event processing. You can specify number of partitions during the event hub creation from 1 to 32. Message order is guaranteed only within a partition but not across all partitions.
If you need the order to be maintained, you need to write events only to a specific partition and read only from the same partition. In Azure Event Hub, partitions are distributed across different instances for high availability. Which means, a partition may go offline for maintenance and come online later. So if you wanted to manage order, you need to write and read to a single partition and you may need to manually manage situations such as partition go offline from your application logic.
If you need to manage order, I would recommend using Azure Service Bus queue which order & availability is managed by Service bus it self.

From the docs,
In order to make it sequential you need to select the proper partitionKey
If you don't specify a partition key when publishing an event, a
round-robin assignment is used. In many cases, using a partition key
is a good choice if event ordering is important. When you use a
partition key, these partitions require availability on a single node,
and outages can occur over time; for example, when compute nodes
reboot and patch.

In the bundle of events you receive from eventhub, there is an attribute called sequence_number, as this bundle is a list, you can sort by sequence_number and then process the events.

Related

Maintaining order of processing with multiple instances

SenderApp is sending values to service bus queue and on the other end we have two receiver instances.
Requirement is that we only save values that have changed to db (all incoming values are first saved to redis cache where comparison happens).
SenderApp sends three values in following order to queue: (1, 2, 1).
-----1---2---1------------------------>
Now the values go in queue with FIFO method and on the other end of queue we have two instances of receiver application.
This is where it gets interesting.
Lets say due to latency or some other factor second receiver instance is slow to process the value(2) and ends up saving it to database last from all three values.
So it should be something like this:
Receiver instance #1
---------------------1---1------------>
Receiver instance #2
-----------------------------2-------->
Now we have a problem. Instance one is comparing the second sent value which is 1 against the first value which is also 1 and it doesn't get saved to database. Values that are sent to service bus queue have timestamps attached to them.
Also it needs to be fairly scalable solution.
All ideas are welcome, maybe levarage the redis cache, maybe service bus sessions?
Edit:
For more clarification:
Each incoming message has device id and value attached to it so we must not save any consecutive duplicate values for that specific device.
So for example we have two incoming messages from the same device.
First incoming message has a value 1 with device id 999 (we must save it).
But now the next incoming message also has value 1 and device id 999.
So that would be consecutive duplicate and we must not save it.
Also what makes this problem difficult is that we also can not save values directly on sender side.
Explanatory graph of the general flow below:
Competing consumers (receivers) will contradict the requirement to handle messages in the order they were sent. To adhere to the in-order processing, the Azure Service Bus message sessions feature could help. A single receiver only processes a session at any time; there's no processing by multiple receivers. This eliminates the chance of having messages from the same source processed in parallel and out of order. This is still a scalable approach as receivers can handle different sessions. If messages for a given session arrive after some time, the session processing would be picked up by any competing receivers.
In this scenario, where a uniquely identifiable device emits messages, the device ID could be used as a session ID.
Worth noting that each receiver can handle multiple sessions in parallel, not just a single session. That way, the solution can scale out and up, depending on the compute used and complexity of the code executed per message.

Near Real Time Event Processing From Web API

I'm after a couple of ideas or opinions if you do not mind. I'm a trying to understand the best approach for a solution which needs to process near real time events received over a Web API using REST and JSON. There could be several events that are received every second.
As an event is received its processed against a number of rules which could be computationally expense to perform. Each event would be processed against 100s of rules to find a match. A rule might be based on multiple events, and as such I need to store state in memory, not disk or database as performance will become key. The Rules will be pushed in from a Database as a one time exercise and again will be held in memory. If a rule is changed, it will be re-pushed.
Would it be best to write this is a single C# WebAPI Application that receives and correlates the Events. Or a WebAPI, and windows service?
If the later how do I get the API and Windows Service to pass data between each other?These could be on the same or separate servers
With the Windows Service rather than start a new thread for every event respecified, im thinking I should create an event queue or buffer (some sort of FIFO Array). Id have several buffers assigned to different threads or processes to achieve some level of parallelism.
Similarly if I produced this as just a WebAPI, is it possible to create the Queuing/threading approach?
This question might be too big to provide single answer for. Designing system like this depends on multitude of factors such as requirements of the system.
For general event processing solution it's good idea to have web api which saves the events into queue system that has stores the events for later processing. Queue can be external service such as Azure Storage Queue or custom any custom queue implementation that you're able to communicate with and that satisfies your requirements.
Then you would have single or multiple event processors that retrieve events for processing from queue. The event processor could be custom program written by you. Generally the queue should have a way to lease an event so that in case processor fails (crashes) the event is returned to the queue for another processor to process. Once processor has processed the event it can be permanently removed from the queue.
Having that type of architecture is good starting point for building reliable and possibly even scalable solution for processing events. Of course one has to consider the performance of the queue as it could become bottleneck if the amount of events per second is huge.

Azure EventHub - How to remove the event?

Like allways, I would appreciate your help, as I am currently stuck!!!
We have a new project and we will be using the Azure EventHub. I have created demo app, where we can add events to the Event Hub and also where we can consume them using IEventProcessor(Receiver project). The questions is that every time, I execute the receiver project, I see the same events. Shouldn't we expect that those events will be deleted-removed after we consume them?
Example in the Receiver project:
foreach (EventData eventData in messages)
{
string data = Encoding.UTF8.GetString(eventData.GetBytes());
Console.WriteLine(string.Format("Message received. Partition: '{0}', Data: '{1}'",
context.Lease.PartitionId, data));
}
Is there a way to delete/remove the event after the Console.WriteLine or will the message be retained for a day? With the Queues , you can signal the completion , but with the EventHub is don't see any command, I can use to delete/remove it.
Any reply would be greatly appreciated. We have been instructed to use EventHub but a-b reasons, its not a matter of choice.
Make sure you call context.CheckpointAsync before exiting ProcessEventsAsync. That will store the client offset for the partition, and the next processor instance which gets assigned that partition will resume from the last stored offset.
See http://msdn.microsoft.com/en-us/library/dn751578.aspx for documentation (not a lot of information, though).
you should use the checkpoint and save the offset for the partition. AFAIK there is no way remove the events from the eventhub. It will be automatically erased from the eventhub after the events retention days. But I have also seen that we will get messages that have completed their retention days.
https://social.msdn.microsoft.com/Forums/windows/en-US/93b1bf18-2229-4da8-994f-ddc7c675f62f/message-retention-day-not-working?forum=servbus
So I believe they will get removed automatically when you hit the space quota, or it will be like a scheduled task, not sure though.
The events in Azure Event Hub will be removed after they are buffered for the defined retention period. You can set the retention days from 1-7 days. There is no need for removing them manually.
In addition to all other answers, there's one more confusing point: EventHub may not delete messages older than retention period up to 30 days. It depends on load put over the hub.
E.g. retention period is 1 day, but if there're few messages they can be kept for a longer period.
Luckily they are not billed.
The best option to consume an event hub is the EventProcessorHost framework.
It gives you the possibility to checkpoint any message you had already read. To do that it stores (in a blob storage) the index of the last checkpointed message in order to resume the processing in case of a shutdown.
https://blogs.msdn.microsoft.com/servicebus/2015/01/16/event-processor-host-best-practices-part-1/
Probably you will also need a storage emulator for development purposes, but if you have an Azure account you could use a remote blob storage.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-emulator

Duplicate detection in Azure Storage Queue

I want to know if there is any elegant way to ensure that Queue always have distinct messages (nothing related to Duplicate Detection Window or any time period for that matter) ?
I know that Service Bus Queue provides session concepts (as I mentioned Duplicate Detection of Service Bus Queue won't help me as it depends on time period), which can serve my purpose, but I don't want my component's dependency on another Azure service, just because of this feature.
Thanks,
This is not possible to do reliably.
There is just no mechanism that can query a Storage queue and find out if a message with the same contents is already there or was there before. You can try to implement your own logic using some storage table, but that will not be reliable - as the entry into the table may succeed and then entry into the queue may fail - and now you would potentially have bad data in the table.
Your code should always assume that it can retrieve a message containing the same data that was already processed. This is because messages can come back to the queue when workers that are working on them crash or take too long.
You can use Service Bus. Is like Azure Storage Queue but it allows messages of 256Kb-1MB and makes duplicate detection

NService Bus - Content based routing & auditing - is my approach ok?

I have a little trouble deciding which way to go for while designing the message flow in our system.
Because the volatile nature of our business processes (i.e. calculating freight costs) we use a workflow framework to be able to change the process on the fly.
The general process should look something like this
The interface is a service which connects to the customers system via whatever interface the customer provides (webservices, tcp endpoints, database polling, files, you name it). Then a command is sent to the executor containing the received data and the id of the workflow to be executed.
The first problem comes at the point where we want to distribute load on multiple worker services.
Say we have different processes like printing parcel labels, calculating prices, sending notification mails. Printing the labels should never be delayed because a ton of mailing workflows is executed. So we want to be able to route commands to different workers based on the work they do.
Because all commands are like "execute workflow XY" we would be required to implement our own content based routing. NServicebus does not support this out of the box, most times because it's an anti pattern.
Is there a better way to do this, when you are not able to use different message types to route your messages?
The second problem comes when we want to add a monitoring. Because an endpoint can only subscribe to one queue for each message type we can not let all executors just publish a "I completed a workflow" message. The current solution would be to Bus.Send the message to a pre configured auditing endpoint. This feels a little like cheating to me ;)
Is there a better way to consolidate published messages of multiple workers into one queue again? If there would not be problem #1 I think all workers could use the same input queue however this is not possible in this scenario.
You can try to make your routing not content-based, but headers-based which should be much easier. You are not interested if the workflow is to print labels or not, you are interested in whether this command is priority or not. So you can probably add this information into the message header...

Categories