Azure EventHub - How to remove the event?

Azure EventHub - How to remove the event? - c#

Like allways, I would appreciate your help, as I am currently stuck!!!
We have a new project and we will be using the Azure EventHub. I have created demo app, where we can add events to the Event Hub and also where we can consume them using IEventProcessor(Receiver project). The questions is that every time, I execute the receiver project, I see the same events. Shouldn't we expect that those events will be deleted-removed after we consume them?
Example in the Receiver project:
foreach (EventData eventData in messages)
{
string data = Encoding.UTF8.GetString(eventData.GetBytes());
Console.WriteLine(string.Format("Message received. Partition: '{0}', Data: '{1}'",
context.Lease.PartitionId, data));
}
Is there a way to delete/remove the event after the Console.WriteLine or will the message be retained for a day? With the Queues , you can signal the completion , but with the EventHub is don't see any command, I can use to delete/remove it.
Any reply would be greatly appreciated. We have been instructed to use EventHub but a-b reasons, its not a matter of choice.

Make sure you call context.CheckpointAsync before exiting ProcessEventsAsync. That will store the client offset for the partition, and the next processor instance which gets assigned that partition will resume from the last stored offset.
See http://msdn.microsoft.com/en-us/library/dn751578.aspx for documentation (not a lot of information, though).

you should use the checkpoint and save the offset for the partition. AFAIK there is no way remove the events from the eventhub. It will be automatically erased from the eventhub after the events retention days. But I have also seen that we will get messages that have completed their retention days.
https://social.msdn.microsoft.com/Forums/windows/en-US/93b1bf18-2229-4da8-994f-ddc7c675f62f/message-retention-day-not-working?forum=servbus
So I believe they will get removed automatically when you hit the space quota, or it will be like a scheduled task, not sure though.

The events in Azure Event Hub will be removed after they are buffered for the defined retention period. You can set the retention days from 1-7 days. There is no need for removing them manually.

In addition to all other answers, there's one more confusing point: EventHub may not delete messages older than retention period up to 30 days. It depends on load put over the hub.
E.g. retention period is 1 day, but if there're few messages they can be kept for a longer period.
Luckily they are not billed.

The best option to consume an event hub is the EventProcessorHost framework.
It gives you the possibility to checkpoint any message you had already read. To do that it stores (in a blob storage) the index of the last checkpointed message in order to resume the processing in case of a shutdown.
https://blogs.msdn.microsoft.com/servicebus/2015/01/16/event-processor-host-best-practices-part-1/
Probably you will also need a storage emulator for development purposes, but if you have an Azure account you could use a remote blob storage.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-emulator

Related

Azure Event Hub - Receiving events Sequentially

I am using below code receive the events from Azure Event-Hub
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dotnet-framework-getstarted-send#receive-events
I want to Handle the requests coming to event-hub Sequentially. For example someone sent 5 events in very quick time, I want to complete request 1 processing then i want take the second request to process.
How can i handle the events coming to event-hub Sequentially?

Event Hub uses partitions to enable horizontal scaling of event processing. You can specify number of partitions during the event hub creation from 1 to 32. Message order is guaranteed only within a partition but not across all partitions.
If you need the order to be maintained, you need to write events only to a specific partition and read only from the same partition. In Azure Event Hub, partitions are distributed across different instances for high availability. Which means, a partition may go offline for maintenance and come online later. So if you wanted to manage order, you need to write and read to a single partition and you may need to manually manage situations such as partition go offline from your application logic.
If you need to manage order, I would recommend using Azure Service Bus queue which order & availability is managed by Service bus it self.

From the docs,
In order to make it sequential you need to select the proper partitionKey
If you don't specify a partition key when publishing an event, a
round-robin assignment is used. In many cases, using a partition key
is a good choice if event ordering is important. When you use a
partition key, these partitions require availability on a single node,
and outages can occur over time; for example, when compute nodes
reboot and patch.

In the bundle of events you receive from eventhub, there is an attribute called sequence_number, as this bundle is a list, you can sort by sequence_number and then process the events.

Deleting a message from Azure Queue Service by value

I'm using an Azure Storage Queue Service to track a queue of long running jobs which are en-queued by multiple disparate clients, but I now have a requirement to remove messages from the queue if any exist matching some given criteria.
I realise that this is somewhat anti-pattern for a queue, but the service does provide some functionality in addition to simple queueing (such as Delete Message and Peek Messages) so I thought I'd try to implement it.
The solution I've come up with works, but is not very elegant and is quite inefficient - and I'm wondering if it can be done better - or if I should bin the whole approach and use a mechanism which supports this requirement by design (which would require a fair amount of work across different systems). Here is the simplified code:
var queue = MethodThatGetsTheAppropriateQueueReference();
await queue.FetchAttributesAsync(); //populates the current queue length
if (queue.ApproximateMessageCount.HasValue)
{
// Get all messages and find any messages to be removed.
// this makes those messages unavailable for other clients
// for the visibilityTimeOut period.
// I've set this to the minimum of 1 second - not ideal though
var messages = await queue.GetMessagesAsync(queue.ApproximateMessageCount.Value);
var messagesToDelete = messages.Where(x => x.AsString.Contains(someGuid));
// Delete applicable messages
messagesToDelete.ToList().ForEach(x => queue.DeleteMessageAsync(x));
}
Note originally I tried using PeekMessagesAsync() to avoid affecting messages which do not need to be deleted, but this does not give you a PopReceipt which is required by DeleteMessageAsync().
The questions:
Is there a way to do this without pulling ALL of the messages down? (there could be quite a few)
If 1 isnt possible, is there a way to get the PopReceipt for a message if we use PeekMessagesAsync()?

Is there a way to do this without pulling ALL of the messages down?
(there could be quite a few)
Unfortunately no. You have to Get messages (a maximum of 32 at a time) and analyze the contents of the messages to determine if the message should be deleted.
If 1 isnt possible, is there a way to get the PopReceipt for a message
if we use PeekMessagesAsync()?
Again, No. In order to get PopReceipt, a message must be dequeued which is only possible via GetMessagesAsync(). PeekMessagesAsync() simply returns the message without altering its visibility.
Possible Solution
You may want to look into Service Bus Topics and Subscriptions for this kind of functionality.
What you could do is create a topic where all messages will be sent.
Then you would create 2 subscriptions: In one subscription you will set a rule which checks for message contents for the matching value and in other subscription you will set a rule which checks for message contents for not matching value.
What Azure Service Bus will do is check each message that arrives against the rules and accordingly pushes the message in appropriate subscription. This way you will have a nice separation of messages that should/shouldn't be deleted.

using an Azure Service Bus Queue and BrokeredMessage.ScheduledEnqueueTimeUtc to renew subscriptions

I have a subscription model, and want to perform renew-related logic like issue new invoice, send emails, etc. For example, user would purchase the subscription today, and the renewal is in a year's time. I've been using an Azure Queue recently, and think it would apply for such a renewal.
Is it possible to use the Azure Queue by pushing messages using BrokeredMessage.ScheduledEnqueueTimeUtc (http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.scheduledenqueuetimeutc.aspx) for such long term scheduled messages?
I've used it for shorter-term, like sending notifications in 1 minute time and it works great.
This way, I can have even multiple processes listening to the queue, and be sure that only one process would perform the renewal logic. This would solve a lot of locking-related problems, as that is kind of built-in the Azure Queue via leasing and related features.

Yes, you can use it for long-term scheduling, scheduled messages have the same guaranties as normal ones. But there are few things you need to be aware of:
ScheduledEnqueueTimeUtc is a time when message is going to be available (within hundreds of miliseconds) on the queue but not necessary delivered, this depends on load and state of the queue. So it's fine for business processes but not for time sensitive (milliseconds) usage. Not a problem in your case, unless your subscription cancellation is really time sensitive.
It affects your storage quota ( Not really a problem with current quotas, but if you think about years this might be a problem)
As far as I'm aware you can't access scheduled messages before ScheduledEnqueueTimeUtc, they are invisible.
Extremely awesome source of informations on azure messaging
From technological perspective it's fine but in your case I would also think about other potential problems if you think about years:
Message versioning
What happens when you would like to change Azure to something else (AWS?)
What if you decide to change in next year Azure Service Bus for NServiceBus

Duplicate detection in Azure Storage Queue

I want to know if there is any elegant way to ensure that Queue always have distinct messages (nothing related to Duplicate Detection Window or any time period for that matter) ?
I know that Service Bus Queue provides session concepts (as I mentioned Duplicate Detection of Service Bus Queue won't help me as it depends on time period), which can serve my purpose, but I don't want my component's dependency on another Azure service, just because of this feature.
Thanks,

This is not possible to do reliably.
There is just no mechanism that can query a Storage queue and find out if a message with the same contents is already there or was there before. You can try to implement your own logic using some storage table, but that will not be reliable - as the entry into the table may succeed and then entry into the queue may fail - and now you would potentially have bad data in the table.
Your code should always assume that it can retrieve a message containing the same data that was already processed. This is because messages can come back to the queue when workers that are working on them crash or take too long.

You can use Service Bus. Is like Azure Storage Queue but it allows messages of 256Kb-1MB and makes duplicate detection

What's the easiest way to schedule a function to run at a specific time using C#

If I had a lot of messages in a database that I wanted to send, and each row specified a date and time to send the message, and a flag for if it has been sent.
These won't always be at fixed intervals, and more than 1 message may want to be sent at the same time.
In this case it would just queue them up and send in order of when they were created.
Is the easiest thing to do just to have a function that runs over and over again, once it completes it just runs again
So it would:
Start Running and check the current date/time
Check for any unsent messages
Send all the messages due to go out before and up to the time it started running
Start all over again and take the current date/time
My problem with this is, would it just be horribly inefficient to continuously have a method running, possibly for hours or days without actually sending a message.
The main strain in this case I think would be put on the database, it would constantly be getting hit with a query.
Is there a better way to schedule something like this to happen.
Or just do the above but every time it runs make it wait for 5 minutes before running again.
Does Workflow 4 offer anything suitable for scheduling perhaps?

You could always do a pre-emptive read of the next time value in the series and do a single sleep until then, instead of looping through short sleeps over and over.
Not sure if that's as elaborate as you want though

Maybe have a compiled view in the database which returns messages that are not sent (I assume there's a flag on each record?) and for which the intended send time is prior to the current time. Then a Windows Service or console application on a scheduled interval can hit that view (which can be performance-tuned in the database pretty well, I'd imagine) and send any messages returned by it.

You could use a windows service to accomplish this. Or if you're using MSSQL, you could even use a SQL Server Agent Job.

Several answers has suggested sending some messages then calling sleep until the next message is due to be sent.
How you sleep in this instance is all important.
You can - in theory - tell a thread to sleep for hours, however if during that time the app (or service) needs to shut down then you're in trouble. The process will be terminated, no cleanup will be executed. This is a less than ideal.
Don't get confused between the concept of polling for work to do, and sleeping between polls.
If you've to wait 5 minutes (or 5 hours) before next polling the database, that's fine, however you never want to *sleep for more than a second or two at a time.
What I'd do . . .
Write a windows service. The service has one active thread that polls the database, see's are any messages due to send, and sends them.
It will then poll on a configurable delay (1 minute, 5 minutes, 1 hour, what ever suits).
However it will never sleep for more than a second while it's waiting to poll the database.
If you can be assured that messages can only be added to send after the last message in the DB? If so you can check the time of the next message and not poll until that time.
However, if I find that the next message doesn't need to be sent for 5 hours, is it possible that while I'm waiting a message was added that should be sent in 30 minutes?
If so then you can never trust the "Next message time" and not poll until then, you have to continuously poll on your fixed interval NB worth saying again, your polling interval and your sleep interval are not the same thing.

How about writing a windows service which does this for you. This Windows service will run in the background and check the current time with your db records in a purticular interval (ex : every 5 minutes) and send emails to people and update corresponding records in your tables to set the Email Sent Flag to true
You can even have an SQL job which selects records which are not sent and matches wtih the current time and call a stored procedure which calls dot net assembly to send email . The dot net assembly can use SMTPClient to send emails.

It depends on what you use. Using a scheduled task or a service is perfectly acceptable for the scenario you describe.
You have to be careful though that you do not tie up resources if the process runs too often. It might be more efficient for it to run less often at peak times and more often during off-peak times.

Whatever method you prefer (make a Windows Service, use Task Scheduler, etc..), please bear in mind that your initial suggestion is exactly what is called busy waiting, which is something you should avoid unless you really know what you're doing.

What you describe isn't that bad if you extend it with
"when there are no messages due select the next time a message will be due and sleep till then".
Alternatively use a DB with "notification support" making the whole thing event-driven i.e. the DB sends you an event whenever a message is due.

you can use this one .NET Scheduled Timer for checking timeinervals and running the function(sending messages) at specific time intervals ....

I would say create a windows service with the timer. it may sleep for configured amount of seconds and then compare the datetime from the database. if it matched then send an e-mail & set the flag in the database for sent e-mails.

I recently implemented a windows service which utilized a class called IntervalHeap in the C5 collection class library. I then added a persistence layer which keeps tracks of the items and their intervals in case the service is stopped/crashed.
Has been in production for a few months and has been working very well.

We do this at a financial institution to send out internal e-mails from our intranet applications. Once every 15 minutes, a scheduling software (enterprise scheduler, not a Windows scheduled task) fires off a job. We have a view called PendingEmail on top of a table called EmailQueue that only lists out what needs to be sent this go around (the EmailQueue table has a PopDate, which is an effective date as to when the e-mail should get sent). The application fires off e-mails for whatever it found in the PendingEmails view.
The job sends out a maximum batch size of emails every 15 minutes, marking each record with whether it was successfully sent or whether there was an error (invalid email address, etc.) and what the Exception was, and whether we would like to try re-sending it the next time around. It updates that EmailQueue table all at once, not each record individually. The batch size was put in place to prevent the job from taking more than 15 minutes and stomping on itself.
I don't know that polling every so often is really consuming all that many resources, unless you're going to do it every 5 seconds or something. If you're sending out millions of messages you may need to distribute the work across multiple machines. If you're going to write some custom code, I would use a Timer over Thread.Sleep(), and set the Timer to tick every 5 minutes or whatever interval you'd like to perform work. An event fires on every tick that would subscribe to to start the routine that sends your messages.
See this post on Thread.Sleep() vs. the Timer class:
Compare using Thread.Sleep and Timer for delayed execution

Many databases allow events to be fired by triggers, eg. 'after insert'. The trigger is run by the database process/thread and the actions it can take are database-specific. It could, for instance, call a C or java procedure that signals a named semaphore upon which you emailer is waiting or exec. an emailer app directly. Look at 'trigger' or 'create trigger' for your database.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.