Duplicate detection in Azure Storage Queue

Duplicate detection in Azure Storage Queue - c#

I want to know if there is any elegant way to ensure that Queue always have distinct messages (nothing related to Duplicate Detection Window or any time period for that matter) ?
I know that Service Bus Queue provides session concepts (as I mentioned Duplicate Detection of Service Bus Queue won't help me as it depends on time period), which can serve my purpose, but I don't want my component's dependency on another Azure service, just because of this feature.
Thanks,

This is not possible to do reliably.
There is just no mechanism that can query a Storage queue and find out if a message with the same contents is already there or was there before. You can try to implement your own logic using some storage table, but that will not be reliable - as the entry into the table may succeed and then entry into the queue may fail - and now you would potentially have bad data in the table.
Your code should always assume that it can retrieve a message containing the same data that was already processed. This is because messages can come back to the queue when workers that are working on them crash or take too long.

You can use Service Bus. Is like Azure Storage Queue but it allows messages of 256Kb-1MB and makes duplicate detection

Related

Requeue Ibm MQ Message

We are running multiple instances of a windows service that reads messages from a Topic, runs a report, then converts the results into a PDF and emails them to a user. In case of exceptions we simply log the exception and move on.
The use case we want to handle is when the service is shut down we want to preserve the jobs that are currently running so they can be reprocessed by another instance of the service or when the service is restarted.
Is there a way of requeueing a message? The hacky solution would be to just republish the message from the consuming service, but there must be another way.
When incoming messages are processed, their data is put in an internal queue structure (not a message queue) and processed in batches of parallel threads, so the IbmMq transaction stuff seems hard to implement. Is that what I should be using though?

Your requirement seems to be hard to implement if you don't get rid of the "internal queue structure (not a message queue)" if this is not based on a transaction oriented middleware. The MQ queue / topic works well for multi-threaded consumers, so it is not apparent what you gain from this intermediate step of moving the data to just another queue. If you start your transaction with consuming the message from MQ, you can have it being rolled back when something goes wrong.

If I understood your use case correctly, you can use Durable subscriptions:
Durable subscriptions continue to exist when a subscribing application's connection to the queue manager is closed.
The details are explained in DEFINE SUB (create a durable subscription). Example:
DEFINE QLOCAL(THE.REPORTING.QUEUE) REPLACE DEFPSIST(YES)
DEFINE TOPIC(THE.REPORTING.TOPIC) REPLACE +
TOPICSTR('/Path/To/My/Interesting/Thing') DEFPSIST(YES) DURSUB(YES)
DEFINE SUB(THE.REPORTING.SUB) REPLACE +
TOPICOBJ(THE.REPORTING.TOPIC) DEST(THE.REPORTING.QUEUE)
Your service instances can consume now from THE.REPORTING.QUEUE.

While I readily admit that my knowledge is shaky, from what I understood from IBM’s [sketchy, inadequate, obtuse] documentation there really is no good built in solution. With transactions the Queue Manager assumes all is well unless it receives a roll back request and when it does it rolls back to a syncpoint, so if you’re trying to roll back to one message but two other messages have completed in the meantime it will roll back all three.
We ended up coding our own solution updating the way we’re logging messages and marking them as completed in the DB. Then on both startup and shutdown we find the uncompleted messages and programmatically publish them back to the queue, limiting the DB search by machine name so if we have multiple instances of the service running they won’t duplicate message processing.

Trying to understand Azure Service Bus Sessions

So I am trying to understand Azure service bus Session ID for creating FIFO in my queue.
The idea I have is pretty straight forward but I don't know if its the right way to thing when it comes to FIFO.
What i am thinking are in these steps fro creating FIFO in my queue:
TO CREATE:
First: Check the Queue for messages and their Session ID's to and expose the ID hierarchy.
Next: Create new message with the the latest Session-ID in the hierarchy and iterate that value by 1 (+1)
Next: Send to Service Bus Queue.
TO READ:
First: Check the Queue for messages and their Session ID's to and expose the the ID hierarchy.
Next: Read-And-Delete the earliest Session-ID in the hierarchy.
Next: Process...
Keep in mind I haven't included error handling and such for example the read-and-delete part, because that I have already figured out.
So the question is is this the right way of thinking and also, how do I achieve this in C# I cant really find something that explains this concept in a straight forward manner.

To elaborate:
Lets say you have 9 total queue messages and these are grouped into three sessions, session ids 1, 2, and 3. Each group of 3 messages will then be processed in order (First in first out).
However, parallelism can still occur between sessions - or between each group of messages - if there is more than one queue listener.
Each listener/processor of the queue storing all 9 messages gets a lock on all the messages that share the same session id and then processes each message one at a time until the session is complete (usually when there are no more messages left in the queue with that session id unless you turn off AutoComplete and decide to manually close the session whenever you deem it necessary).
Hopefully that makes sense.

So I am trying to understand Azure service bus Session ID for creating FIFO in my queue.
Assuming you've gone through the documentation on Message Sessions and haven't skipped the linked sample for Microsoft.Azure.ServiceBus and WindowsAzure.ServiceBus, you'll notice that the latter sample has an extensive explanation on how sessions operate.
You don't "create" a FIFO queue, you just use it with sessions and that's how you achieve what you need. Sessions have their use cases. One of them is your scenario, where you have one indefinite session with a single session ID to keep your messages in orde.
Note: be aware of the limitations (no parallel processing which will affect your throughput).
how do I achieve this in C# I cant really find something that explains this concept in a straight forward manner.
The older client sample provides an answer to your implementation related question with very solid breakdown and explanations (WindowsAzure.ServiceBus).

Amazon SQS with C# and SQL Server

I have been requested to use Amazon SQS in our new system. Our business depends on having some tasks/requests from the clients to our support agents, and once the client submit his task/request, it should be queued in my SQL Server database, and all queued tasks should be assigned to the non-busy agent because the flow says that the agent can process or handle one task/request at the meantime, so, If I have 10 tasks/requests came to my system, all should be queued, then, the system should forward the task to the agent who is free now and once the agent solves the task, he should get the next one if any, otherwise, the system should wait for any agent until finishing his current task to assign a new one, and for sure, there should not be any duplication in tasks/requests handling ... and so on.
What do I need, now?
Simple reference which can clarify what is Amazon SQS as this is my first time to use queuing service?
How can I use the same with C# and SQL Server? I have read this topic but I still feel that there is something messing as I am not able to start. I am just aiming at the way which I can process the task in run-time and assign it to an agent, then close it and getting a new one as I explained above.

Asking us to design a system based on a paragraph of prose is a pretty tall order.
SQS is simply a cloud queue system. Based on your description, I'm not sure it would make your system any better.
First off, you are already storing everything in your database, so why do you need to store things in the queue as well? If you want to have queue semantics while storing stuff in your database you could consider SQL Server Service Broker (https://technet.microsoft.com/en-us/library/ms345108(v=sql.90).aspx#sqlsvcbr_topic2) which supports queues within SQL. Alternatively unless your scale is pretty high (100+ tasks/second maybe) you could just query the table for tasks which need to be picked up.
Secondly, it sounds like you might have a workflow around tasks that could extend to more than just a single queue for agents to pick them up. For example, do you have any follow up on the tasks (emailing clients to ask them how their service was, putting a task on hold until a client gets back to you, etc)? If so, you might want to look at Simple Workflow Service (https://aws.amazon.com/swf/) or since you are already on Microsoft's stack you can look at Windows Workflow (https://msdn.microsoft.com/en-us/library/ee342461.aspx)
BTW, SQS does not guarantee "only one" delivery by default, so if duplication is a big problem for you then you will either have to do your own deduplication or use FIFO queues (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html) which support deduplication, but are limited to 300 transactions/second (aka: roughly 100 messages/second accounting for the standard send -> receive -> delete APIs. Using batching obviously that number could be much higher, but considering your use case it doesn't sound like you would be able to use batching without a lot of work).

Generic object for passing data to Queue

I need to create a WebJob that handles some work that is not super time-sensitive. I use both DocumentDb and Azure Table Storage so I deal with de-normalized data that need to be handled by some backend process to keep consistent.
I have multiple uses for this WebJob so I'm trying to figure out the best way to pass data to it.
Which is the right approach for sending requests to my WebJob?
Create a generic object with properties that can store the data while the request is in the queue. In other words, some type of container that I use for transporting data through the queue.
Persist the data in some backend database and send a simple command via the queue?
One example I can think of is that I have a list of 15 entities that I need to update in my Azure Table Storage. This may take multiple read/write operations which will take time.
If I use approach #1, I'd "package" the list of 15 entities in an object and put it in a message in my queue. Depending on the situation, some messages may get a bit fat which concerns me.
If I use approach #2, I'd save the ID's of the entities in a table somewhere (SQL or Azure Table Storage) and I'd send some type of batch ID via a message. Then my WebJob would receive the batch Id, first retrieve the entities and process them. Though this approach seems like a better one, I'm afraid, this will be pretty slow.
Please keep in mind that the primary use of this particular WebJob is to speed up response times for end users in situations that require multiple backend operations. I'm trying to handle them in a WebJob so that what's time-sensitive gets processed right away and the other "not-so-time-sensitive" operations can be handled by the WebJob.
I want my solution to be both very robust and as fast as possible -- though the job is not highly time sensitive, I still want to process the backend job quickly.

using an Azure Service Bus Queue and BrokeredMessage.ScheduledEnqueueTimeUtc to renew subscriptions

I have a subscription model, and want to perform renew-related logic like issue new invoice, send emails, etc. For example, user would purchase the subscription today, and the renewal is in a year's time. I've been using an Azure Queue recently, and think it would apply for such a renewal.
Is it possible to use the Azure Queue by pushing messages using BrokeredMessage.ScheduledEnqueueTimeUtc (http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.scheduledenqueuetimeutc.aspx) for such long term scheduled messages?
I've used it for shorter-term, like sending notifications in 1 minute time and it works great.
This way, I can have even multiple processes listening to the queue, and be sure that only one process would perform the renewal logic. This would solve a lot of locking-related problems, as that is kind of built-in the Azure Queue via leasing and related features.

Yes, you can use it for long-term scheduling, scheduled messages have the same guaranties as normal ones. But there are few things you need to be aware of:
ScheduledEnqueueTimeUtc is a time when message is going to be available (within hundreds of miliseconds) on the queue but not necessary delivered, this depends on load and state of the queue. So it's fine for business processes but not for time sensitive (milliseconds) usage. Not a problem in your case, unless your subscription cancellation is really time sensitive.
It affects your storage quota ( Not really a problem with current quotas, but if you think about years this might be a problem)
As far as I'm aware you can't access scheduled messages before ScheduledEnqueueTimeUtc, they are invisible.
Extremely awesome source of informations on azure messaging
From technological perspective it's fine but in your case I would also think about other potential problems if you think about years:
Message versioning
What happens when you would like to change Azure to something else (AWS?)
What if you decide to change in next year Azure Service Bus for NServiceBus

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.