I'm using an Azure Storage Queue Service to track a queue of long running jobs which are en-queued by multiple disparate clients, but I now have a requirement to remove messages from the queue if any exist matching some given criteria.
I realise that this is somewhat anti-pattern for a queue, but the service does provide some functionality in addition to simple queueing (such as Delete Message and Peek Messages) so I thought I'd try to implement it.
The solution I've come up with works, but is not very elegant and is quite inefficient - and I'm wondering if it can be done better - or if I should bin the whole approach and use a mechanism which supports this requirement by design (which would require a fair amount of work across different systems). Here is the simplified code:
var queue = MethodThatGetsTheAppropriateQueueReference();
await queue.FetchAttributesAsync(); //populates the current queue length
if (queue.ApproximateMessageCount.HasValue)
{
// Get all messages and find any messages to be removed.
// this makes those messages unavailable for other clients
// for the visibilityTimeOut period.
// I've set this to the minimum of 1 second - not ideal though
var messages = await queue.GetMessagesAsync(queue.ApproximateMessageCount.Value);
var messagesToDelete = messages.Where(x => x.AsString.Contains(someGuid));
// Delete applicable messages
messagesToDelete.ToList().ForEach(x => queue.DeleteMessageAsync(x));
}
Note originally I tried using PeekMessagesAsync() to avoid affecting messages which do not need to be deleted, but this does not give you a PopReceipt which is required by DeleteMessageAsync().
The questions:
Is there a way to do this without pulling ALL of the messages down? (there could be quite a few)
If 1 isnt possible, is there a way to get the PopReceipt for a message if we use PeekMessagesAsync()?
Is there a way to do this without pulling ALL of the messages down?
(there could be quite a few)
Unfortunately no. You have to Get messages (a maximum of 32 at a time) and analyze the contents of the messages to determine if the message should be deleted.
If 1 isnt possible, is there a way to get the PopReceipt for a message
if we use PeekMessagesAsync()?
Again, No. In order to get PopReceipt, a message must be dequeued which is only possible via GetMessagesAsync(). PeekMessagesAsync() simply returns the message without altering its visibility.
Possible Solution
You may want to look into Service Bus Topics and Subscriptions for this kind of functionality.
What you could do is create a topic where all messages will be sent.
Then you would create 2 subscriptions: In one subscription you will set a rule which checks for message contents for the matching value and in other subscription you will set a rule which checks for message contents for not matching value.
What Azure Service Bus will do is check each message that arrives against the rules and accordingly pushes the message in appropriate subscription. This way you will have a nice separation of messages that should/shouldn't be deleted.
Related
I am developing a service in a multiservice architecture using RabbitMQ and the MassTransit library.
The service receives transactions via Consumer. In accordance with the filtering rules (which are set in the configuration json file and import to service via Options), the address where the information of transaction needs to be sent is determined and item published to a separate queue for future sending.
In the Consumer of Queue for sending, I just send data to the address that was specified for this transaction.
Now there is a need to send data in batches. And here the MassTransit functionality with Batch Consumer could help.
But there are difficulties of dispatching. For example, Consumer receive 4 transactions. 2 of them need to be sent to one address, 2 others to another. In the code, I make two arrays with transactions for each address and try to send. If both arrays were sent successfully, then everything is fine. If both arrays receive an error, the entire Batch goes to retry, which is also good. But if one of the arrays is sent successfully and the other is not, then the entire Batch goes to repeat.
The actual question is, is it possible to create two separate queues for one entity (uses one interface) and send data to each of them separately according of rules? Or is there another way to solve this problem that would divide transactions into Batches according to the sending address?
is it possible to create two separate queues for one entity
I would ask that you try to simplify this process. If the architecture is so confusing that it takes readers 30 mins to under the question, it's too complex. Think about supporting this code in 12 months time.
However, an option is to use a Batch that send to a Batch
The first Batch reads a custom Message Header (say __filterby) to split the message into two different queues (endpoints).
The code then re-batch to a dedicated endpoint/consumer based on the logic. This means one endpoint/queue. Here is some pseudo code to explain what I mean.
public async Task Consume(ConsumeContext<Batch<OrderAudit>> context)
{
var arraya = Contect.Messages(m => m?.Headers?.filterby == 'arraya';
ConsumeContext<IArrayA> a = arraya;
// Send
var arrayb = Contect.Messages(m => m?.Headers?.filterby == 'arrayb';
ConsumeContext<IArrayB> b = arrayb;
// send
}
Also, this feels is close to having a RabbitMQ Exchange direct traffic to multiple queues based on a Topic/routing_key. You could re-work the solution to fix this pattern.
References that might help
https://masstransit-project.com/troubleshooting/common-gotchas.html#sharing-a-queue
https://masstransit-project.com/usage/producers.html#message-initializers
https://www.rabbitmq.com/tutorials/tutorial-five-python.html
We are running multiple instances of a windows service that reads messages from a Topic, runs a report, then converts the results into a PDF and emails them to a user. In case of exceptions we simply log the exception and move on.
The use case we want to handle is when the service is shut down we want to preserve the jobs that are currently running so they can be reprocessed by another instance of the service or when the service is restarted.
Is there a way of requeueing a message? The hacky solution would be to just republish the message from the consuming service, but there must be another way.
When incoming messages are processed, their data is put in an internal queue structure (not a message queue) and processed in batches of parallel threads, so the IbmMq transaction stuff seems hard to implement. Is that what I should be using though?
Your requirement seems to be hard to implement if you don't get rid of the "internal queue structure (not a message queue)" if this is not based on a transaction oriented middleware. The MQ queue / topic works well for multi-threaded consumers, so it is not apparent what you gain from this intermediate step of moving the data to just another queue. If you start your transaction with consuming the message from MQ, you can have it being rolled back when something goes wrong.
If I understood your use case correctly, you can use Durable subscriptions:
Durable subscriptions continue to exist when a subscribing application's connection to the queue manager is closed.
The details are explained in DEFINE SUB (create a durable subscription). Example:
DEFINE QLOCAL(THE.REPORTING.QUEUE) REPLACE DEFPSIST(YES)
DEFINE TOPIC(THE.REPORTING.TOPIC) REPLACE +
TOPICSTR('/Path/To/My/Interesting/Thing') DEFPSIST(YES) DURSUB(YES)
DEFINE SUB(THE.REPORTING.SUB) REPLACE +
TOPICOBJ(THE.REPORTING.TOPIC) DEST(THE.REPORTING.QUEUE)
Your service instances can consume now from THE.REPORTING.QUEUE.
While I readily admit that my knowledge is shaky, from what I understood from IBM’s [sketchy, inadequate, obtuse] documentation there really is no good built in solution. With transactions the Queue Manager assumes all is well unless it receives a roll back request and when it does it rolls back to a syncpoint, so if you’re trying to roll back to one message but two other messages have completed in the meantime it will roll back all three.
We ended up coding our own solution updating the way we’re logging messages and marking them as completed in the DB. Then on both startup and shutdown we find the uncompleted messages and programmatically publish them back to the queue, limiting the DB search by machine name so if we have multiple instances of the service running they won’t duplicate message processing.
SenderApp is sending values to service bus queue and on the other end we have two receiver instances.
Requirement is that we only save values that have changed to db (all incoming values are first saved to redis cache where comparison happens).
SenderApp sends three values in following order to queue: (1, 2, 1).
-----1---2---1------------------------>
Now the values go in queue with FIFO method and on the other end of queue we have two instances of receiver application.
This is where it gets interesting.
Lets say due to latency or some other factor second receiver instance is slow to process the value(2) and ends up saving it to database last from all three values.
So it should be something like this:
Receiver instance #1
---------------------1---1------------>
Receiver instance #2
-----------------------------2-------->
Now we have a problem. Instance one is comparing the second sent value which is 1 against the first value which is also 1 and it doesn't get saved to database. Values that are sent to service bus queue have timestamps attached to them.
Also it needs to be fairly scalable solution.
All ideas are welcome, maybe levarage the redis cache, maybe service bus sessions?
Edit:
For more clarification:
Each incoming message has device id and value attached to it so we must not save any consecutive duplicate values for that specific device.
So for example we have two incoming messages from the same device.
First incoming message has a value 1 with device id 999 (we must save it).
But now the next incoming message also has value 1 and device id 999.
So that would be consecutive duplicate and we must not save it.
Also what makes this problem difficult is that we also can not save values directly on sender side.
Explanatory graph of the general flow below:
Competing consumers (receivers) will contradict the requirement to handle messages in the order they were sent. To adhere to the in-order processing, the Azure Service Bus message sessions feature could help. A single receiver only processes a session at any time; there's no processing by multiple receivers. This eliminates the chance of having messages from the same source processed in parallel and out of order. This is still a scalable approach as receivers can handle different sessions. If messages for a given session arrive after some time, the session processing would be picked up by any competing receivers.
In this scenario, where a uniquely identifiable device emits messages, the device ID could be used as a session ID.
Worth noting that each receiver can handle multiple sessions in parallel, not just a single session. That way, the solution can scale out and up, depending on the compute used and complexity of the code executed per message.
Here is what I am trying to do:
Dequeue a message
Do an action with the message
If the action fails, put the message back in the queue
If the action succeeds, acknowledge the message
My problem right now is that, if the action fails, the message isn't re-queued, but stays unacknowledged. If I go in RabbitMQ web configuration interface, I see that the messages are flagged as unacknowledged, even though the basic.Nack has been stepped over.
var delivery = subscription.Next();
var messageBody = delivery.Body;
try
{
action.Invoke(messageBody);
subscription.Ack(delivery);
}
catch (Exception ex)
{
subscription.Model.BasicNack(delivery.DeliveryTag, false, true);
throw ex;
}
Update:
So I've noticed that Messages go from Ready to Unacknowledged really fast. A rate way faster then I'm actually calling subscriber.Next(), as if the the .Net client caches all the messages in memory (the memory foot print of my app is actually growing quite fast), and processes those messages from memory and sends the Ack() afterwards, unflagging the message from Unacknowledged.
Update 2:
Seems like the queue being emptied really fast was because I hadn't set BasicQos on my Model. The following fixed everything. Basic.Nack() still doesn't seem to work tho:
Model.BasicQos(0, 1, false)
I suspect you're using:
channel.BasicConsume(your_queue_name, false, consumer); to retrieve messages.
I ran several tests with a RabbitMQ 3.2.4 server and client. I was unable to get either channel.BasickAck(...) or channel.BasicNack(...) to work as expected.
That said, I was able to get the expected Ack | Nack behavior when I used:
BasicGetResult result = channel.BasicGet(your_queue_name, false);
So you may want to consider a different retrieval method to get messages. I realize that the Consume & Dequeue are the "preferred" methods but they weren't working in my case. I wanted fair, one-at-a-time dispatch with acknowledgments. Using BasicGet was the only way I could achieve that.
The downside to that approach is you'll possibly lose the client side event iterator you're using with subscription.Next().
If I had to venture a guess, I think that something about the local Queue collection is messing up the channel's ability to provide an acknowledgement. And it's worth pointing out that creating the consumer with new QueueingBasicConsumer(channel); triggers a call to pre-fetch events from the server's queue. The consumer's Queue is just a SharedQueue<RabbitMQ.Client.Events.BasicDeliverEventArgs> and SharedQueue is just an extension of IEnumerable.
Also keep in mind that the same channel that pulls the message needs to provide the Ack | Nack. You cannot Ack | Nack a message from a different channel. Or at least I haven't figured out how to do so, nor have others. That's a problem if you wrap your RabbitMQ objects within using statements (so you don't leave network resources laying around) and you have long process to run before you can safely acknowledge.
This SO Answer lays out a decent workflow to get around the likely reality that your pulling channel is not going to be the channel that sends the Ack | Nack. The trick is setting a TTL and not bothering with sending a Nack - just let the new message expire and requeue automatically.
I want to know if there is any elegant way to ensure that Queue always have distinct messages (nothing related to Duplicate Detection Window or any time period for that matter) ?
I know that Service Bus Queue provides session concepts (as I mentioned Duplicate Detection of Service Bus Queue won't help me as it depends on time period), which can serve my purpose, but I don't want my component's dependency on another Azure service, just because of this feature.
Thanks,
This is not possible to do reliably.
There is just no mechanism that can query a Storage queue and find out if a message with the same contents is already there or was there before. You can try to implement your own logic using some storage table, but that will not be reliable - as the entry into the table may succeed and then entry into the queue may fail - and now you would potentially have bad data in the table.
Your code should always assume that it can retrieve a message containing the same data that was already processed. This is because messages can come back to the queue when workers that are working on them crash or take too long.
You can use Service Bus. Is like Azure Storage Queue but it allows messages of 256Kb-1MB and makes duplicate detection