Maintain order of message in outgoing queue - .Net - c#

In my scenario, messages are coming in a predefined order. Once it enters our system, multiple receivers picks up the message from the incoming queue and processes it. Once the processing is done, the processed messages should be sent out in the same order as they had arrived. In a scaled up system, how do we ensure this?
The system requires high processing speed and throughput. In .net world, which queue would be ideal for this scenario?

It's a fundamental problem around ordered delivery - somewhere in your system you need to throttle everything down to a single thread. This is unavoidable.
Where you choose to do this can make a large difference to throughput. You could choose to make your entire message processor single-threaded. This would ensure order was maintained but at a cost of low throughput.
However, there is a way you can still process messages concurrently, but then you need to somehow assemble them in the correct order again. There is an integration design pattern called Resequencer - http://eaipatterns.com/Resequencer.html.
However the resequencer pattern this relies on you being able to stamp each message with a time-stamp or sequence number on the way into your system if there is nothing already in your messages to indicate ordering.
Additionally, is ordered delivery a requirement across the entire message set coming in from the queue? For example, it may be that only some of your messages have a need to be delivered in order.
Or it could be that you can group your messages into "sets" under a correlating identifier - within each set order needs to be maintained but you can still have concurrent processing on a "per-set" basis.

Related

Maintaining order of processing with multiple instances

SenderApp is sending values to service bus queue and on the other end we have two receiver instances.
Requirement is that we only save values that have changed to db (all incoming values are first saved to redis cache where comparison happens).
SenderApp sends three values in following order to queue: (1, 2, 1).
-----1---2---1------------------------>
Now the values go in queue with FIFO method and on the other end of queue we have two instances of receiver application.
This is where it gets interesting.
Lets say due to latency or some other factor second receiver instance is slow to process the value(2) and ends up saving it to database last from all three values.
So it should be something like this:
Receiver instance #1
---------------------1---1------------>
Receiver instance #2
-----------------------------2-------->
Now we have a problem. Instance one is comparing the second sent value which is 1 against the first value which is also 1 and it doesn't get saved to database. Values that are sent to service bus queue have timestamps attached to them.
Also it needs to be fairly scalable solution.
All ideas are welcome, maybe levarage the redis cache, maybe service bus sessions?
Edit:
For more clarification:
Each incoming message has device id and value attached to it so we must not save any consecutive duplicate values for that specific device.
So for example we have two incoming messages from the same device.
First incoming message has a value 1 with device id 999 (we must save it).
But now the next incoming message also has value 1 and device id 999.
So that would be consecutive duplicate and we must not save it.
Also what makes this problem difficult is that we also can not save values directly on sender side.
Explanatory graph of the general flow below:
Competing consumers (receivers) will contradict the requirement to handle messages in the order they were sent. To adhere to the in-order processing, the Azure Service Bus message sessions feature could help. A single receiver only processes a session at any time; there's no processing by multiple receivers. This eliminates the chance of having messages from the same source processed in parallel and out of order. This is still a scalable approach as receivers can handle different sessions. If messages for a given session arrive after some time, the session processing would be picked up by any competing receivers.
In this scenario, where a uniquely identifiable device emits messages, the device ID could be used as a session ID.
Worth noting that each receiver can handle multiple sessions in parallel, not just a single session. That way, the solution can scale out and up, depending on the compute used and complexity of the code executed per message.

Deleting a message from Azure Queue Service by value

I'm using an Azure Storage Queue Service to track a queue of long running jobs which are en-queued by multiple disparate clients, but I now have a requirement to remove messages from the queue if any exist matching some given criteria.
I realise that this is somewhat anti-pattern for a queue, but the service does provide some functionality in addition to simple queueing (such as Delete Message and Peek Messages) so I thought I'd try to implement it.
The solution I've come up with works, but is not very elegant and is quite inefficient - and I'm wondering if it can be done better - or if I should bin the whole approach and use a mechanism which supports this requirement by design (which would require a fair amount of work across different systems). Here is the simplified code:
var queue = MethodThatGetsTheAppropriateQueueReference();
await queue.FetchAttributesAsync(); //populates the current queue length
if (queue.ApproximateMessageCount.HasValue)
{
// Get all messages and find any messages to be removed.
// this makes those messages unavailable for other clients
// for the visibilityTimeOut period.
// I've set this to the minimum of 1 second - not ideal though
var messages = await queue.GetMessagesAsync(queue.ApproximateMessageCount.Value);
var messagesToDelete = messages.Where(x => x.AsString.Contains(someGuid));
// Delete applicable messages
messagesToDelete.ToList().ForEach(x => queue.DeleteMessageAsync(x));
}
Note originally I tried using PeekMessagesAsync() to avoid affecting messages which do not need to be deleted, but this does not give you a PopReceipt which is required by DeleteMessageAsync().
The questions:
Is there a way to do this without pulling ALL of the messages down? (there could be quite a few)
If 1 isnt possible, is there a way to get the PopReceipt for a message if we use PeekMessagesAsync()?
Is there a way to do this without pulling ALL of the messages down?
(there could be quite a few)
Unfortunately no. You have to Get messages (a maximum of 32 at a time) and analyze the contents of the messages to determine if the message should be deleted.
If 1 isnt possible, is there a way to get the PopReceipt for a message
if we use PeekMessagesAsync()?
Again, No. In order to get PopReceipt, a message must be dequeued which is only possible via GetMessagesAsync(). PeekMessagesAsync() simply returns the message without altering its visibility.
Possible Solution
You may want to look into Service Bus Topics and Subscriptions for this kind of functionality.
What you could do is create a topic where all messages will be sent.
Then you would create 2 subscriptions: In one subscription you will set a rule which checks for message contents for the matching value and in other subscription you will set a rule which checks for message contents for not matching value.
What Azure Service Bus will do is check each message that arrives against the rules and accordingly pushes the message in appropriate subscription. This way you will have a nice separation of messages that should/shouldn't be deleted.

when MS DTC is not used in a message queue transaction

I'm creating a design to process a large number of jobs using MSMQ to scale out. Each job is processed and the database is updated for that job Id saying it is complete. If error, it should go back to the queue. So, I need a transactional MSMQ. Now, I could process the job and updating the db record might fail for whatever reason. In this case also, I would need the job back in the queue so it can be re-attempted and saved back to db with success. This means I will need to enable MS DTC to manage transaction across the database server.
I am reading Pro MSMQ book and it mentions "For most applications, transactions are not needed at all. There is a tendency to overuse transactions and affect the performance of the entire system unnecessarily. Before deciding to use transactions, analyze the ACID properties requirement of the entire system and the performance impact.".
I'm failing to understand which cases would not involve a database update. Wouldn't any record that is picked up from a queue needs to be processed and updated somewhere? I would think about 90% of the systems will need this type of functionality. I get it if it was a middle layer queue that passes it to another queue or something like that, but for processing the last record in the pipeline, isn't MS DTC always required?
Thoughts?
Edit:- the full text:
Transactional messaging offers a lot of benefits, such as message
integrity and message order, over nontransactional messaging, but the
performance price you pay for using transactional messaging is huge.
Use internal transactions only if it is absolutely essential to
maintain the order of messages in the queue. External transactions
offer the benefit of propagating the transaction context across
multiple resource managers. Such transactions are useful when there is
a large-scale distributed system with multiple databases and message
queues, and the transactional integrity between the messages
exchanged between these resource managers is critical. The overhead
incurred while using external transactions is significantly more
than the one incurred by internal Message Queuing transactions. For
most applications, transactions are not needed at all. There is a
tendency to overuse transactions and affect the performance of the
entire system unnecessarily. Before deciding to use transactions,
analyze the ACID properties requirement of the entire system and the
resulting performance impact.
As the book says, the main problem with enlisting in these transactions is that you are sacrificing performance at high volumes by locking multiple resources. There are alternative ways for you to achieve your goals without losing consistency. There is always a trade-off between consistency, availability and partition tolerance (read about the CAP theorem) and you need to decide which attributes of the system are needed to successfully meet the business requirements.
To tackle your problem without a transaction, instead of popping the message off the queue, you can Peek the message, and if your processing succeeds you pop the message and discard it. If the processing fails you move the message to an error queue. Error queues can be automatically retried (it could be a transitory issue). The error queue(s) should be monitored actively to ensure your system is processing correctly, ie. you need alerts to fire if the error queue is over a threshold or increasing at a rate.
Note that this approach won't work for a single queue with multiple processors as you have commented. This will work where you partition your data and messages and bind a processor to a queue.
E.g. I'm doing central sales processing for a chain of retailers. I might say that I process us east coast retailers on queue QA and west coast on queue QB. I have a processor PA bound queue QA (can be muliple exe's or threads in a single exe) and processor PB bound to QB. This way messages are processed in order for a distinct entity in the system.
The key is to pick the right data partitioning scheme so that work is spread evenly.

ActiveMQ: Publish Messages in bulk, persistent, but not async?

is it possible to store a large amount of messages in bulk?
I want to send them sync, persistent, but to get speed very much at one time.
I am using NMS, the .net version of the java-framework. But if you only know how to do this in java, it would even help. Maybe I can find a solution for .net more easier.
I thought of things like transactions. But I only got transactions to work for consumers, not for producers.
Conventional wisdom used to suggest that if you wanted maximum throughput when sending in bulk, then you should a SESSION_TRANSACTED acknowledgement mode and batch all of the message sends together with a .commit().
Unfortunately, here's a benchmark showing this not to be the case http://www.jakubkorab.net/2011/09/batching-jms-messages-for-performance-not-so-fast.html and that are you better off just sending them as normal without transactions. If you are already using transactions, then it may make sense to try and batch them.
My advice here also is that unless you are dealing with messages that are extremely time sensitive, the rate at which you produce isn't going to be that big of a deal - you should be more concerned with bandwidth as opposed to speed of message sends. If you don't mind your messages being out of order you can have multiple producers produce these messages to a given destination... or if you need them in order use multiple producers and then a resequencer after they are in the broker.

MSMQ with dynamic priorities

I'm doing a project with some timing constraints right now. Setup is: A web service accepts (tiny) xml files and I have to process these, fast.
First and most naive idea was to handle this processing in the request dispatcher itself, but that didn't scale and was doomed from the start.
So now I'm looking at a varying load of incoming requests that each produce ~ 50 jobs on my side. Technologies available for use are limited due to the customers' rules. If it's not Sql Server or MS MQ it probably won't fly.
I thought about going down the MS MQ route (Web service just submitting messages, multiple consumer processes lateron) and small proof of concept modules worked like a charm.
There's one problem though: The priority of these jobs might change a lot, in the queue. The system is fairly time critical, so if we - for whatever reasons - cannot process incoming jobs in a timely fashion, we need to prefer the latest ones.
Basically the usecase changes from reliable messaging in general to LIFO under (too) heavy load. Old entries still have to be processed, but just lost all of their priority.
Is there any manageable way to build something like this in MS MQ?
Expanding the business side, as requested:
The processing of the incoming job is bound to some tracks, where physical goods are moved around. If I cannot process the messages in time, the things are "gone".
I still want the results for statistical purpose, but really need to focus on the newer messages now.
Think of me being able to influence mechanical things and reroute things moving on a track - if they didn't move past point X yet..
So, if i understand this, you want to be able to switch between sorting the queue by priority OR by arrival time, depending on the situation. MSMQ can only sort the queue by priority AND by arrival time.
Although I understand what you are trying to do, I don't quite see the business justification for it. Can you expand on this?
I would propose using a service to move messages from the incoming queue to a number of work queues for processing. Under normal load, there would be a several queues, each with a monitoring thread.
Under heavy load, new traffic would all go to just one "panic" queue under the load dropped. The threads on the other work queues could be paused if necessary.
CheersJohn Breakwell

Categories