SenderApp is sending values to service bus queue and on the other end we have two receiver instances.
Requirement is that we only save values that have changed to db (all incoming values are first saved to redis cache where comparison happens).
SenderApp sends three values in following order to queue: (1, 2, 1).
-----1---2---1------------------------>
Now the values go in queue with FIFO method and on the other end of queue we have two instances of receiver application.
This is where it gets interesting.
Lets say due to latency or some other factor second receiver instance is slow to process the value(2) and ends up saving it to database last from all three values.
So it should be something like this:
Receiver instance #1
---------------------1---1------------>
Receiver instance #2
-----------------------------2-------->
Now we have a problem. Instance one is comparing the second sent value which is 1 against the first value which is also 1 and it doesn't get saved to database. Values that are sent to service bus queue have timestamps attached to them.
Also it needs to be fairly scalable solution.
All ideas are welcome, maybe levarage the redis cache, maybe service bus sessions?
Edit:
For more clarification:
Each incoming message has device id and value attached to it so we must not save any consecutive duplicate values for that specific device.
So for example we have two incoming messages from the same device.
First incoming message has a value 1 with device id 999 (we must save it).
But now the next incoming message also has value 1 and device id 999.
So that would be consecutive duplicate and we must not save it.
Also what makes this problem difficult is that we also can not save values directly on sender side.
Explanatory graph of the general flow below:
Competing consumers (receivers) will contradict the requirement to handle messages in the order they were sent. To adhere to the in-order processing, the Azure Service Bus message sessions feature could help. A single receiver only processes a session at any time; there's no processing by multiple receivers. This eliminates the chance of having messages from the same source processed in parallel and out of order. This is still a scalable approach as receivers can handle different sessions. If messages for a given session arrive after some time, the session processing would be picked up by any competing receivers.
In this scenario, where a uniquely identifiable device emits messages, the device ID could be used as a session ID.
Worth noting that each receiver can handle multiple sessions in parallel, not just a single session. That way, the solution can scale out and up, depending on the compute used and complexity of the code executed per message.
Related
My function is sending a payload to different sftp servers. Those servers are limited in how many connections they can accept.
I need a solution to throttle our connections to those servers.
The function is triggered by storage queues, and the first draft of the design is:
I then learned that you can only have 1 trigger per function, which led me to sandwhich another aggregating queue:
I can set the batchSize/newBatchThreshold on the originating queues, but I'm not certain this will work because the originating queues will not be aware of when to push messages to the aggregate queue.
I need the function to not scale out to more than N instances for all messages from queue X, since the sftp server X will not accept more than N connections.
Additionally, I need the function to scale out to no more than M instances for all messages from queue Y, since the sftp server Y will not accept more than M connections.
The total instances would be M + N for the above scenario.
How do we adjust our design in order to fit these requirements?
There's no 100% bullet-proof solution to this, the issue is tracked here.
The best way could be setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT to 1 in application settings of the Function App which is triggered by the aggregate queue. Then, you should only get one concurrent instance of the Function App, so the batchSize setting will actually be useful for rate limiting.
You don't need to limit queue processors X/Y/Z in this case, let the messages flow to the aggregate.
Now, I didn't understand if only messages from queue X touch SFPT X, or it's many-to-many. If it's one-on-one, it makes sense to get rid of the aggregate queue, have three Functions and limit the concurrency for each of the queues separately.
Anyway, the limit settings are as I suggested above.
In case this still doesn't satisfy your requirements, you may switch to another messaging service. For instance, send all messages of one type into a separate session of Service Bus or a single partition of Event Hub, which will naturally limit the concurrency on the broker level.
Option 1: Depend on the sftp's error response
Does the sftp server return a 429 (too many requests) response? Or something similar? When you get such a response, you can just exit from the function without deleting the message from the queue. The message will become visible again after 30 seconds, and would trigger a function. 30 seconds is the default value of visibilitytimeout and is customizable on a per-msg basis.
Option 2: Distributed locks
I don't know from top of my head a distributed locking solution with counters. An alternative would be to implement a locking solution on your own using a SQL db and atomic transactions. A function when processing a message from Queue X will look into the db to see if a lock counter for X is less than N, and increase it by 1 if so, and then process the message. In this case, you will have to make sure that the locks get released even if your function crashes. That is, implement locks with lease expiration time.
I'm using an Azure Storage Queue Service to track a queue of long running jobs which are en-queued by multiple disparate clients, but I now have a requirement to remove messages from the queue if any exist matching some given criteria.
I realise that this is somewhat anti-pattern for a queue, but the service does provide some functionality in addition to simple queueing (such as Delete Message and Peek Messages) so I thought I'd try to implement it.
The solution I've come up with works, but is not very elegant and is quite inefficient - and I'm wondering if it can be done better - or if I should bin the whole approach and use a mechanism which supports this requirement by design (which would require a fair amount of work across different systems). Here is the simplified code:
var queue = MethodThatGetsTheAppropriateQueueReference();
await queue.FetchAttributesAsync(); //populates the current queue length
if (queue.ApproximateMessageCount.HasValue)
{
// Get all messages and find any messages to be removed.
// this makes those messages unavailable for other clients
// for the visibilityTimeOut period.
// I've set this to the minimum of 1 second - not ideal though
var messages = await queue.GetMessagesAsync(queue.ApproximateMessageCount.Value);
var messagesToDelete = messages.Where(x => x.AsString.Contains(someGuid));
// Delete applicable messages
messagesToDelete.ToList().ForEach(x => queue.DeleteMessageAsync(x));
}
Note originally I tried using PeekMessagesAsync() to avoid affecting messages which do not need to be deleted, but this does not give you a PopReceipt which is required by DeleteMessageAsync().
The questions:
Is there a way to do this without pulling ALL of the messages down? (there could be quite a few)
If 1 isnt possible, is there a way to get the PopReceipt for a message if we use PeekMessagesAsync()?
Is there a way to do this without pulling ALL of the messages down?
(there could be quite a few)
Unfortunately no. You have to Get messages (a maximum of 32 at a time) and analyze the contents of the messages to determine if the message should be deleted.
If 1 isnt possible, is there a way to get the PopReceipt for a message
if we use PeekMessagesAsync()?
Again, No. In order to get PopReceipt, a message must be dequeued which is only possible via GetMessagesAsync(). PeekMessagesAsync() simply returns the message without altering its visibility.
Possible Solution
You may want to look into Service Bus Topics and Subscriptions for this kind of functionality.
What you could do is create a topic where all messages will be sent.
Then you would create 2 subscriptions: In one subscription you will set a rule which checks for message contents for the matching value and in other subscription you will set a rule which checks for message contents for not matching value.
What Azure Service Bus will do is check each message that arrives against the rules and accordingly pushes the message in appropriate subscription. This way you will have a nice separation of messages that should/shouldn't be deleted.
I want to know if there is any elegant way to ensure that Queue always have distinct messages (nothing related to Duplicate Detection Window or any time period for that matter) ?
I know that Service Bus Queue provides session concepts (as I mentioned Duplicate Detection of Service Bus Queue won't help me as it depends on time period), which can serve my purpose, but I don't want my component's dependency on another Azure service, just because of this feature.
Thanks,
This is not possible to do reliably.
There is just no mechanism that can query a Storage queue and find out if a message with the same contents is already there or was there before. You can try to implement your own logic using some storage table, but that will not be reliable - as the entry into the table may succeed and then entry into the queue may fail - and now you would potentially have bad data in the table.
Your code should always assume that it can retrieve a message containing the same data that was already processed. This is because messages can come back to the queue when workers that are working on them crash or take too long.
You can use Service Bus. Is like Azure Storage Queue but it allows messages of 256Kb-1MB and makes duplicate detection
I'm currently working on an application that communicates with an electronic device via the SerialPort.
This communication is done in a half-duplex fashion, where the application is the master and the device is the slave. The master needs to send a message to the device, and the device needs to respond before the next message is sent.
If the message doesn't receive a response, the application needs to resend it.
The content of the next message is dependent on the result of the current message. i.e. each new message has an incremented sequence number, and sometimes data for the next message is taken from the reponse of the current one.
To send messages I use an interface to System.IO.Ports.SerialPort. When messages are received a SerialDataReceivedEventHandler is fired.
What's the best way for me to manage this? Is there a pattern that I can base this on?
I have worked on something similar in the past. This is the basic structure of the messaging I have used:
Application: Send Command Seq #1 -->
<-- Device: Acknowledge command, seq #1 (including any command-specific response)
If the device didn't acknowledge within 1 second, the same command would be resent, same sequence number. This "retry" sequence would happen 3 times and then it would time out, and retire the command.
The sequence number from the application side would increment by one for every command it sent that was not a retry. It would loop back to sequence number 1 after hitting 99.
The application will have to keep track of its state based on what command(s) is "in-flight" between it and the device, and what kind of response it has received. It can identify the response to a specific command by the sequence number the device puts in its acknowledgement.
To keep it simple to transition from state to state you can make it so that there is only ever one active command and don't move on until that one has been ack'ed or timed out and retired.
Shane Wealti approach is right but I think what you might be really asking yourself, should you use threads? Even though it is a master/slave scenario, should you have a listening thread? When do you listen to the port and when do you send?
The simplest approach - no threads
You don't need to use threads in this scenario due to master/slave configuration, all you have is two functions.
SendCommand(char * bfr)
{
}
RecieveCommand(char * bfr)
{
}
SendCommand( txBfr );
RecieveData( rxBfr );
// process receive buffer, prepare new command
SendCommand( txbfr );
RecieveCommand( rxBfr );
// and so on
The approach is the most simple one and totally functional. However since there are no threads and say your RecieveData() times out in one second, you GUI will not be responsive in that second. Note that you are not listening to the port all the time but only when are expecting a reply.
I might edit this to add Comprehensive approach using threads later but don't have the time right now.
In my scenario, messages are coming in a predefined order. Once it enters our system, multiple receivers picks up the message from the incoming queue and processes it. Once the processing is done, the processed messages should be sent out in the same order as they had arrived. In a scaled up system, how do we ensure this?
The system requires high processing speed and throughput. In .net world, which queue would be ideal for this scenario?
It's a fundamental problem around ordered delivery - somewhere in your system you need to throttle everything down to a single thread. This is unavoidable.
Where you choose to do this can make a large difference to throughput. You could choose to make your entire message processor single-threaded. This would ensure order was maintained but at a cost of low throughput.
However, there is a way you can still process messages concurrently, but then you need to somehow assemble them in the correct order again. There is an integration design pattern called Resequencer - http://eaipatterns.com/Resequencer.html.
However the resequencer pattern this relies on you being able to stamp each message with a time-stamp or sequence number on the way into your system if there is nothing already in your messages to indicate ordering.
Additionally, is ordered delivery a requirement across the entire message set coming in from the queue? For example, it may be that only some of your messages have a need to be delivered in order.
Or it could be that you can group your messages into "sets" under a correlating identifier - within each set order needs to be maintained but you can still have concurrent processing on a "per-set" basis.