How Peek works in a Partition enabled Service Bus Queue?

How Peek works in a Partition enabled Service Bus Queue? - c#

I understand from the Microsoft docs that during the first Peek() operation, any one of the available message brokers respond and send their oldest message. Then on subsequent Peek() operation, we can traverse across the partitions to peek every message with increased sequence number.
My question is, during the very first Peek() operation, I will get a message from any of the first responded partitions. Is there a guarantee that I can peek all the messages from the queue?
In a much simpler way, there are three Partitions:
Partition "A" has 10 messages with sequence number from 1 to 10.
Partition "B" has 10 messages with sequence number from 11 to 20.
Partition "C" has 10 messages with sequence number from 21 to 30.
Now if i perform Peek() operation, if Partition "B" responds first, the first message that I'll get is a message with sequence number 11. Next peek operation will look for a message with incremented sequence number. Won't I miss out messages from Partition "A" which has sequence numbers 1-10 which peek operation can never reach since it always searches for the incremented sequence number?
UPDATE
QueueClient queueClient = messagingFactory.CreateQueueClient("QueueName", ReceiveMode.PeekLock);
BrokeredMessage message = null;
while (iteration < messageCount)
{
message = queueClient.Peek(); // According to docs, Peeks the oldest message from any responding broker, and next iterations peek the message with incremented sequence number
if (message == null)
break;
Console.WriteLine(message.SequenceNumber);
iteration++;
}
Is there a guarantee that I can browse all the messages of a partitioned queue using the snippet above?

There is no guarantee that the returned message is the oldest one across all partitions.
It therefore depends which message broker responds first, and the oldest message from that partition will be shown. There is no general rule as to which partition will respond first in your example, but it is guaranteed that the oldest message from that partition is displayed first.
If you want to retrieve the messages by sequence number, use the overloaded Peek method Peek(Sequencenumber), see: https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-browsing

For partitioned entities, the sequence number is issued relative to the partition.
[...]
The SequenceNumber value is a unique 64-bit integer assigned to a message as it is accepted and stored by the broker and functions as its internal identifier. For partitioned entities, the topmost 16 bits reflect the partition identifier.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-sequencing)
So you cannot compare sequence numbers across partitions to see which one is older.
As an example, I just created a partitioned queue and put a couple of messages into two partitions (in Order):
1. Partition 1, SequenceNumber 61924494876344321
2. Partition 2, SequenceNumber 28991922601197569
3. Partition 1, SequenceNumber 61924494876344322
4. Partition 1, SequenceNumber 61924494876344323
5. Partition 2, SequenceNumber 28991922601197570
Browse/Peek messages: Available only in the older WindowsAzure.ServiceBus library. PeekBatch does not always return the number of messages specified in the MessageCount property. There are two common reasons for this behavior. One reason is that the aggregated size of the collection of messages exceeds the maximum size of 256 KB. Another reason is that if the queue or topic has the EnablePartitioning property set to true, a partition may not have enough messages to complete the requested number of messages. In general, if an application wants to receive a specific number of messages, it should call PeekBatch repeatedly until it gets that number of messages, or there are no more messages to peek.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-partitioning, Emphasis added)
As such, you should be able to repeatedly call Peek / PeekBatch to eventually get all the messages. At least, if you use the official SDKs.

Related

How to have a Consumer start from the last unconsumed message?

Using a handler passed to SetPartitionsAssignedHandler I need to be able to consume from the message immediately following the last message consumed by the group.
I have a consumer on a single partition topic. Because I need to be able to set a custom offset in certain circumstances, I have implemented a handler and passed it to SetPartitionsAssignedHandler. If the handler determines that a specific offset is needed, it figures out the offset and returns a TopicPartitionOffset with that value set in an Offset instance. This works as expected. What does not work is if no specific offset is expected. I've tried returning
* a TopicPartitionOffset with Offset.End - consumes from the next message posted to the topic
* a TopicPartitionOffset with Offset.Beginning - consumes from the beginning of the partition
* a TopicPartitionOffset with either Offset.Stored or Offset.Unset - consumes from the last message consumed, but always consumes that message again. I've checked that that's what's happening by looking at the offset of the first message consumed.
* nothing - the consumer never consumes any messages
I've searched around, including going through the code, but the TopicPartitionOffset information is passed into the librdkafka.dll and that determines what is done with the offset information, so I can't see why Stored and Unset both reconsume the last consumed message. I also can't see why not returning a TopicPartitionOffset results in the consumer never consuming anything.
The group ID is consistent, so it's not a case of the group ID changing that's causing a problem. And if the group ID was changing then using Offset.Stored or Offset.Unset would result in the partition being read from the beginning anyway.
I've found this same question asked with no answer at consumer consuming the same message twice at the starting only. I've also looked at How to make kafka consumer to read from last consumed offset but not from beginning, but setting the offset reset to earliest and the group ID does not result in the desired behaviour because giving a handler to SetPartitionAssignedHandler evidently overrides whatever the default behaviour would be. I didn't find any other questions that seemed relevant and so far no other relevant information has come up anywhere. I did also go through the existing issues listed on the Github repo before looking through the code to see if I could spot anything there.
The ConsumerConfig is constructed like this.
private ConsumerConfig GetConsumerConfig(Config.Consumer config)
{
ConsumerConfig consumerCfg = GetBaseConfiguruation();
if (!consumerCfg.Any())
{
return consumerCfg;
}
consumerCfg.BootstrapServers = "localhost:9092";
consumerCfg.GroupId = "TestConsumer";
consumerCfg.EnableAutoCommit = false;
consumerCfg.AutoOffsetReset = AutoOffsetReset.Earliest;
return consumerCfg;
}
private ConsumerConfig GetBaseConfiguruation()
{
Option<string> ipAddr = LocalIpAddress.GetLocalIpAddress();
return ipAddr.HasValue
? new ConsumerConfig()
{
ClientId = $"{ipAddr.ValueOrFailure()}",
AutoCommitIntervalMs = 1000,
SessionTimeoutMs = 30000,
StatisticsIntervalMs = 60000,
FetchMinBytes = 64 * 1024,
FetchWaitMaxMs = 200000,
MaxPartitionFetchBytes = 3 * 102400
}
: new ConsumerConfig();
}
The Group ID is in the app.config, so it's always the same unless the config is changed. I'm not changing it between executions of the app.
The consumer is constructed with this configuration.
private IConsumer<string, string> CreateConsumer(ConsumerConfig config)
{
return new ConsumerBuilder<string, string>(config)
.SetErrorHandler(OnConsumeError)
.SetStatisticsHandler(OnStatistics)
.SetLogHandler(OnLog)
.SetOffsetsCommittedHandler(OnOffsetCommit)
.SetPartitionsAssignedHandler(OnPartitionsAssigned)
.Build();
}
When the consumer connects it subscribes one or more configured topics, usually just one, by calling
IConsumer<T,U>.Subscribe(List<string>)
The exact configuration in the app.config isn't relevant here - it consists of the topic name, an optional offset and information on where the message goes for processing.
This is simplified code representing what construction of the TopicPartitionOffset when no specific offset is needed (and with AutoOffsetReset.Earliest hardcoded to force use of Offset.Stored).
private List<TopicPartitionOffset> OnPartitionsAssigned(
IConsumer<string, string> consumer,
List<TopicPartition> topicPartitions)
{
List<TopicPartitionOffset> offsetPartitions =
topicPartitions.Select(partition => GetPartitionOffset(AutoOffsetReset.Earliest, partition))
.ToList();
return offsetPartitions;
}
private static TopicPartitionOffset GetPartitionOffset(AutoOffsetReset offsetReset, TopicPartition partition)
{
return new TopicPartitionOffset(
partition.Topic,
partition.Partition,
AutoOffsetReset.Latest == offsetReset ? Offset.End : Offset.Stored);
}
The only significant difference when a specific offset is needed is that instead of Offset.End or Offset.Stored a numeric value is determined and used.
I expected that using Offset.Stored would result in consuming from the message after the last message consumed on the partition (by the group). But it always results in reconsuming the last message that was consumed.
Update:
After further investigation I tried getting the committed offsets for the partitions in the handler passed to SetPartitionAssignedHandler. Then, instead of using one of the Offset special values, I assign construct a TopicPartitionOffset with an Offset having a Value of 1 higher than that last committed. This works well, except for cases where multiple messages were received in a batch. It appears that Commit(IEnumerable) has a bug which results in only the first offset for a partition being committed. So if we receive 3 messages on a given partition, only the lowest offset for that is committed if I include multiple for the same partition.
So pseudo code that produces the desired result (when not using auto-commit):
Set partition offset to one greater than the last committed.
Read messages.
After processing a batch of messages, get the highest offset for each partition and commit that list.

Enqueue array[] with different elements

What I'm trying to do is:
private Queue<Array[]> queue = new Queue<Array[]>(10);
queue.Enqueue({ bufferarray, networkstream});
I'm not sure if this is even possible, if I use 2 queues with same parameters and always call them after each other will I always pull matching values?
Edit for clarification im trying to equeue the received bytes of a tcp stream and the tcp stream itself into one queue or in 2 different queues if I will dequeue matching values

If you have 2 queues with same parameters and always enqueue the values on both. You can pull them both and they will match with no problem because they function the same. You always get the oldest element of the queue
.

How could messages get out of sequence?

I have used QuickFix/.NET for a long time, but in last two days, the engine appears to have sent messages out of sequence twice.
Here is an example, the 3rd message is out of sequence:
20171117-14:44:34.627 : 8=FIX.4.4 9=70 35=0 34=6057 49=TRD 52=20171117-14:44:34.622 56=SS 10=208
20171117-14:44:34.635 : 8=FIX.4.4 9=0070 35=0 34=6876 49=SS 56=TRD 52=20171117-14:44:34.634 10=060
20171117-14:45:04.668 : 8=FIX.4.4 9=224 35=D 34=6059 49=TRD 52=20171117-14:45:04.668 56=SS 11=AGG-171117T095204000182 38=100000 40=D 44=112.402 54=2 55=USD/XXX 59=3 60=20171117-09:45:04.647 278=2cK-3ovrjdrk00X1j8h03+ 10=007
20171117-14:45:04.668 : 8=FIX.4.4 9=70 35=0 34=6058 49=TRD 52=20171117-14:45:04.642 56=SS 10=209
I understand that the QuickFix logger is not in a separate thread.
What could cause this to happen?

The message numbers are generated using GetNextSenderMsgSeqNum method in quickfix/n, which use locking.
public int GetNextSenderMsgSeqNum()
{
lock (sync_) { return this.MessageStore.GetNextSenderMsgSeqNum(); }
}
In my opinion, the messages are generated in sequence and your application is displaying in different order.
In some situations the sender and receiver are not in sync, where receiver expects different sequence number, the initiator sends the message to acceptor that different sequence number is expected.
In that case, sequence number to can be changed to expected sequence number using the method call to update sequence or goto store folder and open file with extension.seqnums and update the sequence numbers.
I hope this will help.

As the datetime is the exact same on both messages, this may be a problem of sorting. This is common across any sorted list where the index is identical on two different items. If this were within your own code I would suggest that to resolve it, you include an extra element as part of the key, such a sequence number

Multiple messages sent by QuickFix with identical timestamps may be sent out of sequence.
A previous answer on StackOverflow suggested re-ordering them on the receiving side, but was not accepted:
QuickFix - messages out of sequence
If you decide to limit yourself to one message per millisecond, say with a sleep() command in between sends, be sure to increase your processes' scheduling priority: https://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx
You normally get a very long sleep even though you asked for only one millisecond, but I've gotten roughly 1-2 ms with ABOVE_NORMAL_PRIORITY_CLASS. (Windows 10)
You might try to disable Nagle's algorithm, which aggregates multiple TCP messages together and sends them at once. Nagle in and of itself can't cause messages to be sent out of order, but QuickFix may be manually buffering the messages in some weird way. Try telling QuickFix to send them immediately with SocketNodelay: http://quickfixn.org/tutorial/configuration.html

Azure Storage Queues - Counting visible messages

I have a distributed application which share loads with Azure Storage queues. In order to verify that everything is working good, I'm wrote a small application that run every 10 minutes and check how much items is in queue. If the number is above the threshold, send me a notification message.
This is how I'm running over all queues:
Dictionary<string, int> dic = new Dictionary<string, int>();
foreach (CloudQueue queue in QueuesToMonitor)
{
queue.FetchAttributes();
dic.Add(queue.Name, queue.ApproximateMessageCount.HasValue ? queue.ApproximateMessageCount.Value : -1);
}
This code is working fine but it also counting messages which hidden. I'm want to exclude those messages from counting (because those task are not ready to be executed).
For exeample, I'm checked one of my queues and got an answer that 579 items is in queue. But, actully is empty of visible items. I'm verify this with Azure Storage Explorer:
How can I count only the visible items in queue?

Short answer to your question is that you can't get a count of only visible messages in a queue.
Approximate messages count will give you an approximate count of total messages in a queue and will include both visible and invisible messages.
One thing you could possibly do is PEEK at messages and it will return you a list of visible messages. However it will only return you a maximum of top 32 messages from the queue. So you logic to send notification message would work if the threshold is less than 32.

control that event with integer number is not happening more than N times per second

My system generates events with integer number. Totally there are about 10 000 of events from 1 to 10 000. Every time I receive new event with numer i need to check how many times I've already received event with such number in tha last second:
if I have received this event more than ~3-10 times in last second than I need to ignore it
otherwise i need to process it.
So I just need to control and ignore "flood" with events with the same number.
There are two requirements:
overhead of the flood control should be really MINIMAL as it used in HFT trading
at the same time i do not need to control "exactly" I just need "roughly" flood control. I.e. it's ok to stop receive events somewhere between 3 and 10 events per second.
So my proposal would be:
create int[10 000] array
every second refresh all items in this array to 0 (refresh operation of item of the array is atomic, also we can iterate over array without any problems and without locking because we do not insert or delete items, however probably someone can recomend special function to "zero" array, take into accout that I can read array at the same time from another thread)
every time new event received we a) Interlocked.Increment corresponding item in the array and only if the result is less than a threshold (~3) we will process it.
So flood control would be just one Intrerlocked.Increment operation and one comparision operation.
What do you tnink can you recommend something better?

One problem with your approach - is that if you clear the counters every second - it might be that you had a flood right before the end of the second but since you've just cleared it you will continue accepting new event.
It might be OK for you as you are good with approximation only.
Another approach may be to have an array of queues of time stamps.
When a new event comes in - you get the relevant queue from the array and clear from its head all the timestamps that occurred more than a second in the past.
The you check the size of the queue, if it is bigger than the threshold you do nothing - otherwise you enter the new event timestamp into the queue and process it.
I realize that this approach might be slower than just incrementing integers but it will be more accurate.
I suppose you can run some benchmarks and find out how slower is it and whether it fits your needs or not.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.