Azure Storage Queues - Counting visible messages - c#

I have a distributed application which share loads with Azure Storage queues. In order to verify that everything is working good, I'm wrote a small application that run every 10 minutes and check how much items is in queue. If the number is above the threshold, send me a notification message.
This is how I'm running over all queues:
Dictionary<string, int> dic = new Dictionary<string, int>();
foreach (CloudQueue queue in QueuesToMonitor)
{
queue.FetchAttributes();
dic.Add(queue.Name, queue.ApproximateMessageCount.HasValue ? queue.ApproximateMessageCount.Value : -1);
}
This code is working fine but it also counting messages which hidden. I'm want to exclude those messages from counting (because those task are not ready to be executed).
For exeample, I'm checked one of my queues and got an answer that 579 items is in queue. But, actully is empty of visible items. I'm verify this with Azure Storage Explorer:
How can I count only the visible items in queue?

Short answer to your question is that you can't get a count of only visible messages in a queue.
Approximate messages count will give you an approximate count of total messages in a queue and will include both visible and invisible messages.
One thing you could possibly do is PEEK at messages and it will return you a list of visible messages. However it will only return you a maximum of top 32 messages from the queue. So you logic to send notification message would work if the threshold is less than 32.

Related

How Peek works in a Partition enabled Service Bus Queue?

I understand from the Microsoft docs that during the first Peek() operation, any one of the available message brokers respond and send their oldest message. Then on subsequent Peek() operation, we can traverse across the partitions to peek every message with increased sequence number.
My question is, during the very first Peek() operation, I will get a message from any of the first responded partitions. Is there a guarantee that I can peek all the messages from the queue?
In a much simpler way, there are three Partitions:
Partition "A" has 10 messages with sequence number from 1 to 10.
Partition "B" has 10 messages with sequence number from 11 to 20.
Partition "C" has 10 messages with sequence number from 21 to 30.
Now if i perform Peek() operation, if Partition "B" responds first, the first message that I'll get is a message with sequence number 11. Next peek operation will look for a message with incremented sequence number. Won't I miss out messages from Partition "A" which has sequence numbers 1-10 which peek operation can never reach since it always searches for the incremented sequence number?
UPDATE
QueueClient queueClient = messagingFactory.CreateQueueClient("QueueName", ReceiveMode.PeekLock);
BrokeredMessage message = null;
while (iteration < messageCount)
{
message = queueClient.Peek(); // According to docs, Peeks the oldest message from any responding broker, and next iterations peek the message with incremented sequence number
if (message == null)
break;
Console.WriteLine(message.SequenceNumber);
iteration++;
}
Is there a guarantee that I can browse all the messages of a partitioned queue using the snippet above?
There is no guarantee that the returned message is the oldest one across all partitions.
It therefore depends which message broker responds first, and the oldest message from that partition will be shown. There is no general rule as to which partition will respond first in your example, but it is guaranteed that the oldest message from that partition is displayed first.
If you want to retrieve the messages by sequence number, use the overloaded Peek method Peek(Sequencenumber), see: https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-browsing
For partitioned entities, the sequence number is issued relative to the partition.
[...]
The SequenceNumber value is a unique 64-bit integer assigned to a message as it is accepted and stored by the broker and functions as its internal identifier. For partitioned entities, the topmost 16 bits reflect the partition identifier.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-sequencing)
So you cannot compare sequence numbers across partitions to see which one is older.
As an example, I just created a partitioned queue and put a couple of messages into two partitions (in Order):
1. Partition 1, SequenceNumber 61924494876344321
2. Partition 2, SequenceNumber 28991922601197569
3. Partition 1, SequenceNumber 61924494876344322
4. Partition 1, SequenceNumber 61924494876344323
5. Partition 2, SequenceNumber 28991922601197570
Browse/Peek messages: Available only in the older WindowsAzure.ServiceBus library. PeekBatch does not always return the number of messages specified in the MessageCount property. There are two common reasons for this behavior. One reason is that the aggregated size of the collection of messages exceeds the maximum size of 256 KB. Another reason is that if the queue or topic has the EnablePartitioning property set to true, a partition may not have enough messages to complete the requested number of messages. In general, if an application wants to receive a specific number of messages, it should call PeekBatch repeatedly until it gets that number of messages, or there are no more messages to peek.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-partitioning, Emphasis added)
As such, you should be able to repeatedly call Peek / PeekBatch to eventually get all the messages. At least, if you use the official SDKs.

How could messages get out of sequence?

I have used QuickFix/.NET for a long time, but in last two days, the engine appears to have sent messages out of sequence twice.
Here is an example, the 3rd message is out of sequence:
20171117-14:44:34.627 : 8=FIX.4.4 9=70 35=0 34=6057 49=TRD 52=20171117-14:44:34.622 56=SS 10=208
20171117-14:44:34.635 : 8=FIX.4.4 9=0070 35=0 34=6876 49=SS 56=TRD 52=20171117-14:44:34.634 10=060
20171117-14:45:04.668 : 8=FIX.4.4 9=224 35=D 34=6059 49=TRD 52=20171117-14:45:04.668 56=SS 11=AGG-171117T095204000182 38=100000 40=D 44=112.402 54=2 55=USD/XXX 59=3 60=20171117-09:45:04.647 278=2cK-3ovrjdrk00X1j8h03+ 10=007
20171117-14:45:04.668 : 8=FIX.4.4 9=70 35=0 34=6058 49=TRD 52=20171117-14:45:04.642 56=SS 10=209
I understand that the QuickFix logger is not in a separate thread.
What could cause this to happen?
The message numbers are generated using GetNextSenderMsgSeqNum method in quickfix/n, which use locking.
public int GetNextSenderMsgSeqNum()
{
lock (sync_) { return this.MessageStore.GetNextSenderMsgSeqNum(); }
}
In my opinion, the messages are generated in sequence and your application is displaying in different order.
In some situations the sender and receiver are not in sync, where receiver expects different sequence number, the initiator sends the message to acceptor that different sequence number is expected.
In that case, sequence number to can be changed to expected sequence number using the method call to update sequence or goto store folder and open file with extension.seqnums and update the sequence numbers.
I hope this will help.
As the datetime is the exact same on both messages, this may be a problem of sorting. This is common across any sorted list where the index is identical on two different items. If this were within your own code I would suggest that to resolve it, you include an extra element as part of the key, such a sequence number
Multiple messages sent by QuickFix with identical timestamps may be sent out of sequence.
A previous answer on StackOverflow suggested re-ordering them on the receiving side, but was not accepted:
QuickFix - messages out of sequence
If you decide to limit yourself to one message per millisecond, say with a sleep() command in between sends, be sure to increase your processes' scheduling priority: https://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx
You normally get a very long sleep even though you asked for only one millisecond, but I've gotten roughly 1-2 ms with ABOVE_NORMAL_PRIORITY_CLASS. (Windows 10)
You might try to disable Nagle's algorithm, which aggregates multiple TCP messages together and sends them at once. Nagle in and of itself can't cause messages to be sent out of order, but QuickFix may be manually buffering the messages in some weird way. Try telling QuickFix to send them immediately with SocketNodelay: http://quickfixn.org/tutorial/configuration.html

Bigquery internalError when streaming data

I'm getting the following error while streaming data:
Google.ApisGoogle.Apis.Requests.RequestError
Internal Error [500]
Errors [
Message[Internal Error] Location[ - ] Reason[internalError] Domain[global]
]
My code:
public bool InsertAll(BigqueryService s, String datasetId, String tableId, List<TableDataInsertAllRequest.RowsData> data)
{
try
{
TabledataResource t = s.Tabledata;
TableDataInsertAllRequest req = new TableDataInsertAllRequest()
{
Kind = "bigquery#tableDataInsertAllRequest",
Rows = data
};
TableDataInsertAllResponse response = t.InsertAll(req, projectId, datasetId, tableId).Execute();
if (response.InsertErrors != null)
{
return true;
}
}
catch (Exception e)
{
throw e;
}
return false;
}
I'm streaming data constantly and many times a day I have this error. How can I fix this?
We seen several problems:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take is an exponential-backoff with retry, even the support told to do so. Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.
There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.
You can check out this chart about your project health:
https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D
About times. Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes that can retry.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.

control that event with integer number is not happening more than N times per second

My system generates events with integer number. Totally there are about 10 000 of events from 1 to 10 000. Every time I receive new event with numer i need to check how many times I've already received event with such number in tha last second:
if I have received this event more than ~3-10 times in last second than I need to ignore it
otherwise i need to process it.
So I just need to control and ignore "flood" with events with the same number.
There are two requirements:
overhead of the flood control should be really MINIMAL as it used in HFT trading
at the same time i do not need to control "exactly" I just need "roughly" flood control. I.e. it's ok to stop receive events somewhere between 3 and 10 events per second.
So my proposal would be:
create int[10 000] array
every second refresh all items in this array to 0 (refresh operation of item of the array is atomic, also we can iterate over array without any problems and without locking because we do not insert or delete items, however probably someone can recomend special function to "zero" array, take into accout that I can read array at the same time from another thread)
every time new event received we a) Interlocked.Increment corresponding item in the array and only if the result is less than a threshold (~3) we will process it.
So flood control would be just one Intrerlocked.Increment operation and one comparision operation.
What do you tnink can you recommend something better?
One problem with your approach - is that if you clear the counters every second - it might be that you had a flood right before the end of the second but since you've just cleared it you will continue accepting new event.
It might be OK for you as you are good with approximation only.
Another approach may be to have an array of queues of time stamps.
When a new event comes in - you get the relevant queue from the array and clear from its head all the timestamps that occurred more than a second in the past.
The you check the size of the queue, if it is bigger than the threshold you do nothing - otherwise you enter the new event timestamp into the queue and process it.
I realize that this approach might be slower than just incrementing integers but it will be more accurate.
I suppose you can run some benchmarks and find out how slower is it and whether it fits your needs or not.

WPF dispatcher performance (100-200 updates/sec)

In a WPF Window, I've got a line chart that plots real-time data (Quinn-Curtis RealTime chart for WPF). In short, for each new value, I call a SetCurrentValue(x, y) method, and then the UpdateDraw() method to update the chart.
The data comes in via a TCP connection in another thread. Every new value that comes in causes an DataReceived event, and its handler should plot the value to the chart and then update it. Logically, I can't call UpdateDraw() directly, since my chart is in the UI thread which is not the same thread as where the data comes in.
So I call Dispatcher.Invoke( new Action (UpdateDraw()) ) - and this works fine, well, as long as I update max. 30 times/sec. When updating more often, the Dispatcher can't keep up and the chart updated slower than the data comes in. I tested this using a single-thread situation with simulated data and without the Dispatcher there are no problems.
So, my conclusion is that the Dispatcher is too slow for this situation. I actually need to update 100-200 times/sec!
Is there a way to put a turbo on the Dispatcher, or are there other ways to solve this? Any suggestions are welcome.
An option would be to use a shared queue to communicate the data.
Where the data comes on, you push the data to the end of the queue:
lock (sharedQueue)
{
sharedQueue.Enqueue(data);
}
On the UI thread, you find a way to read this data, e.g. using a timer:
var incomingData = new List<DataObject>();
lock (sharedQueue)
{
while (sharedQueue.Count > 0)
incomingData.Add(sharedQueue.Dequeue());
}
// Use the data in the incomingData list to plot.
The idea here is that you're not communicating that data is coming in. Because you have a constant stream of data, I suspect that's not a problem. I'm not saying that the exact implementation as give above is the rest, but this is about the general idea.
I'm not sure how you should check for new data, because I do not have enough insight into the details of the application; but this may be a start for you.
Youre requierments are bonkers- You seriously do NOT need 100-200 updates per second, especialyl as teh screen runs at 60 updates per second normally. People wont see them anyway.
Enter new data into a queue.
Trigger a pull event on / for the dispatcher.
Santize data in the queue (thro out doubles, last valid wins) and put them in.l
30 updates per second are enough - people wont see a difference. I had performacne issues on some financial data under high load with a T&S until I did that - now the graph looks better.
Keep Dispatcher moves as few as you can.
I still like to know why you'd want to update a chart 200 times per second when your monitor can't even display it that fast. (Remember, normal flatscreen monitors have an update-rate of 60 fps)
What's the use of updating something 200 times per second when you can only SEE updates 60 times per second ?
You might as well batch incoming data and update the chart at 60 fps since you won't be able to see the difference anyway.
If it's not just about displaying the data but you're doing something else with it - say you are monitoring it to see if it reaches a certain threshold - than I recommend splitting the system in 2 parts : one part monitoring at full speed, the other independently displaying at the maximum speed your monitor can handle : 60 fps.
So please, tell us why you want to update a ui-control more often than it can be displayed to the user.
WPF drawing occurs in a separate thread. Depending on your chart complexity, your PC must have had a mega-descent video card to keep up with 100 frames per second. WPF uses Direct3D to draw everything on screen and optimizing video driver for this has been added in Vista (improved in Windows 7). So, on XP you might have troubles just because of your high data-output rate on poorly designed OS.
Despite all that, I see no reason of printing information to screen with a rate of more than 30-60 frames per second. Come on! Even FPS shooters does not require such a strong reflexes from player. Do you want to tell me, that your poor chart does? :) If by this outputting, you produce some side-effects, which are what you actually need, then it's completely different story. Tell us more about the problem then.

Categories