How could messages get out of sequence? - c#

I have used QuickFix/.NET for a long time, but in last two days, the engine appears to have sent messages out of sequence twice.
Here is an example, the 3rd message is out of sequence:
20171117-14:44:34.627 : 8=FIX.4.4 9=70 35=0 34=6057 49=TRD 52=20171117-14:44:34.622 56=SS 10=208
20171117-14:44:34.635 : 8=FIX.4.4 9=0070 35=0 34=6876 49=SS 56=TRD 52=20171117-14:44:34.634 10=060
20171117-14:45:04.668 : 8=FIX.4.4 9=224 35=D 34=6059 49=TRD 52=20171117-14:45:04.668 56=SS 11=AGG-171117T095204000182 38=100000 40=D 44=112.402 54=2 55=USD/XXX 59=3 60=20171117-09:45:04.647 278=2cK-3ovrjdrk00X1j8h03+ 10=007
20171117-14:45:04.668 : 8=FIX.4.4 9=70 35=0 34=6058 49=TRD 52=20171117-14:45:04.642 56=SS 10=209
I understand that the QuickFix logger is not in a separate thread.
What could cause this to happen?

The message numbers are generated using GetNextSenderMsgSeqNum method in quickfix/n, which use locking.
public int GetNextSenderMsgSeqNum()
{
lock (sync_) { return this.MessageStore.GetNextSenderMsgSeqNum(); }
}
In my opinion, the messages are generated in sequence and your application is displaying in different order.
In some situations the sender and receiver are not in sync, where receiver expects different sequence number, the initiator sends the message to acceptor that different sequence number is expected.
In that case, sequence number to can be changed to expected sequence number using the method call to update sequence or goto store folder and open file with extension.seqnums and update the sequence numbers.
I hope this will help.

As the datetime is the exact same on both messages, this may be a problem of sorting. This is common across any sorted list where the index is identical on two different items. If this were within your own code I would suggest that to resolve it, you include an extra element as part of the key, such a sequence number

Multiple messages sent by QuickFix with identical timestamps may be sent out of sequence.
A previous answer on StackOverflow suggested re-ordering them on the receiving side, but was not accepted:
QuickFix - messages out of sequence
If you decide to limit yourself to one message per millisecond, say with a sleep() command in between sends, be sure to increase your processes' scheduling priority: https://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx
You normally get a very long sleep even though you asked for only one millisecond, but I've gotten roughly 1-2 ms with ABOVE_NORMAL_PRIORITY_CLASS. (Windows 10)
You might try to disable Nagle's algorithm, which aggregates multiple TCP messages together and sends them at once. Nagle in and of itself can't cause messages to be sent out of order, but QuickFix may be manually buffering the messages in some weird way. Try telling QuickFix to send them immediately with SocketNodelay: http://quickfixn.org/tutorial/configuration.html

Related

How Peek works in a Partition enabled Service Bus Queue?

I understand from the Microsoft docs that during the first Peek() operation, any one of the available message brokers respond and send their oldest message. Then on subsequent Peek() operation, we can traverse across the partitions to peek every message with increased sequence number.
My question is, during the very first Peek() operation, I will get a message from any of the first responded partitions. Is there a guarantee that I can peek all the messages from the queue?
In a much simpler way, there are three Partitions:
Partition "A" has 10 messages with sequence number from 1 to 10.
Partition "B" has 10 messages with sequence number from 11 to 20.
Partition "C" has 10 messages with sequence number from 21 to 30.
Now if i perform Peek() operation, if Partition "B" responds first, the first message that I'll get is a message with sequence number 11. Next peek operation will look for a message with incremented sequence number. Won't I miss out messages from Partition "A" which has sequence numbers 1-10 which peek operation can never reach since it always searches for the incremented sequence number?
UPDATE
QueueClient queueClient = messagingFactory.CreateQueueClient("QueueName", ReceiveMode.PeekLock);
BrokeredMessage message = null;
while (iteration < messageCount)
{
message = queueClient.Peek(); // According to docs, Peeks the oldest message from any responding broker, and next iterations peek the message with incremented sequence number
if (message == null)
break;
Console.WriteLine(message.SequenceNumber);
iteration++;
}
Is there a guarantee that I can browse all the messages of a partitioned queue using the snippet above?
There is no guarantee that the returned message is the oldest one across all partitions.
It therefore depends which message broker responds first, and the oldest message from that partition will be shown. There is no general rule as to which partition will respond first in your example, but it is guaranteed that the oldest message from that partition is displayed first.
If you want to retrieve the messages by sequence number, use the overloaded Peek method Peek(Sequencenumber), see: https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-browsing
For partitioned entities, the sequence number is issued relative to the partition.
[...]
The SequenceNumber value is a unique 64-bit integer assigned to a message as it is accepted and stored by the broker and functions as its internal identifier. For partitioned entities, the topmost 16 bits reflect the partition identifier.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-sequencing)
So you cannot compare sequence numbers across partitions to see which one is older.
As an example, I just created a partitioned queue and put a couple of messages into two partitions (in Order):
1. Partition 1, SequenceNumber 61924494876344321
2. Partition 2, SequenceNumber 28991922601197569
3. Partition 1, SequenceNumber 61924494876344322
4. Partition 1, SequenceNumber 61924494876344323
5. Partition 2, SequenceNumber 28991922601197570
Browse/Peek messages: Available only in the older WindowsAzure.ServiceBus library. PeekBatch does not always return the number of messages specified in the MessageCount property. There are two common reasons for this behavior. One reason is that the aggregated size of the collection of messages exceeds the maximum size of 256 KB. Another reason is that if the queue or topic has the EnablePartitioning property set to true, a partition may not have enough messages to complete the requested number of messages. In general, if an application wants to receive a specific number of messages, it should call PeekBatch repeatedly until it gets that number of messages, or there are no more messages to peek.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-partitioning, Emphasis added)
As such, you should be able to repeatedly call Peek / PeekBatch to eventually get all the messages. At least, if you use the official SDKs.

Bigquery internalError when streaming data

I'm getting the following error while streaming data:
Google.ApisGoogle.Apis.Requests.RequestError
Internal Error [500]
Errors [
Message[Internal Error] Location[ - ] Reason[internalError] Domain[global]
]
My code:
public bool InsertAll(BigqueryService s, String datasetId, String tableId, List<TableDataInsertAllRequest.RowsData> data)
{
try
{
TabledataResource t = s.Tabledata;
TableDataInsertAllRequest req = new TableDataInsertAllRequest()
{
Kind = "bigquery#tableDataInsertAllRequest",
Rows = data
};
TableDataInsertAllResponse response = t.InsertAll(req, projectId, datasetId, tableId).Execute();
if (response.InsertErrors != null)
{
return true;
}
}
catch (Exception e)
{
throw e;
}
return false;
}
I'm streaming data constantly and many times a day I have this error. How can I fix this?
We seen several problems:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take is an exponential-backoff with retry, even the support told to do so. Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.
There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.
You can check out this chart about your project health:
https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D
About times. Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes that can retry.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.

control that event with integer number is not happening more than N times per second

My system generates events with integer number. Totally there are about 10 000 of events from 1 to 10 000. Every time I receive new event with numer i need to check how many times I've already received event with such number in tha last second:
if I have received this event more than ~3-10 times in last second than I need to ignore it
otherwise i need to process it.
So I just need to control and ignore "flood" with events with the same number.
There are two requirements:
overhead of the flood control should be really MINIMAL as it used in HFT trading
at the same time i do not need to control "exactly" I just need "roughly" flood control. I.e. it's ok to stop receive events somewhere between 3 and 10 events per second.
So my proposal would be:
create int[10 000] array
every second refresh all items in this array to 0 (refresh operation of item of the array is atomic, also we can iterate over array without any problems and without locking because we do not insert or delete items, however probably someone can recomend special function to "zero" array, take into accout that I can read array at the same time from another thread)
every time new event received we a) Interlocked.Increment corresponding item in the array and only if the result is less than a threshold (~3) we will process it.
So flood control would be just one Intrerlocked.Increment operation and one comparision operation.
What do you tnink can you recommend something better?
One problem with your approach - is that if you clear the counters every second - it might be that you had a flood right before the end of the second but since you've just cleared it you will continue accepting new event.
It might be OK for you as you are good with approximation only.
Another approach may be to have an array of queues of time stamps.
When a new event comes in - you get the relevant queue from the array and clear from its head all the timestamps that occurred more than a second in the past.
The you check the size of the queue, if it is bigger than the threshold you do nothing - otherwise you enter the new event timestamp into the queue and process it.
I realize that this approach might be slower than just incrementing integers but it will be more accurate.
I suppose you can run some benchmarks and find out how slower is it and whether it fits your needs or not.

C# best way to differentiate socket messages

Im new to sockets, and Im creating a tictactoe online, I know how to make the connections with the clients and the server, but I will make a chat too.
Then I doing this, when a user chat I send a message with a prefix "CHAT: HELLO WORLD"
and when a user make a move I send a message without the prefix... this is the best way?
THX!!!
In defining a wire protocol over a stream-based protocol like TCP, you have a few options for constructing messages:
Fixed-length
All messages are the same length; every sequence of x bytes represents a new message.
Length-prefixed (variable length)
The first byte(s) of the message represent the length of the payload to follow.
String-terminated (variable length)
Read bytes from the stream until you come to a specified byte-string that represents the end of a message, i.e. the newline character \n.
If you ever intend on changing the protocol (protip: you will, even if you don't think you will), it is crucial that you include an identifier for the protocol version in each message to prevent issues when dealing with clients using an older iteration of the protocol. Clearly, this is the first thing you must determine before deciphering the rest of the payload, so this should be the first byte(s) of the message (following any length-prefix) - how could we determine the version if we don't know where it is located in every message we receive?
Typically you would go with a format that includes a packet length, type and payload.
In your case you could go with a Byte (type), Int16 (length), Byte[] (payload).
The type can be represented in code as an enum. Length would just represent the length of the payload.
public enum Byte PacketType {
PlayerMove = 1,
PlayerChat = 2
}
You need to define a protocol. Remember to allow room for additional features :-).
Eg. using regular expressions over complete lines (end with selected line terminator):
Matching ^:[a-c][1-3]:: is a move (colon, position, colon user name).
Matching ^!.*?:: is a chat message (exclamation point, name, colon, text).
and anything else (in V1) is an error.
Remember:
Data is sent in packets, you might need multiple reads from the socket to get a complete message.
Avoid ambiguity: resolving it might be x or y is hard.
Specify a text encoding (eg. UTF-8).
I assume you're using TCP?
You need to make sure you 'frame' both messages so you can identify them and also avoid potential blocking issues (in case the client stops sending while you are still expecting to read CHAT: or whatever you define). With TCP your byte order is guaranteed but reading does not guarantee a complete 'packet' so you'll need to implement some way of building up a buffer and identifying when your 'message' is complete.
A reasonably simple way of doing this is to make sure each 'message' has a header with the type and size specified.
EG:
Enumerate your message types (move and chat currently), so say 'chat' is 0x01 and your message is 1020 bytes. You can prefix your 'message' with 0x0103FC so the server knows how many bytes to expect, and build up a buffer using async socket calls until the 1020 bytes are read (or you arbitrarily decide that the client is not sending anymore)

Problem reading from a TCPClient

I'm making a simple client application in C#, and have reached a problem.
The server application sends a string in the format of "<number> <param> <param>" etc. In other words, the first symbol is an integer, and the rest are whatever, all are separated by one space each.
The problem I get, when reading this string, is that my program first reads a string with the , and then the next time I read I get the rest of the message.
For example, if I were to do a writeline on what I receive, it would look like this:
(if he sends "1 0 0 0")
1
0 0 0
(EDIT: The formatting doesn't seem to permit this. The 1 is on a row of its own, the rest are supposed to be on the row below, including the space preceding the first 0)
I've run out of ideas how to fix this. Here's the method (I commented out some stuff I tried):
http://pastebin.com/0bXC9J2f
EDIT (again): I forgot, it seems to work just fine when I'm in debug and just go through everything step by step, so I can't find any source of the problem that way.
TCP is stream based and not message based. One Read can contain any of the following alternatives:
A teeny weeny part of message
A half message
Excactly one message
One and a half message
Two messages
Thus you need to use some kind of method to see if a complete message have arrived. The most common methods are:
Add a footer (for instance an empty line) which indicates end of message
Add a fixed length header containing the length of the message
If your protocol is straight TCP, then you cannot send messages, strings or anything else except octet, (byte) streams. Does your 'string' have a null at the end? If so, you need to append received data until the null arrives, then you have your message.
If this is your problem, then you should code your protocol so that it works no matter how many read calls are made on the socket, eg. if a null-terminated string of [99 data bytes+#0] is sent by the server, your protocol should be able to assemble the correct string if 100 bytes are returned in one call, 1 byte is received in 100 calls, or anything in between.
Rgds,
Martin

Categories