How to have a Consumer start from the last unconsumed message? - c#

Using a handler passed to SetPartitionsAssignedHandler I need to be able to consume from the message immediately following the last message consumed by the group.
I have a consumer on a single partition topic. Because I need to be able to set a custom offset in certain circumstances, I have implemented a handler and passed it to SetPartitionsAssignedHandler. If the handler determines that a specific offset is needed, it figures out the offset and returns a TopicPartitionOffset with that value set in an Offset instance. This works as expected. What does not work is if no specific offset is expected. I've tried returning
* a TopicPartitionOffset with Offset.End - consumes from the next message posted to the topic
* a TopicPartitionOffset with Offset.Beginning - consumes from the beginning of the partition
* a TopicPartitionOffset with either Offset.Stored or Offset.Unset - consumes from the last message consumed, but always consumes that message again. I've checked that that's what's happening by looking at the offset of the first message consumed.
* nothing - the consumer never consumes any messages
I've searched around, including going through the code, but the TopicPartitionOffset information is passed into the librdkafka.dll and that determines what is done with the offset information, so I can't see why Stored and Unset both reconsume the last consumed message. I also can't see why not returning a TopicPartitionOffset results in the consumer never consuming anything.
The group ID is consistent, so it's not a case of the group ID changing that's causing a problem. And if the group ID was changing then using Offset.Stored or Offset.Unset would result in the partition being read from the beginning anyway.
I've found this same question asked with no answer at consumer consuming the same message twice at the starting only. I've also looked at How to make kafka consumer to read from last consumed offset but not from beginning, but setting the offset reset to earliest and the group ID does not result in the desired behaviour because giving a handler to SetPartitionAssignedHandler evidently overrides whatever the default behaviour would be. I didn't find any other questions that seemed relevant and so far no other relevant information has come up anywhere. I did also go through the existing issues listed on the Github repo before looking through the code to see if I could spot anything there.
The ConsumerConfig is constructed like this.
private ConsumerConfig GetConsumerConfig(Config.Consumer config)
{
ConsumerConfig consumerCfg = GetBaseConfiguruation();
if (!consumerCfg.Any())
{
return consumerCfg;
}
consumerCfg.BootstrapServers = "localhost:9092";
consumerCfg.GroupId = "TestConsumer";
consumerCfg.EnableAutoCommit = false;
consumerCfg.AutoOffsetReset = AutoOffsetReset.Earliest;
return consumerCfg;
}
private ConsumerConfig GetBaseConfiguruation()
{
Option<string> ipAddr = LocalIpAddress.GetLocalIpAddress();
return ipAddr.HasValue
? new ConsumerConfig()
{
ClientId = $"{ipAddr.ValueOrFailure()}",
AutoCommitIntervalMs = 1000,
SessionTimeoutMs = 30000,
StatisticsIntervalMs = 60000,
FetchMinBytes = 64 * 1024,
FetchWaitMaxMs = 200000,
MaxPartitionFetchBytes = 3 * 102400
}
: new ConsumerConfig();
}
The Group ID is in the app.config, so it's always the same unless the config is changed. I'm not changing it between executions of the app.
The consumer is constructed with this configuration.
private IConsumer<string, string> CreateConsumer(ConsumerConfig config)
{
return new ConsumerBuilder<string, string>(config)
.SetErrorHandler(OnConsumeError)
.SetStatisticsHandler(OnStatistics)
.SetLogHandler(OnLog)
.SetOffsetsCommittedHandler(OnOffsetCommit)
.SetPartitionsAssignedHandler(OnPartitionsAssigned)
.Build();
}
When the consumer connects it subscribes one or more configured topics, usually just one, by calling
IConsumer<T,U>.Subscribe(List<string>)
The exact configuration in the app.config isn't relevant here - it consists of the topic name, an optional offset and information on where the message goes for processing.
This is simplified code representing what construction of the TopicPartitionOffset when no specific offset is needed (and with AutoOffsetReset.Earliest hardcoded to force use of Offset.Stored).
private List<TopicPartitionOffset> OnPartitionsAssigned(
IConsumer<string, string> consumer,
List<TopicPartition> topicPartitions)
{
List<TopicPartitionOffset> offsetPartitions =
topicPartitions.Select(partition => GetPartitionOffset(AutoOffsetReset.Earliest, partition))
.ToList();
return offsetPartitions;
}
private static TopicPartitionOffset GetPartitionOffset(AutoOffsetReset offsetReset, TopicPartition partition)
{
return new TopicPartitionOffset(
partition.Topic,
partition.Partition,
AutoOffsetReset.Latest == offsetReset ? Offset.End : Offset.Stored);
}
The only significant difference when a specific offset is needed is that instead of Offset.End or Offset.Stored a numeric value is determined and used.
I expected that using Offset.Stored would result in consuming from the message after the last message consumed on the partition (by the group). But it always results in reconsuming the last message that was consumed.
Update:
After further investigation I tried getting the committed offsets for the partitions in the handler passed to SetPartitionAssignedHandler. Then, instead of using one of the Offset special values, I assign construct a TopicPartitionOffset with an Offset having a Value of 1 higher than that last committed. This works well, except for cases where multiple messages were received in a batch. It appears that Commit(IEnumerable) has a bug which results in only the first offset for a partition being committed. So if we receive 3 messages on a given partition, only the lowest offset for that is committed if I include multiple for the same partition.
So pseudo code that produces the desired result (when not using auto-commit):
Set partition offset to one greater than the last committed.
Read messages.
After processing a batch of messages, get the highest offset for each partition and commit that list.

Related

Biztalk XLANG transform() output the same random value in a loop inside Biztalk orchestration

I wrote a map to generate an HL7 message header (MSH).
For the MSH.10 segment, by definition should be unique so I put the following in my map.
public string MessageControlId()
{
//return System.DateTime.Now.ToString("yyyyMMddHHmmssffff");
string firstPart = System.DateTime.Now.ToString("yyyyMMdd");
string middlePart = new Random().Next( 1000, 9999 ).ToString();
string lastPart = System.DateTime.Now.ToString("ffff");
return firstPart + middlePart + lastPart;
}
Then in my orchestration I call the header map multiple time in a loop. My goal is to generate multiple HL7 messages, each with its own message header and a unique MSH.10 value.
The code below is based on Microsoft Biztalk XLANG syntax which invokes the map to transform and create the message header via the transform() statement.
tMapType = System.Type.GetType(msgBre.HeaderMapName);
transform (msgHeader) = tMapType(msgBilling);
However, when I tested this out I see multiple HL7 message generated but many of them have duplicate value in term of their MSH.10 segment. I grouped them in different color below.
I expect separate value each time because in my code I generate a random number between 1000 and 9999. Plus I also generate the time value to the thousand of a second.
Do you know why this occur? My only theory is when I call the tranform() function, it does not really invoke the map to recreate the header each time...that seems wrong to me.
UPDATE:
Thanks to #hulihunskeli input, I was able to solve this by going into my orchestration in Biztalk and just prior of loop repetition I added a 200ms delay and that seems to solve it. I guess this is one of those things where the processing time of the loop is just too quick for the function to generate a new object which ensure a unique number.
You are using Random object which is a pseudo-random number generator, so it returns the same sequence of numbers with a same seed. You are not giving a seed explicitly to the constructor, so it uses default seed which is based on a system clock. If you are creating Random objects in a tight loop with a default seed then next() function will return you the same number multiple times, which I think is what happens here.
You should either give unique seed explicitly or use the same Random object all the time (if it is possible).

Confluent Kafka returns -1001 for offset position

I'm trying to get the Kafka offset using Confluent Kafka.
This is the code I'm using to obtain it:
var offsetPosition = consumer.Position(new TopicPartition(topicConfiguration.Topic, topicConfiguration.Partition));
It always gives me a value of -1001 though. What am I doing wrong?
Additional Info
I think this is may because it is Unset. This is what the doc says:
Unset in case there was no previous message consumed by this consumer.
I'm not sure what I should do with this though.
You aren't doing anything wrong, that is the default value. From the docs:
If an offset value of Offset.Invalid (-1001) is specified, consumption
will resume from the last committed offset, or according to the
'auto.offset.reset' configuration parameter if no offsets have been
committed yet.
If you want to specify where to start from, you can use the consumer.Assign method.

How Peek works in a Partition enabled Service Bus Queue?

I understand from the Microsoft docs that during the first Peek() operation, any one of the available message brokers respond and send their oldest message. Then on subsequent Peek() operation, we can traverse across the partitions to peek every message with increased sequence number.
My question is, during the very first Peek() operation, I will get a message from any of the first responded partitions. Is there a guarantee that I can peek all the messages from the queue?
In a much simpler way, there are three Partitions:
Partition "A" has 10 messages with sequence number from 1 to 10.
Partition "B" has 10 messages with sequence number from 11 to 20.
Partition "C" has 10 messages with sequence number from 21 to 30.
Now if i perform Peek() operation, if Partition "B" responds first, the first message that I'll get is a message with sequence number 11. Next peek operation will look for a message with incremented sequence number. Won't I miss out messages from Partition "A" which has sequence numbers 1-10 which peek operation can never reach since it always searches for the incremented sequence number?
UPDATE
QueueClient queueClient = messagingFactory.CreateQueueClient("QueueName", ReceiveMode.PeekLock);
BrokeredMessage message = null;
while (iteration < messageCount)
{
message = queueClient.Peek(); // According to docs, Peeks the oldest message from any responding broker, and next iterations peek the message with incremented sequence number
if (message == null)
break;
Console.WriteLine(message.SequenceNumber);
iteration++;
}
Is there a guarantee that I can browse all the messages of a partitioned queue using the snippet above?
There is no guarantee that the returned message is the oldest one across all partitions.
It therefore depends which message broker responds first, and the oldest message from that partition will be shown. There is no general rule as to which partition will respond first in your example, but it is guaranteed that the oldest message from that partition is displayed first.
If you want to retrieve the messages by sequence number, use the overloaded Peek method Peek(Sequencenumber), see: https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-browsing
For partitioned entities, the sequence number is issued relative to the partition.
[...]
The SequenceNumber value is a unique 64-bit integer assigned to a message as it is accepted and stored by the broker and functions as its internal identifier. For partitioned entities, the topmost 16 bits reflect the partition identifier.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-sequencing)
So you cannot compare sequence numbers across partitions to see which one is older.
As an example, I just created a partitioned queue and put a couple of messages into two partitions (in Order):
1. Partition 1, SequenceNumber 61924494876344321
2. Partition 2, SequenceNumber 28991922601197569
3. Partition 1, SequenceNumber 61924494876344322
4. Partition 1, SequenceNumber 61924494876344323
5. Partition 2, SequenceNumber 28991922601197570
Browse/Peek messages: Available only in the older WindowsAzure.ServiceBus library. PeekBatch does not always return the number of messages specified in the MessageCount property. There are two common reasons for this behavior. One reason is that the aggregated size of the collection of messages exceeds the maximum size of 256 KB. Another reason is that if the queue or topic has the EnablePartitioning property set to true, a partition may not have enough messages to complete the requested number of messages. In general, if an application wants to receive a specific number of messages, it should call PeekBatch repeatedly until it gets that number of messages, or there are no more messages to peek.
(https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-partitioning, Emphasis added)
As such, you should be able to repeatedly call Peek / PeekBatch to eventually get all the messages. At least, if you use the official SDKs.

How could messages get out of sequence?

I have used QuickFix/.NET for a long time, but in last two days, the engine appears to have sent messages out of sequence twice.
Here is an example, the 3rd message is out of sequence:
20171117-14:44:34.627 : 8=FIX.4.4 9=70 35=0 34=6057 49=TRD 52=20171117-14:44:34.622 56=SS 10=208
20171117-14:44:34.635 : 8=FIX.4.4 9=0070 35=0 34=6876 49=SS 56=TRD 52=20171117-14:44:34.634 10=060
20171117-14:45:04.668 : 8=FIX.4.4 9=224 35=D 34=6059 49=TRD 52=20171117-14:45:04.668 56=SS 11=AGG-171117T095204000182 38=100000 40=D 44=112.402 54=2 55=USD/XXX 59=3 60=20171117-09:45:04.647 278=2cK-3ovrjdrk00X1j8h03+ 10=007
20171117-14:45:04.668 : 8=FIX.4.4 9=70 35=0 34=6058 49=TRD 52=20171117-14:45:04.642 56=SS 10=209
I understand that the QuickFix logger is not in a separate thread.
What could cause this to happen?
The message numbers are generated using GetNextSenderMsgSeqNum method in quickfix/n, which use locking.
public int GetNextSenderMsgSeqNum()
{
lock (sync_) { return this.MessageStore.GetNextSenderMsgSeqNum(); }
}
In my opinion, the messages are generated in sequence and your application is displaying in different order.
In some situations the sender and receiver are not in sync, where receiver expects different sequence number, the initiator sends the message to acceptor that different sequence number is expected.
In that case, sequence number to can be changed to expected sequence number using the method call to update sequence or goto store folder and open file with extension.seqnums and update the sequence numbers.
I hope this will help.
As the datetime is the exact same on both messages, this may be a problem of sorting. This is common across any sorted list where the index is identical on two different items. If this were within your own code I would suggest that to resolve it, you include an extra element as part of the key, such a sequence number
Multiple messages sent by QuickFix with identical timestamps may be sent out of sequence.
A previous answer on StackOverflow suggested re-ordering them on the receiving side, but was not accepted:
QuickFix - messages out of sequence
If you decide to limit yourself to one message per millisecond, say with a sleep() command in between sends, be sure to increase your processes' scheduling priority: https://msdn.microsoft.com/en-us/library/windows/desktop/ms685100(v=vs.85).aspx
You normally get a very long sleep even though you asked for only one millisecond, but I've gotten roughly 1-2 ms with ABOVE_NORMAL_PRIORITY_CLASS. (Windows 10)
You might try to disable Nagle's algorithm, which aggregates multiple TCP messages together and sends them at once. Nagle in and of itself can't cause messages to be sent out of order, but QuickFix may be manually buffering the messages in some weird way. Try telling QuickFix to send them immediately with SocketNodelay: http://quickfixn.org/tutorial/configuration.html

Bigquery internalError when streaming data

I'm getting the following error while streaming data:
Google.ApisGoogle.Apis.Requests.RequestError
Internal Error [500]
Errors [
Message[Internal Error] Location[ - ] Reason[internalError] Domain[global]
]
My code:
public bool InsertAll(BigqueryService s, String datasetId, String tableId, List<TableDataInsertAllRequest.RowsData> data)
{
try
{
TabledataResource t = s.Tabledata;
TableDataInsertAllRequest req = new TableDataInsertAllRequest()
{
Kind = "bigquery#tableDataInsertAllRequest",
Rows = data
};
TableDataInsertAllResponse response = t.InsertAll(req, projectId, datasetId, tableId).Execute();
if (response.InsertErrors != null)
{
return true;
}
}
catch (Exception e)
{
throw e;
}
return false;
}
I'm streaming data constantly and many times a day I have this error. How can I fix this?
We seen several problems:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take is an exponential-backoff with retry, even the support told to do so. Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.
There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.
You can check out this chart about your project health:
https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D
About times. Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes that can retry.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.

Categories