Kafka: Consume partition with manual batching - Messages are being skipped

Kafka: Consume partition with manual batching - Messages are being skipped - c#

I am using Confluent Kafka .NET to create a consumer for a partitioned topic.
As Confluent Kafka .NET does not support consuming in batches, I built a function that consumes messages until the batch size is reached. The idea of this function is to build batches with messages from the same partition only, that is why I stop building the batch once I consume a result that has a different partition and return whatever number of messages I was able to consume up to that point.
Goal or Objective: I want to be able to process the messages I returned in the batch, and commit the offsets for those messages only. i.e:
Message Consumed From Partition
Offset
Stored in Batch
0
0
Yes
0
1
Yes
2
0
No
From the table above I would like to process both messages I got from partition 0. Message from partition 2 would be ignored and (hopefully) PICKED UP LATER in another call to ConsumeBatch.
To commit I simply call the synchronous Commit function passing the offset of the latest message I processed as parameter. In this case I would pass the offset of the second message of the batch shown in the table above (Partition 0 - Offset 1).
ISSUE:
The problem is that for some reason, when I build a batch like the one shown above, the messages I decide not to process because of validations are being ignored forever. i.e: Message 0 of partition 2 will never be picked up by the consumer again.
As you can see in the consumer configuration below, I have set both EnableAutoCommit and EnableAutoOffsetStore as false. I think this would be enough for the consumer to not do anything with the offsets and be able to pick up ignored messages in another Consume call, but it isn't. The offset is somehow increasing up to the latest consumed message for each partition, regardless of my configuration.
Can anybody give me some light on what am I missing here to achieve the desired behavior if possible?
Simplified version of the function to build the batch:
public IEnumerable<ConsumeResult<string, string>> ConsumeBatch(int batchSize)
{
List<ConsumeResult<string, string>> consumedMessages = new List<ConsumeResult<string, string>>();
int latestPartition = -1; // The partition from where we consumed the last message
for (int i = 0; i < batchSize; i++)
{
var result = _consumer.Consume(100);
if (result != null)
{
if (latestPartition == -1 || result.Partition.Value == latestPartition)
{
consumedMessages.Add(result);
latestPartition = result.Partition.Value;
}
else
break;
}
else
break;
}
return consumedMessages;
}
ConsumerConfig used to instantiate my consumer client:
_consumerConfig = new ConsumerConfig
{
BootstrapServers = _bootstrapServers,
EnableAutoCommit = false,
AutoCommitIntervalMs = 0,
GroupId = "WorkerConsumers",
AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoOffsetStore = false,
};
Additional Information:
This is being tested with:
1 topic with 6 partitions and replication factor of 2
3 brokers
1 single-threaded consumer client that belongs to a consumer group
Local environment with wsl2 on Windows 10

The key was to use the Seek function to reset the partition's offset to a specific position so that the ignored message could be picked up again as part of another batch.
In the same function above:
public IEnumerable<ConsumeResult<string, string>> ConsumeBatch(int batchSize)
{
List<ConsumeResult<string, string>> consumedMessages = new List<ConsumeResult<string, string>>();
int latestPartition = -1; // The partition from where we consumed the last message
for (int i = 0; i < batchSize; i++)
{
var result = _consumer.Consume(100);
if (result != null)
{
if (latestPartition == -1 || result.Partition.Value == latestPartition)
{
consumedMessages.Add(result);
latestPartition = result.Partition.Value;
}
else
{
// This call will guarantee that this message that will not be included in the current batch, will be included in another batch later
_consumer.Seek(result.TopicPartitionOffset); // IMPORTANT LINE!!!!!!!
break;
}
}
else
break;
}
return consumedMessages;
}
I think in general, if you want to consume a message without altering the offsets in any way (kinda peeking the topic partition), you can call Consume and then use Seek(result.TopicPartitionOffset) to set the offset of that topic partition back to where it was before consuming the message.

Related

can we delete a specific dead letter message based on the sequence number from dead letter queue in Azure service bus using .net code?

I have created a program to read the message from deadletter queue based on the sequence number and copy the contents of dead letter message and sends it to active queue as a new message with the content of deadletter message along with the same messageId and other properties. And later I delete the same message from DLQ as well.
public static async Task<string> GetDeadLetterMessagesAsync(string connectionString,
string queueName, long seqNum, int countDLQMessages)
{
//creating a service bus client
var serviceBusClient = new ServiceBusClient(connectionString);
Console.WriteLine("ServiceBusClient is created");
//ReceiverOptions to access dead letter queue
var receiverOptions = new ServiceBusReceiverOptions { SubQueue = SubQueue.DeadLetter };
Console.WriteLine("receiverOption is created");
//Create receiver to access the deadletter queue of main queue
var receiver = serviceBusClient.CreateReceiver(queueName, receiverOptions);
Console.WriteLine("receiver is created");
// serviceBusSender used to send the message to service bus
ServiceBusSender sender;
sender= serviceBusClient.CreateSender(queueName);
IList<ServiceBusReceivedMessage> receivedMessages = (IList<ServiceBusReceivedMessage>)await receiver.ReceiveMessagesAsync(countDLQMessages);
Console.WriteLine("read all the messages in service bus DLQ");
var totalMessageCount = receivedMessages.Count;
if (totalMessageCount == 0)
{
return "No Message is available in DLQ";
}
// Binary search on deadletter messages in DLQ on sequence number
Int32 lower = 0;
Int32 upper = receivedMessages.Count - 1;
Console.WriteLine("ReceivedMessage List");
Console.WriteLine(receivedMessages[0].Body);
int sequenceFlag = 0;
while (lower <= upper)
{
Int32 middle = lower + (upper - lower) / 2;
if (seqNum == receivedMessages[middle].SequenceNumber)
{
var Body = receivedMessages[middle].Body;
var MessageId = receivedMessages[middle].MessageId;
var CorrelationId = receivedMessages[middle].CorrelationId;
var msg = new ServiceBusMessage
{
Body = Body,
MessageId = MessageId,
CorrelationId = CorrelationId
};
//Sending the dead letter message to active queue along with its other properties i.e MessageId, etc.
await sender.SendMessageAsync(msg);
Console.WriteLine($"Message has been published from dead letter to Active Queue.");
//complete the dead letter message which will remove the message from dead letter queue
await receiver.CompleteMessageAsync(receivedMessages[middle]);
//Changing the sequenceFlag value to 1, if requested sequence number exists in DLQ.
sequenceFlag = 1;
//clean up the service bus resources used by sender and receiver
await sender.DisposeAsync();
await receiver.DisposeAsync();
break;
}
else if (seqNum < receivedMessages[middle].SequenceNumber)
upper = middle - 1;
else
lower = middle + 1;
}
if (sequenceFlag != 1)
{
return ($"Sequence number: {seqNum} doesn't exist in queue - {queueName}");
}
else
{
return "Data is moved to Active queue";
}
}
This code works if we have less than 100 of messages in DLQ, but when we have more than 100 or 1000 of messages in DLQ, it doesn't read the dead letter messages available in DLQ and throws an output as message is not available for sequence number 'xyz'.
This issue happens due to the ReceiveMessagesAsync(MaxMessages) method, as it doesn't guarantee that it will return the exact same number of messages as requested in its parameter.
Do we have any other method from which we can extract n number of messages at once and can perform the further action on the same?
I have Also tried to peek the message based on sequence number and sends its content to active queue as a new message but I am not able to complete that specific message in DLQ.
Can we delete/complete the dead letter message based on sequence number?

Receiving N number of messages is not guaranteed when using the ReceiveMessageAsync method. If you need to receive N messages, you would need to loop until you've reached that amount, e.g.:
var remaining = numMessages;
var receivedMsgs = new List<ServiceBusReceivedMessage>();
while (remaining > 0)
{
// loop in case we don't receive all messages in one attempt
var received = await receiver.ReceiveMessagesAsync(remaining);
receivedMsgs.AddRange(received);
remaining -= received.Count;
}
Alternatively, if you are able to design your application in such a way that the messages can be deferred rather than deadlettered for this scenario, you can receive deferred messages directly based on the sequence number:
var deferredMessage = await receiver.ReceiveDeferredMessageAsync(seqNumber);

Sending messages on scale to Service Bus from durable functions

I have a scenario where one activity function has retrieved a set of records which can be anywhere from 1000 to a million and stored in an object. This object is then used by the next activity function to send messages in parallel to service bus.
Currently I am using a for loop on this object to send each record in the object to service bus. Please let me know if there is a better alternative pattern where the object or content (wherever it is stored) is emptied to be sent to service bus and the function scales out automatically without restricting the processing to a for loop.
Have used a for loop from a function that orchestrates to call activity functions for the records in the object.
Have looked at the scaling of the activity function and for a set of 18000 records it has scaled up-to 15 instances and processed the whole set in 4 minutes.
Currently the function is using the consumption plan.Checked to see that only this function app is using this plan and its not shared.
The topic to which the message is sent has another service listening to it, to read the message.
The instance count for both orchestrating & activity function is as available by default.
for(int i=0;i<number_messages;i++)
{
taskList[i] =
context.CallActivityAsync<string>("Sendtoservicebus",
(messages[i],runId,CorrelationId,Code));
}
try
{
await Task.WhenAll(taskList);
}
catch (AggregateException ae)
{
ae.Flatten();
}
The messages should be quickly sent to service bus by scaling out the activity functions appropriately.

I would suggest you to use Batch for sending messages.
Azure Service Bus client supports sending messages in batches (SendBatch and SendBatchAsync methods of QueueClient and TopicClient). However, the size of a single batch must stay below 256k bytes, otherwise the whole batch will get rejected.
We will start with a simple use case: the size of each message is known to us. It's defined by hypothetical Func getSize function. Here is a helpful extension method that will split an arbitrary collection based on a metric function and maximum chunk size:
public static List<List<T>> ChunkBy<T>(this IEnumerable<T> source, Func<T, long> metric, long maxChunkSize)
{
return source
.Aggregate(
new
{
Sum = 0L,
Current = (List<T>)null,
Result = new List<List<T>>()
},
(agg, item) =>
{
var value = metric(item);
if (agg.Current == null || agg.Sum + value > maxChunkSize)
{
var current = new List<T> { item };
agg.Result.Add(current);
return new { Sum = value, Current = current, agg.Result };
}
agg.Current.Add(item);
return new { Sum = agg.Sum + value, agg.Current, agg.Result };
})
.Result;
}
Now, the implementation of SendBigBatchAsync is simple:
public async Task SendBigBatchAsync(IEnumerable<T> messages, Func<T, long> getSize)
{
var chunks = messages.ChunkBy(getSize, MaxServiceBusMessage);
foreach (var chunk in chunks)
{
var brokeredMessages = chunk.Select(m => new BrokeredMessage(m));
await client.SendBatchAsync(brokeredMessages);
}
}
private const long MaxServiceBusMessage = 256000;
private readonly QueueClient client;
how do we determine the size of each message? How do we implement getSize function?
BrokeredMessage class exposes Size property, so it might be tempting to rewrite our method the following way:
public async Task SendBigBatchAsync<T>(IEnumerable<T> messages)
{
var brokeredMessages = messages.Select(m => new BrokeredMessage(m));
var chunks = brokeredMessages.ChunkBy(bm => bm.Size, MaxServiceBusMessage);
foreach (var chunk in chunks)
{
await client.SendBatchAsync(chunk);
}
}
The last possibility that I want to consider is actually allow yourself violating the max size of the batch, but then handle the exception, retry the send operation and adjust future calculations based on actual measured size of the failed messages. The size is known after trying to SendBatch, even if operation failed, so we can use this information.
// Sender is reused across requests
public class BatchSender
{
private readonly QueueClient queueClient;
private long batchSizeLimit = 262000;
private long headerSizeEstimate = 54; // start with the smallest header possible
public BatchSender(QueueClient queueClient)
{
this.queueClient = queueClient;
}
public async Task SendBigBatchAsync<T>(IEnumerable<T> messages)
{
var packets = (from m in messages
let bm = new BrokeredMessage(m)
select new { Source = m, Brokered = bm, BodySize = bm.Size }).ToList();
var chunks = packets.ChunkBy(p => this.headerSizeEstimate + p.Brokered.Size, this.batchSizeLimit);
foreach (var chunk in chunks)
{
try
{
await this.queueClient.SendBatchAsync(chunk.Select(p => p.Brokered));
}
catch (MessageSizeExceededException)
{
var maxHeader = packets.Max(p => p.Brokered.Size - p.BodySize);
if (maxHeader > this.headerSizeEstimate)
{
// If failed messages had bigger headers, remember this header size
// as max observed and use it in future calculations
this.headerSizeEstimate = maxHeader;
}
else
{
// Reduce max batch size to 95% of current value
this.batchSizeLimit = (long)(this.batchSizeLimit * .95);
}
// Re-send the failed chunk
await this.SendBigBatchAsync(packets.Select(p => p.Source));
}
}
}
}
You can use this blog for further reference. Hope it helps.

Random partitioner does not distribute messages between Kafka topic partitions

I've created a topic in Kafka with 9 partitions, naming it aptly 'test', and knocked together two simple applications in C# (.NET Core), using Confluent.Kafka client library: a producer and a consumer. I did little more than tweak examples from the documentation.
I am running two instances of the consumer application and one instance of the producer. I don't see much point in pasting the consumer code here, it's a trivial 'get a message, print it on screen' app, however, it does also print the number of the partition the message came from.
This is the producer app:
static async Task Main(string[] args)
{
var random = new Random();
var config = new ProducerConfig {
BootstrapServers = "10.0.0.5:9092",
Partitioner = Partitioner.ConsistentRandom
};
int counter = 0;
while (true)
{
using (var p = new ProducerBuilder<string, string>(config).Build())
{
try
{
p.BeginProduce(
"test",
new Message<string, string>
{
//Key = random.Next().ToString(),
Value = $"test {++counter}"
});
if (counter % 10 == 0)
p.Flush();
}
catch (ProduceException<Null, string> e)
{
Console.WriteLine($"Delivery failed: {e.Error.Reason}");
}
}
}
}
Problem: If the Key property of the message is not set, all messages get sent to the partition number 7, meaning that one of my consumer instances is idle. I had to manually randomise the key in order to distribute them between partitions (see the commented out line). (The original code, as copied from the docs, used Null as the type of the key, and this sent all messages to the 7th partition too.)
Why is that? According to the documentation of the ProducerConfig.Partitioner property, the consistent_random option should ensure random distribution if the key is not specified. I tried using the Partioner.Random option, which should use random distribution regardless of the key, but this did not help.
Is this the expected behaviour, am I doing something wrong, or did I come across a bug?
I am using version 1.0.0-RC2 of Confluent.Kafka NuGet.
Complete documentation of the Partitioner config:
// Summary:
// Partitioner: `random` - random distribution, `consistent` - CRC32 hash of key
// (Empty and NULL keys are mapped to single partition), `consistent_random` - CRC32
// hash of key (Empty and NULL keys are randomly partitioned), `murmur2` - Java
// Producer compatible Murmur2 hash of key (NULL keys are mapped to single partition),
// `murmur2_random` - Java Producer compatible Murmur2 hash of key (NULL keys are
// randomly partitioned. This is functionally equivalent to the default partitioner
// in the Java Producer.). default: consistent_random importance: high

I encountered the same issue.
Seems like when initiating a client, the first message will always go the same partition.
The Partioner.Random will work if you use the same client for all your messages

How to programatically check if NServiceBus has finished processing all messages

As part of an effort to automate starting/stopping some of our NServiceBus services, I'd like to know when a service has finished processing all the messages in it's input queue.
The problem is that, while the NServiceBus service is running, my C# code is reporting one less message than is actually there. So it thinks that the queue is empty when there is still one message left. If the service is stopped, it reports the "correct" number of messages. This is confusing because, when I inspect the queues myself using the Private Queues view in the Computer Management application, it displays the "correct" number.
I'm using a variant of the following C# code to find the message count:
var queue = new MessageQueue(path);
return queue.GetAllMessages().Length;
I know this will perform horribly when there are many messages. The queues I'm inspecting should only ever have a handful of messages at a time.
I have looked at
other
related
questions,
but haven't found the help I need.
Any insight or suggestions would be appreciated!
Update: I should have mentioned that this service is behind a Distributor, which is shut down before trying to shut down this service. So I have confidence that new messages will not be added to the service's input queue.

The thing is that it's not actually "one less message", but rather dependent on the number of messages currently being processed by the endpoint which, in a multi-threaded process, can be as high as the number of threads.
There's also the issue of client processes that continue to send messages to that same queue.
Probably the only "sure" way of handling this is by counting the messages multiple times with a delay in between and if the number stay zero over a certain number of attempts that you can assume the queue is empty.

WMI was the answer! Here's a first pass at the code. It could doubtless be improved.
public int GetMessageCount(string queuePath)
{
const string query = "select * from Win32_PerfRawData_MSMQ_MSMQQueue";
var query = new WqlObjectQuery(query);
var searcher = new ManagementObjectSearcher(query);
var queues = searcher.Get();
foreach (ManagementObject queue in queues)
{
var name = queue["Name"].ToString();
if (AreTheSameQueue(queuePath, name))
{
// Depending on the machine (32/64-bit), this value is a different type.
// Casting directly to UInt64 or UInt32 only works on the relative CPU architecture.
// To work around this run-time unknown, convert to string and then parse to int.
var countAsString = queue["MessagesInQueue"].ToString();
var messageCount = int.Parse(countAsString);
return messageCount;
}
}
return 0;
}
private static bool AreTheSameQueue(string path1, string path2)
{
// Tests whether two queue paths are equivalent, accounting for differences
// in case and length (if one path was truncated, for example by WMI).
string sanitizedPath1 = Sanitize(path1);
string sanitizedPath2 = Sanitize(path2);
if (sanitizedPath1.Length > sanitizedPath2.Length)
{
return sanitizedPath1.StartsWith(sanitizedPath2);
}
if (sanitizedPath1.Length < sanitizedPath2.Length)
{
return sanitizedPath2.StartsWith(sanitizedPath1);
}
return sanitizedPath1 == sanitizedPath2;
}
private static string Sanitize(string queueName)
{
var machineName = Environment.MachineName.ToLowerInvariant();
return queueName.ToLowerInvariant().Replace(machineName, ".");
}

Track dead WebDriver instances during parallel task

I am seeing some dead-instance weirdness running parallelized nested-loop web stress tests using Selenium WebDriver, simple example being, say, hit 300 unique pages with 100 impressions each.
I'm "successfully" getting 4 - 8 WebDriver instances going using a ThreadLocal<FirefoxWebDriver> to isolate them per task thread, and MaxDegreeOfParallelism on a ParallelOptions instance to limit the threads. I'm partitioning and parallelizing the outer loop only (the collection of pages), and checking .IsValueCreated on the ThreadLocal<> container inside the beginning of each partition's "long running task" method. To facilitate cleanup later, I add each new instance to a ConcurrentDictionary keyed by thread id.
No matter what parallelizing or partitioning strategy I use, the WebDriver instances will occasionally do one of the following:
Launch but never show a URL or run an impression
Launch, run any number of impressions fine, then just sit idle at some point
When either of these happen, the parallel loop eventually seems to notice that a thread isn't doing anything, and it spawns a new partition. If n is the number of threads allowed, this results in having n productive threads only about 50-60% of the time.
Cleanup still works fine at the end; there may be 2n open browsers or more, but the productive and unproductive ones alike get cleaned up.
Is there a way to monitor for these useless WebDriver instances and a) scavenge them right away, plus b) get the parallel loop to replace the task segment immediately, instead of lagging behind for several minutes as it often does now?

I was having a similar problem. It turns out that WebDriver doesn't have the best method for finding open ports. As described here it gets a system wide lock on ports, finds an open port, and then starts the instance. This can starve the other instances that you're trying to start of ports.
I got around this by specifying a random port number directly in the delegate for the ThreadLocal<IWebDriver> like this:
var ports = new List<int>();
var rand = new Random((int)DateTime.Now.Ticks & 0x0000FFFF);
var driver = new ThreadLocal<IWebDriver>(() =>
{
var profile = new FirefoxProfile();
var port = rand.Next(50) + 7050;
while(ports.Contains(port) && ports.Count != 50) port = rand.Next(50) + 7050;
profile.Port = port;
ports.Add(port);
return new FirefoxDriver(profile);
});
This works pretty consistently for me, although there's the issue if you end up using all 50 in the list that is unresolved.

Since there is no OnReady event nor an IsReady property, I worked around it by sleeping the thread for several seconds after creating each instance. Doing that seems to give me 100% durable, functioning WebDriver instances.
Thanks to your suggestion, I've implemented IsReady functionality in my open-source project Webinator. Use that if you want, or use the code outlined below.
I tried instantiating 25 instances, and all of them were functional, so I'm pretty confident in the algorithm at this point (I leverage HtmlAgilityPack to see if elements exist, but I'll skip it for the sake of simplicity here):
public void WaitForReady(IWebDriver driver)
{
var js = #"{ var temp=document.createElement('div'); temp.id='browserReady';" +
#"b=document.getElementsByTagName('body')[0]; b.appendChild(temp); }";
((IJavaScriptExecutor)driver).ExecuteScript(js);
WaitForSuccess(() =>
{
IWebElement element = null;
try
{
element = driver.FindElement(By.Id("browserReady"));
}
catch
{
// element not found
}
return element != null;
},
timeoutInMilliseconds: 10000);
js = #"{var temp=document.getElementById('browserReady');" +
#" temp.parentNode.removeChild(temp);}";
((IJavaScriptExecutor)driver).ExecuteScript(js);
}
private bool WaitForSuccess(Func<bool> action, int timeoutInMilliseconds)
{
if (action == null) return false;
bool success;
const int PollRate = 250;
var maxTries = timeoutInMilliseconds / PollRate;
int tries = 0;
do
{
success = action();
tries++;
if (!success && tries <= maxTries)
{
Thread.Sleep(PollRate);
}
}
while (!success && tries < maxTries);
return success;
}
The assumption is if the browser is responding to javascript functions and is finding elements, then it's probably a reliable instance and ready to be used.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Kafka: Consume partition with manual batching - Messages are being skipped - c#

Related

can we delete a specific dead letter message based on the sequence number from dead letter queue in Azure service bus using .net code?

Sending messages on scale to Service Bus from durable functions

Random partitioner does not distribute messages between Kafka topic partitions

How to programatically check if NServiceBus has finished processing all messages

Track dead WebDriver instances during parallel task

Categories

Resources