Confluent Kafka dot net ProduceAsync tasks

Confluent Kafka dot net ProduceAsync tasks - c#

var msgs = new List<string> {“msg1”, “msg2”, “msg3”};
var tasks = new List<Task>();
Foreach(var msg in msgs) {
tasks.add(_producer.ProduceAsync(...)); }
var deliveryReports = Task.WhenAll(tasks).Result;
My Kafka producer config:
Batch size: 10
Linger:100 ms
My question is, do the tasks get completed in the order they were
created. Can I guarantee that the task representing msg1 completes
before the task representing msg2 or msg3.
Thanks.

Ok I think I now understand how the producer and the broker works to achieve ordering.
So, when ProduceAsync is called, it adds the message to the send buffer, creates promise that is used to complete future and returns future.So, it creates task completion source object and returns its task.
The client library(librdkafka) waits until it receives the configured number of messages or timeout period to batch the messages. A batch is created containing the messages in the same order as in the send buffer. The batch is partitioned (randomly if the default partitioner is used) based on their destination partitions/topics, i.e. split into smaller batches. Each post-split batch is sent to the respective leader broker/ISR (the individual send()’s happen sequentially), and each is acked by its respective leader broker according to request.required.acks. The client library invokes a callback on each ack it receives and the callback completes its respective future i.e taskCompletionSource.Set();

There's a couple of things here.
First, librdkafka has the capability to manage re-tries for you and by default it does ('retries' is set to 2) - so this can cause re-ordering of message delivery and delivery reports. To ensure this doesn't happen you can set 'max.in.flight' to 1 (or 'retries' to 0 and manage this yourself).
With librdkafka configured to supply delivery-reports back to .net in the order the messages are sent, the question becomes one of Task completion ordering guarantees. I need to think about this for more than 5 minutes to give a good answer, but for now assume ordering is not guaranteed (I will write more later). You can get guaranteed ordering by using the variants of ProduceAsync that accept an IDeliveryReport handler. Note that in version 1.0, these methods will be changed somewhat and will be called BeginProduce.

Related

Consume messages from Kafka Topic - no response

var configs = new Dictionary<string, string>
{
{"bootstrap.servers", MY_SERVER},
{"security.protocol", "SASL_PLAINTEXT"},
{"sasl.mechanism", "SCRAM-SHA-256"},
{"sasl.username", "MY_USERNAME"},
{"sasl.password", "MY_PWD"},
{"group.id", "sample_group"} // added
};
var consumerConfig = new ConsumerConfig(configs);
using (var schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryConfig))
using (var consumer = new ConsumerBuilder<string, MyModel>(consumerConfig)
.SetKeyDeserializer(new AvroDeserializer<string>(schemaRegistry, avroSerializerConfig).AsSyncOverAsync())
.SetValueDeserializer(new AvroDeserializer<MyModel>(schemaRegistry, avroSerializerConfig).AsSyncOverAsync())
.Build())
{
consumer.Subscribe(TOPIC_NAME);
while (true)
{
var result = consumer.Consume(); //stuck here
Console.WriteLine(result);
}
}
As stated in the code, there is no response coming from consumer.Consume() . It does not throw any error message even during consumer.Subscribe() What will be the possible reason? (I am new to Kafka Consumer)
Maybe there is no message in Topic, so nothing to receive?
The code asked for missing 'group.id', so I added {"group.id", "sample_group"} in config and wrap with ConsumerConfig. Is random name ("sample_group") allowed for group.id or should it be something retrieved from Topic information?
anything else?

Your code looks fine and the fact that no errors and Exceptions are showing up is also a good sign.
"1. Maybe there is no message in Topic, so nothing to receive?"
Even if there are no messages in the Kafka topic, your observation matches the expected behavior. In the while(true) loop you are continuously trying to fetch data from the topic and if nothing can be fetched the consumer will try again in the next iteration. Consumers on Kafka topics are meant to read a topic sequentially while running continuously. It is totally fine that some times the consumers has consumed all messages and stays idle for some time until new message arrive into the topic. During the waiting time the consumer will not stop or crash.
Keep in mind that messages in a Kafka topic have by default a retention period of 7 days. After that time, the messages will be deleted.
"2. The code asked for missing 'group.id', so I added {"group.id", "sample_group"} in config and wrap with ConsumerConfig. Is random name ("sample_group") allowed for group.id or should it be something retrieved from Topic information?"
Yes, the name "sample_group" is allowed as a ConsumerGroup name. There are no reserved consumer group names so this name will not cause any trouble.
"3. anything else?"
By default, a KafkaConsumer reads the messages from "latest" offset. That means, if you run a ConsumerGroup for the very first time it will not read all messages from beginning but rather from end. Check the consumer configurations in the .net Kafka-API documentation for something like auto_offset_reset. You might set this configuration to "earliest" in case you want to read all messages from beginning. Please note, that as soon as you run your application with a given ConsumerGroup for the first time, the second time you run this application this configuration auto_offset_reset will not have any impact because the ConsumerGroup is now registered within Kafka.
What you can usually do, to ensure that the consumer should actually read messages is if you start your consumer before you start producing messages to that topic. Then, (almost) independent of your configuration you should see data flowing through your application.

Apache NMS using ActiveMQ: How do I use transactional acknowledge mode but still acknowledging/rolling back a single message every time?

I use Apache NMS (in c#) to receive messages from ActiveMQ.
I want to be able to acknowledge every message I received, or roll back a message in case I had an error.
I solved the first part by using the CreateSession(AcknowledgementMode.IndividualAcknowledge), and then for every received message I use message.Acknowledge().
The problem is that in this mode there is no Rollback option. if the message is not acknowledged - I can never receive it again for another trial. It can only be sent to another consumer, but there isn't another consumer so it is just stucked in queue.
So I tried to use AcknowledgementMode.Transactional instead, but here there is another problem: I can only use session.Commit() or session.Rollback(), but there is no way to know which specific message I commit or role back.
What is the correct way to do this?

Stay with INDIVIDUAL_ACKNOWLEDGE and then try session.recover() and session.close(). Both of those should signal to the broker that the messages are not going to be acknowledged.

My solution to this was to throw an exception if (for any reason (exception from db savechanges event for example)) I did not want to acknowledge the message with message.Acknowledge().
When you throw an exception inside your extended method of IMessageConsumer Listener then the message will be sent again to your consumer for about 5 times (it will then moved to default DLQ queue for investigation).
However you can change this using the RedeliveryPolicy in connection object.
Example of Redelivery
Policy redeliveryPolicy = new RedeliveryPolicy
{
InitialRedeliveryDelay = 5000, // every 5 secs
MaximumRedeliveries = 10, // the message will be redelivered 10 times
UseCollisionAvoidance = true, // use this to random calculate the 5 secs
CollisionAvoidancePercent = 50,// used along with above option
UseExponentialBackOff = false
};
If message fails again (after 10 times) then it will be moved to a default DLQ queue. (this queue will be created automatically)
You can use this queue to investigate the messages that have not been acknowledged using an other consumer.

SubscriptionClient.RecieveBatch not retrieving all the brokered messages

I have a console application to read all the brokered messages present in the subscription on the Azure Service Bus. I have around 3500 messages in there. This is my code to read the messages:
SubscriptionClient client = messagingFactory.CreateSubscriptionClient(topic, subscription);
long count = namespaceManager.GetSubscription(topic, subscription).MessageCountDetails.ActiveMessageCount;
Console.WriteLine("Total messages to process : {0}", count.ToString()); //Here the number is showing correctly
IEnumerable<BrokeredMessage> dlIE = null;
dlIE = client.ReceiveBatch(Convert.ToInt32(count));
When I execute the code, in the dlIE, I can see only 256 messages. I have also tried giving the prefetch count like this client.PrefetchCountbut then also it returns 256 messages only.
I think there is some limit to the number of messages that can be retrieved at a time.However there is no such thing mentioned on the msdn page for the RecieveBatch method. What can I do to retrieve all messages at a time?
Note:
I only want to read the message and then let it exist on the service bus. Therefore I do not use message.complete method.
I cannot remove and re-create the topic/subscription from the Service Bus.
Edit:
I used PeekBatch instead of ReceiveBatch like this:
IEnumerable<BrokeredMessage> dlIE = null;
List<BrokeredMessage> bmList = new List<BrokeredMessage>();
long i = 0;
dlIE = subsciptionClient.PeekBatch(Convert.ToInt32(count)); // count is the total number of messages in the subscription.
bmList.AddRange(dlIE);
i = dlIE.Count();
if(i < count)
{
while(i < count)
{
IEnumerable<BrokeredMessage> dlTemp = null;
dlTemp = subsciptionClient.PeekBatch(i, Convert.ToInt32(count));
bmList.AddRange(dlTemp);
i = i + dlTemp.Count();
}
}
I have 3255 messages in the subscription. When the first time peekBatch is called it gets 250 messages. so it goes into the while loop with PeekBatch(250,3225). Every time 250 messages are only received. The final total messages I am having in the output list is 3500 with duplicates. I am not able to understand how this is happening.

I have figured it out. The subscription client remembers the last batch it retrieved and when called again, retrieves the next batch.
So the code would be :
IEnumerable<BrokeredMessage> dlIE = null;
List<BrokeredMessage> bmList = new List<BrokeredMessage>();
long i = 0;
while (i < count)
{
dlIE = subsciptionClient.PeekBatch(Convert.ToInt32(count));
bmList.AddRange(dlIE);
i = i + dlIE.Count();
}
Thanks to MikeWo for guidance
Note: There seems to be some kind of a size limit on the number of messages you can peek at a time. I tried with different subscriptions and the number of messages fetched were different for each.

Is the topic you are writing to partitioned by chance? When you receive messages from a partitioned entity it will only fetch from one of the partitions at a time. From MSDN:
"When a client wants to receive a message from a partitioned queue, or from a subscription of a partitioned topic, Service Bus queries all fragments for messages, then returns the first message that is returned from any of the messaging stores to the receiver. Service Bus caches the other messages and returns them when it receives additional receive requests. A receiving client is not aware of the partitioning; the client-facing behavior of a partitioned queue or topic (for example, read, complete, defer, deadletter, prefetching) is identical to the behavior of a regular entity."
It's probably not a good idea to assume that even with a non partitioned entity that you'd get all messages in one go with really either the Receive or Peek methods. It would be much more efficient to loop through the messages in much smaller batches, especially if your message have any decent size to them or are indeterminate in size.
Since you don't actually want to remove the message from the queue I'd suggest using PeekBatch instead of ReceiveBatch. This lets you get a copy of the message and doesn't lock it. I'd highly suggest a loop using the same SubscriptionClient in conjunction with PeekBatch. By using the same SubscriptionClient with PeekBatch under the hood the last pulled sequence number is kept as as you loop through it should keep track and go through the whole queue. This would essentially let you read through the entire queue.

I came across a similar issue where client.ReceiveBatchAsync(....) would not retrieve any data from the subscription in the azure service bus.
After some digging around I found out that there is a bit for each subscriber to enable batch operations. This can only be enabled through powershell. Below is the command I used:
$subObject = Get-AzureRmServiceBusSubscription -ResourceGroup '#resourceName' -NamespaceName '#namespaceName' -Topic '#topicName' -SubscriptionName '#subscriptionName'
$subObject.EnableBatchedOperations = $True
Set-AzureRmServiceBusSubscription -ResourceGroup '#resourceName' -NamespaceName '#namespaceName' -Topic '#topicName'-SubscriptionObj $subObject
More details can be found here. While it still didn't load all the messages at least it started to clear the queue. As far as I'm aware, the batch size parameter is only there as a suggestion to the service bus but not a rule.
Hope it helps!

How does PubSub work in BookSleeve/ Redis?

I wonder what the best way is to publish and subscribe to channels using BookSleeve. I currently implement several static methods (see below) that let me publish content to a specific channel with the newly created channel being stored in private static Dictionary<string, RedisSubscriberConnection> subscribedChannels;.
Is this the right approach, given I want to publish to channels and subscribe to channels within the same application (note: my wrapper is a static class). Is it enough to create one channel even I want to publish and subscribe? Obviously I would not publish to the same channel than I would subscribe to within the same application. But I tested it and it worked:
RedisClient.SubscribeToChannel("Test").Wait();
RedisClient.Publish("Test", "Test Message");
and it worked.
Here my questions:
1) Will it be more efficient to setup a dedicated publish channel and a dedicated subscribe channel rather than using one channel for both?
2) What is the difference between "channel" and "PatternSubscription" semantically? My understanding is that I can subscribe to several "topics" through PatternSubscription() on the same channel, correct? But if I want to have different callbacks invoked for each "topic" I would have to setup a channel for each topic correct? Is that efficient or would you advise against that?
Here the code snippets.
Thanks!!!
public static Task<long> Publish(string channel, byte[] message)
{
return connection.Publish(channel, message);
}
public static Task SubscribeToChannel(string channelName)
{
string subscriptionString = ChannelSubscriptionString(channelName);
RedisSubscriberConnection channel = connection.GetOpenSubscriberChannel();
subscribedChannels[subscriptionString] = channel;
return channel.PatternSubscribe(subscriptionString, OnSubscribedChannelMessage);
}
public static Task UnsubscribeFromChannel(string channelName)
{
string subscriptionString = ChannelSubscriptionString(channelName);
if (subscribedChannels.Keys.Contains(subscriptionString))
{
RedisSubscriberConnection channel = subscribedChannels[subscriptionString];
Task task = channel.PatternUnsubscribe(subscriptionString);
//remove channel subscription
channel.Close(true);
subscribedChannels.Remove(subscriptionString);
return task;
}
else
{
return null;
}
}
private static string ChannelSubscriptionString(string channelName)
{
return channelName + "*";
}

1: there is only one channel in your example (Test); a channel is just the name used for a particular pub/sub exchange. It is, however, necessary to use 2 connections due to specifics of how the redis API works. A connection that has any subscriptions cannot do anything else except:
listen to messages
manage its own subscriptions (subscribe, psubscribe, unsubscribe, punsubscribe)
However, I don't understand this:
private static Dictionary<string, RedisSubscriberConnection>
You shouldn't need more than one subscriber connection unless you are catering for something specific to you. A single subscriber connection can handle an arbitrary number of subscriptions. A quick check on client list on one of my servers, and I have one connection with (at time of writing) 23,002 subscriptions. Which could probably be reduced, but: it works.
2: pattern subscriptions support wildcards; so rather than subscribing to /topic/1, /topic/2/ etc you could subscribe to /topic/*. The name of the actual channel used by publish is provided to the receiver as part of the callback signature.
Either can work. It should be noted that the performance of publish is impacted by the total number of unique subscriptions - but frankly it is still stupidly fast (as in: 0ms) even if you have tens of multiple thousands of subscribed channels using subscribe rather than psubscribe.
But from publish
Time complexity: O(N+M) where N is the number of clients subscribed to the receiving channel and M is the total number of subscribed patterns (by any client).
I recommend reading the redis documentation of pub/sub.
Edit for follow on questions:
a) I assume I would have to "publish" synchronously (using Result or Wait()) if I want to guarantee the order of sending items from the same publisher is preserved when receiving items, correct?
that won't make any difference at all; since you mention Result / Wait(), I assume you're talking about BookSleeve - in which case the multiplexer already preserves command order. Redis itself is single threaded, and will always process commands on a single connection in order. However: the callbacks on the subscriber may be executed asynchronously and may be handed (separately) to a worker thread. I am currently investigating whether I can force this to be in-order from RedisSubscriberConnection.
Update: from 1.3.22 onwards you can set the CompletionMode to PreserveOrder - then all callbacks will be completed sequentially rather than concurrently.
b) after making adjustments according to your suggestions I get a great performance when publishing few items regardless of the size of the payload. However, when sending 100,000 or more items by the same publisher performance drops rapidly (down to 7-8 seconds just to send from my machine).
Firstly, that time sounds high - testing locally I get (for 100,000 publications, including waiting for the response for all of them) 1766ms (local) or 1219ms (remote) (that might sound counter-intuitive, but my "local" isn't running the same version of redis; my "remote" is 2.6.12 on Centos; my "local" is
2.6.8-pre2 on Windows).
I can't make your actual server faster or speed up the network, but: in case this is packet fragmentation, I have added (just for you) a SuspendFlush() / ResumeFlush() pair. This disables eager-flushing (i.e. when the send-queue is empty; other types of flushing still happen); you might find this helps:
conn.SuspendFlush();
try {
// start lots of operations...
} finally {
conn.ResumeFlush();
}
Note that you shouldn't Wait until you have resumed, because until you call ResumeFlush() there could be some operations still in the send-buffer. With that all in place, I get (for 100,000 operations):
local: 1766ms (eager-flush) vs 1554ms (suspend-flush)
remote: 1219ms (eager-flush) vs 796ms (suspend-flush)
As you can see, it helps more with remote servers, as it will be putting fewer packets through the network.
I cannot use transactions because later on the to-be-published items are not all available at once. Is there a way to optimize with that knowledge in mind?
I think that is addressed by the above - but note that recently CreateBatch was added too. A batch operates a lot like a transaction - just: without the transaction. Again, it is another mechanism to reduce packet fragmentation. In your particular case, I suspect the suspend/resume (on flush) is your best bet.
Do you recommend having one general RedisConnection and one RedisSubscriberConnection or any other configuration to have such wrapper perform desired functions?
As long as you're not performing blocking operations (blpop, brpop, brpoplpush etc), or putting oversized BLOBs down the wire (potentially delaying other operations while it clears), then a single connection of each type usually works pretty well. But YMMV depending on your exact usage requirements.

MSMQ Receive() method timeout

My original question from a while ago is MSMQ Slow Queue Reading, however I have advanced from that and now think I know the problem a bit more clearer.
My code (well actually part of an open source library I am using) looks like this:
queue.Receive(TimeSpan.FromSeconds(10), MessageQueueTransactionType.Automatic);
Which is using the Messaging.MessageQueue.Receive function and queue is a MessageQueue. The problem is as follows.
The above line of code will be called with the specified timeout (10 seconds). The Receive(...) function is a blocking function, and is supposed to block until a message arrives in the queue at which time it will return. If no message is received before the timeout is hit, it will return at the timeout. If a message is in the queue when the function is called, it will return that message immediately.
However, what is happening is the Receive(...) function is being called, seeing that there is no message in the queue, and hence waiting for a new message to come in. When a new message comes in (before the timeout), it isn't detecting this new message and continues waiting. The timeout is eventually hit, at which point the code continues and calls Receive(...) again, where it picks up the message and processes it.
Now, this problem only occurs after a number of days/weeks. I can make it work normally again by deleting & recreating the queue. It happens on different computers, and different queues. So it seems like something is building up, until some point when it breaks the triggering/notification ability that the Receive(...) function uses.
I've checked a lot of different things, and everything seems normal & isn't different from a queue that is working normally. There is plenty of disk space (13gig free) and RAM (about 350MB free out of 1GB from what I can tell). I have checked registry entries which all appear the same as other queues, and the performance monitor doesn't show anything out of the normal. I have also run the TMQ tool and can't see anything noticably wrong from that.
I am using Windows XP on all the machines and they all have service pack 3 installed. I am not sending a large amount of messages to the queues, at most it would be 1 every 2 seconds but generally a lot less frequent than that. The messages are only small too and nowhere near the 4MB limit.
The only thing I have just noticed is the p0000001.mq and r0000067.mq files in C:\WINDOWS\system32\msmq\storage are both 4,096KB however they are that size on other computers also which are not currently experiencing the problem. The problem does not happen to every queue on the computer at once, as I can recreate 1 problem queue on the computer and the other queues still experience the problem.
I am not very experienced with MSMQ so if you post possible things to check can you please explain how to check them or where I can find more details on what you are talking about.
Currently the situation is:
ComputerA - 4 queues normal
ComputerB - 2 queues experiencing problem, 1 queue normal
ComputerC - 2 queues experiencing problem
ComputerD - 1 queue normal
ComputerE - 2 queues normal
So I have a large number of computers/queues to compare and test against.

Any particular reason you aren't using an event handler to listen to the queues? The System.Messaging library allows you to attach a handler to a queue instead of, if I understand what you are doing correctly, looping Receive every 10 seconds. Try something like this:
class MSMQListener
{
public void StartListening(string queuePath)
{
MessageQueue msQueue = new MessageQueue(queuePath);
msQueue.ReceiveCompleted += QueueMessageReceived;
msQueue.BeginReceive();
}
private void QueueMessageReceived(object source, ReceiveCompletedEventArgs args)
{
MessageQueue msQueue = (MessageQueue)source;
//once a message is received, stop receiving
Message msMessage = null;
msMessage = msQueue.EndReceive(args.AsyncResult);
//do something with the message
//begin receiving again
msQueue.BeginReceive();
}
}

We are also using NServiceBus and had a similar problem inside our network.
Basically, MSMQ is using UDP with two-phase commits. After a message is received, it has to be acknowledged. Until it is acknowledged, it cannot be received on the client side as the receive transaction hasn't been finalized.
This was caused by different things in different times for us:
once, this was due to the Distributed Transaction Coordinator unable to communicate between machines as firewall misconfiguration
another time, we were using cloned virtual machines without sysprep which made internal MSMQ ids non-unique and made it receive a message to one machine and ack to another. Eventually, MSMQ figures things out but it takes quite a while.

Try this
public Message Receive( TimeSpan timeout, Cursor cursor )
overloaded function.
To get a cursor for a MessageQueue, call the CreateCursor method for that queue.
A Cursor is used with such methods as Peek(TimeSpan, Cursor, PeekAction) and Receive(TimeSpan, Cursor) when you need to read messages that are not at the front of the queue. This includes reading messages synchronously or asynchronously. Cursors do not need to be used to read only the first message in a queue.
When reading messages within a transaction, Message Queuing does not roll back cursor movement if the transaction is aborted. For example, suppose there is a queue with two messages, A1 and A2. If you remove message A1 while in a transaction, Message Queuing moves the cursor to message A2. However, if the transaction is aborted for any reason, message A1 is inserted back into the queue but the cursor remains pointing at message A2.
To close the cursor, call Close.

If you want to use something completely synchronous and without event you can test this method
public object Receive(string path, int millisecondsTimeout)
{
var mq = new System.Messaging.MessageQueue(path);
var asyncResult = mq.BeginReceive();
var handles = new System.Threading.WaitHandle[] { asyncResult.AsyncWaitHandle };
var index = System.Threading.WaitHandle.WaitAny(handles, millisecondsTimeout);
if (index == 258) // Timeout
{
mq.Close();
return null;
}
var result = mq.EndReceive(asyncResult);
return result;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.