Random partitioner does not distribute messages between Kafka topic partitions - c#

I've created a topic in Kafka with 9 partitions, naming it aptly 'test', and knocked together two simple applications in C# (.NET Core), using Confluent.Kafka client library: a producer and a consumer. I did little more than tweak examples from the documentation.
I am running two instances of the consumer application and one instance of the producer. I don't see much point in pasting the consumer code here, it's a trivial 'get a message, print it on screen' app, however, it does also print the number of the partition the message came from.
This is the producer app:
static async Task Main(string[] args)
{
var random = new Random();
var config = new ProducerConfig {
BootstrapServers = "10.0.0.5:9092",
Partitioner = Partitioner.ConsistentRandom
};
int counter = 0;
while (true)
{
using (var p = new ProducerBuilder<string, string>(config).Build())
{
try
{
p.BeginProduce(
"test",
new Message<string, string>
{
//Key = random.Next().ToString(),
Value = $"test {++counter}"
});
if (counter % 10 == 0)
p.Flush();
}
catch (ProduceException<Null, string> e)
{
Console.WriteLine($"Delivery failed: {e.Error.Reason}");
}
}
}
}
Problem: If the Key property of the message is not set, all messages get sent to the partition number 7, meaning that one of my consumer instances is idle. I had to manually randomise the key in order to distribute them between partitions (see the commented out line). (The original code, as copied from the docs, used Null as the type of the key, and this sent all messages to the 7th partition too.)
Why is that? According to the documentation of the ProducerConfig.Partitioner property, the consistent_random option should ensure random distribution if the key is not specified. I tried using the Partioner.Random option, which should use random distribution regardless of the key, but this did not help.
Is this the expected behaviour, am I doing something wrong, or did I come across a bug?
I am using version 1.0.0-RC2 of Confluent.Kafka NuGet.
Complete documentation of the Partitioner config:
// Summary:
// Partitioner: `random` - random distribution, `consistent` - CRC32 hash of key
// (Empty and NULL keys are mapped to single partition), `consistent_random` - CRC32
// hash of key (Empty and NULL keys are randomly partitioned), `murmur2` - Java
// Producer compatible Murmur2 hash of key (NULL keys are mapped to single partition),
// `murmur2_random` - Java Producer compatible Murmur2 hash of key (NULL keys are
// randomly partitioned. This is functionally equivalent to the default partitioner
// in the Java Producer.). default: consistent_random importance: high

I encountered the same issue.
Seems like when initiating a client, the first message will always go the same partition.
The Partioner.Random will work if you use the same client for all your messages

Related

Kafka: Consume partition with manual batching - Messages are being skipped

I am using Confluent Kafka .NET to create a consumer for a partitioned topic.
As Confluent Kafka .NET does not support consuming in batches, I built a function that consumes messages until the batch size is reached. The idea of this function is to build batches with messages from the same partition only, that is why I stop building the batch once I consume a result that has a different partition and return whatever number of messages I was able to consume up to that point.
Goal or Objective: I want to be able to process the messages I returned in the batch, and commit the offsets for those messages only. i.e:
Message Consumed From Partition
Offset
Stored in Batch
0
0
Yes
0
1
Yes
2
0
No
From the table above I would like to process both messages I got from partition 0. Message from partition 2 would be ignored and (hopefully) PICKED UP LATER in another call to ConsumeBatch.
To commit I simply call the synchronous Commit function passing the offset of the latest message I processed as parameter. In this case I would pass the offset of the second message of the batch shown in the table above (Partition 0 - Offset 1).
ISSUE:
The problem is that for some reason, when I build a batch like the one shown above, the messages I decide not to process because of validations are being ignored forever. i.e: Message 0 of partition 2 will never be picked up by the consumer again.
As you can see in the consumer configuration below, I have set both EnableAutoCommit and EnableAutoOffsetStore as false. I think this would be enough for the consumer to not do anything with the offsets and be able to pick up ignored messages in another Consume call, but it isn't. The offset is somehow increasing up to the latest consumed message for each partition, regardless of my configuration.
Can anybody give me some light on what am I missing here to achieve the desired behavior if possible?
Simplified version of the function to build the batch:
public IEnumerable<ConsumeResult<string, string>> ConsumeBatch(int batchSize)
{
List<ConsumeResult<string, string>> consumedMessages = new List<ConsumeResult<string, string>>();
int latestPartition = -1; // The partition from where we consumed the last message
for (int i = 0; i < batchSize; i++)
{
var result = _consumer.Consume(100);
if (result != null)
{
if (latestPartition == -1 || result.Partition.Value == latestPartition)
{
consumedMessages.Add(result);
latestPartition = result.Partition.Value;
}
else
break;
}
else
break;
}
return consumedMessages;
}
ConsumerConfig used to instantiate my consumer client:
_consumerConfig = new ConsumerConfig
{
BootstrapServers = _bootstrapServers,
EnableAutoCommit = false,
AutoCommitIntervalMs = 0,
GroupId = "WorkerConsumers",
AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoOffsetStore = false,
};
Additional Information:
This is being tested with:
1 topic with 6 partitions and replication factor of 2
3 brokers
1 single-threaded consumer client that belongs to a consumer group
Local environment with wsl2 on Windows 10
The key was to use the Seek function to reset the partition's offset to a specific position so that the ignored message could be picked up again as part of another batch.
In the same function above:
public IEnumerable<ConsumeResult<string, string>> ConsumeBatch(int batchSize)
{
List<ConsumeResult<string, string>> consumedMessages = new List<ConsumeResult<string, string>>();
int latestPartition = -1; // The partition from where we consumed the last message
for (int i = 0; i < batchSize; i++)
{
var result = _consumer.Consume(100);
if (result != null)
{
if (latestPartition == -1 || result.Partition.Value == latestPartition)
{
consumedMessages.Add(result);
latestPartition = result.Partition.Value;
}
else
{
// This call will guarantee that this message that will not be included in the current batch, will be included in another batch later
_consumer.Seek(result.TopicPartitionOffset); // IMPORTANT LINE!!!!!!!
break;
}
}
else
break;
}
return consumedMessages;
}
I think in general, if you want to consume a message without altering the offsets in any way (kinda peeking the topic partition), you can call Consume and then use Seek(result.TopicPartitionOffset) to set the offset of that topic partition back to where it was before consuming the message.

How to produce a message for same topic on different partition with null key

I've just come across Kafka and new to it. So while making setup, I created 6 partitions [0,1...5] and trying to store Messages with a null key to the different partition.
For example:
var Config = new Dictionary<string, object> {
{ "group.id", "topic1_group" },
{ "bootstrap.servers", "localhost:9092" },
{ "default.topic.config",new Dictionary<string, object> {
{ "acks", 1}
}
}
};
var producer = new Producer<string, string>(Config, new StringSerializer(Encoding.UTF8), new StringSerializer(Encoding.UTF8));
return await producer.ProduceAsync("topic1", null, "Message1");
But when I try to run my producer class code, it gets stored in the same partition.
Edit:
Kafka Response: partition 4, offset 10
Now, the again producing the next message - topic1
Kafka Response: partition 4, offset 11
The servers and client are updated to Kafka version 0.10.0 and 0.9.0 respectively.
Looks like you are returning the result of one send request.
Try producing messages in a loop rather than restarting the app, therefore restarting the round-robin partitioning

C# Threading + lock being weird [duplicate]

This question already has answers here:
Random.Next returns always the same values [duplicate]
(4 answers)
Closed 6 years ago.
My program is a multi threaded proxy checker and whenever I return the proxy ip addresses from my method and try to echo them out I'm getting a bunch and the threads are doing it completely unintended. It's supposed to supply each thread with a line of IP addresses. Here's a screenshot of what's echoing. After this the IP variable will return and contain null.
My plagued code (bear with me, based off a public example):
static List<String> ips = new List<String>();// this is at the start of the program class
static Random rnd = new Random();
private static String getip()
{
if (ips.Count == 0)
{
return null;
}
return ips[rnd.Next(0, ips.Count)];
}
Also the get IP is called in a while (true) loop as it's a proxy checker, I don't think that code is too necessary.
The other code:
while (true)
{
string ip = getip();
try
{
using (var client = new ProxyClient(ip, user, pass))
{
Console.WriteLine(ip, user, pass);
client.Connect();
if (client.IsConnected)
{
return true;
}
else
{
client.Disconnect();
return false;
}
}
}
catch
{
removeip(ip);
}
Thread.Sleep(30);
}
For example, thread 1 should have 127.0.0.1 (first IP from list), thread 2, 127.0.0.2 (second IP from list) etc etc, the problem at the moment is located in the screenshot.
Edit: this is not a duplicate i didn't explain what i need properly, this note from Eric J explains what i'm trying to do, it wasn't just the random issue.
NOTE
If you want each thread to get its own unique IP rather than a random one, you'll need to do something different than pick a random IP. You can after all get the same random IP more than once (if you flip a coin twice, you might get head twice or tails twice).
A good strategy would be to start from your List<String> ips and create one thread for each entry in that list.
I don't see a need to lock the code in question. List<T> is thread-safe for read access. You only need to lock it if you are adding to or modifying the list (see Thread Safety at the bottom of the MSDN entry).
The problem you are experiencing is unrelated. When you create a new Random(), the seed for the pseudo-random number generator is based on the system clock. Multiple calls in quick succession can happen on the same clock tick, meaning that they get the same number sequence.
Initialize your random outside of getip() to avoid the problem (e.g. make it a static field of your class).
static List<String> ips = new List<String>();// this is at the start of the program class
// See the discussion in comments about multithreaded access to Random()
// In particular see http://stackoverflow.com/a/19271004/141172
static Random rnd = // Get a thread safe Random instance
private static String getip()
{
if (ips.Count == 0)
{
return null;
}
return ips[rnd.Next(0, ips.Count)];
}
NOTE
If you want each thread to get its own unique IP rather than a random one, you'll need to do something different than pick a random IP. You can after all get the same random IP more than once (if you flip a coin twice, you might get head twice or tails twice).
A good strategy would be to start from your List<String> ips and create one thread for each entry in that list. Pass the IP it should be responsible for as a parameter.

How to programatically check if NServiceBus has finished processing all messages

As part of an effort to automate starting/stopping some of our NServiceBus services, I'd like to know when a service has finished processing all the messages in it's input queue.
The problem is that, while the NServiceBus service is running, my C# code is reporting one less message than is actually there. So it thinks that the queue is empty when there is still one message left. If the service is stopped, it reports the "correct" number of messages. This is confusing because, when I inspect the queues myself using the Private Queues view in the Computer Management application, it displays the "correct" number.
I'm using a variant of the following C# code to find the message count:
var queue = new MessageQueue(path);
return queue.GetAllMessages().Length;
I know this will perform horribly when there are many messages. The queues I'm inspecting should only ever have a handful of messages at a time.
I have looked at
other
related
questions,
but haven't found the help I need.
Any insight or suggestions would be appreciated!
Update: I should have mentioned that this service is behind a Distributor, which is shut down before trying to shut down this service. So I have confidence that new messages will not be added to the service's input queue.
The thing is that it's not actually "one less message", but rather dependent on the number of messages currently being processed by the endpoint which, in a multi-threaded process, can be as high as the number of threads.
There's also the issue of client processes that continue to send messages to that same queue.
Probably the only "sure" way of handling this is by counting the messages multiple times with a delay in between and if the number stay zero over a certain number of attempts that you can assume the queue is empty.
WMI was the answer! Here's a first pass at the code. It could doubtless be improved.
public int GetMessageCount(string queuePath)
{
const string query = "select * from Win32_PerfRawData_MSMQ_MSMQQueue";
var query = new WqlObjectQuery(query);
var searcher = new ManagementObjectSearcher(query);
var queues = searcher.Get();
foreach (ManagementObject queue in queues)
{
var name = queue["Name"].ToString();
if (AreTheSameQueue(queuePath, name))
{
// Depending on the machine (32/64-bit), this value is a different type.
// Casting directly to UInt64 or UInt32 only works on the relative CPU architecture.
// To work around this run-time unknown, convert to string and then parse to int.
var countAsString = queue["MessagesInQueue"].ToString();
var messageCount = int.Parse(countAsString);
return messageCount;
}
}
return 0;
}
private static bool AreTheSameQueue(string path1, string path2)
{
// Tests whether two queue paths are equivalent, accounting for differences
// in case and length (if one path was truncated, for example by WMI).
string sanitizedPath1 = Sanitize(path1);
string sanitizedPath2 = Sanitize(path2);
if (sanitizedPath1.Length > sanitizedPath2.Length)
{
return sanitizedPath1.StartsWith(sanitizedPath2);
}
if (sanitizedPath1.Length < sanitizedPath2.Length)
{
return sanitizedPath2.StartsWith(sanitizedPath1);
}
return sanitizedPath1 == sanitizedPath2;
}
private static string Sanitize(string queueName)
{
var machineName = Environment.MachineName.ToLowerInvariant();
return queueName.ToLowerInvariant().Replace(machineName, ".");
}

Track dead WebDriver instances during parallel task

I am seeing some dead-instance weirdness running parallelized nested-loop web stress tests using Selenium WebDriver, simple example being, say, hit 300 unique pages with 100 impressions each.
I'm "successfully" getting 4 - 8 WebDriver instances going using a ThreadLocal<FirefoxWebDriver> to isolate them per task thread, and MaxDegreeOfParallelism on a ParallelOptions instance to limit the threads. I'm partitioning and parallelizing the outer loop only (the collection of pages), and checking .IsValueCreated on the ThreadLocal<> container inside the beginning of each partition's "long running task" method. To facilitate cleanup later, I add each new instance to a ConcurrentDictionary keyed by thread id.
No matter what parallelizing or partitioning strategy I use, the WebDriver instances will occasionally do one of the following:
Launch but never show a URL or run an impression
Launch, run any number of impressions fine, then just sit idle at some point
When either of these happen, the parallel loop eventually seems to notice that a thread isn't doing anything, and it spawns a new partition. If n is the number of threads allowed, this results in having n productive threads only about 50-60% of the time.
Cleanup still works fine at the end; there may be 2n open browsers or more, but the productive and unproductive ones alike get cleaned up.
Is there a way to monitor for these useless WebDriver instances and a) scavenge them right away, plus b) get the parallel loop to replace the task segment immediately, instead of lagging behind for several minutes as it often does now?
I was having a similar problem. It turns out that WebDriver doesn't have the best method for finding open ports. As described here it gets a system wide lock on ports, finds an open port, and then starts the instance. This can starve the other instances that you're trying to start of ports.
I got around this by specifying a random port number directly in the delegate for the ThreadLocal<IWebDriver> like this:
var ports = new List<int>();
var rand = new Random((int)DateTime.Now.Ticks & 0x0000FFFF);
var driver = new ThreadLocal<IWebDriver>(() =>
{
var profile = new FirefoxProfile();
var port = rand.Next(50) + 7050;
while(ports.Contains(port) && ports.Count != 50) port = rand.Next(50) + 7050;
profile.Port = port;
ports.Add(port);
return new FirefoxDriver(profile);
});
This works pretty consistently for me, although there's the issue if you end up using all 50 in the list that is unresolved.
Since there is no OnReady event nor an IsReady property, I worked around it by sleeping the thread for several seconds after creating each instance. Doing that seems to give me 100% durable, functioning WebDriver instances.
Thanks to your suggestion, I've implemented IsReady functionality in my open-source project Webinator. Use that if you want, or use the code outlined below.
I tried instantiating 25 instances, and all of them were functional, so I'm pretty confident in the algorithm at this point (I leverage HtmlAgilityPack to see if elements exist, but I'll skip it for the sake of simplicity here):
public void WaitForReady(IWebDriver driver)
{
var js = #"{ var temp=document.createElement('div'); temp.id='browserReady';" +
#"b=document.getElementsByTagName('body')[0]; b.appendChild(temp); }";
((IJavaScriptExecutor)driver).ExecuteScript(js);
WaitForSuccess(() =>
{
IWebElement element = null;
try
{
element = driver.FindElement(By.Id("browserReady"));
}
catch
{
// element not found
}
return element != null;
},
timeoutInMilliseconds: 10000);
js = #"{var temp=document.getElementById('browserReady');" +
#" temp.parentNode.removeChild(temp);}";
((IJavaScriptExecutor)driver).ExecuteScript(js);
}
private bool WaitForSuccess(Func<bool> action, int timeoutInMilliseconds)
{
if (action == null) return false;
bool success;
const int PollRate = 250;
var maxTries = timeoutInMilliseconds / PollRate;
int tries = 0;
do
{
success = action();
tries++;
if (!success && tries <= maxTries)
{
Thread.Sleep(PollRate);
}
}
while (!success && tries < maxTries);
return success;
}
The assumption is if the browser is responding to javascript functions and is finding elements, then it's probably a reliable instance and ready to be used.

Categories