How to sort a ConcurrentBag? - c#

I am working on a client/server application. The server sends messages to the client, but the order cannot be guaranteed. I am using TCP... I don't want to get into why the order cannot be guaranteed (it is to do with threads on the server).
Anyway, on the client, I am processing messages like this:
private Queue<byte[]> rawMessagesIn = new Queue<byte[]>();
public ConcurrentBag<ServerToClient> messages = new ConcurrentBag<ServerToClient>();
public void Start()
{
var processTask = Task.Factory.StartNew(() =>
{
while (run)
{
process();
}
});
}
void process(){
if(rawMessagesIn.Count > 0){
var raw_message = rawMessagesIn.Dequeue();
var message = (ServerToClient)Utils.Deserialize(raw_message);
messages.Add(message);
}
}
private void OnDataReceived(object sender, byte[] data)
{
rawMessagesIn.Enqueue(data);
}
Now, it is important that when I call messages.TryTake() or messages.TryPeek() that the message out is the next in the sequence. Every message has a number/integer representing its order. For example, message.number = 1
I need to use TryPeek because the message at index 0 might be the correct message or it might be the wrong message, in which case we remove the message from the bag. However, there is a possibility that the message is a future required message, and so it should not be removed.
I have tried using message.OrderBy(x=>x.number).ToList(); but I cannot see how it will work. If I use the OrderBy and get a sorted list SL and the item at index 0 is the correct one, I cannot simply remove or modify messages because I do not know its position in the ConcurrentBag!
Does anyone have a suggestion for me?

My suggestion is to switch from manually managing queues, to a TransformBlock<TInput,TOutput> from the TPL Dataflow library. This component is a combination of an input queue, and output queue, and a processor that transforms the TInput to TOutput. The EnsureOrdered functionality is built-in, and it is the default. Example:
private readonly TransformBlock<byte[], ServerToClient> _transformer;
public Client() // Constructor
{
_transformer = new((byte[] raw_message) =>
{
ServerToClient message = (ServerToClient)Utils.Deserialize(raw_message);
return message;
}, new ExecutionDataflowBlockOptions()
{
EnsureOrdered = true, // Just for clarity. true is the default.
MaxDegreeOfParallelism = 1, // the default is 1
});
}
private void OnDataReceived(object sender, byte[] data)
{
bool accepted = _transformer.Post(data);
// The accepted will be false in case the _transformer has failed.
}
public bool TryReceiveAll(out IList<ServerToClient> messages)
{
return _transformer.TryReceiveAll(out messages);
}
There are many ways to consume the ServerToClient messages that are stored in the output queue of the block. The example above demonstrates the TryReceiveAll method. There are also the TryReceive, Receive, ReceiveAsync and ReceiveAllAsync (some of them are extension methods). You can also use the lower level method OutputAvailableAsync as shown here. Linking it to another dataflow block is also an option.

Related

How to use multiple consumers in different programming language for same group ID in Kafka

I wanted to create a load balancing in Kafka (multiple programming languages) for a topic. So I did the following.
Created a topic with 4 partitions.
Created a producer in C# (producing messages every second)
Created one consumer(consumer1) in C# (consumer group: testConsumerGrp)
Created one more consumer(consumer2) in NodeJs (consumer group: testConsumerGrp)
I used confluent.kafka in C# and kafkajs in NodeJs.
I Open the producer and keep it running.
If I run only C# consumer, it works fine.
If I run only NodeJs consumer, it works fine.
If I run multiple C# consumer (only c# and less than 4 instances), it works fine.
If I run multiple NodeJs consumer (only NodeJs and less than 4 instances), it works fine.
If I run one C# and one NodeJs consumer then I am getting Inconsistent group protocol error
Can't we use two programming languages for a same consumer group?
Producer in C# - windows form
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using Confluent.Kafka;
namespace KafkaProducer
{
public partial class frmProducer : Form
{
const string TOPIC = "testTopic";
private IProducer<Null, string> pBuilder;
public frmProducer()
{
InitializeComponent();
}
private async void timer1_Tick(object sender, EventArgs e)
{
try
{
// instead of sending some value, we send current DateTime as value
var dr = await pBuilder.ProduceAsync(TOPIC, new Message<Null, string> { Value = DateTime.Now.ToLongTimeString() });
// once done, add the value into list box
listBox1.Items.Add($"{dr.Value} - Sent to Partition: {dr.Partition.Value}");
listBox1.TopIndex = listBox1.Items.Count - 1;
}
catch (ProduceException<Null, string> err)
{
MessageBox.Show($"Failed to deliver msg: {err.Error.Reason}");
}
}
private void frmProducer_Load(object sender, EventArgs e)
{
ProducerConfig config = new ProducerConfig { BootstrapServers = "localhost:9092" };
pBuilder = new ProducerBuilder<Null, string>(config).Build();
timer1.Enabled = true;
}
private void frmProducer_FormClosing(object sender, FormClosingEventArgs e)
{
timer1.Enabled = false;
pBuilder.Dispose();
}
}
}
Consumer in C# - windows form
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
using Confluent.Kafka;
namespace KafkaConsumer
{
public partial class frmConsumer : Form
{
CancellationTokenSource cts = new CancellationTokenSource();
public frmConsumer()
{
InitializeComponent();
}
private void StartListen()
{
var conf = new ConsumerConfig
{
GroupId = "test-consumer-group",
BootstrapServers = "localhost:9092",
AutoOffsetReset = AutoOffsetReset.Earliest
};
using (var c = new ConsumerBuilder<Ignore, string>(conf).Build())
{
c.Subscribe("testTopic");
//TopicPartitionTimestamp tpts = new TopicPartitionTimestamp("testTopic", new Partition(), Timestamp. )
//c.OffsetsForTimes()
try
{
while (true)
{
try
{
var cr = c.Consume(cts.Token);
// Adding the consumed values into the UI
listBox1.Invoke(new Action(() =>
{
listBox1.Items.Add($"{cr.Value} - from Partition: {cr.Partition.Value}" );
listBox1.TopIndex = listBox1.Items.Count - 1;
}));
}
catch (ConsumeException err)
{
MessageBox.Show($"Error occured: {err.Error.Reason}");
}
}
}
catch (OperationCanceledException)
{
// Ensure the consumer leaves the group cleanly and final offsets are committed.
c.Close();
}
}
}
private void Form1_FormClosing(object sender, FormClosingEventArgs e)
{
cts.Cancel();
}
private async void frmConsumer_Load(object sender, EventArgs e)
{
await Task.Run(() => StartListen());
}
}
}
Consumer in NodeJs
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: 'my-app',
brokers: ["localhost:9092"]
});
const consumer = kafka.consumer({ groupId: "test-consumer-group" });
const run = async () => {
// Consuming
await consumer.connect();
await consumer.subscribe({ topic: "testTopic", fromBeginning: false });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
console.log(message.value.toString() + " - from Partition " + partition);
}
});
};
run().catch(console.error);
If I run C# and NodeJs consumer at same time then getting Inconsistent group protocol error.
How to use multiple consumer from different programming languages in Kafka?
Short answer:
This may not have as much to do with the different languages as you might think. This is happening due to the differences in the protocols of the 2 consumer clients (and their libraries).
Try setting the following property in both the consumer clients:
partition.assignment.strategy = round-robin
Note: I've just supplied the general property so you'll need to look at the language specific versions for your clients. You could even set this to range but keep it consistent.
The explanation goes like this:
Reading through the protocol on Kafka's wiki to find out the root cause of Inconsistent group protocol - it turns out that this is returned when:
There is an active consumer group with active/running consumers
And a new consumer arrives to join this group with a protocol type (or a set of protocols) that is not compatible with that of the current group
Now, there could be various aspects in the ConsumerGroupProtocolMetadata but one of the aspects that does seem to differ in the libraries of the clients that you're using is the partition.assignment.strategy.
The dotnet client is a wrapper around librdkafka defaults the value of the above property to range. Here's the reference.
where as
kafkajs as per the documentation defaults it to round-robin - hence causing the inconsistency.
Hope this helps.
I know this comes one year too late but this happens because of the same group naming
When you start the C# client it creates a group for its consumers.
E.g. group-1 (group-1-consumer-1,group-1-consumer-2, etc) - These names are automatically allocated so don't bother. I think you can set these manually but is not recommended to avoid potential name collision.
Now, when you set this in motion you cannot add the same group from a different group runner (from another microservice).
See what Lalit quoted from Kafka wiki:
There is an active consumer group with active/running consumers
Now, when you will start the nodeJs one, you should use a different group name as most likely will carry out other tasks with that data.
Yes, you can subscribe both groups to the same topics as Kafka will keep an offset for each group and where they left of.

Sending messages on scale to Service Bus from durable functions

I have a scenario where one activity function has retrieved a set of records which can be anywhere from 1000 to a million and stored in an object. This object is then used by the next activity function to send messages in parallel to service bus.
Currently I am using a for loop on this object to send each record in the object to service bus. Please let me know if there is a better alternative pattern where the object or content (wherever it is stored) is emptied to be sent to service bus and the function scales out automatically without restricting the processing to a for loop.
Have used a for loop from a function that orchestrates to call activity functions for the records in the object.
Have looked at the scaling of the activity function and for a set of 18000 records it has scaled up-to 15 instances and processed the whole set in 4 minutes.
Currently the function is using the consumption plan.Checked to see that only this function app is using this plan and its not shared.
The topic to which the message is sent has another service listening to it, to read the message.
The instance count for both orchestrating & activity function is as available by default.
for(int i=0;i<number_messages;i++)
{
taskList[i] =
context.CallActivityAsync<string>("Sendtoservicebus",
(messages[i],runId,CorrelationId,Code));
}
try
{
await Task.WhenAll(taskList);
}
catch (AggregateException ae)
{
ae.Flatten();
}
The messages should be quickly sent to service bus by scaling out the activity functions appropriately.
I would suggest you to use Batch for sending messages.
Azure Service Bus client supports sending messages in batches (SendBatch and SendBatchAsync methods of QueueClient and TopicClient). However, the size of a single batch must stay below 256k bytes, otherwise the whole batch will get rejected.
We will start with a simple use case: the size of each message is known to us. It's defined by hypothetical Func getSize function. Here is a helpful extension method that will split an arbitrary collection based on a metric function and maximum chunk size:
public static List<List<T>> ChunkBy<T>(this IEnumerable<T> source, Func<T, long> metric, long maxChunkSize)
{
return source
.Aggregate(
new
{
Sum = 0L,
Current = (List<T>)null,
Result = new List<List<T>>()
},
(agg, item) =>
{
var value = metric(item);
if (agg.Current == null || agg.Sum + value > maxChunkSize)
{
var current = new List<T> { item };
agg.Result.Add(current);
return new { Sum = value, Current = current, agg.Result };
}
agg.Current.Add(item);
return new { Sum = agg.Sum + value, agg.Current, agg.Result };
})
.Result;
}
Now, the implementation of SendBigBatchAsync is simple:
public async Task SendBigBatchAsync(IEnumerable<T> messages, Func<T, long> getSize)
{
var chunks = messages.ChunkBy(getSize, MaxServiceBusMessage);
foreach (var chunk in chunks)
{
var brokeredMessages = chunk.Select(m => new BrokeredMessage(m));
await client.SendBatchAsync(brokeredMessages);
}
}
private const long MaxServiceBusMessage = 256000;
private readonly QueueClient client;
how do we determine the size of each message? How do we implement getSize function?
BrokeredMessage class exposes Size property, so it might be tempting to rewrite our method the following way:
public async Task SendBigBatchAsync<T>(IEnumerable<T> messages)
{
var brokeredMessages = messages.Select(m => new BrokeredMessage(m));
var chunks = brokeredMessages.ChunkBy(bm => bm.Size, MaxServiceBusMessage);
foreach (var chunk in chunks)
{
await client.SendBatchAsync(chunk);
}
}
The last possibility that I want to consider is actually allow yourself violating the max size of the batch, but then handle the exception, retry the send operation and adjust future calculations based on actual measured size of the failed messages. The size is known after trying to SendBatch, even if operation failed, so we can use this information.
// Sender is reused across requests
public class BatchSender
{
private readonly QueueClient queueClient;
private long batchSizeLimit = 262000;
private long headerSizeEstimate = 54; // start with the smallest header possible
public BatchSender(QueueClient queueClient)
{
this.queueClient = queueClient;
}
public async Task SendBigBatchAsync<T>(IEnumerable<T> messages)
{
var packets = (from m in messages
let bm = new BrokeredMessage(m)
select new { Source = m, Brokered = bm, BodySize = bm.Size }).ToList();
var chunks = packets.ChunkBy(p => this.headerSizeEstimate + p.Brokered.Size, this.batchSizeLimit);
foreach (var chunk in chunks)
{
try
{
await this.queueClient.SendBatchAsync(chunk.Select(p => p.Brokered));
}
catch (MessageSizeExceededException)
{
var maxHeader = packets.Max(p => p.Brokered.Size - p.BodySize);
if (maxHeader > this.headerSizeEstimate)
{
// If failed messages had bigger headers, remember this header size
// as max observed and use it in future calculations
this.headerSizeEstimate = maxHeader;
}
else
{
// Reduce max batch size to 95% of current value
this.batchSizeLimit = (long)(this.batchSizeLimit * .95);
}
// Re-send the failed chunk
await this.SendBigBatchAsync(packets.Select(p => p.Source));
}
}
}
}
You can use this blog for further reference. Hope it helps.

Integration Test to publish to a topic and subscribe to another in Azure Service Bus is unreliable is there a race condition?

I am trying to write an integration / acceptance test to test some code in azure, the code in the question ATM simply subscribes to one topic and publishes to another.
I have written the test but it doesn't always pass, seems as though there could be a race condition in place. I've tried writing it a couple of ways including using OnMessage and also using Receive (example I show here).
When using OnMessage the test seemed to always exit prematurely (around 30 seconds), which I guess perhaps means its inappropriate for this test anyway.
My query concerning my example specifically, I assumed that once I created the subscription to the target topic, that any message sent to it I would be able to pickup using Receive(), whatever point in time that message arrived meaning, if the message arrives at the target topic before I call Receive(), I would still be able to read the message afterward by calling Receive(). Could anyone please shed any light on this?
namespace somenamespace {
[TestClass]
public class SampleTopicTest
{
private static TopicClient topicClient;
private static SubscriptionClient subClientKoEligible;
private static SubscriptionClient subClientKoIneligible;
private static OnMessageOptions options;
public const string TEST_MESSAGE_SUB = "TestMessageSub";
private static NamespaceManager namespaceManager;
private static string topicFleKoEligible;
private static string topicFleKoIneligible;
private BrokeredMessage message;
[ClassInitialize]
public static void BeforeClass(TestContext testContext)
{
//client for publishing messages
string connectionString = ConfigurationManager.AppSettings["ServiceBusConnectionString"];
string topicDataReady = ConfigurationManager.AppSettings["DataReadyTopicName"];
topicClient = TopicClient.CreateFromConnectionString(connectionString, topicDataReady);
topicFleKoEligible = ConfigurationManager.AppSettings["KnockOutEligibleTopicName"];
topicFleKoIneligible = ConfigurationManager.AppSettings["KnockOutIneligibleTopicName"];
//create test subscription to receive messages
namespaceManager = NamespaceManager.CreateFromConnectionString(connectionString);
if (!namespaceManager.SubscriptionExists(topicFleKoEligible, TEST_MESSAGE_SUB))
{
namespaceManager.CreateSubscription(topicFleKoEligible, TEST_MESSAGE_SUB);
}
if (!namespaceManager.SubscriptionExists(topicFleKoIneligible, TEST_MESSAGE_SUB))
{
namespaceManager.CreateSubscription(topicFleKoIneligible, TEST_MESSAGE_SUB);
}
//subscriber client koeligible
subClientKoEligible = SubscriptionClient.CreateFromConnectionString(connectionString, topicFleKoEligible, TEST_MESSAGE_SUB);
subClientKoIneligible = SubscriptionClient.CreateFromConnectionString(connectionString, topicFleKoIneligible, TEST_MESSAGE_SUB);
options = new OnMessageOptions()
{
AutoComplete = false,
AutoRenewTimeout = TimeSpan.FromMinutes(1),
};
}
[TestMethod]
public void E2EPOCTopicTestLT50()
{
Random rnd = new Random();
string customerId = rnd.Next(1, 49).ToString();
FurtherLendingCustomer sentCustomer = new FurtherLendingCustomer { CustomerId = customerId };
BrokeredMessage sentMessage = new BrokeredMessage(sentCustomer.ToJson());
sentMessage.CorrelationId = Guid.NewGuid().ToString();
string messageId = sentMessage.MessageId;
topicClient.Send(sentMessage);
Boolean messageRead = false;
//wait for message to arrive on the ko eligible queue
while((message = subClientKoEligible.Receive(TimeSpan.FromMinutes(2))) != null){
//read message
string messageString = message.GetBody<String>();
//Serialize
FurtherLendingCustomer receivedCustomer = JsonConvert.DeserializeObject<FurtherLendingCustomer>(messageString.Substring(messageString.IndexOf("{")));
//assertion
Assert.AreEqual(sentCustomer.CustomerId, receivedCustomer.CustomerId,"verify customer id");
//pop message
message.Complete();
messageRead = true;
//leave loop after processing one message
break;
}
if (!messageRead)
Assert.Fail("Didn't receive any message after 2 mins");
}
}
}
As the official document states about SubscriptionClient.Receive(TimeSpan):
Parameters
serverWaitTime
TimeSpan
The time span the server waits for receiving a message before it times out.
A Null can be return by this API if operation exceeded the timeout specified, or the operations succeeded but there are no more messages to be received.
Per my test, if a message sent to the topic and then delivered to your subscription within your specific serverWaitTime, then you could receive a message no matter whether the message arrives at the target topic before or after you call Receive.
When using OnMessage the test seemed to always exit prematurely (around 30 seconds), which I guess perhaps means its inappropriate for this test anyway.
[TestMethod]
public void ReceiveMessages()
{
subClient.OnMessage(msg => {
System.Diagnostics.Trace.TraceInformation($"{DateTime.Now}:{msg.GetBody<string>()}");
msg.Complete();
});
Task.Delay(TimeSpan.FromMinutes(5)).Wait();
}
For Subscription​Client.​On​Message, I assumed that it basically a loop invoking Receive. After calling OnMessage, you need to wait for a while and stop this method to exit. Here is a blog about the Event-Driven message programming for windows Azure Service Bus, you could refer to here.
Additionally, I found that your topicClient for sending messages and the subClientKoEligible for receiving a message are not targeted at the same topic path.

BinaryFormatter.Deserialize hangs the whole thread

I have two simple applications connected via named pipes. In the client side I have a method that checks incoming messages every n ms:
private void timer_Elapsed(Object sender, ElapsedEventArgs e)
{
IFormatter f = new BinaryFormatter();
try
{
object temp = f.Deserialize(pipeClient); //hangs here
result = (Func<T>)temp;
}
catch
{
}
}
In the beginning the pipe is empty, and f.Deserialize method hangs the whole application. And I can't even check that pipe's empty? Is there any solution to this problem?
UPD: tried XmlSerializer, everything's the same.
The thing that is hanging on you is the pipeClient.Read( call both formatters are making internally.
This is the expected behavior of a Stream, when you call Read:
Return Value
Type: System.Int32
The total number of bytes that are read into buffer. This might be less than the number of bytes
requested if that number of bytes is not currently available, or 0 if
the end of the stream is reached.
So the stream will block till data shows up or throw a timeout exception if it is the type of stream that supports timeouts. It will never just return without reading anything unless you are "at the end of the stream" which for a PipeStream (or similarly a NetworkStream) only happens when the connection is closed.
The way you solve the problem is don't use a timer to check if a new message arrives, just start up a background thread and have it sitting in a loop, it will block itself until a message shows up.
class YourClass
{
public YourClass(PipeStream pipeClient)
{
_pipeClient = pipeClient;
var task = new Task(MessageHandler, TaskCreationOptions.LongRunning);
task.Start();
}
//SNIP...
private void MessageHandler()
{
while(_pipeClient.IsConnected)
{
IFormatter f = new BinaryFormatter();
try
{
object temp = f.Deserialize(_pipeClient);
result = (Func<T>)temp;
}
catch
{
//You really should do some kind of logging.
}
}
}
}

SignalR notification system

This is my first time playing around with SignalR. I am trying to build a notification system where the server checks at regular intervals to see if there is something (query database) to broadcast and if there is then it broadcasts it to all the clients.
I came across this post on Stackoverflow and was wondering if modifying the code to make a DB call at a particular interval was indeed the right way to do it. If not is there a better way to do it?
I did see a lot of Notification related questions posted here but none with any code in it. Hence this post.
This is the exact code that I am using:
public class NotificationHub : Hub
{
public void Start()
{
Thread thread = new Thread(Notify);
thread.Start();
}
public void Notify()
{
List<CDCNotification> notifications = new List<CDCNotification>();
while (true)
{
notifications.Clear();
notifications.Add(new CDCNotification()
{
Server = "Server A", Application = "Some App",
Message = "This is a long ass message and amesaadfasd asdf message",
ImgURL = "../Content/Images/accept-icon.png"
});
Clients.shownotification(notifications);
Thread.Sleep(20000);
}
}
}
I am already seeing some weird behaviour where the notifications come more often than they are supposed to. Even though I am supposed to get it every 20s I get it around 4-5 secs and I get multiple messages.
Here is my client:
var notifier = $.connection.notificationHub;
notifier.shownotification = function (data) {
$.each(data, function (i, sample) {
var output = Mustache.render("<img class='pull-left' src='{{ImgURL}}'/> <div><strong>{{Application}}</strong></div><em>{{Server}}</em> <p>{{Message}}</p>", sample);
$.sticky(output);
});
};
$.connection.hub.start(function () { notifier.start(); });
Couple of notes:
As soon as a second client connects to your server there will be 2 threads sending the notifications, therefore if you ave more than one client you will have intervals smaller than 20s
Handling thread manually within ASP.NET is considered bad practice, you should avoid this if possible
In general this smells a lot like polling which is kind of the thing SignalR lets you get rid of since you don't need to signal the server/client
In order to solve this you need todo something like this (again, threads in a web application are generally not a good idea):
public class NotificationHub : Hub
{
public static bool initialized = false;
public static object initLock = new object();
public void Start()
{
if(initialized)
return;
lock(initLock)
{
if(initialized)
return;
Thread thread = new Thread(Notify);
thread.Start();
initialized = true;
}
}
public void Notify()
{
List<CDCNotification> notifications = new List<CDCNotification>();
while (true)
{
notifications.Clear();
notifications.Add(new CDCNotification() { Server = "Server A", Application = "Some App", Message = "This is a long ass message and amesaadfasd asdf message", ImgURL = "../Content/Images/accept-icon.png" });
Clients.shownotification(notifications);
Thread.Sleep(20000);
}
}
}
The static initialized flag prevents multiple threads from being created. The locking around it is to ensure that the flag is only set once.
I am working on the same task over here. Instead of continuously checking the database, I created my own events and listener, where an event is RAISED when a NOTIFICATION IS ADDED :) What do you think about that?

Categories