Does Kafka allow reading a message content asynchronously? - c#

Anybody knows whether the Kafka clients allows sending and reading the content of a message in an asynchronous way.
I am currently using the Confluent.Kafka producers and consumers in C# which allows making an async call containing the whole message payload, however it would be interesting to publish the value of the message or content of several MBs asynchronously, and being able of reading it asynchronously as well, instead of just receiving the message in one shot.
using (var producer = new ProducerBuilder<string, string>(config).Build())
{
await producer.ProduceAsync(_topic, new Message<string, string> { Key = _file, Value = <pass async content here> });
}
Is anyway of achieving this?
Thanks

The producer needs to flush the event, send to the broker, which gets written to disk and (optionally) ack the entire record before consumers can read it.
If you'd like to stream chunks of files, then you should send them as binary, but you will need to chunk it yourself, and deal with potential ordering problems in the consumer (e.g. two clients are streaming the same filename, your key, at the same time, with interwoven values)
The recommendation for dealing with files (i.e. large binary content) is to not send them through Kafka, but rather upload them to a shared filesystem, then send the URI as a string through an event.

Related

In HttpContent, why ReadFromJsonAsync is an async method?

I'm having a doubt about HttpContent.ReadFromJsonAsync (link)
In a common way of make a request to an endpoint, this can be done:
var response = await Http.GetAsync(path...);
if (!response.IsSuccessStatusCode)
{
[do something]
}
else
{
myObject = (await response.Content.ReadFromJsonAsync<MyObject>())!;
}
I am having a hard time understanding why when I want to get the object, it is necessary to perform another await operation. In my head, I already got the response in the GetAsync mehotd and what is missing is to deserialize the object only. I understand that the await is not related of converting json to an object but a network thing.
I tried to find out the reason for this behaviour in the official MS doc, but I couldn't find anything.
Searching on google, I found that even though the content of the response has already been received by the time ReadFromJsonAsync is called, the method still needs to read the content of the response from the network, and parse the content in order to deserialize it into the specified object type.
But cannot figure why this is necessary neither "where" the content is "waiting to be read", I know that response.Content.ReadFromJsonAsync didn't make a new network request, so what's going on in the back?
Is it temporarily stored in some socket? (or is it nonsense to think so?) Is there a time limit for reading it?
Thanks!
There are for example overloads for HttpClient.GetAsync that accept an HttpCompletionOption parameter which allows for GetAsync to complete as soon as the response headers have been read while the content of the response has not been completely received.
Therefore, ReadFromJsonAsync being async makes sense, as reading the HttpResponseMessage.Content could become an I/O-bound operation including waiting for and receiving the complete content of the response.
HTTP breaks the response up into multiple frames, and makes a clear distinction between the metadata and the response data. The first set of frames contain the status code and headers, and you can quite often decide what to do with the response based on this information alone.
Mozilla documentation for HTTP Messages
In C#, the response object can be returned from a request as soon as all of the header frames have been received.
By default, the GetAsync method will wait for all the response data to be returned. However there are overloads that allow you to start processing the response as soon as the headers are received.
Why not just wait for all the data in the first place?
The request content could be massive! Imagine you want to download a 4Gb image, and save it to a file on the local PC. If the HTTP implementation waited for all the data frames to be received, you would end up using at least 4Gb of RAM to buffer the data.
Instead of waiting for the data frames, the content is exposed through a Stream. Data is appended to a buffer, and is read from the buffer by the application on demand. Reading from the stream is an asynchronous operation, because you may be waiting for more frames to be received. The key difference here is that the buffer can have a relatively small size limit. If the response contains more data than can fit into the buffer, you'll have to read from the stream multiple times - which is normal use of the Stream API.

Compressed messages are not showing up in eventhub

I am using GZIP compressor to compress the messages and trying to push the data in to eventhub from my console application. When I push the messages to eventhub it is not throwing any exceptions and at the same time, the data is not showing up. This is the code I wrote to push the data in to eventhub after compressing
var eventHubClient = EventHubClient.CreateFromConnectionString("");
var eventData = new EventData(System.Text.Encoding.UTF8.GetBytes(result.Result.Value));
eventData.Properties.Add("Compression","GZip");
eventHubClient.SendAsync(eventData);
eventHubClient.Close();
The difficulties are not related to compressing the message body. You're not awaiting the completion of the SendAsync call before calling Close. This effectively cancels the send because you've closed the network connection that it was using.
To ensure the send is complete before closing, you'll need to adjust your code to something similar to:
var eventHubClient = EventHubClient.CreateFromConnectionString("<< CONNECTION STRING >>");
var eventData = new EventData(System.Text.Encoding.UTF8.GetBytes(result.Result.Value));
eventData.Properties.Add("Compression","GZip");
await eventHubClient.SendAsync(eventData);
eventHubClient.Close();
For better performance, I'd also suggest that you consider creating the EventHubClient once and treating it as a singletone for the lifetime of your application.

C# - Bluetooth programming

In my program, I send a command to a device and it sends some data back. Whenever the data is available, the following event handler gets invoked.
private void notify_change(GattCharacteristic sender, GattValueChangedEventArgs args)
{
lock (this._dataRec)
{
notCounter++;
byte[] bArray = new byte[args.CharacteristicValue.Length];
DataReader.FromBuffer(args.CharacteristicValue).ReadBytes(bArray);
dataQ.Enqueue(Encoding.ASCII.GetString(bArray));
Monitor.Pulse(this._dataRec);
}
}
Sometimes, I noticed that this gets called before previous data has been read (or something like that; from the list of commands, data seems to be missing). Looks like the buffer gets overwritten whenever the function is invoked. Is there a way to add data to the buffer rather than overwriting it?
In my program, I send a command to a device and it sends some data
back.
Since you are trigger response by your calls, be sure that you don't make buffer overflow on device side. Minimal theoretical gap between two packets is 7.5ms but in practice it is about 20-30ms. So if you are sending in a loop, your device will lost (overwrite) packets if gap is less than your HW setup can handle.
The GATT protocol has two options to receive unsolicited information. They are notifications and indications. notifications are one with out acknowledgement from the receiver where as indications require an acknowledgment from the receiver. so you probably need indications and if this is not an option you need to ensure that the data is copied before the next notification.
see the following from the Bluetooth specification.

Is it possible to publish multiple messages at once using the RabbitMQ client for C#?

Right now, our publishing code for large amounts of messages looks like so:
foreach (var message in messages)
{
publisher.Publish(message);
}
Does there exist the ability to send more than one message over the channel at once?
publisher.Publish(messages);
or as so if we chunk
var chunks = messages.Chunk(100);
foreach (var chunk in chunks)
{
publisher.Publish(chunk);
}
With current version of RabbitMq(3.8.2) you can send batch messages as below for c# client sdk:
basicPublishBatch = channel.CreateBasicPublishBatch();
basicPublishBatch.Add("exchange", "routeKey", false, null, new byte[]{1});
basicPublishBatch.Add("exchange", "routeKey", false, null, new byte[]{1});
basicPublishBatch.Publish();
Check this PR:
https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/368
For RabbitMQ the AMQP protocol is asynchronous for produce and consume operations so it is not clear how to benefit from batch consumer endpoint given out of the box.
What you can do is to create endpoints for chunked messages and process them inside workflow if you can speed up operations. So one solution would be for example to have batching component included before publisher class and send custom messages.

Named pipes: Bad data when reading too slowly?

I've built a C# application using NamedPipeServerStream/NamedPipeClientStream, and I'm having trouble serializing objects when the client reads too slowly. Here is the minimal example I can put together.
The client (pure consumer):
NamedPipeClientStream pipeClient;
if (useString)
reader = new StreamReader(pipeClient);
while (true)
{
Thread.Sleep(5000);
if (useString)
{
string line = reader.ReadLine();
}
else
{
Event evt = (Event)formatter.Deserialize(pipeClient);
}
}
The (pure producer):
while (true)
{
i++;
Thread.Sleep(1000);
if (useStrings)
{
StreamWriter writer = new StreamWriter(m_pipeServer);
writer.WriteLine("START data payload {0} END", i);
writer.Flush();
}
else
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(m_pipeServer, new Event(i));
}
m_pipeServer.Flush();
m_pipeServer.WaitForPipeDrain();
}
And "Event" is a simple class with a single property tracking the payload: i.
The behavior I expect is simply "missing" events, when the server produces too much for the client to read. However, in the string case I get a random ordering of events:
START data payload 0 END
START data payload 1 END
START data payload 2 END
START data payload 4 END
START data payload 15 END
START data payload 16 END
START data payload 24 END
START data payload 3 END
START data payload 35 END
START data payload 34 END
START data payload 17 END
And for the binary serializer I get an exception (this is less surprising):
SerializationException: Binary stream '0' does not contain a valid BinaryHeader. Possible causes are invalid stream or object version change between serialization and deserialization.
Lastly, note that if I remove the call to Sleep on the client, everything works fine: all events are received, in order (as expected).
So I'm trying to figure out how to do serialize binary events over a named pipe when the client may read too slowly and miss events. In my scenario, missing events is completely fine. However, but I'm surprised at the string events coming out of order intact instead of truncated (due to buffer rollover) or simply dropped.
The binary formatter case is actually the one I care about. I'm trying to serialize and pass relatively small events (~300 bytes) across a named pipe to multiple consumer programs, but I'm concerned those clients won't be able to keep up with the volume.
How do I properly produce/consume these events across a named pipe if we exhaust the buffer? My desired behavior is simply dropping events that the client can't keep up with.
I wouldn't trust the transport layer (i.e. the pipe) to drop packets that the client can't keep up with. I would create a circular queue on the client. A dedicated thread would then service the pipe and put messages on the queue. A separate client thread (or multiple threads) would service the queue. Doing it this way, you should be able to keep the pipe clean.
Since it's a circular queue, newer messages will overwrite older ones. A client reading will always get the oldest message that hasn't yet been processed.
Creating a circular queue is pretty easy, and you can even make it implement IProducerConsumerCollection so that you can use it as the backing store for BlockingCollection.

Categories