EventHubBufferedProducerClient - can't enqueue new events after flushing - c#

I am working with the EventHubBufferedProducerClient class of the Azure SDK for .NET (Azure.Messaging.EventHubs v. 5.7.5); I need to send two groups of messages, with the second group starting after publishing the first.
No issues with the first group: I enqueue them and then use the FlushAsync method to make sure all the messages in the buffer are sent for publication.
When I try to enqueue a message of the second group, though, I receive an ObjectDisposedException: 'The CancellationTokenSource has been disposed.'.
NB: I do not use the EventHubProducerClient because I need to tailor the Partition Key to each message.
I also tried the following "toy code" (I hid the actual connstring and hubname for posting) , to be sure the issue is not related to the processing of the data before and after publication - the issue also repeats with this code.
static async Task Main(string[] args)
{
EventHubBufferedProducerClient client = new EventHubBufferedProducerClient("connectionstring", "eventhubname");
client.SendEventBatchFailedAsync += args =>
{
return Task.CompletedTask;
};
for (int i = 0; i<3; i++)
{
EventData data = new EventData($"string {i}");
await client.EnqueueEventAsync(data);
}
await client.FlushAsync();
for (int i = 3; i < 6; i++)
{
EventData data = new EventData($"string {i}");
await client.EnqueueEventAsync(data); //EXCEPTION HERE at the first iteration
}
await client.FlushAsync();
}
I know I can "solve" this by creating a new instance of the client to enqueue and publish the second group of events, but I'm not sure it's the best solution; I'm also quite curious to understand why the issue happens.
Thank you for your help!

Related

Azure WebPubSub memory leak

I have a Web App and I noticed that after a while it restarts due to lack of memory.
After researching, I found that memory increases after sending a message via WebPubSub.
This can be easily reproduced (sample):
using Azure.Core;
using Azure.Messaging.WebPubSub;
var connectionString = "<ConnectionString >";
var hub = "<HubName>";
var serviceClient = new WebPubSubServiceClient(connectionString, hub);
Console.ReadKey();
Task[] tasks = new Task[100];
for (int i = 0; i < 100; i++)
{
tasks[i] = serviceClient.SendToUserAsync("testUser", RequestContent.Create("Message"), ContentType.TextPlain);
}
Task.WaitAll(tasks);
Console.ReadKey();
During debugging, I noticed that a new HttpConnection is created during each send and the old one remains. Thus, when sending 100 messages, 100 connections will be created, during the next sending, more will be created.
I concluded that the problem is in the WebPubSub SDK, but maybe it's not so and someone can help me solve it.
UPD:
When sending 100 messages in parallel, 100 connections are created in the HttpConnectionPool, hence the sharp increase in unmanaged memory. The next time you send 100 messages, existing connections from the pool will be used, and no new connections are created, but a lot of data is allocated on the heap.
So now I'm finding out how long connections live in the pool, what data lives in the Heap and how to free them. Call GC.Collect(); after Task.WaitAll(tasks); solves the problem.
Both Response and RequestContent are IDisposable.
Does using below code help?
var serviceClient = new WebPubSubServiceClient(connectionString, hub);
Task[] tasks = new Task[100];
for (int i = 0; i < 100; i++)
{
tasks[i] = SendToUser(serviceClient);
}
Task.WaitAll(tasks);
Console.ReadKey();
private async static Task SendToUser(WebPubSubServiceClient serviceClient)
{
using var content = RequestContent.Create("Message");
using var response = await serviceClient.SendToUserAsync("testUser", content, ContentType.TextPlain);
}

How to correctly close the EventHubReceiver when working with Azure IoT in C#?

I am writing an application that should be able to read and display IoT data. The basic functionality works for me with this code (I removed some checks etc so that the code would the shorter):
public void Run()
{
_eventHubClient = EventHubClient.CreateFromConnectionString(ConnectionString, "messages/events");
var partitions = _eventHubClient.GetRuntimeInformation().PartitionIds;
cts = new CancellationTokenSource();
var tasks = partitions.Select(partition => ReceiveMessagesFromDeviceAsync(partition, cts.Token));
Task.WaitAll(tasks.ToArray());
}
public void Cancel()
{
cts.Cancel();
}
private async Task ReceiveMessagesFromDeviceAsync(string partition, CancellationToken cancellationToken)
{
var eventHubReceiver = _eventHubClient.GetDefaultConsumerGroup().CreateReceiver(partition, DateTime.UtcNow);
while (true)
{
if (cancellationToken.IsCancellationRequested)
{
break;
}
var eventData = await eventHubReceiver.ReceiveAsync(new TimeSpan(0,0,1));
var data = Encoding.UTF8.GetString(eventData.GetBytes());
Console.WriteLine("Message received at {2}. Partition: {0} Data: '{1}'", partition, data, eventData.EnqueuedTimeUtc);
}
}
My problem is that I need to be able to stop and restart the connection again. Everything works okay until the moment when I start it for the 6th time, then I get the "QuotaExceededException": "Exceeded the maximum number of allowed receivers per partition in a consumer group which is 5". I have googled the exception and I understand the problem, what I don't know is how to correctly close the previous receivers after I close a connection, so that I could open it again later. I have tried calling
eventHubReceiver.Close()
in the Cancel() method but it didn't seem to help.
I would be very grateful for any hints on how to solve this, thanks.

What is wrong with my Code (SendPingAsync)

Im writing a C# Ping-Application.
I started with a synchronous Ping-method, but I figurred out that pinging several server with one click takes more and more time.
So I decided to try the asynchronous method.
Can someone help me out?
public async Task<string> CustomPing(string ip, int amountOfPackets, int sizeOfPackets)
{
// timeout
int Timeout = 2000;
// PaketSize logic
string packet = "";
for (int j = 0; j < sizeOfPackets; j++)
{
packet += "b";
};
byte[] buffer = Encoding.ASCII.GetBytes(packet);
// time-var
long ms = 0;
// Main Method
using (Ping ping = new Ping())
for (int i = 0; i < amountOfPackets; i++)
{
PingReply reply = await ping.SendPingAsync(ip, Timeout, buffer);
ms += reply.RoundtripTime;
};
return (ms / amountOfPackets + " ms");
};
I defined a "Server"-Class (Ip or host, City, Country).
Then I create a "server"-List:
List<Server> ServerList = new List<Server>()
{
new Server("www.google.de", "Some City,", "Some Country")
};
Then I loop through this list and I try to call the method like this:
foreach (var server in ServerList)
ListBox.Items.Add("The average response time of your custom server is: " + server.CustomPing(server.IP, amountOfPackets, sizeOfPackets));
Unfortunately, this is much more competitive than the synchronous method, and at the point where my method should return the value, it returns
System.Threading.Tasks.Taks`1[System.string]
since you have an async method it will return the task when it is called like this:
Task<string> task = server.CustomPing(server.IP, amountOfPackets, sizeOfPackets);
when you add it directly to your ListBox while concatenating it with a string it will use the ToString method, which by default prints the full class name of the object. This should explaint your output:
System.Threading.Tasks.Taks`1[System.string]
The [System.string] part actually tells you the return type of the task result. This is what you want, and to get it you would need to await it! like this:
foreach (var server in ServerList)
ListBox.Items.Add("The average response time of your custom server is: " + await server.CustomPing(server.IP, amountOfPackets, sizeOfPackets));
1) this has to be done in another async method and
2) this will mess up all the parallelity that you are aiming for. Because it will wait for each method call to finish.
What you can do is to start all tasks one after the other, collect the returning tasks and wait for all of them to finish. Preferably you would do this in an async method like a clickhandler:
private async void Button1_Click(object sender, EventArgs e)
{
Task<string> [] allTasks = ServerList.Select(server => server.CustomPing(server.IP, amountOfPackets, sizeOfPackets)).ToArray();
// WhenAll will wait for all tasks to finish and return the return values of each method call
string [] results = await Task.WhenAll(allTasks);
// now you can execute your loop and display the results:
foreach (var result in results)
{
ListBox.Items.Add(result);
}
}
The class System.Threading.Tasks.Task<TResult> is a helper class for Multitasking. While it resides in the Threading Namespace, it works for Threadless Multitasking just as well. Indeed if you see a function return a task, you can usually use it for any form of Multitasking. Tasks are very agnostic in how they are used. You can even run it synchronously, if you do not mind that little extra overhead of having a Task doing not a lot.
Task helps with some of the most important rules/convetions of Multitasking:
Do not accidentally swallow exceptions. Threadbase Multitasking is notoriously good in doing just that.
Do not use the result after a cancelation
It does that by throwing you exceptions in your face (usually the Aggregate one) if you try to access the Result Property when convention tells us you should not do that.
As well as having all those other usefull properties for Multitasking.

Under what conditions does .NET decide to cancel a PLINQ task?

I've had a look around for some documentation, and all the documentation appears to suggest the following...
PLINQ tasks do not have a default timeout
PLINQ tasks can deadlock, and .NET/TPL will never cancel them for you to release the deadlock
However, in my application, this is not the case. I cannot replicate a minimal example using a simpler console app, but I will show the closest reproduction I have tried, and the actual PLINQ query that is being cancelled prior to completion. There is no Exception of any type other than the Task Cancellation Exceptions, which all suggest that the task was requested to be cancelled directly (no other Exception occurred anywhere). There is no cancellation code anywhere in my application, so it can only be .NET deciding to cancel it for me?
I am aware that these examples below hammer out HttpClients, this is not the cause, as the console example shows.
Attempt at reproduction, this code never cancels, despite the epic running time:
var j = 0;
var ints = new List<int>();
for (int i = 0; i < 5000; i++) {
ints.Add(i);
};
ints.AsParallel().WithExecutionMode(ParallelExecutionMode.ForceParallelism).WithDegreeOfParallelism(8).ForAll(n => {
int count = 0;
while (count < 100 && j == 0) {
var httpClient = new HttpClient();
var response = httpClient.GetStringAsync("https://hostname/").GetAwaiter().GetResult();
count++;
Thread.Sleep(1000);
}
});
But this code, usually gets a couple of minutes in before stalling. I'm not sure whether it stalls first, and then .NET notices this and cancels it (but that violates point 2...) or whether .NET just cancels it because it took overall too long (but that's point 1...). usernames contains 5000 elements in the running code, hence the console test with 5000 elements. Papi just wraps HttpClient and uses SendAsync to send HttpRequestMessage off, I can't see SendAsync being the cause though.
importSuccess = usernames.AsParallel().WithExecutionMode(ParallelExecutionMode.ForceParallelism).WithDegreeOfParallelism(8).All(u => {
var apiClone = new Papi(api.Path);
apiClone.SessionId = api.SessionId;
var userDetails = String.Format("stuff {0}", u);
var importResponse = apiClone.Post().WithString(userDetails, "application/json").Send(apiClone.SessionId).GetAwaiter().GetResult();
if (importResponse.IsSuccessStatusCode) {
var body = importResponse.Content.ReadAsStringAsync().GetAwaiter().GetResult().ToLower();
if (body == "true") {
return true;
}
}
return false;
});
Again the above PLINQ query throws a Task Cancelled Exception after a couple of minutes, no other Exception is observed. Anyone ever had a PLINQ query be cancelled without writing the cancellation themselves or know why this might be?

How to do async 'paged' processing of EF select result

I'm writing something that loads records from SQL server onto an azure queue. The thing is, the number of items in the select result might be very large, so I would like to start the queuing stuff while data is still being retrieved.
I'm trying to leverage EF6 (async), all async methods and and TPL for parallel enqueueing. So I have:
// This defines queue that Generator will publsh to and
// QueueManager wil read from. More info:
// http://msdn.microsoft.com/en-us/library/hh228601(v=vs.110).aspx
var queue = new BufferBlock<ProcessQueueItem>();
// Configure queue listener first
var result = this.ReceiveAndEnqueue(queue);
// Start generation process
var tasks = generator.Generate(batchId);
The ReceiveAndEnqueue is simple:
private async Task ReceiveAndEnqueue(ISourceBlock<ProcessQueueItem> queue)
{
while (await queue.OutputAvailableAsync())
{
var processQueueItem = await queue.ReceiveAsync();
await this.queueManager.Enqueue(processQueueItem);
this.tasksEnqueued++;
}
}
The generator generate() signature is as follows:
public void Generate(Guid someId, ITargetBlock<ProcessQueueItem> target)
Which calls the SendAsync() method on the target to place new items. What I'm doing right now is dividing the total number of results into 'batches', loading them in, and sending them async, untill all is done:
public void Generate(Guid batchId, ITargetBlock<ProcessQueueItem> target)
{
var accountPromise = this.AccountStatusRepository.GetAccountsByBatchId(batchId.ToString());
accountPromise.Wait();
var accounts = accountPromise.Result;
// Batch configuration
var itemCount = accounts.Count();
var numBatches = (int)Math.Ceiling((double)itemCount / this.batchSize);
Debug.WriteLine("Found {0} items what will be put in {1} batches of {2}", itemCount, numBatches, this.batchSize);
for (int i = 0; i < numBatches; i++)
{
var itemsToTake = Math.Min(this.batchSize, itemCount - currentIndex);
Debug.WriteLine("Running batch - skip {0} and take {1}", currentIndex, itemsToTake);
// Take a subset of the items and place them onto the queue
var batch = accounts.Skip(currentIndex).Take(itemsToTake);
// Generate a list of tasks to enqueue the items
var taskList = new List<Task>(itemsToTake);
taskList.AddRange(batch.Select(account => target.SendAsync(account.AsProcessQueueItem(batchId))));
// Return the control when all tasks have been enqueued
Task.WaitAll(taskList.ToArray());
currentIndex = currentIndex + this.batchSize;
}
This works however, my colleague remarked - 'can't we make the interface simpler, and let Generate() and make the interface like so:
public Task<IEnumerable<ProcessQueueItem> Generate(Guid someId)
A lot cleaner, and no dependency of the Generate method onto the TPL library. I totally agree, I'm just affraid that if I do that, I'm going to have to call
var result = Generate().Wait().Result;
at some point, before enqueuinig all the items. That will make me wait untill ALL the stuff is loaded in and is in memory.
So what my question comes down is: how can I start using EF query results as soon as they 'drip in' from a select? As if EF would run a 'yield' over the results if you catch my drift.
EDIT
I think I made a thinking mistake. EF loads items lazy by default. So I can just return all the results as IQueryable<> but that doesn't mean they're actually loaded from the DB. I'll then iterate over them and enqueue them.
EDIT 2
Nope, that doesn't work, since I need to transform the object from the database in the Generate() method...
OK, this is what I ended up with:
public IEnumerable<ProcessQueueItem> Generate(Guid batchId)
{
var accounts = this.AccountStatusRepository.GetAccountsByBatchId(batchId.ToString());
foreach (var accountStatuse in accounts)
{
yield return accountStatuse.AsProcessQueueItem(batchId);
}
}
The repository returns an IEnumerable of just some DataContext.Stuff.Where(...). The generator uses the extension method to transform the entity to the domain model (ProcessQueueItem) which by the means of yield is immediately sent to the caller of the method, that will start calling the QueueManager to start queueing.

Categories